The Character Set Converter

charconv is a command line utility that converts a text file into another character set encoding. Either the input or the output file must be encoded in Unicode — the tool cannot convert directly between two non-Unicode encodings. The tool is useful for converting resource files written in a local character set encoding into the standard encoding expected for the device, (e.g. UTF-8).

The names of the input and output files and how they are encoded are passed as parameters to the tool. If the input filename is omitted, the tool takes its input from stdin, and if the output filename is omitted, it writes its output to stdout.

Other possible parameters are the byte order for the file encoded in Unicode and a byte order mark. The Unicode byte order can be specified as little or big endian. Little endian is assumed by default. When converting from a foreign encoding into Unicode, -byteordermark can be specified to add a byte order mark to the Unicode output. If specified, the Unicode byte order must also be specified.

The tool is invoked using charconv.bat. Data files for the supported non-Unicode character set encodings are stored as .dat files in the \epoc32\tools\charconv\ directory. In addition, the tool supports conversion both ways between Unicode and UTF-8. If no data file exists for the specified encoding, the tool terminates, and outputs an appropriate error message.

When converting a foreign-encoded input text file into Unicode, any unknown characters are replaced by the Unicode 0xFFFD character. When converting a Unicode input text file into a foreign encoding, any unknown characters are replaced by the replacement character specified in the data file.