»
Symbian OS v6.1 Edition for C++ »
API Reference »
Character Conversion Plug-In Provider »
CnvUtilities
Location:
convutils.h
Link against: convutils.lib
CnvUtilities
Support
Supported from 6.0
Description
Provides static character conversion utilities for complex
encodings. Its functions may be called from a plug-in DLL's implementation of
ConvertFromUnicode()
and ConvertToUnicode()
.
These utility functions are provided for use when converting
to/from complex character set encodings, including modal encodings. Modal
encodings are those where the interpretation of a given byte of data is
dependent on the current mode; mode changing is performed by escape sequences
which occur in the byte stream. A non-modal complex encoding is one in which
characters are encoded using variable numbers of bytes. The number of bytes
used to encode a character depends on the value of the initial byte.
Defined in CnvUtilities
:
ConvertFromIntermediateBufferInPlace()
, ConvertFromUnicode()
, ConvertToUnicodeFromHeterogeneousForeign()
, ConvertToUnicodeFromModalForeign()
, FConvertFromIntermediateBufferInPlace
, FConvertToIntermediateBufferInPlace
, FNumberOfBytesAbleToConvert
, SCharacterSet
, SMethod
, SState
Convert from Unicode to complex foreign
static TInt ConvertFromUnicode(CCnvCharacterSetConverter::TEndianness aDefaultEndiannessOfForeignCharacters, const TDesC8& aReplacementForUnconvertibleUnicodeCharacters, TDes8& aForeign, const TDesC16& aUnicode, CCnvCharacterSetConverter::TArrayOfAscendingIndices& aIndicesOfUnconvertibleCharacters, const TArray<SCharacterSet>& aArrayOfCharacterSets);
Description
Converts Unicode text into a complex foreign character set
encoding. This is an encoding which cannot be converted simply by calling
CCnvCharacterSetConverter::DoConvertFromUnicode()
. It may be modal
(e.g. JIS) or non-modal (e.g. Shift-JIS).
The Unicode text specified in aUnicode
is
converted using the array of conversion data objects
(aArrayOfCharacterSets
) provided by the plug-in for the complex
character set encoding, and the converted text is returned in
aForeign
. Any existing contents in aForeign
are
overwritten.
Unlike
CCnvCharacterSetConverter::DoConvertFromUnicode()
, multiple
character sets can be specified. aUnicode
is converted using the
first character conversion data object in the array. When a character is found
which cannot be converted using that data, each character set in the array is
tried in turn. If it cannot be converted using any object in the array, the
index of the character is appended to
aIndicesOfUnconvertibleCharacters
and the character is replaced by
aReplacementForUnconvertibleUnicodeCharacters
.
If it can be converted using another object in the array, that
object is used to convert all subsequent characters until another unconvertible
character is found.
Parameters
CCnvCharacterSetConverter::TEndianness
aDefaultEndiannessOfForeignCharacters |
The default endian-ness to use when writing the characters in
the foreign character set. If an endian-ness for foreign characters is
specified in the current conversion data object, then that is used instead and
the value of aDefaultEndiannessOfForeignCharacters is ignored.
|
const TDesC8&
aReplacementForUnconvertibleUnicodeCharacters |
The single character (one or more byte values) which is used
to replace unconvertible characters. |
TDes8& aForeign |
On return, contains the converted text in the
non-Unicode character set. |
const TDesC16& aUnicode |
The source Unicode text to be converted. |
CCnvCharacterSetConverter::TArrayOfAscendingIndices&
aIndicesOfUnconvertibleCharacters |
On return, holds an ascending array of the indices of each
Unicode character in the source text which could not be converted (because none
of the target character sets have an equivalent character). |
const TArray<SCharacterSet>& aArrayOfCharacterSets
|
Array of character conversion data objects, representing the
character sets which comprise a complex character set encoding. These are used
in sequence to convert the Unicode text. There must be at least one character
set in this array and no character set may have any NULL member data, or a
panic occurs. |
|
Return value
TInt |
The number of unconverted characters left at the end of the
input descriptor (e.g. because aForeign was not long enough to
hold all the text), or a negative error value, as defined in
CCnvCharacterSetConverter::TError . |
|
static TInt ConvertFromUnicode(CCnvCharacterSetConverter::TEndianness aDefaultEndiannessOfForeignCharacters, const TDesC8& aReplacementForUnconvertibleUnicodeCharacters, TDes8& aForeign, const TDesC16& aUnicode, CCnvCharacterSetConverter::TArrayOfAscendingIndices& aIndicesOfUnconvertibleCharacters, const TArray<SCharacterSet>& aArrayOfCharacterSets, TUint& aOutputConversionFlags, TUint aInputConversionFlags);
Description
Converts Unicode text into a complex foreign character set
encoding. This is an encoding which cannot be converted simply by a call to
CCnvCharacterSetConverter::DoConvertFromUnicode()
. It may be modal
(e.g. JIS) or non-modal (e.g. Shift-JIS).
The Unicode text specified in aUnicode
is
converted using the array of conversion data objects
(aArrayOfCharacterSets
) provided by the plug-in for the complex
character set encoding and the converted text is returned in
aForeign
. The function can either append to aForeign
or overwrite its contents (if any).
Unlike
CCnvCharacterSetConverter::DoConvertFromUnicode()
, multiple
character sets can be specified. aUnicode
is converted using the
first character conversion data object in the array. When a character is found
which cannot be converted using that data, each character set in the array is
tried in turn. If it cannot be converted using any object in the array, the
index of the character is appended to
aIndicesOfUnconvertibleCharacters
and the character is replaced by
aReplacementForUnconvertibleUnicodeCharacters
.
If it can be converted using another object in the array, that
object is used to convert all subsequent characters until another unconvertible
character is found.
Parameters
CCnvCharacterSetConverter::TEndianness
aDefaultEndiannessOfForeignCharacters |
The default endian-ness to use when writing the characters in
the foreign character set. If an endian-ness for foreign characters is
specified in the current conversion data object, then that is used instead and
the value of aDefaultEndiannessOfForeignCharacters is ignored.
|
const TDesC8&
aReplacementForUnconvertibleUnicodeCharacters |
The single character (one or more byte values) which is used
to replace unconvertible characters. |
TDes8& aForeign |
On return, contains the converted text in the
non-Unicode character set. This may already contain some text. If it does, and
if aInputConversionFlags specifies
EInputConversionFlagAppend , then the converted text is appended to
this descriptor. |
const TDesC16& aUnicode |
The source Unicode text to be converted. |
CCnvCharacterSetConverter::TArrayOfAscendingIndices&
aIndicesOfUnconvertibleCharacters |
On return, holds an ascending array of the indices of each
Unicode character in the source text which could not be converted (because none
of the target character sets have an equivalent character). |
const TArray<SCharacterSet>& aArrayOfCharacterSets
|
Array of character set data objects. These are used in
sequence to convert the Unicode text. There must be at least one character set
in this array and no character set may have any NULL member data, or a panic
occurs. |
TUint& aOutputConversionFlags |
If the input descriptor ended in a truncated sequence, e.g.
the first half only of a Unicode surrogate pair, this returns with the
EOutputConversionFlagInputIsTruncated flag set. |
TUint aInputConversionFlags |
Specify
CCnvCharacterSetConverter::EInputConversionFlagAppend to append
the text to aForeign . Specify
CCnvCharacterSetConverter::EInputConversionFlagAllowTruncatedInputNotEvenPartlyConsumable
to prevent the function from returning the error-code
EErrorIllFormedInput when the input descriptor consists of nothing
but a truncated sequence. The
CCnvCharacterSetConverter::EInputConversionFlagStopAtFirstUnconvertibleCharacter
flag must not be set, otherwise a panic occurs. |
|
Return value
TInt |
The number of unconverted characters left at the end of the
input descriptor (e.g. because aForeign was not long enough to
hold all the text), or a negative error value, as defined in
CCnvCharacterSetConverter::TError . |
|
ConvertFromIntermediateBufferInPlace()
static void ConvertFromIntermediateBufferInPlace(TInt aStartPositionInDescriptor, TDes8& aDescriptor, TInt& aNumberOfCharactersThatDroppedOut, const TDesC8& aEscapeSequence, TInt aNumberOfBytesPerCharacter);
Description
Inserts an escape sequence into the descriptor.
This function is provided to help in the implementation of
ConvertFromUnicode()
for modal character set encodings. Each
SCharacterSet
object in the array passed to
ConvertFromUnicode()
must have its
iConvertFromIntermediateBufferInPlace
member assigned. To do this
for a modal character set encoding, implement a function whose signature
matches that of FConvertFromIntermediateBufferInPlace
and which
calls this function, passing all arguments unchanged, and specifying the
character set's escape sequence and the number of bytes per character.
Parameters
TInt aStartPositionInDescriptor |
The byte position in aDescriptor at which the
escape sequence is inserted. If the character set uses more than one byte per
character, this position must be the start of a character, otherwise a panic
occurs. |
TDes8& aDescriptor |
The descriptor into which the escape sequence is inserted.
|
TInt& aNumberOfCharactersThatDroppedOut |
The escape sequence is inserted into the start of
aDescriptor and any characters that need to drop out to make room
for the escape sequence (because the descriptor's maximum length was not long
enough) drop out from the end of the buffer. This parameter indicates the
number of characters that needed to drop out. |
const TDesC8& aEscapeSequence |
The escape sequence for the character set. |
TInt aNumberOfBytesPerCharacter |
The number of bytes per character. |
|
Convert to Unicode from complex foreign
ConvertToUnicodeFromModalForeign()
static TInt ConvertToUnicodeFromModalForeign(CCnvCharacterSetConverter::TEndianness aDefaultEndiannessOfForeignCharacters, TDes16& aUnicode, const TDesC8& aForeign, TInt& aState, TInt& aNumberOfUnconvertibleCharacters, TInt& aIndexOfFirstByteOfFirstUnconvertibleCharacter, const TArray<SState>& aArrayOfStates);
Description
Converts text from a modal foreign character set encoding into
Unicode.
The non-Unicode text specified in aForeign
is
converted using the array of character set conversion objects
(aArrayOfStates
) provided by the plug-in, and the converted text
is returned in aUnicode
. Overwrites the contents, if any, of
aUnicode
. The first element in aArrayOfStates
is
taken to be the default mode (i.e. the mode to assume by default if there is no
preceding escape sequence).
Parameters
CCnvCharacterSetConverter::TEndianness
aDefaultEndiannessOfForeignCharacters |
The default endian-ness of the foreign characters. If an
endian-ness for foreign characters is specified in the conversion data, then
that is used instead and the value of
aDefaultEndiannessOfForeignCharacters is ignored. |
TDes16& aUnicode |
On return, contains the text converted into Unicode. |
const TDesC8& aForeign |
The non-Unicode source text to be converted. |
TInt& aState |
Used to store a modal character set encoding's current mode
across multiple calls to ConvertToUnicode() on the same input
descriptor. This argument should be passed the same object as passed to the
plug-in's ConvertToUnicode() exported function. |
TInt& aNumberOfUnconvertibleCharacters |
On return, contains the number of characters in
aForeign which were not converted. Characters which cannot be
converted are output as Unicode replacement characters (0xfffd). |
TInt& aIndexOfFirstByteOfFirstUnconvertibleCharacter
|
On return, the index of the first byte of the first
unconvertible character. For instance if the first character in the input
descriptor (aForeign ) could not be converted, then this parameter
is set to the first byte of that character, i.e. zero. A negative value is
returned if all the characters were converted. |
const TArray<SState>& aArrayOfStates |
Array of character set conversion data objects, and their
escape sequences ("modes"). There must be one or more modes in this array, none
of the modes can have any NULL member data, and each mode's escape sequence
must begin with KControlCharacterEscape (0x1b) or a panic occurs.
|
|
Return value
TInt |
The number of unconverted bytes left at the end of the input
descriptor, or a negative error value, as defined in TError .
|
|
ConvertToUnicodeFromModalForeign()
static TInt ConvertToUnicodeFromModalForeign(CCnvCharacterSetConverter::TEndianness aDefaultEndiannessOfForeignCharacters, TDes16& aUnicode, const TDesC8& aForeign, TInt& aState, TInt& aNumberOfUnconvertibleCharacters, TInt& aIndexOfFirstByteOfFirstUnconvertibleCharacter, const TArray<SState>& aArrayOfStates, TUint& aOutputConversionFlags, TUint aInputConversionFlags);
Description
Converts text from a modal foreign character set encoding into
Unicode.
The non-Unicode text specified in aForeign
is
converted using the array of character set conversion objects
(aArrayOfStates
) provided by the plug-in, and the converted text
is returned in aUnicode
. The function can either append to
aUnicode
or overwrite its contents (if any), depending on the
input conversion flags specified. The first element in
aArrayOfStates
is taken to be the default mode (i.e. the mode to
assume by default if there is no preceding escape sequence).
Parameters
CCnvCharacterSetConverter::TEndianness
aDefaultEndiannessOfForeignCharacters |
The default endian-ness for the foreign characters. If an
endian-ness for foreign characters is specified in the conversion data, then
that is used instead and the value of
aDefaultEndiannessOfForeignCharacters is ignored. |
TDes16& aUnicode |
On return, contains the text converted into Unicode. |
const TDesC8& aForeign |
The non-Unicode source text to be converted. |
TInt& aState |
Used to store a modal character set encoding's current mode
across multiple calls to ConvertToUnicode() on the same input
descriptor. This argument should be passed the same object as passed to the
plug-in's ConvertToUnicode() exported function. |
TInt& aNumberOfUnconvertibleCharacters |
On return, contains the number of characters in
aForeign which were not converted. Characters which cannot be
converted are output as Unicode replacement characters (0xfffd). |
TInt& aIndexOfFirstByteOfFirstUnconvertibleCharacter
|
On return, the index of the first byte of the first
unconvertible character. For instance if the first character in the input
descriptor (aForeign ) could not be converted, then this parameter
is set to the first byte of that character, i.e. zero. A negative value is
returned if all the characters were converted. |
const TArray<SState>& aArrayOfStates |
Array of character set conversion data objects, and their
escape sequences. There must be one or more modes in this array, none of the
modes can have any NULL member data, and each mode's escape sequence must begin
with KControlCharacterEscape (0x1b) or a panic occurs. |
TUint& aOutputConversionFlags |
If the input descriptor ended in a truncated sequence, e.g. an
incomplete multi-byte character, aOutputConversionFlags returns
with the EOutputConversionFlagInputIsTruncated flag set. |
TUint aInputConversionFlags |
Specify
CCnvCharacterSetConverter::EInputConversionFlagAppend to append
the text to aUnicode . Specify
EInputConversionFlagAllowTruncatedInputNotEvenPartlyConsumable to
prevent the function from returning the error-code
EErrorIllFormedInput when the input descriptor consists of nothing
but a truncated sequence. The
CCnvCharacterSetConverter::EInputConversionFlagStopAtFirstUnconvertibleCharacter
flag must not be set, otherwise a panic occurs. |
|
Return value
TInt |
The number of unconverted bytes left at the end of the input
descriptor, or a negative error value, as defined in TError .
|
|
ConvertToUnicodeFromHeterogeneousForeign()
static TInt ConvertToUnicodeFromHeterogeneousForeign(CCnvCharacterSetConverter::TEndianness aDefaultEndiannessOfForeignCharacters, TDes16& aUnicode, const TDesC8& aForeign, TInt& aNumberOfUnconvertibleCharacters, TInt& aIndexOfFirstByteOfFirstUnconvertibleCharacter, const TArray<SMethod>& aArrayOfMethods);
Description
Converts text from a non-modal complex character set encoding
(e.g. Shift-JIS or EUC-JP) into Unicode.
The non-Unicode text specified in aForeign
is
converted using the array of character set conversion methods
(aArrayOfMethods
) provided by the plug-in, and the converted text
is returned in aUnicode
. Overwrites the contents, if any, of
aUnicode
.
Parameters
CCnvCharacterSetConverter::TEndianness
aDefaultEndiannessOfForeignCharacters |
The default endian-ness of the foreign characters. If an
endian-ness for foreign characters is specified in the conversion data, then
that is used instead and the value of
aDefaultEndiannessOfForeignCharacters is ignored. |
TDes16& aUnicode |
On return, contains the text converted into Unicode. |
const TDesC8& aForeign |
The non-Unicode source text to be converted. |
TInt& aNumberOfUnconvertibleCharacters |
On return, contains the number of characters in
aForeign which were not converted. Characters which cannot be
converted are output as Unicode replacement characters (0xfffd). |
TInt& aIndexOfFirstByteOfFirstUnconvertibleCharacter
|
On return, the index of the first byte of the first
unconvertible character. For instance if the first character in the input
descriptor (aForeign ) could not be converted, then this parameter
is set to the first byte of that character, i.e. zero. A negative value is
returned if all the characters were converted. |
const TArray<SMethod>& aArrayOfMethods |
Array of conversion methods. There must be one or more methods
in this array and none of the methods in the array can have any NULL member
data or a panic occurs. |
|
Return value
TInt |
The number of unconverted bytes left at the end of the input
descriptor, or a negative error value, as defined in TError .
|
|
ConvertToUnicodeFromHeterogeneousForeign()
static TInt ConvertToUnicodeFromHeterogeneousForeign(CCnvCharacterSetConverter::TEndianness aDefaultEndiannessOfForeignCharacters, TDes16& aUnicode, const TDesC8& aForeign, TInt& aNumberOfUnconvertibleCharacters, TInt& aIndexOfFirstByteOfFirstUnconvertibleCharacter, const TArray<SMethod>& aArrayOfMethods, TUint& aOutputConversionFlags, TUint aInputConversionFlags);
Description
Converts text from a non-modal complex character set encoding
(e.g. Shift-JIS, or EUC-JP) into Unicode.
The non-Unicode text specified in aForeign
is
converted using the array of character set conversion methods
(aArrayOfMethods
) provided by the plug-in, and the converted text
is returned in aUnicode
. The function can either be set to append
to aUnicode
, or overwrite its contents.
Parameters
CCnvCharacterSetConverter::TEndianness
aDefaultEndiannessOfForeignCharacters |
The default endian-ness for the foreign characters. If an
endian-ness for foreign characters is specified in the conversion data, then
that is used instead and the value of
aDefaultEndiannessOfForeignCharacters is ignored. |
TDes16& aUnicode |
On return, contains the text converted into Unicode. |
const TDesC8& aForeign |
The non-Unicode source text to be converted. |
TInt& aNumberOfUnconvertibleCharacters |
On return, contains the number of characters in
aForeign which were not converted. Characters which cannot be
converted are output as Unicode replacement characters (0xfffd). |
TInt& aIndexOfFirstByteOfFirstUnconvertibleCharacter
|
On return, the index of the first byte of the first
unconvertible character. For instance if the first character in the input
descriptor (aForeign ) could not be converted, then this parameter
is set to the first byte of that character, i.e. zero. A negative value is
returned if all the characters were converted. |
const TArray<SMethod>& aArrayOfMethods |
Array of conversion methods. There must be one or more methods
in this array and none of the methods in the array can have any NULL member
data or a panic occurs. |
TUint& aOutputConversionFlags |
If the input descriptor ended in a truncated sequence, e.g. an
incomplete multi-byte character, aOutputConversionFlags returns
with the EOutputConversionFlagInputIsTruncated flag set. |
TUint aInputConversionFlags |
Specify
CCnvCharacterSetConverter::EInputConversionFlagAppend to append
the text to aUnicode . Specify
EInputConversionFlagAllowTruncatedInputNotEvenPartlyConsumable to
prevent the function from returning the error-code
EErrorIllFormedInput when the input descriptor consists of nothing
but a truncated sequence. The
CCnvCharacterSetConverter::EInputConversionFlagStopAtFirstUnconvertibleCharacter
flag must not be set, otherwise a panic occurs. |
|
Return value
TInt |
The number of unconverted bytes left at the end of the input
descriptor, or a negative error value, as defined in TError .
|
|
SCharacterSet
Description
Character conversion data for one of the character sets which
is specified in a complex character set encoding. An array of these structs is
used when converting from Unicode to a complex character set encoding (see
CnvUtilities::ConvertFromUnicode()
). None of the members may be
NULL.
Defined in CnvUtilities::SCharacterSet
:
iConversionData
, iConvertFromIntermediateBufferInPlace
, iEscapeSequence
iConversionData
const SCnvConversionData* iConversionData
Description
The conversion data.
iConvertFromIntermediateBufferInPlace
FConvertFromIntermediateBufferInPlace iConvertFromIntermediateBufferInPlace
Description
A pointer to a function which "mangles" the text in a way
appropriate to the target complex character set. For instance it might insert a
shifting character, escape sequence, or other special characters.
iEscapeSequence
const TDesC8* iEscapeSequence
Description
The escape sequence which introduces the character set, i.e.
it identifies this character set as the next one to use. Must not be NULL. If
the character set is non-modal, this should be set to an empty
descriptor.
SState
Description
Character conversion data for one of the character sets which
is specified in a modal character set encoding. An array of these structs is
used when converting from a modal character set into Unicode, using
CnvUtilities::ConvertToUnicodeFromModalForeign()
. Neither of the
members may be NULL.
Defined in CnvUtilities::SState
:
iConversionData
, iEscapeSequence
iEscapeSequence
const TDesC8* iEscapeSequence
Description
The escape sequence which introduces the character set, i.e.
it identifies this character set as the next one to use. This must begin with
KControlCharacterEscape
.
iConversionData
const SCnvConversionData* iConversionData
Description
The conversion data.
SMethod
Description
Character conversion data for one of the character sets which
is specified in a non-modal complex character set encoding. An array of these
structs is used when converting from a non-modal complex character set encoding
into Unicode using
CnvUtilities::ConvertToUnicodeFromHeterogeneousForeign()
. None of
the members may be NULL.
Defined in CnvUtilities::SMethod
:
iConversionData
, iConvertToIntermediateBufferInPlace
, iNumberOfBytesAbleToConvert
, iNumberOfBytesPerCharacter
, iNumberOfCoreBytesPerCharacter
iNumberOfBytesAbleToConvert
FNumberOfBytesAbleToConvert iNumberOfBytesAbleToConvert
Description
A pointer to a function which calculates the number of
consecutive bytes in the remainder of the foreign descriptor which can be
converted using the current character set's conversion data. It may return a
negative CCnvCharacterSetConverter::TError
value to indicate an
error in the encoding.
iConvertToIntermediateBufferInPlace
FConvertToIntermediateBufferInPlace iConvertToIntermediateBufferInPlace
Description
A pointer to a function which prepares the text for
conversion into Unicode. For instance it might remove any shifting or other
special characters.
iConversionData
const SCnvConversionData* iConversionData
Description
The conversion data.
iNumberOfBytesPerCharacter
TInt16 iNumberOfBytesPerCharacter
Description
The number of bytes per character.
iNumberOfCoreBytesPerCharacter
TInt16 iNumberOfCoreBytesPerCharacter
Description
The number of core bytes per character.
Typedef FConvertFromIntermediateBufferInPlace
typedef void (*FConvertFromIntermediateBufferInPlace)(TInt aStartPositionInDescriptor, TDes8& aDescriptor, TInt& aNumberOfCharactersThatDroppedOut);
Description
A pointer to a function which "mangles" text when converting
from Unicode into a complex modal or non-modal foreign character set encoding.
It might insert a shifting character, escape sequence, or other special
characters.
If the target character set encoding is modal, the
implementation of this function may call the
CnvUtilities::ConvertFromIntermediateBufferInPlace()
utility
function which is provided because many modal character sets require an
identical implementation of this function.
Typedef FNumberOfBytesAbleToConvert
typedef TInt (*FNumberOfBytesAbleToConvert)(const TDesC8& aDescriptor);
Description
A pointer to a function which calculates the number of
consecutive bytes in the remainder of the foreign descriptor which can be
converted using the current character set's conversion data. Called when
converting from a non-modal complex character set encoding into Unicode. It may
return a negative CCnvCharacterSetConverter::TError
value to
indicate an error in the encoding.
Typedef FConvertToIntermediateBufferInPlace
typedef void (*FConvertToIntermediateBufferInPlace)(TDes8& aDescriptor);
Description
A pointer to a function which prepares the text for conversion
into Unicode. For instance it might remove any shifting or other special
characters. Called when converting from a non-modal complex character set
encoding into Unicode.