Location:
e32std.h
Link against: euser.lib
TChar
Supported from 5.0
Holds a character value and provides a number of utility functions to manipulate it and test its properties. For example, functions to convert the character to uppercase and test whether it’s a control character.
The character value is stored as a 32-bit unsigned integer. The
shorthand “TChar
value” is used to describe the
character value wrapped by a TChar
object.
TChar
can be used to represent Unicode values outside
plane 0 (that is, the extended Unicode range from 0x10000 to 0xFFFFF). This
differentiates it from TText
which can only be used for 16-bit
Unicode character values.
Defined in TChar
:
Anonymous
, Compose()
, Decompose()
, EAlphaGroup
, EArabicNumber
, EBlockSeparator
, EBoundaryNeutral
, ECcCategory
, ECfCategory
, ECnCategory
, ECoCategory
, ECommonNumberSeparator
, EControlGroup
, ECsCategory
, EEuropeanNumber
, EEuropeanNumberSeparator
, EEuropeanNumberTerminator
, EFoldAccents
, EFoldAll
, EFoldCase
, EFoldDigits
, EFoldKana
, EFoldSpaces
, EFoldStandard
, EFoldWidth
, EFullWidth
, EHalfWidth
, ELeft
, ELeftToRight
, ELeftToRightEmbedding
, ELeftToRightOverride
, ELetterModifierGroup
, ELetterOtherGroup
, ELlCategory
, ELmCategory
, ELoCategory
, ELtCategory
, ELuCategory
, EMarkGroup
, EMaxAssignedCategory
, EMaxAssignedGroup
, EMaxGraphicCategory
, EMaxLetterCategory
, EMaxLetterOrLetterModifierCategory
, EMaxPrintableCategory
, EMcCategory
, EMeCategory
, EMnCategory
, ENarrow
, ENdCategory
, ENeutralWidth
, ENlCategory
, ENoCategory
, ENonSpacingMark
, ENumberGroup
, EOtherNeutral
, EParagraphSeparator
, EPcCategory
, EPdCategory
, EPeCategory
, EPoCategory
, EPopDirectionalFormat
, EPsCategory
, EPunctuationGroup
, ERight
, ERightToLeft
, ERightToLeftArabic
, ERightToLeftEmbedding
, ERightToLeftOverride
, EScCategory
, ESegmentSeparator
, ESeparatorGroup
, EShiftJIS
, ESkCategory
, ESmCategory
, ESoCategory
, ESymbolGroup
, EUnassignedGroup
, EUnicode
, EWhiteSpace
, EWide
, EZlCategory
, EZpCategory
, EZsCategory
, Eos()
, Fold()
, GetBDCategory()
, GetBdCategory()
, GetCJKWidth()
, GetCategory()
, GetCjkWidth()
, GetCombiningClass()
, GetInfo()
, GetLowerCase()
, GetNumericValue()
, GetTitleCase()
, GetUpperCase()
, IsAlpha()
, IsAlphaDigit()
, IsAssigned()
, IsControl()
, IsDigit()
, IsGraph()
, IsHexDigit()
, IsLower()
, IsMirrored()
, IsPrint()
, IsPunctuation()
, IsSpace()
, IsTitle()
, IsUpper()
, LowerCase()
, TBDCategory
, TBdCategory
, TCJKWidth
, TCategory
, TChar()
, TCharInfo
, TCjkWidth
, TEncoding
, TitleCase()
, UpperCase()
, operator TUint()
, operator+()
, operator+=()
, operator-()
, operator-=()
, operator=()
TChar(TUint aChar);
Constructs this character object and initialises it with the specified value.
|
TChar(const TChar& aChar);
Withdrawn in 6.0
Constructs this character object from another TChar
object.
|
TChar operator+(TUint aChar);
Returns the result of adding an unsigned integer value to this character object. This character object is not changed.
|
|
TChar& operator+=(TUint aChar);
Adds an unsigned integer value to this character object. This character object is changed by the operation.
|
|
TChar& operator=(TUint aChar);
Withdrawn in 6.0
Assigns an unsigned integer value to this character object.
|
|
TChar& operator=(const TChar& aChar);
Withdrawn in 6.0
Assign the specified character object to this character object.
|
|
TChar operator-(TUint aChar);
Returns the result of subtracting an unsigned integer value from this character object. This character object is not changed.
|
|
TChar& operator-=(TUint aChar);
Subtracts an unsigned integer value from this character object. This character object is changed by the operation.
|
|
operator TUint() const;
Returns the value of the character as an unsigned integer. The operator
casts a TChar
to a TUint
— returning
the TUint
value wrapped by this character
object.
TUint GetLowerCase() const;
Returns the character value after conversion to lowercase or the character's own value, if no lowercase form exists. The character object itself is not changed.
|
TUint GetUpperCase() const;
Returns the character value after conversion to uppercase or the character's own value, if no uppercase form exists. The character object itself is not changed.
|
TUint GetTitleCase() const;
Returns the character value after conversion to titlecase or the character's own value, if no titlecase form exists.
The titlecase form of a character is identical to its uppercase form unless a specific titlecase form exists.
In ER5, this function is only defined and implemented for a Unicode build.
|
void Fold();
Converts the character to a form which can be used in tolerant comparisons without control over the operations performed.
Tolerant comparisons are those which ignore character differences like case and accents.
This function can be used when searching for a string in a text file or a file in a directory. Folding performs the following conversions: converts to lowercase, strips accents, converts all digits representing the values 0..9 to the ordinary digit characters '0'..'9', converts all spaces (standard, non-break, fixed-width, ideographic, etc.) to the ordinary space character (0x0020), converts Japanese characters in the hiragana syllabary to katakana, and converts East Asian halfwidth and fullwidth variants to their ordinary forms. You can choose to perform any subset of these operations by using the other function overload.
void Fold(TInt aFlags);
Converts the character to a form which can be used in tolerant comparisons allowing selection of the specific fold operations to be performed.
In ER5, this function is only defined and implemented for a Unicode build.
|
void LowerCase();
Converts the character to its lowercase form. Characters lacking a lowercase form are unchanged.
void UpperCase();
Converts the character to its uppercase form. Characters lacking an uppercase form are unchanged.
void TitleCase();
Converts the character to its titlecase form. The titlecase form of a character is identical to its uppercase form unless a specific titlecase form exists. Characters lacking a titlecase form are unchanged.
In ER5, this function is only defined and implemented for a Unicode build.
TBool Eos() const;
Tests whether the character is the is the C/C++ end-of-string character — 0.
|
TBool IsAlpha() const;
Tests whether the character is alphabetic.
For Unicode, the function returns TRUE for all letters, including those
from syllabaries and ideographic scripts. The function returns FALSE for
letter-like characters that are in fact diacritics. Specifically, the function
returns TRUE for categories:
ELuCategory
,ELtCategory
,ELlCategory
,
andELoCategory
; it returns FALSE for all other categories
includingELmCategory
.
|
TBool IsDigit() const;
Tests whether the character is a standard decimal digit.
For Unicode, this function returns TRUE only for the digits '0'...'9' (U+0030...U+0039), not for other digits in scripts like Arabic, Tamil, etc.
|
TBool IsAlphaDigit() const;
Tests whether the character is alphabetic or a decimal digit.
It is identical to
(IsAlpha()
||IsDigit()
).
|
TBool IsGraph() const;
Tests whether the character is a graphic character.
For Unicode, graphic characters include printable characters but not the
space character. Specifically, graphic characters are any character except
those in categories:
EZsCategory
,EZlCategory
,EZpCategory
,
ECcCategory
,ECfCategory
,ECsCategory
,
ECoCategory
, and ,ECnCategory
.
Note that for ISO Latin-1, all alphanumeric and punctuation characters are graphic.
|
TBool IsHexDigit() const;
Tests whether the character is a hexadecimal digit (0-9, a-f, A-F).
|
TBool IsLower() const;
Tests whether the character is lowercase.
|
TBool IsUpper() const;
Tests whether the character is uppercase.
|
TBool IsPrint() const;
Tests whether the character is a printable character.
For Unicode, printable characters are any character except those in
categories: ECcCategory
,ECfCategory
,ECsCategory
,
ECoCategory
andECnCategory
.
Note that for ISO Latin-1, all alphanumeric and punctuation characters, plus space, are printable.
|
TBool IsPunctuation() const;
Tests whether the character is a punctuation character.
For Unicode, punctuation characters are any character in the
categories:EPcCategory
,
EPdCategory
,EPsCategory
,EPeCategory
,
EPiCategory
,EPfCategory
,EPoCategory
.
|
TBool IsSpace() const;
Tests whether the character is a white space character. White space includes spaces, tabs and separators.
For Unicode, the function returns TRUE for all characters in the
categories: EZsCategory
,
EZlCategory
andEZpCategory
, and also for the
characters 0x0009 (horizontal tab), 0x000A (linefeed), 0x000B (vertical tab),
0x000C (form feed), and 0x000D (carriage return).
|
TBool IsControl() const;
Tests whether the character is a control character.
For Unicode, the function returns TRUE for all characters in the
categories: ECcCategory
,ECfCategory
,ECsCategory
,
ECoCategory
andECnCategoryCc
.
|
TBDCategory GetBDCategory() const;
Withdrawn in 6.0
Returns the bi-directional category of a character.
For more information on the bi-directional algorithm, see Unicode Technical Report No. 9 available at: http://www.unicode.org/unicode/reports/tr9.
This function is only defined and implemented for a Unicode build.
|
TBdCategory GetBdCategory() const;
Supported from 6.0
Returns the bi-directional category of a character.
For more information on the bi-directional algorithm, see Unicode Technical Report No. 9 available at: http://www.unicode.org/unicode/reports/tr9/.
|
TCJKWidth GetCJKWidth() const;
Withdrawn in 6.0
Returns the Chinese, Japanese, Korean (CJK) notional width.
Some display systems used in East Asia display characters on a grid of fixed-width character cells — like the standard MSDOS display mode.
Some characters, e.g. the Japanese katakana syllabary, take up a single character cell and some characters, e.g., kanji, Chinese characters used in Japanese, take up two. These are called half-width and full-width characters. This property is fixed and cannot be overridden for particular locales.
For more information on returned widths, see Unicode Technical Report 11 on East Asian Width available at: http://www.unicode.org/unicode/reports/tr11/
This function is only defined and implemented for a Unicode build.
|
TCjkWidth GetCJKWidth() const;
Supported from 6.0
Returns the Chinese, Japanese, Korean (CJK) notional width.
Some display systems used in East Asia display characters on a grid of fixed-width character cells — like the standard MSDOS display mode.
Some characters, e.g. the Japanese katakana syllabary, take up a single character cell and some characters, e.g., kanji, Chinese characters used in Japanese, take up two. These are called half-width and full-width characters. This property is fixed and cannot be overridden for particular locales.
For more information on returned widths, see Unicode Technical Report 11 on East Asian Width available at: http://www.unicode.org/unicode/reports/tr11/
|
TCategory GetCategory() const;
Returns this character's Unicode category.
In ER5, this function is only defined and implemented for a Unicode build.
|
TInt GetCombiningClass() const;
Returns this character's combining class. Note that diacritics and other combining characters have non-zero combining classes.
In ER5, this function is only defined and implemented for a Unicode build.
|
void GetInfo(TCharInfo& aInfo) const;
Returns this character’s standard category information. This includes everything except its CJK width and decomposition, if any.
In ER5, this function is only defined and implemented for a Unicode build.
|
TInt GetNumericValue() const;
Returns the integer numeric value of this character.
Numeric values need not be in the range 0..9; the Unicode character set includes various other numeric characters such as the Roman and Tamil numerals for 500, 1000, etc.
In ER5, this function is only defined and implemented for a Unicode build.
|
TBool IsAssigned() const;
Tests whether this character has an assigned meaning in the Unicode encoding.
All characters outside the range 0x0000 - 0xFFFF are unassigned and there are also many unassigned characters within the Unicode range.
Locales can change the assigned/unassigned status of characters. This means that the precise behaviour of this function is locale-dependent.
In ER5, this function is only defined and implemented for a Unicode build.
|
TBool IsMirrored() const;
Tests whether this character has the mirrored property.
Mirrored characters, like ( ) [ ] < >, change direction according to the directionality of the surrounding characters. For example, an opening parenthesis 'faces right' in Hebrew or Arabic, and to say that 2 < 3 you would have to say that 3 > 2, where the '>' is, in this example, a less-than sign to be read right-to-left.
In ER5, this function is only defined and implemented for a Unicode build.
|
TBool IsTitle() const;
Tests whether this character is in titlecase.
In ER5, this function is only defined and implemented for a Unicode build.
|
static TBool Compose(TUint16& aResult,const TDesC16& aSource);
Composes a string of Unicode characters to produce a single character result. For example, 0061 ('a') and 030A (combining ring above) compose to give 00E5 ('a' with ring above).
A canonical decomposition is a relationship between a string of characters — usually a base character and one or more diacritics — and a composed character. The Unicode standard requires that compliant software treats composed characters identically with their canonical decompositions. The mappings used by these functions are fixed and cannot be overridden for particular locales.
In ER5, this function is only defined and implemented for a Unicode build.
|
|
TBool Decompose(TPtrC16& aResult) const;
Maps this character to its canonical decomposition. For example, 01E1 ('a' with dot above and macron) decomposes into 0061 ('a') 0307 (dot) and 0304 (macron).
Decomposition is not maximal. To achieve a maximal decomposition, call this function repeatedly for the resulting characters until FALSE is returned for all of them.
Note that this function is used during collation, as performed by
theMem::CompareC()
function, to convert the compared strings to
their maximal canonical decompositions.
In ER5, this function is only defined and implemented for a Unicode build.
|
|
TCategory
General Unicode character category. The high nybble encodes the major category (Mark, Number, etc.) and a low nybble encodes the subdivisions of that category.
The category codes can be used in three ways: (i) as unique constants:
there is one for each Unicode category, with a name of the form
E<XX>Category, where <XX> is the category name given by the Unicode
database (e.g., the constant ELuCategory
is used for lowercase
letters, category Lu); (ii) as numbers in certain ranges: letter categories are
all <= EMaxLetterCategory
; and (iii) as codes in which the
upper nybble gives the category group (e.g., punctuation categories all yield
TRUE for the test (category & 0xF0)
==EPunctuationGroup
).
|
TBDCategory
Withdrawn in 6.0
The bi-directional Unicode character category.
For more information on the bi-directional algorithm, see Unicode Technical Report No. 9 available at: http://www.unicode.org/unicode/reports/tr9.
|
TBdCategory
Supported from 6.0
The bi-directional Unicode character category.
For more information on the bi-directional algorithm, see Unicode Technical Report No. 9 available at: http://www.unicode.org/unicode/reports/tr9.
|
TCJKWidth
Withdrawn in 6.0
Notional character width as known to East Asian (Chinese, Japanese, Korean (CJK)) coding systems.
|
TCjkWidth
Supported from 6.0
Notional character width as known to East Asian (Chinese, Japanese, Korean (CJK)) coding systems.
|
TEncoding
Encoding systems used by the translation functions.
|
Anonymous
Flags defining operations to be performed using
TChar::Fold()
.
|
TCharInfo
Character information
Defined in TChar::TCharInfo
:
iBDCategory
, iBdCategory
, iCategory
, iCombiningClass
, iLowerCase
, iMirrored
, iNumericValue
, iTitleCase
, iUpperCase
iCategory
TCategory iCategory
General category
iBdCategory
TBdCategory iBdCategory
Supported from 6.0
Bi-directional category
iBDCategory
TBDCategory iBDCategory
Withdrawn in 6.0
Bi-directional category
iCombiningClass
TUint16 iCombiningClass
Combining class: number (currently) in the range 0..234
iLowerCase
TUint16 iLowerCase
Lower case form
iMirrored
TBool iMirrored
True if the character is mirrored
iNumericValue
TInt16 iNumericValue
Integer numeric value: -1 if none, -2 if a fraction
iTitleCase
TUint16 iTitleCase
Title case form
iUpperCase
TUint16 iUpperCase
Upper case form