[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]
Files Reference

Character Set Description (charmap) Source File Format

Purpose

Defines character symbols as character encodings.

Description

The character set description (charmap) source file defines character symbols as character encodings. The /usr/lib/nls/charmap directory contains charmap source files for supported locales. The localedef command recognizes two sections in charmap source files, the CHARMAP section and the CHARSETID section:

CHARMAP Maps symbolic character names to code points. This section must precede all other sections, and is mandatory.
CHARSETID Maps the code points within the code set to a character set ID. This sections is optional.

The CHARMAP Section

The CHARMAP section of the charmap file maps symbolic character names to code points. All supported code sets have the portable character set as a proper subset. Only symbols that are not defined in the portable character set must be defined in the CHARMAP section. The portable character set consists of the following character symbols (listed by their standardized symbolic names) and encodings:

Symbol Name Code (hexadecimal)
<NUL>                       000
<SOH>>                      001
<STX>                       002
<ETX>                       003
<EOT>                       004
<ENQ>                       005
<ACK>                       006
<alert>                     007
<backspace>                 008
<tab>                       009
<new-line>                  00A
<vertical-tab               00B
<form-feed>                 00C
<carriage-return>           00D
<SO>                        00E
<SI>                        00F
<DLE>                       010
<DC1>                       011
<DC2>                       012
<DC3>                       013
<DC4>                       014
<NAK>                       015
<SYN>                       016
<ETB>                       017
<CAN>                       018
<EM>                        019
<SUB>                       01A
<ESC>                       01B
<IS4>                       01C
<IS3>                       01D
<IS2>                       01E
<IS1>                       01F
<space>                     020
<exclamation-mark>          021
<quotation-mark>            022
<number-sign>               023
<dollar-sign>               024
<percent>                   025
<ampersand>                 026
<apostrophe>                027
<left-parenthesis>          028
<right-parenthesis>         029
<asterisk>                  02A
<plus-sign>                 02B
<comma>                     02C
<hyphen>                    02D
<period>                    02E
<slash>                     02F
<zero>                      030
<one>                       031
<two>                       032
<three>                     033
<four>                      034
<five>                      035
<six>                       036
<seven>                     037
<eight>                     038
<nine>                      039
<colon>                     03A
<semi-colon>                03B
<less-than>                 03C
<equal-sign>                03D
<greater-than>              03E
<question-mark>             03F
<commercial-at>             040
<A>                         041
<B>                         042
<C>                         043
<D>                         044
<E>                         045
<F>                         046
<G>                         047
<H>                         048
<I>                         049
<J>                         04A
<K>                         04B
<L>                         04C
<M>                         04D
<N>                         04E
<O>                         04F
<P>                         050
<Q>                         051
<R>                         052
<S>                         053
<T>                         054
<U>                         055
<V>                         056
<W>                         057
<X>                         058
<Y>                         059
<Z>                         05A
<left-bracket>              05B
<backslash>                 05C
<right-bracket>             05D
<circumflex>                05E
<underscore>                05F
<grave-accent>              060
<a>                         061
<b>                         062
<c>                         063
<d>                         064
<e>                         065
<f>                         066
<g>                         067
<h>                         068
<i>                         069
<j>                         06A
<k>                         06B
<l>                         06C
<m>                         06D
<n>                         06E
<o>                         06F
<p>                         070
<q>                         071
<r>                         072
<s>                         073
<t>                         074
<u>                         075
<v>                         076
<w>                         077
<x>                         078
<y>                         079
<z>                         07A
<left-brace>                07B
<vertical-line>             07C
<right-brace>               07D
<tilde>                     07E
<DEL>                       07F

The CHARMAP section contains the following sections:

Examples

The following is an example of a portion of a possible CHARMAP section from a charmap file:

CHARMAP
<code_set_name>          ISO8859-1
<mb_cur_max>             1
<mb_cur_min>             1
<escape_char>            \
<comment_char>           #
<NUL>                        \x00
<SOH>                        \x01
<STX>                        \x02
<ETX>                        \x03
<EOT>                        \x04
<ENQ>                        \x05
<ACK>                        \x06
<alert>                      \x07
<backspace                   \x09
<tab>                        \x09
<newline>                    \x0a
<vertical-tab>               \x0b
<form-feed>                  \x0c
<carriage-return>            \x0d
END CHARMAP

The CHARSETID Section

The CHARSETID section maps the code points within the code set to a character set ID. The CHARSETID section contains the following sections:

Character set ID mappings are defined by listing symbolic names or code points for symbolic names and their associated character set IDs. The following are possible formats for a character set ID mapping statement:

<character_symbol>                               number
<character_symbol>...<character_symbol>          number
character_constant                               number
character_constant...character_constant          number

The <character_symbol> used must have previously been defined in the CHARMAP section. The character_constant must follow the format described for the CHARMAP section.

Individual character set mappings are accomplished by indicating either the symbolic name (defined in the CHARMAP section or the portable character set) followed by the character set ID, or the code point associated with a symbolic name followed by the character set ID value. Symbolic names and code points must be separated from a character set ID value by one or more blank characters. Ranges of code points can be mapped to a character set ID value by indicating appropriate combinations of symbolic names and code point values as endpoints to the range, separated by ... (ellipsis) to indicate the intermediate characters, and followed by the character set ID for the range. The first endpoint value must be less than or equal to the second end point value.

Examples

The following is an example of a portion of a possible CHARSETID section from a charmap file:

CHARSETID
<space>...<nobreakspace>           0
<tilde>...<y-diaeresis>            1
END CHARSETID

Implementation Specifics

This file format is part of the Base Operating System (BOS) Runtime.

Related Information

Locale Definition Source File Format , Locale Method Source File Format .

For specific information about the locale categories and keywords, see the LC_COLLATE category, LC_CTYPE category, LC_MESSAGES category, LC_MONETARY category, LC_NUMERIC category, and LC_TIME category .

The locale command, localedef command.

For information on converting data between code sets, see Converters Overview for System Management, National Language Support Overview for System Management, Understanding the Character Set Description (charmap) Source File in AIX Version 4.3 System Management Concepts: Operating System and Devices.


[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]