#, fuzzy msgid "" msgstr "" "Project-Id-Version: man-pages-l10n VERSION\n" "POT-Creation-Date: 2014-07-17 17:57+0900\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" #. type: TH #: man-pages/man7/unicode.7:29 #, no-wrap msgid "UNICODE" msgstr "" #. type: TH #: man-pages/man7/unicode.7:29 #, no-wrap msgid "2014-06-13" msgstr "" #. type: TH #: man-pages/man7/unicode.7:29 #, no-wrap msgid "GNU" msgstr "" #. type: TH #: man-pages/man7/unicode.7:29 #, no-wrap msgid "Linux Programmer's Manual" msgstr "" #. type: SH #: man-pages/man7/unicode.7:30 #, no-wrap msgid "NAME" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:32 msgid "Unicode - universal character set" msgstr "" #. type: SH #: man-pages/man7/unicode.7:32 #, no-wrap msgid "DESCRIPTION" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:40 msgid "" "The international standard ISO 10646 defines the Universal Character Set " "(UCS). UCS contains all characters of all other character set standards. " "It also guarantees \"round-trip compatibility\"; in other words, conversion " "tables can be built such that no information is lost when a string is " "converted from any other encoding to UCS and back." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:62 msgid "" "UCS contains the characters required to represent practically all known " "languages. This includes not only the Latin, Greek, Cyrillic, Hebrew, " "Arabic, Armenian, and Georgian scripts, but also Chinese, Japanese and " "Korean Han ideographs as well as scripts such as Hiragana, Katakana, Hangul, " "Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, " "Malayalam, Thai, Lao, Khmer, Bopomofo, Tibetan, Runic, Ethiopic, Canadian " "Syllabics, Cherokee, Mongolian, Ogham, Myanmar, Sinhala, Thaana, Yi, and " "others. For scripts not yet covered, research on how to best encode them " "for computer usage is still going on and they will be added eventually. " "This might eventually include not only Hieroglyphs and various historic Indo-" "European languages, but even some selected artistic scripts such as Tengwar, " "Cirth, and Klingon. UCS also covers a large number of graphical, " "typographical, mathematical, and scientific symbols, including those " "provided by TeX, Postscript, APL, MS-DOS, MS-Windows, Macintosh, OCR fonts, " "as well as many word processing and publishing systems, and more are being " "added." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:93 msgid "" "The UCS standard (ISO 10646) describes a 31-bit character set architecture " "consisting of 128 24-bit I, each divided into 256 16-bit I " "made up of 256 8-bit I with 256 I positions, one for each " "character. Part 1 of the standard (ISO 10646-1) defines the first 65534 " "code positions (0x0000 to 0xfffd), which form the I (BMP), that is plane 0 in group 0. Part 2 of the standard (ISO 10646-" "2) adds characters to group 0 outside the BMP in several I in the range 0x10000 to 0x10ffff. There are no plans to add " "characters beyond 0x10ffff to the standard, therefore of the entire code " "space, only a small fraction of group 0 will ever be actually used in the " "foreseeable future. The BMP contains all characters found in the commonly " "used other character sets. The supplemental planes added by ISO 10646-2 " "cover only more exotic characters for special scientific, dictionary " "printing, publishing industry, higher-level protocol and enthusiast needs." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:101 msgid "" "The representation of each UCS character as a 2-byte word is referred to as " "the UCS-2 form (only for BMP characters), whereas UCS-4 is the " "representation of each character by a 4-byte word. In addition, there exist " "two encoding forms UTF-8 for backward compatibility with ASCII processing " "software and UTF-16 for the backward-compatible handling of non-BMP " "characters up to 0x10ffff by UCS-2 software." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:107 msgid "" "The UCS characters 0x0000 to 0x007f are identical to those of the classic US-" "ASCII character set and the characters in the range 0x0000 to 0x00ff are " "identical to those in ISO 8859-1 (Latin-1)." msgstr "" #. type: SS #: man-pages/man7/unicode.7:107 #, no-wrap msgid "Combining characters" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:123 msgid "" "Some code points in UCS have been assigned to I. " "These are similar to the nonspacing accent keys on a typewriter. A " "combining character just adds an accent to the previous character. The most " "important accented characters have codes of their own in UCS, however, the " "combining character mechanism allows us to add accents and other diacritical " "marks to any character. The combining characters always follow the " "character which they modify. For example, the German character Umlaut-A " "(\"Latin capital letter A with diaeresis\") can either be represented by the " "precomposed UCS code 0x00c4, or alternatively as the combination of a normal " "\"Latin capital letter A\" followed by a \"combining diaeresis\": 0x0041 " "0x0308." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:127 msgid "" "Combining characters are essential for instance for encoding the Thai script " "or for mathematical typesetting and users of the International Phonetic " "Alphabet." msgstr "" #. type: SS #: man-pages/man7/unicode.7:127 #, no-wrap msgid "Implementation levels" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:132 msgid "" "As not all systems are expected to support advanced mechanisms like " "combining characters, ISO 10646-1 specifies the following three " "I of UCS:" msgstr "" #. type: TP #: man-pages/man7/unicode.7:132 #, no-wrap msgid "Level 1" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:138 msgid "" "Combining characters and Hangul Jamo (a variant encoding of the Korean " "script, where a Hangul syllable glyph is coded as a triplet or pair of vovel/" "consonant codes) are not supported." msgstr "" #. type: TP #: man-pages/man7/unicode.7:138 #, no-wrap msgid "Level 2" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:143 msgid "" "In addition to level 1, combining characters are now allowed for some " "languages where they are essential (e.g., Thai, Lao, Hebrew, Arabic, " "Devanagari, Malayalam)." msgstr "" #. type: TP #: man-pages/man7/unicode.7:143 #, no-wrap msgid "Level 3" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:146 msgid "All UCS characters are supported." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:159 msgid "" "The Unicode 3.0 Standard published by the Unicode Consortium contains " "exactly the UCS Basic Multilingual Plane at implementation level 3, as " "described in ISO 10646-1:2000. Unicode 3.1 added the supplemental planes of " "ISO 10646-2. The Unicode standard and technical reports published by the " "Unicode Consortium provide much additional information on the semantics and " "recommended usages of various characters. They provide guidelines and " "algorithms for editing, sorting, comparing, normalizing, converting, and " "displaying Unicode strings." msgstr "" #. type: SS #: man-pages/man7/unicode.7:159 #, no-wrap msgid "Unicode under Linux" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:169 msgid "" "Under GNU/Linux, the C type I is a signed 32-bit integer type. Its " "values are always interpreted by the C library as UCS code values (in all " "locales), a convention that is signaled by the GNU C library to applications " "by defining the constant B<__STDC_ISO_10646__> as specified in the ISO C99 " "standard." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:178 msgid "" "UCS/Unicode can be used just like ASCII in input/output streams, terminal " "communication, plaintext files, filenames, and environment variables in the " "ASCII compatible UTF-8 multibyte encoding. To signal the use of UTF-8 as " "the character encoding to all applications, a suitable I has to be " "selected via environment variables (e.g., \"LANG=en_GB.UTF-8\")." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:193 msgid "" "The B function returns the name of the selected " "encoding. Library functions such as B(3) and B(3) can " "be used to transform the internal I characters and strings into the " "system character encoding and back and B(3) tells, how many " "positions (0\\(en2) the cursor is advanced by the output of a character." msgstr "" #. type: SS #: man-pages/man7/unicode.7:194 #, no-wrap msgid "Private area" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:208 msgid "" "In the Basic Multilingual Plane, the range 0xe000 to 0xf8ff will never be " "assigned to any characters by the standard and is reserved for private usage." " For the Linux community, this private area has been subdivided further " "into the range 0xe000 to 0xefff which can be used individually by any end-" "user and the Linux zone in the range 0xf000 to 0xf8ff where extensions are " "coordinated among all Linux users. The registry of the characters assigned " "to the Linux zone is maintained by LANANA and the registry itself is " "I in the Linux kernel sources." msgstr "" #. type: SS #: man-pages/man7/unicode.7:208 #, no-wrap msgid "Literature" msgstr "" #. type: IP #: man-pages/man7/unicode.7:209 man-pages/man7/unicode.7:219 man-pages/man7/unicode.7:223 man-pages/man7/unicode.7:233 man-pages/man7/unicode.7:239 man-pages/man7/unicode.7:245 #, no-wrap msgid "*" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:214 msgid "" "Information technology \\(em Universal Multiple-Octet Coded Character Set " "(UCS) \\(em Part 1: Architecture and Basic Multilingual Plane. " "International Standard ISO/IEC 10646-1, International Organization for " "Standardization, Geneva, 2000." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:219 msgid "" "This is the official specification of UCS . Available from E<.UR http://www." "iso.ch/> E<.UE .>" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:223 msgid "" "The Unicode Standard, Version 3.0. The Unicode Consortium, Addison-Wesley, " "Reading, MA, 2000, ISBN 0-201-61633-5." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:226 msgid "" "S. Harbison, G. Steele. C: A Reference Manual. Fourth edition, Prentice " "Hall, Englewood Cliffs, 1995, ISBN 0-13-326224-3." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:233 msgid "" "A good reference book about the C programming language. The fourth edition " "covers the 1994 Amendment 1 to the ISO C90 standard, which adds a large " "number of new C library functions for handling wide and multibyte character " "encodings, but it does not yet cover ISO C99, which improved wide and " "multibyte character support even further." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:235 msgid "Unicode Technical Reports." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:238 msgid "E<.UR http://www.unicode.org\\:/reports/> E<.UE>" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:241 msgid "Markus Kuhn: UTF-8 and Unicode FAQ for UNIX/Linux." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:244 msgid "E<.UR http://www.cl.cam.ac.uk\\:/~mgk25\\:/unicode.html> E<.UE>" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:247 msgid "Bruno Haible: Unicode HOWTO." msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:250 msgid "E<.UR http://www.tldp.org\\:/HOWTO\\:/Unicode-HOWTO.html> E<.UE>" msgstr "" #. .SH AUTHOR #. Markus Kuhn #. type: SH #: man-pages/man7/unicode.7:253 #, no-wrap msgid "SEE ALSO" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:258 msgid "B(1), B(3), B(7), B(7)" msgstr "" #. type: SH #: man-pages/man7/unicode.7:258 #, no-wrap msgid "COLOPHON" msgstr "" #. type: Plain text #: man-pages/man7/unicode.7:266 msgid "" "This page is part of release 3.70 of the Linux I project. A " "description of the project, information about reporting bugs, and the latest " "version of this page, can be found at \\%http://www.kernel.org/doc/man-pages/" "." msgstr ""