#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: man-pages-l10n VERSION\n"
"POT-Creation-Date: 2014-07-17 12:07+0900\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#. type: TH
#: man-pages/man7/charsets.7:21
#, no-wrap
msgid "CHARSETS"
msgstr ""

#. type: TH
#: man-pages/man7/charsets.7:21
#, no-wrap
msgid "2014-05-28"
msgstr ""

#. type: TH
#: man-pages/man7/charsets.7:21
#, no-wrap
msgid "Linux"
msgstr ""

#. type: TH
#: man-pages/man7/charsets.7:21
#, no-wrap
msgid "Linux Programmer's Manual"
msgstr ""

#. type: SH
#: man-pages/man7/charsets.7:22
#, no-wrap
msgid "NAME"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:24
msgid ""
"charsets - programmer's view of character sets and internationalization"
msgstr ""

#. type: SH
#: man-pages/man7/charsets.7:24
#, no-wrap
msgid "DESCRIPTION"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:31
msgid ""
"Linux is an international operating system.  Various of its utilities and "
"device drivers (including the console driver) support multilingual character "
"sets including Latin-alphabet letters with diacritical marks, accents, "
"ligatures, and entire non-Latin alphabets including Greek, Cyrillic, Arabic, "
"and Hebrew."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:40
msgid ""
"This manual page presents a programmer's-eye view of different character-set "
"standards and how they fit together on Linux.  Standards discussed include "
"ASCII, ISO 8859, KOI8-R, Unicode, ISO 2022 and ISO 4873.  The primary "
"emphasis is on character sets actually used as locale character sets, not "
"the myriad others that can be found in data from other systems."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:40
#, no-wrap
msgid "ASCII"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:44
msgid ""
"ASCII (American Standard Code For Information Interchange) is the original 7-"
"bit character set, originally designed for American English.  It is "
"currently described by the ECMA-6 standard."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:53
msgid ""
"Various ASCII variants replacing the dollar sign with other currency symbols "
"and replacing punctuation with non-English alphabetic characters to cover "
"German, French, Spanish, and others in 7 bits exist.  All are deprecated; "
"glibc doesn't support locales whose character sets aren't true supersets of "
"ASCII.  (These sets are also known as ISO-646, a close relative of ASCII "
"that permitted replacing these characters.)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:56
msgid ""
"As Linux was written for hardware designed in the US, it natively supports "
"ASCII."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:56
#, no-wrap
msgid "ISO 8859"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:60
msgid ""
"ISO 8859 is a series of 15 8-bit character sets all of which have US ASCII "
"in their low (7-bit) half, invisible control characters in positions 128 to "
"159, and 96 fixed-width graphics in positions 160-255."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:65
msgid ""
"Of these, the most important is ISO 8859-1 (Latin-1).  It is natively "
"supported in the Linux console driver, fairly well supported in X11R6, and "
"is the base character set of HTML."
msgstr ""

#.  // some distributions still have the deprecated consolechars
#. type: Plain text
#: man-pages/man7/charsets.7:73
msgid ""
"Console support for the other 8859 character sets is available under Linux "
"through user-mode utilities (such as B<setfont>(8))  that modify keyboard "
"bindings and the EGA graphics table and employ the \"user mapping\" font "
"table in the console driver."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:75
msgid "Here are brief descriptions of each set:"
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:75
#, no-wrap
msgid "8859-1 (Latin-1)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:83
msgid ""
"Latin-1 covers most Western European languages such as Albanian, Catalan, "
"Danish, Dutch, English, Faroese, Finnish, French, German, Galician, Irish, "
"Icelandic, Italian, Norwegian, Portuguese, Spanish, and Swedish.  The lack "
"of the ligatures Dutch ij, French oe and old-style ,,German`` quotation "
"marks is considered tolerable."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:83
#, no-wrap
msgid "8859-2 (Latin-2)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:88
msgid ""
"Latin-2 supports most Latin-written Slavic and Central European languages: "
"Croatian, Czech, German, Hungarian, Polish, Romanian, Slovak, and Slovene."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:88
#, no-wrap
msgid "8859-3 (Latin-3)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:92
msgid ""
"Latin-3 is popular with authors of Esperanto, Galician, and Maltese.  "
"(Turkish is now written with 8859-9 instead.)"
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:92
#, no-wrap
msgid "8859-4 (Latin-4)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:96
msgid ""
"Latin-4 introduced letters for Estonian, Latvian, and Lithuanian.  It is "
"essentially obsolete; see 8859-10 (Latin-6) and 8859-13 (Latin-7)."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:96
#, no-wrap
msgid "8859-5"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:104
msgid ""
"Cyrillic letters supporting Bulgarian, Byelorussian, Macedonian, Russian, "
"Serbian, and Ukrainian.  Ukrainians read the letter \"ghe\" with downstroke "
"as \"heh\" and would need a ghe with upstroke to write a correct ghe.  See "
"the discussion of KOI8-R below."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:104
#, no-wrap
msgid "8859-6"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:110
msgid ""
"Supports Arabic.  The 8859-6 glyph table is a fixed font of separate letter "
"forms, but a proper display engine should combine these using the proper "
"initial, medial, and final forms."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:110
#, no-wrap
msgid "8859-7"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:113
msgid "Supports Modern Greek."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:113
#, no-wrap
msgid "8859-8"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:119
msgid ""
"Supports modern Hebrew without niqud (punctuation signs).  Niqud and full-"
"fledged Biblical Hebrew are outside the scope of this character set; under "
"Linux, UTF-8 is the preferred encoding for these."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:119
#, no-wrap
msgid "8859-9 (Latin-5)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:123
msgid ""
"This is a variant of Latin-1 that replaces Icelandic letters with Turkish "
"ones."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:123
#, no-wrap
msgid "8859-10 (Latin-6)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:130
msgid ""
"Latin 6 adds the last Inuit (Greenlandic) and Sami (Lappish) letters that "
"were missing in Latin 4 to cover the entire Nordic area.  RFC 1345 listed a "
"preliminary and different \"latin6\".  Skolt Sami still needs a few more "
"accents than these."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:130
#, no-wrap
msgid "8859-11"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:135
msgid ""
"This exists only as a rejected draft standard.  The draft standard was "
"identical to TIS-620, which is used under Linux for Thai."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:135
#, no-wrap
msgid "8859-12"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:143
msgid ""
"This set does not exist.  While Vietnamese has been suggested for this "
"space, it does not fit within the 96 (noncombining) characters ISO 8859 "
"offers.  UTF-8 is the preferred character set for Vietnamese use under Linux."
""
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:143
#, no-wrap
msgid "8859-13 (Latin-7)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:147
msgid ""
"Supports the Baltic Rim languages; in particular, it includes Latvian "
"characters not found in Latin-4."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:147
#, no-wrap
msgid "8859-14 (Latin-8)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:151
msgid ""
"This is the Celtic character set, covering Gaelic and Welsh.  This charset "
"also contains the dotted characters needed for Old Irish."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:151
#, no-wrap
msgid "8859-15 (Latin-9)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:155
msgid ""
"This adds the Euro sign and French and Finnish letters that were missing in "
"Latin-1."
msgstr ""

#. type: TP
#: man-pages/man7/charsets.7:155
#, no-wrap
msgid "8859-16 (Latin-10)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:159
msgid ""
"This set covers many of the languages covered by 8859-2, and supports "
"Romanian more completely than that set does."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:159
#, no-wrap
msgid "KOI8-R"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:168
msgid ""
"KOI8-R is a non-ISO character set popular in Russia.  The lower half is US "
"ASCII; the upper is a Cyrillic character set somewhat better designed than "
"ISO 8859-5.  KOI8-U is a common character set, based off KOI8-R, that has "
"better support for Ukrainian.  Neither of these sets are ISO-2022 "
"compatible, unlike the ISO-8859 series."
msgstr ""

#.  Thanks to Tomohiro KUBOTA for the following sections about
#.  national standards.
#. type: Plain text
#: man-pages/man7/charsets.7:174
msgid ""
"Console support for KOI8-R is available under Linux through user-mode "
"utilities that modify keyboard bindings and the EGA graphics table, and "
"employ the \"user mapping\" font table in the console driver."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:174
#, no-wrap
msgid "JIS X 0208"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:191
msgid ""
"JIS X 0208 is a Japanese national standard character set.  Though there are "
"some more Japanese national standard character sets (like JIS X 0201, JIS X "
"0212, and JIS X 0213), this is the most important one.  Characters are "
"mapped into a 94x94 two-byte matrix, whose each byte is in the range 0x21-"
"0x7e.  Note that JIS X 0208 is a character set, not an encoding.  This means "
"that JIS X 0208 itself is not used for expressing text data.  JIS X 0208 is "
"used as a component to construct encodings such as EUC-JP, Shift_JIS, and "
"ISO-2022-JP.  EUC-JP is the most important encoding for Linux and includes "
"US ASCII and JIS X 0208.  In EUC-JP, JIS X 0208 characters are expressed in "
"two bytes, each of which is the JIS X 0208 code plus 0x80."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:191
#, no-wrap
msgid "KS X 1001"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:200
msgid ""
"KS X 1001 is a Korean national standard character set.  Just as JIS X 0208, "
"characters are mapped into a 94x94 two-byte matrix.  KS X 1001 is used like "
"JIS X 0208, as a component to construct encodings such as EUC-KR, Johab, and "
"ISO-2022-KR.  EUC-KR is the most important encoding for Linux and includes "
"US ASCII and KS X 1001.  KS C 5601 is an older name for KS X 1001."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:200
#, no-wrap
msgid "GB 2312"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:209
msgid ""
"GB 2312 is a mainland Chinese national standard character set used to "
"express simplified Chinese.  Just like JIS X 0208, characters are mapped "
"into a 94x94 two-byte matrix used to construct EUC-CN.  EUC-CN is the most "
"important encoding for Linux and includes US ASCII and GB 2312.  Note that "
"EUC-CN is often called as GB, GB 2312, or CN-GB."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:209
#, no-wrap
msgid "Big5"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:218
msgid ""
"Big5 is a popular character set in Taiwan to express traditional Chinese.  "
"(Big5 is both a character set and an encoding.)  It is a superset of US "
"ASCII.  Non-ASCII characters are expressed in two bytes.  Bytes 0xa1-0xfe "
"are used as leading bytes for two-byte characters.  Big5 and its extension "
"is widely used in Taiwan and Hong Kong.  It is not ISO 2022-compliant."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:218
#, no-wrap
msgid "TIS 620"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:225
msgid ""
"TIS 620 is a Thai national standard character set and a superset of US ASCII."
"  Like ISO 8859 series, Thai characters are mapped into 0xa1-0xfe.  TIS 620 "
"is the only commonly used character set under Linux besides UTF-8 to have "
"combining characters."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:225
#, no-wrap
msgid "UNICODE"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:237
msgid ""
"Unicode (ISO 10646) is a standard which aims to unambiguously represent "
"every character in every human language.  Unicode's structure permits 20.1 "
"bits to encode every character.  Since most computers don't include 20.1-bit "
"integers, Unicode is usually encoded as 32-bit integers internally and "
"either a series of 16-bit integers (UTF-16) (needing two 16-bit integers "
"only when encoding certain rare characters) or a series of 8-bit bytes (UTF-"
"8).  Information on Unicode is available at E<.UR http://www.unicode.org> E<."
"UE .>"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:244
msgid ""
"Linux represents Unicode using the 8-bit Unicode Transformation Format (UTF-"
"8).  UTF-8 is a variable length encoding of Unicode.  It uses 1 byte to code "
"7 bits, 2 bytes for 11 bits, 3 bytes for 16 bits, 4 bytes for 21 bits, 5 "
"bytes for 26 bits, 6 bytes for 31 bits."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:252
msgid ""
"Let 0,1,x stand for a zero, one, or arbitrary bit.  A byte 0xxxxxxx stands "
"for the Unicode 00000000 0xxxxxxx which codes the same symbol as the ASCII "
"0xxxxxxx.  Thus, ASCII goes unchanged into UTF-8, and people using only "
"ASCII do not notice any change: not in code, and not in file size."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:260
msgid ""
"A byte 110xxxxx is the start of a 2-byte code, and 110xxxxx 10yyyyyy is "
"assembled into 00000xxx xxyyyyyy.  A byte 1110xxxx is the start of a 3-byte "
"code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled into xxxxyyyy yyzzzzzz.  "
"(When UTF-8 is used to code the 31-bit ISO 10646 then this progression "
"continues up to 6-byte codes.)"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:274
msgid ""
"For most people who use ISO-8859 character sets, this means that the "
"characters outside of ASCII are now coded with two bytes.  This tends to "
"expand ordinary text files by only one or two percent.  For Russian or Greek "
"users, this expands ordinary text files by 100%, since text in those "
"languages is mostly outside of ASCII.  For Japanese users this means that "
"the 16-bit codes now in common use will take three bytes.  While there are "
"algorithmic conversions from some character sets (especially ISO-8859-1) to "
"Unicode, general conversion requires carrying around conversion tables, "
"which can be quite large for 16-bit codes."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:281
msgid ""
"Note that UTF-8 is self-synchronizing: 10xxxxxx is a tail, any other byte is "
"the head of a code.  Note that the only way ASCII bytes occur in a UTF-8 "
"stream, is as themselves.  In particular, there are no embedded NULs "
"(\\(aq\\e0\\(aq) or \\(aq/\\(aqs that form part of some larger code."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:286
msgid ""
"Since ASCII, and, in particular, NUL and \\(aq/\\(aq, are unchanged, the "
"kernel does not notice that UTF-8 is being used.  It does not care at all "
"what the bytes it is handling stand for."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:295
msgid ""
"Rendering of Unicode data streams is typically handled through \"subfont\" "
"tables which map a subset of Unicode to glyphs.  Internally the kernel uses "
"Unicode to describe the subfont loaded in video RAM.  This means that in UTF-"
"8 mode one can use a character set with 512 different symbols.  This is not "
"enough for Japanese, Chinese and Korean, but it is enough for most other "
"purposes."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:300
msgid ""
"At the current time, the console driver does not handle combining characters."
"  So Thai, Sioux and any other script needing combining characters can't be "
"handled on the console."
msgstr ""

#. type: SS
#: man-pages/man7/charsets.7:300
#, no-wrap
msgid "ISO 2022 and ISO 4873"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:307
msgid ""
"The ISO 2022 and 4873 standards describe a font-control model based on VT100 "
"practice.  This model is (partially) supported by the Linux kernel and by "
"B<xterm>(1).  It is popular in Japan and Korea."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:317
msgid ""
"There are 4 graphic character sets, called G0, G1, G2, and G3, and one of "
"them is the current character set for codes with high bit zero (initially "
"G0), and one of them is the current character set for codes with high bit "
"one (initially G1).  Each graphic character set has 94 or 96 characters, and "
"is essentially a 7-bit character set.  It uses codes either 040-0177 (041-"
"0176) or 0240-0377 (0241-0376).  G0 always has size 94 and uses codes 041-"
"0176."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:328
msgid ""
"Switching between character sets is done using the shift functions B<^N> (SO "
"or LS1), B<^O> (SI or LS0), ESC n (LS2), ESC o (LS3), ESC N (SS2), ESC O "
"(SS3), ESC ~ (LS1R), ESC } (LS2R), ESC | (LS3R).  The function LSI<n> makes "
"character set GI<n> the current one for codes with high bit zero.  The "
"function LSI<n>R makes character set GI<n> the current one for codes with "
"high bit one.  The function SSI<n> makes character set GI<n> (I<n>=2 or 3)  "
"the current one for the next character only (regardless of the value of its "
"high order bit)."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:340
msgid ""
"A 94-character set is designated as GI<n> character set by an escape "
"sequence ESC ( xx (for G0), ESC ) xx (for G1), ESC * xx (for G2), ESC + xx "
"(for G3), where xx is a symbol or a pair of symbols found in the ISO 2375 "
"International Register of Coded Character Sets.  For example, ESC ( @ "
"selects the ISO 646 character set as G0, ESC ( A selects the UK standard "
"character set (with pound instead of number sign), ESC ( B selects ASCII "
"(with dollar instead of currency sign), ESC ( M selects a character set for "
"African languages, ESC ( ! A selects the Cuban character set, and so on."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:345
msgid ""
"A 96-character set is designated as GI<n> character set by an escape "
"sequence ESC - xx (for G1), ESC . xx (for G2)  or ESC / xx (for G3).  For "
"example, ESC - G selects the Hebrew alphabet as G1."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:352
msgid ""
"A multibyte character set is designated as GI<n> character set by an escape "
"sequence ESC $ xx or ESC $ ( xx (for G0), ESC $ ) xx (for G1), ESC $ * xx "
"(for G2), ESC $ + xx (for G3).  For example, ESC $ ( C selects the Korean "
"character set for G0.  The Japanese character set selected by ESC $ B has a "
"more recent version selected by ESC & @ ESC $ B."
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:359
msgid ""
"ISO 4873 stipulates a narrower use of character sets, where G0 is fixed "
"(always ASCII), so that G1, G2 and G3 can be invoked only for codes with the "
"high order bit set.  In particular, B<^N> and B<^O> are not used anymore, "
"ESC ( xx can be used only with xx=B, and ESC ) xx, ESC * xx, ESC + xx are "
"equivalent to ESC - xx, ESC . xx, ESC / xx, respectively."
msgstr ""

#. type: SH
#: man-pages/man7/charsets.7:359
#, no-wrap
msgid "SEE ALSO"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:367
msgid ""
"B<console>(4), B<console_codes>(4), B<console_ioctl>(4), B<ascii>(7), "
"B<iso_8859-1>(7), B<unicode>(7), B<utf-8>(7)"
msgstr ""

#. type: SH
#: man-pages/man7/charsets.7:367
#, no-wrap
msgid "COLOPHON"
msgstr ""

#. type: Plain text
#: man-pages/man7/charsets.7:375
msgid ""
"This page is part of release 3.68 of the Linux I<man-pages> project.  A "
"description of the project, information about reporting bugs, and the latest "
"version of this page, can be found at \\%http://www.kernel.org/doc/man-pages/"
"."
msgstr ""