![]() |
|
|
| |
|
||||
Unicode reserves 1,114,112 (= 220 + 216) code points, and currently assigns characters to more than 96,000 of those code points. The first 256 codes precisely match those of ISO 8859-1, the most popular 8-bit character encoding in the "Western world"; as a result, the first 128 characters are also identical to ASCII. The Unicode code space for characters is divided into 17 "planes" and each plane has 65,536 (= 216) code points. There is much controversy among CJK specialists, particularly Japanese ones, about the desirability and technical merit of the "Han unification" process used to map multiple Chinese and Japanese character sets into a single set of unified characters. (See Chinese character encoding) The cap of ~220 code points exists in order to maintain compatibility with the UTF-16 encoding, which can only address that range (see below). There is only ten percent current utilization of the Unicode code space. Furthermore, ranges of characters have been tentatively blocked out for every known unencoded script (see [1] (http://www.unicode.org/roadmaps/)), and while Unicode may need another plane for ideographic characters, there are ten planes that could only be needed if previously unknown scripts with tens of thousands of characters are discovered. This ~20 bit limit is unlikely to be reached in the near future.
Basic Multilingual PlaneThe first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.
Several scripts are expected to be included in the next revision of Unicode:
Several other scripts are proposed for inclusion in the BMP, including: Supplementary Multilingual PlanePlane 1, the Supplementary Multilingual Plane, (SMP) is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols. As of Unicode 4.01, Plane One includes the following scripts:
Several scripts are expected to be included in the next revision of Unicode:
Many other scripts are proposed for inclusion in Plane One, including:
Private Use AreaA Private Use Area is one of several ranges which are reserved for private use. For this range, the Unicode standard does not specify any characters. The Basic Multilingual Plane includes a Private Use Area in the range U+E000–U+F8FF (57344–63743), and Plane Fifteen (U+F0000–U+FFFFF) and Plane Sixteen (U+100000–10FFFF) are completely reserved for private use as well. The use of the Private Use Area was a concept inherited from certain Asian encoding systems. These systems used private use areas to encode Japanese Gaiji (rare personal name characters) in application specific ways. Similarily the ConScript Unicode Registry aims to coördinate the mapping of scripts not yet encoded in or rejected by Unicode in the PUAs. Other planesPlane 2, the Supplementary Ideographic Plane (SIP), is used for about 40,000 rare Chinese characters that are mostly historic, although there are some modern ones. Plane 14, the Supplementary Special-purpose Plane (SSP), currently contains some non-recommended language tag characters and some variation selection characters.
|
||
|
|
|
|
|
|
Copyright 2008 WordIQ.com - Privacy Policy
::
Terms of Use
:: Contact Us
:: About Us This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Mapping of Unicode characters". |