UTF-32 UTF-32

UTF-32 - Definition and Overview

Unicode
series
Unicode
UCS
UTF-7
UTF-8
UTF-16
UTF-32
SCSU
Punycode
Bi-directional text
BOM
Han unification
Unicode and HTML

UTF-32 is a method of encoding Unicode characters, using a fixed amount of 32 bits for each character. It can be regarded as the simplest possible way, as all other Unicode Transformation Formats have variable-length encodings for various characters. However, a notable drawback of UTF-32 is that it requires up to four times the storage space of traditional encodings. This is why it is rarely used for external storage, but only internally when character handling is required to be as efficient as possible.

UCS-4

ISO 10646 defines a 32-bit encoding form called UCS-4, in which each encoded character in the universal character set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

UCS-4 is sufficient to represent all of the Unicode code space, which has 1114112 (= 220+216) code points and therefore requires only up to hexadecimal 10FFFF. Some people consider it wasteful to reserve such a large code space for mapping a relatively small set of code points, so a new encoding form, UTF-32, was proposed. UTF-32 is a subset of UCS-4 that uses 32-bit code values only in the 0 to 10FFFF code space.

UTF-32 and UCS-4

UTF-32 was originally a subset of the UCS-4 standard, but the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes and has removed former provisions for private-use code positions in groups 60 to 7F and in planes E0 to FF.

Accordingly UCS-4 and UTF-32 can be now taken to be identical save that UTF-32 standard has additional Unicode semantics that must be observed.

Copyright 2009 WordIQ.com - Privacy Policy  :: Terms of Use  :: Contact Us  :: About Us
This article is licensed under the GNU Free Documentation License. It uses material from the this Wikipedia article.