Standard_Compression_Scheme_for_Unicode Standard_Compression_Scheme_for_Unicode

Standard Compression Scheme for Unicode - Definition and Overview

Related Words: Abstract, Agglutination, Bottleneck, Cervix, Clumping, Clustering, Concentration, Concretion, Condensation, Constriction, Contraction, Crush
Unicode
series
Unicode
UCS
UTF-7
UTF-8
UTF-16
UTF-32
SCSU
Punycode
Bi-directional text
BOM
Han unification
Unicode and HTML


The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard to reduce the number of bytes needed to represent text, especially if that text uses mostly characters from a small number of Unicode blocks. It does so by dynamically mapping the values in the range 128-255 to blocks of 128 characters. Since most alphabets are in 128 contiguous Unicode codepoints, this allows for 1 byte per character (plus overhead) encoding for many text files. SCSU will also switch to UTF-16 internally to handle non-alphabetic languages.

SCSU is not a resounding success. Few places need to compress enough Unicode text to make it worth using a poorly supported compression scheme. Treated purely as a compression format, it's inferior to most commonly used compression programs for texts over a few kilobytes. It can be used as a text encoding, but it's very hard to handle internally, and the percentage savings between SCSU and UTF-16 or UTF-8 drops after external compression, dramatically in the case of bzip2 and other modern compression schemes. It does have the advantage that SCSU can compress texts that are only a few characters long, whereas most full-scale compressors need a few kilobytes of data to overcome the overhead.

Reuters, the organization that floated the first draft of SCSU, is believed to use SCSU internally.

External links

Example Usage of Compression

MikeLomonosov: SQL Server 2008 R2: Unicode Compression - Компрессия данных появилась ещё в SQL Server 2008, но в версии http://ow.ly/163UBx
misterfonzie: @Lybbe @vkoser I'd need Compression shirts, a girdle, duct tape, and liposuction. Maybe then I'd look OK
orbrey: Trying we7. It's pretty good. Interesting Compression, they've moved the range down a bit. Bass is good but treble very squashed.
Copyright 2009 WordIQ.com - Privacy Policy  :: Terms of Use  :: Contact Us  :: About Us
This article is licensed under the GNU Free Documentation License. It uses material from the this Wikipedia article.