Letter_frequencies Letter_frequencies

Letter frequencies - Definition and Overview

The frequency of letters in text messages has often been studied for use in cryptography, and frequency analysis in particular. An exact analysis of this is not possible, as each person writes slightly differently; however, an approximate order of frequency is ETAOIN SHRDL UCMFG YPWBV KXJQZ.

An analysis based on all the words in the Cambridge Encyclopedia gave a word frequency list quite unlike that which shows up in most lists. From most common to least common, it gave EATIN ORSLH DCMUF PGBYW VKXJZQ. Note that more A's appeared than T's. The author stated that the variance from standard lists could be due to the many foreign words often repeated within articles. Note, too, that the frequency of X is greater in this work than that of J.

This brings up an interesting point. Letter frequencies, like word frequencies, tend to vary, both by writer and by subject. You cannot talk about x-rays without using frequent x's, and you cannot use any letter if on your keyboard it is broken. Letter, digraph, trigraph, and word frequencies can be used to prove or disprove authorship. Things like average word and sentence length is also used. Everyone writes differently. Hemingway is not Faulkner, and so on. A precise average usage could only be gleaned by analyzing usage in, say, a number of different chatrooms, or, say, by covertly checking email, or something of that order using a huge mass of differing inputs.

Relative Frequencies of Text

Relative frequencies of text.
Letter Frequency Letter Frequency
a0.08167n0.06749
b0.01492o0.07507
c0.02782p0.01929
d0.04253q0.00095
e0.12702r0.05987
f0.02228s0.06327
g0.02015t0.09056
h0.06094u0.02758
i0.06966v0.00978
j0.00153w0.02360
k0.00772x0.00150
l0.04025y0.01974
m0.02406z0.00074

Top 10 Beginning of Word Letters

LetterFrequency
t0.1594
a 0.155
i 0.0823
s 0.0775
o 0.0712
c 0.0597
m 0.0426
f 0.0408
p 0.040
w 0.0382

Top 10 End of Word Letters

LetterFrequency
e0.1917
s 0.1435
d 0.0923
t 0.0864
n 0.0786
y 0.0730
r 0.0693
o 0.0467
l 0.0456
f 0.0408

Most Common Digrams (in order)

th, he, in, en, nt, re, er, an, ti, es, on, at, se, nd, or, ar, al, te, co, de, to, ra, et, ed, it, sa, em, ro.

Most Common Trigrams (in order)

the, and, tha, ent, ing, ion, tio, for, nde, has, nce, edt, tis, oft, sth, men

See Also

ETAOIN SHRDLU

Copyright 2009 WordIQ.com - Privacy Policy  :: Terms of Use  :: Contact Us  :: About Us
This article is licensed under the GNU Free Documentation License. It uses material from the this Wikipedia article.