Letter Frequency Analysis 203



Appendix A: Letter Frequency Analysis
First of all, to ensure that we know what we are discussing by frequency
count of a symbol in a given text, such as a letter in ciphertext, we mean the
number of occurrences of it therein. Then one looks at frequency distribution
of a given symbol (or group of symbols, such as digrams and trigrams) by
which we mean the ratio of the frequency count of the symbol(s), such as in
a cryptogram, to the total number of symbols in a large body of text under
consideration. For instance, we will concentrate upon English language texts
and look at frequency count of given groups of symbols in a cryptogram in ratio
with a frequency count of those symbols over all English texts. However, it must
be stressed that no table (and there are many of them) can de¬nitively contain
conclusive information on such frequency distributions since no table is capable
of taking into account every kind of English text. Nevertheless, there are some
commonalities which will serve as lampposts to guide us in our cryptographic
journey throughout the text. For instance, the following are the most common
words in order of frequency distribution.


THE, OF, ARE, I, AND, YOU, A, CAN, TO, HE,
HER, THAT, IN, WAS, IS, HAS, IT, HIM, HIS

The following are the most common letters to end a word, in order of fre-
quency distribution, which is an example of positional frequency, wherein the
frequency count of the position of a given letter is taken in ratio with the total
number of letters occurring in that position over all English texts.

E, T, S, D, N, R, Y

However, the frequency distribution of letters at the beginnings of words is
di¬erent. Most English words begin with the letter S, whereas the letter E is
about halfway into the list and X is last. The most common digrams in the
English language, ordered by frequency distribution, are:


TH IN ER RE AN HE AR EN TI
TE AT ON HA OU IT ES ST OR

The most common trigrams are given as follows.



THE AND THA HAT ENT ION FOR TIO HAS

EDT TIS ERS RES TER CON ING MEN THO



© 2003 by CRC Press LLC
204 Appendix A

Consider the following table, where letters are ordered by frequency count in
sets of printer™s type. The row below the letters gives the number of frequency
count of the individual letters.

Table A.1
E T A I N O S H R
12, 000 9, 000 8, 000 8, 000 8, 000 8, 000 8, 000 6, 400 6, 200
D L U C M F W Y G
4, 400 4, 000 3, 400 3, 000 3, 000 2, 500 2, 000 2, 000 1, 700
P B V K Q J X Z
1, 700 1, 600 1, 200 800 500 400 400 200

Table A.1 was originally given by Samuel Morse.A.1 He was primarily con-
cerned with knowing the frequency of letters so that he could give the simplest
codes to the most frequently used letters. However, it should be noted that
Table A.1 gives the frequency of letters in an English text, which is dominated
by a relatively small number of common words. In various tables, the order
of the letters varies, in terms of their frequency distributions. However, E is
always the ¬rst and T is always the second. In general, the letters

E, T, A, I, N, O, S, H, R

the ¬rst row in table A.1, make up more than 70% of English text. In the case of
Morse™s Table A.1 it is greater than 91%! Some tables are better for specialized
situations as that encountered by Morse.
With all this being said, for our purposes, the above tables will provide a
working template. Moreover, the concrete facts, such as the predominance of E
and T, as well as the predominance of the letters displayed above will serve us
well in our trip through the text.

A.1 Samuel Finley Breese Morse (1791“1872) was born on April 27 in Charlestown, Mas-
sachusetts to Reverend Jedidiah Morse and Elizabeth Breese. Jedidiah was also known as
the “father of American geography” and was author of the ¬rst text on the subject “Geog-
raphy Made Easy”, published in 1784, which saw twenty-¬ve editions in his lifetime. Samuel
attended Phillips Academy in Andover, Massachusetts, then entered Yale College in 1805,
graduating in 1810. In 1811, he left for England to study painting, and when he returned
in 1815, he became a well-known wayfaring portrait painter, settling in New York in 1825.
He founded the “National Academy of Design” and served as its ¬rst president from 1826 to
1845. Although, Morse had no formal training in electricity, he nevertheless came to realize
that electrical current pulses could be used to convey information over wires. In 1832, he ¬rst
conceived of a telegraph and had a complete working model by 1837. This was, incidentally,
independently and almost simultaneously discovered by the two British inventors, Sir William
Cook and Sir Charles Wheatstone (see Footnote 1.8 on page 19). They took out a joint patent
in 1837 for the ¬rst electric telegraph put into practical use by the British railway system. By
1838, Samuel had invented the Morse Code, and in 1854, he was granted patent rights by the
U.S. Supreme Court. The ¬rst telegraph line in America was established between Baltimore
and Washington, and the ¬rst message was sent May 24, 1844: “What hath God wrought?”
By 1861, the U.S. was linked coast-to-coast by telegraph. Morse died April 2, 1872 in New
York City.



© 2003 by CRC Press LLC