---[  Phrack Magazine   Volume 7, Issue 51 September 01, 1997, article 13 of 17
-------------------------[  Monoalphabetic Cryptanalysis (Cyphers, Part One)
--------[  Jeff Thompson aka 'Mythrandir' 
Written for Phrack and completed on Sunday, August 31st, 1997.
---------
First a quick hello to all of those I met at DefCon this year.  It was 
incredible fun to finally put faces to many of the people I have been talking 
with for some time.  It was truly was a treat to meet so many others who are 
alive with the spirit of discovery.  
----------
This is the first in a series of articles on Cryptology that I am writing.  
The goals of these articles will be to attempt to convey some of the excitement
and fun of cyphers.  A topic of much discussion in regards to cryptography 
currently, is about computer based cyphers such as DES, RSA, and the PGP 
implementation.  I will not be discussing these.  Rather, these articles will 
cover what I will term classical cryptology.  Or cryptology as it existed 
before fast number crunching machines came into existance.  These are the sorts
of cyphers which interested cryptographers throughout time and continue to be 
found even to this very day.  Even today, companies are producing software 
whose encryption methods are attackable.  You will find these commonly among 
password protection schemes for software programs.  Through the course of these
articles I will explain in practical terms several common cypher types and 
various implementations of them as well as cryptanalytic techniques for 
breaking these cyphers.
Creating cyphers is fun and all, but the real excitement and often times tedium
is found in Cryptanalysis.  Many of the ideas presented in these articles will 
based on three sources.  The following two books: The Codebreakers by David 
Kahn (ISBN: 0-684-83130-9) and Decrypted Secrets by F.L. Bauer 
(ISBN: 3-540-60418-9).  Both authors have put together wonderful books which 
both cover the history and methods of Cryptology.  Do yourself and the authors 
a favor and purchase these books.  You will be very pleased with the lot.  
Finally, a miniscule amount of these articles will be written based on my own 
personal experience.  
The fun is in the journey and I welcome you on what is certain to be an 
interesting trip.  Please feel free to raise questions, engage me in 
discussions, correct me, or simply offer suggestions at [email protected].
Please be patient with me as I am traveling extensively currently, and may be 
away from the computer at length occasionally.  
Out the door and into the wild...
--Monoalphabetic Cyphers
Monoalphabetic cyphers are often currently found in simple cryptograms in books
and magazines.  These are just simple substitution cyphers.  This does not 
mean that they are always simple for the beginning amateur to solve.
Three common monoalphabetic cyphers which are used are substitution, cyclical, 
and keyed cyphers.
-Substitution Cyphers 
By taking an alphabet and replacing each letter with another letter in a 
unique fashion you create a simple monoalphabetic cypher.  
Plaintext Alphabet	A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Cypher Alphabet		Z I K M O Q S U W Y A C E B D F H J L N P R T V X G
Plaintext Message
The blue cow will rise during the second moon from the west field.
Cyphertext Message
nuo icpo kdt twcc jwlo mpjwbs nuo lokdbm eddb qjde nuo toln qwocm.
-Cyclical Cyphers
By taking an alphabet and aligning it with a rotated alphabet you get a 
cyclical cypher.  For example:
Plaintext Alphabet	A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Cypher Alphabet		N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
Indeed, you may recognize this cypher as a ROT13 which is commonly used on 
news groups to obscure messages.
-Keyed Cypher
Another way to create a monoalphabetic cypher is to choose a keyword or phrase 
as the beginning of the cypher alphabet. Usually, only the unique letters from 
the phrase are used in order to make sure the plaintext to cyphertext behaves 
in a one to one fashion.
For example:
Plaintext Alphabet:	A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Cypher Alphabet		L E T O S H D G F W A R B C I J K M N P Q U V X Y Z
The passphrase in this cypher is "Let loose the dogs of war"  The advantage of 
such a system is that the encryption method is easy to remember.  Also, a 
method of key change can be created without ever having to distribute the keys.
For example, one could use the 4 words at a time of some piece of literature.  
Every message could use the next four words.  Indeed, this change could occur 
more frequently, but that is a subject for another article. 
-Bipartite Substitution
Bipartite substition is the use of symbol pairs to represent plaintext.  Later 
we will see that this sort of substitution lends itself to be easily made more 
difficult to analyze. Two examples of this are:
  1 2 3 4 5	 		 		               A B C D E
1 A B C D E						     A A B C D E 
2 F G H I J						     B F G H I J
3 K L M N O						     C K L M N O
4 P Q R S T				or		     D P Q R S T
5 U V W X Y						     E U V W X Y
6 Z 0 1 2 3						     F Z 0 1 2 3
7 4 5 6 7 8						     G 4 5 6 7 8
9 9 . - ? ,						     H 9 . - ? ,
Obviously, the letters do not need to be placed in this order as their solutions 
would not be that difficult to guess.
--Cryptanalysis
Previously we created a cyphered message:
nuo icpo kdt twcc jwlo mpjwbs nuo lokdbm eddb qjde nuo toln qwocm.
If one were to receive this message, figuring out its contents might seem 
fairly daunting. However,  there are some very good methods for recovering the 
plaintext from the cyphertext. The following discussion will work under the 
assumption that we know the cyphers with which we are dealing are 
monoalphabetics.
-Frequency Analysis
The first method we will use is frequency analysis.  Natural languages have 
many qualities which are very useful for the analysis of cyphertext.  Languages
have letters which occur more commonly in text, collections of letters which 
are more frequent,  patterns in words, and other related letter occurances.  
Counting up the occurances of letters we find that there are...
letter	occurances
b		3
c		4
d		5
e		2
i		1
j		3
k		2
l		3
m		3
n		4
o		8
p		2
q		2
s		1
t		3
u		3
w		4
The order of greatest frequency to least is:
 8   5     4          3           2       1
{o} {d} {c n w} {b j l m t u} {e k p q} {i s} 
If this sort of analysis were run on many volumes of english you would find that
a pattern would emerge.  It would look like this:
{e} {t} {a o i n} {s r h} {l d} {c u m f} {p g w y b} {v k} {x j q z}
You will notice an immediate correlation between e and o.  However, for the 
rest of the letters we can not be very certain.  In fact, we can not be very 
certain about e either.
Since this text is short it is helpful to take a look at some of the other 
behaviors of this text.
Counting up the first, second, third, and last letters of the words in this 
text we find the following frequencies:
First Letter in word		Occurances
e					1
i					1
j					1
k					1
l					1
m					1
n					3
q					2
t					2
Order:
n q t e i j k l m
Second letter in word		Occurances
c					1
d					2
i					1
n					1
o					2
p					1
u					3
w					3
Order:
u w d o c i n p
Third letter in word		Occurances
c					1
d					2
i					1
k					1
l					2
o					4
p					1
t					1
u					1
Order:
o d l c i k p t u
Last letter in word		Occurances
b					1
c					1
e					1
m					1
n					1
o					5
s					1
t					1
English frequency for first letter:
t a o m h w 
Second letter:
h o e i a u 
Third letter:
e s a r n i
Last letter:
e t s d n r 
Noticing the higher frequency count for 'o' in the third and last letters of 
words in addition to its absence as a first letter in any words gives us strong
reason to believe that 'o' substitutes for 'e'.  This is the first wedge into 
solving this cypher.
However, do not be fooled by the apparent strengths of frequency analysis.  
Entire books have been written without the use of some letters in the English 
alphabet.  For instance The Great Gatsby was written without using the letter 
'e' in one word of the book.
Other items to analyze in cyphertext documents is the appearance of letters in 
groups.  These are called bigrams and trigrams.  For example, 'th' is a very 
common letter pairing in the english language.  Also, as no surprise 'the' is 
a very common trigram.  Analysis of english documents will find these results 
for you.
So now that that we have developed a simple way of starting to attack cyphers 
lets examine a few ways to make them more difficult to break.
--Strengthening Cyphers
-Removing word and sentence boundaries
A simple way to complicate decypherment of a cyphertext is to remove all 
spacing and punctuation.  This makes it more difficult to perform a frequency 
analysis on letter positions.  However, it is possible to make reasonable 
guesses as to word positions once yoy begin to study the document.  Another 
method is to break the cyphertext into fixed blocks.  For example after every 
four letters a space is placed.
The previous cypher text would appear as this:
nuoicpokdttwccjwlompjwbsnuolokdbmeddbqjdenuotolnqwocm.
or this:
nuoi cpok dttw ccjw lomp jwbs nuol okdb medd bqjd enuo toln qwoc m
You will notice that the above line ends with a single character.  This gives 
away the end of the text and would be better served by the placement of nulls, 
or garbage characters.  The above line becomes:
nuoi cpok dttw ccjw lomp jwbs nuol okdb medd bqjd enuo toln qwoc mhew
'hew' will decypher to 'qmi' which will clearly appear to be nulls to the 
intended recipient.
-Nulls
Nulls are characters used in messages which have no meanings.  A message could 
be sent which uses numbers as nulls. This makes decypherment more difficult as
part of the message has no meaning.  Until the decypherer realizes this, he 
may have a hard time of solving the message.
-Polyphony
Another method that can be applied is the use of polyphones.  Polyphones are 
simply using a piece of cyphertext to represent more than one piece of 
plaintext.  For example a cyphertext 'e' may represent an 'a' and a 'r'.  This 
does complicate decypherment and may result in multiple messages.  This is 
dangerous as these messages are prone to errors and may even decypher into 
multiple texts.
A new cyphertext alphabet would be
Cyphertext alphabet	A B C D E F G H I J L N P
Plaintext alphabet	Z X U S Q O M K H N R V W
			B D F G I A C E L P J T Y
Our old plaintext message becomes
nih aich gfp peii ledh bclejd nih dhgfjb gffj clfg nih phdn cehib
This decypherment becomes very tricky for someone to accomplish.  Having some 
knowledge of the text would be a great help.
If it appears that very few letters are being used in a document then you may 
wish to suspect the use of polyphones within a document.
-Homophones
Homophones are similar to polyphones except that there is more than one 
cyphertext letter for every plaintext letter.  They are useful to use in that 
they can reduce the frequencies of letters in a message so that an analysis 
yields little information.  This is very easy to do with bipartite 
substitution cyphers.  For example:
         a b c d e
       a a b c d e
       b f g h i j
       c k l m n o
       d p q r s t
       e u v w x y
       f z * * * *
*(fb, fc, fd, fe are NULLS)
We can add homophones to the message like this:
          a b c d e
 i h g a  a b c d e
   k j b  f g h i j
   n l c  k l m n o
   o m d  p q r s t
     p e  u v w x y
       f  z * * * *
The optimal way to set up these homophones is to calculate the frequency of 
appearance in the natural language you are using of each row of letters.  
Homophones should be added so that the cyphertext appearance of each homophone 
is reduced to a level where frequency analysis would yield little information.
-Code Words
One final method which can be used is that of code words.  Simply replace 
important words in the plaintext with code words which represent another word.
For example the nonsense plaintext that has been chosen for this document could 
actually mean:
The blue cow will rise during the second moon from the west field.
The king is angry and will attack in two weeks with the 1st calvary by way of 
the foothills.
blue is angry
cow is king
rise is attack
second is two weeks
moon is 1st calvary
west field stands for some foothills on the west side of the kingdom.
Throughout this document I have mentioned frequency analysis of english 
documents.  This is a fairly tedious task to do by hand, and so I am 
developing software to aid in frequency analysis of documents.  I will be 
making it available via my website at http://www.cu-online.com/~jwthomp/ on 
Monday, September 8th.  Please watch for it in the Cryptography section.
Ok, now to try your hand at a few cyphertexts..
This one has to do with war.
1)
kau noelb'd oerf xmtt okkopw ok qoxb euoqf kau kurhtoe wbmcakds, obq dkemwu amd
podktu xamtu xu altq amr   
This one is an excerpt from a technical document.
2)
etdsalwqs kpjsjljdq gwur orrh frurdjkrf sj qtkkjps npjtk ljeethalwsajhq   
sgrqr kpjsjljdq tqr w jhr sj ewhy kwpwfane ijp spwhqeaqqajh sykalwddy tqahn 
ldwqq f ahsrphrs kpjsjljd wffprqqrq sj qkrlaiy qkrlaial etdsalwqs npjtkq
Mail me your answers and I'll put the first person who solves each cypher in 
the next Phrack.
In fact, I would enjoy seeing some participation in this for the next Phrack.  
After reading this, I welcome the submission of any "Monoalphabetic" cypher 
based on the discussions of this article.  Please do not yet submit any 
polyalphabetic cyphers (Next article).  When submitting to me, please send me 
two letters.  The first mail should include only the encyphered text.  Make 
sure it is enough so that a reasonable examination can be made of the cypher.
This first mail should have a subject "Cyphertext submission".  If you are 
using a method of encypherment not found in this article, please enclose a 
brief description of the type of method you used.  Follow this mail up with 
another entitled "Cyphertext Solution" along with a description of the 
encyphering method as well as the key or table used.
I will select a number of these texts to be printed in the next Phrack, where
readers may have a chance at solving the cyphers.  The reason I ask for two
seperate mailing is that I will want to take a crack at these myself.  Finally,
the names of individuals will be placed in the following phrack of the first
to solve each cypher, and whomever solves the most cyphers prior to the next
Phrack release (real name or pseudonym is fine).
Please mail all submissions to [email protected]
I welcome any comments, suggestions, questions, or whatever at 
[email protected]
----[  EOF