---[ Phrack Magazine Volume 7, Issue 51 September 01, 1997, article 13 of 17 -------------------------[ Monoalphabetic Cryptanalysis (Cyphers, Part One) --------[ Jeff Thompson aka 'Mythrandir'Written for Phrack and completed on Sunday, August 31st, 1997. --------- First a quick hello to all of those I met at DefCon this year. It was incredible fun to finally put faces to many of the people I have been talking with for some time. It was truly was a treat to meet so many others who are alive with the spirit of discovery. ---------- This is the first in a series of articles on Cryptology that I am writing. The goals of these articles will be to attempt to convey some of the excitement and fun of cyphers. A topic of much discussion in regards to cryptography currently, is about computer based cyphers such as DES, RSA, and the PGP implementation. I will not be discussing these. Rather, these articles will cover what I will term classical cryptology. Or cryptology as it existed before fast number crunching machines came into existance. These are the sorts of cyphers which interested cryptographers throughout time and continue to be found even to this very day. Even today, companies are producing software whose encryption methods are attackable. You will find these commonly among password protection schemes for software programs. Through the course of these articles I will explain in practical terms several common cypher types and various implementations of them as well as cryptanalytic techniques for breaking these cyphers. Creating cyphers is fun and all, but the real excitement and often times tedium is found in Cryptanalysis. Many of the ideas presented in these articles will based on three sources. The following two books: The Codebreakers by David Kahn (ISBN: 0-684-83130-9) and Decrypted Secrets by F.L. Bauer (ISBN: 3-540-60418-9). Both authors have put together wonderful books which both cover the history and methods of Cryptology. Do yourself and the authors a favor and purchase these books. You will be very pleased with the lot. Finally, a miniscule amount of these articles will be written based on my own personal experience. The fun is in the journey and I welcome you on what is certain to be an interesting trip. Please feel free to raise questions, engage me in discussions, correct me, or simply offer suggestions at [email protected]. Please be patient with me as I am traveling extensively currently, and may be away from the computer at length occasionally. Out the door and into the wild... --Monoalphabetic Cyphers Monoalphabetic cyphers are often currently found in simple cryptograms in books and magazines. These are just simple substitution cyphers. This does not mean that they are always simple for the beginning amateur to solve. Three common monoalphabetic cyphers which are used are substitution, cyclical, and keyed cyphers. -Substitution Cyphers By taking an alphabet and replacing each letter with another letter in a unique fashion you create a simple monoalphabetic cypher. Plaintext Alphabet A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Cypher Alphabet Z I K M O Q S U W Y A C E B D F H J L N P R T V X G Plaintext Message The blue cow will rise during the second moon from the west field. Cyphertext Message nuo icpo kdt twcc jwlo mpjwbs nuo lokdbm eddb qjde nuo toln qwocm. -Cyclical Cyphers By taking an alphabet and aligning it with a rotated alphabet you get a cyclical cypher. For example: Plaintext Alphabet A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Cypher Alphabet N O P Q R S T U V W X Y Z A B C D E F G H I J K L M Indeed, you may recognize this cypher as a ROT13 which is commonly used on news groups to obscure messages. -Keyed Cypher Another way to create a monoalphabetic cypher is to choose a keyword or phrase as the beginning of the cypher alphabet. Usually, only the unique letters from the phrase are used in order to make sure the plaintext to cyphertext behaves in a one to one fashion. For example: Plaintext Alphabet: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Cypher Alphabet L E T O S H D G F W A R B C I J K M N P Q U V X Y Z The passphrase in this cypher is "Let loose the dogs of war" The advantage of such a system is that the encryption method is easy to remember. Also, a method of key change can be created without ever having to distribute the keys. For example, one could use the 4 words at a time of some piece of literature. Every message could use the next four words. Indeed, this change could occur more frequently, but that is a subject for another article. -Bipartite Substitution Bipartite substition is the use of symbol pairs to represent plaintext. Later we will see that this sort of substitution lends itself to be easily made more difficult to analyze. Two examples of this are: 1 2 3 4 5 A B C D E 1 A B C D E A A B C D E 2 F G H I J B F G H I J 3 K L M N O C K L M N O 4 P Q R S T or D P Q R S T 5 U V W X Y E U V W X Y 6 Z 0 1 2 3 F Z 0 1 2 3 7 4 5 6 7 8 G 4 5 6 7 8 9 9 . - ? , H 9 . - ? , Obviously, the letters do not need to be placed in this order as their solutions would not be that difficult to guess. --Cryptanalysis Previously we created a cyphered message: nuo icpo kdt twcc jwlo mpjwbs nuo lokdbm eddb qjde nuo toln qwocm. If one were to receive this message, figuring out its contents might seem fairly daunting. However, there are some very good methods for recovering the plaintext from the cyphertext. The following discussion will work under the assumption that we know the cyphers with which we are dealing are monoalphabetics. -Frequency Analysis The first method we will use is frequency analysis. Natural languages have many qualities which are very useful for the analysis of cyphertext. Languages have letters which occur more commonly in text, collections of letters which are more frequent, patterns in words, and other related letter occurances. Counting up the occurances of letters we find that there are... letter occurances b 3 c 4 d 5 e 2 i 1 j 3 k 2 l 3 m 3 n 4 o 8 p 2 q 2 s 1 t 3 u 3 w 4 The order of greatest frequency to least is: 8 5 4 3 2 1 {o} {d} {c n w} {b j l m t u} {e k p q} {i s} If this sort of analysis were run on many volumes of english you would find that a pattern would emerge. It would look like this: {e} {t} {a o i n} {s r h} {l d} {c u m f} {p g w y b} {v k} {x j q z} You will notice an immediate correlation between e and o. However, for the rest of the letters we can not be very certain. In fact, we can not be very certain about e either. Since this text is short it is helpful to take a look at some of the other behaviors of this text. Counting up the first, second, third, and last letters of the words in this text we find the following frequencies: First Letter in word Occurances e 1 i 1 j 1 k 1 l 1 m 1 n 3 q 2 t 2 Order: n q t e i j k l m Second letter in word Occurances c 1 d 2 i 1 n 1 o 2 p 1 u 3 w 3 Order: u w d o c i n p Third letter in word Occurances c 1 d 2 i 1 k 1 l 2 o 4 p 1 t 1 u 1 Order: o d l c i k p t u Last letter in word Occurances b 1 c 1 e 1 m 1 n 1 o 5 s 1 t 1 English frequency for first letter: t a o m h w Second letter: h o e i a u Third letter: e s a r n i Last letter: e t s d n r Noticing the higher frequency count for 'o' in the third and last letters of words in addition to its absence as a first letter in any words gives us strong reason to believe that 'o' substitutes for 'e'. This is the first wedge into solving this cypher. However, do not be fooled by the apparent strengths of frequency analysis. Entire books have been written without the use of some letters in the English alphabet. For instance The Great Gatsby was written without using the letter 'e' in one word of the book. Other items to analyze in cyphertext documents is the appearance of letters in groups. These are called bigrams and trigrams. For example, 'th' is a very common letter pairing in the english language. Also, as no surprise 'the' is a very common trigram. Analysis of english documents will find these results for you. So now that that we have developed a simple way of starting to attack cyphers lets examine a few ways to make them more difficult to break. --Strengthening Cyphers -Removing word and sentence boundaries A simple way to complicate decypherment of a cyphertext is to remove all spacing and punctuation. This makes it more difficult to perform a frequency analysis on letter positions. However, it is possible to make reasonable guesses as to word positions once yoy begin to study the document. Another method is to break the cyphertext into fixed blocks. For example after every four letters a space is placed. The previous cypher text would appear as this: nuoicpokdttwccjwlompjwbsnuolokdbmeddbqjdenuotolnqwocm. or this: nuoi cpok dttw ccjw lomp jwbs nuol okdb medd bqjd enuo toln qwoc m You will notice that the above line ends with a single character. This gives away the end of the text and would be better served by the placement of nulls, or garbage characters. The above line becomes: nuoi cpok dttw ccjw lomp jwbs nuol okdb medd bqjd enuo toln qwoc mhew 'hew' will decypher to 'qmi' which will clearly appear to be nulls to the intended recipient. -Nulls Nulls are characters used in messages which have no meanings. A message could be sent which uses numbers as nulls. This makes decypherment more difficult as part of the message has no meaning. Until the decypherer realizes this, he may have a hard time of solving the message. -Polyphony Another method that can be applied is the use of polyphones. Polyphones are simply using a piece of cyphertext to represent more than one piece of plaintext. For example a cyphertext 'e' may represent an 'a' and a 'r'. This does complicate decypherment and may result in multiple messages. This is dangerous as these messages are prone to errors and may even decypher into multiple texts. A new cyphertext alphabet would be Cyphertext alphabet A B C D E F G H I J L N P Plaintext alphabet Z X U S Q O M K H N R V W B D F G I A C E L P J T Y Our old plaintext message becomes nih aich gfp peii ledh bclejd nih dhgfjb gffj clfg nih phdn cehib This decypherment becomes very tricky for someone to accomplish. Having some knowledge of the text would be a great help. If it appears that very few letters are being used in a document then you may wish to suspect the use of polyphones within a document. -Homophones Homophones are similar to polyphones except that there is more than one cyphertext letter for every plaintext letter. They are useful to use in that they can reduce the frequencies of letters in a message so that an analysis yields little information. This is very easy to do with bipartite substitution cyphers. For example: a b c d e a a b c d e b f g h i j c k l m n o d p q r s t e u v w x y f z * * * * *(fb, fc, fd, fe are NULLS) We can add homophones to the message like this: a b c d e i h g a a b c d e k j b f g h i j n l c k l m n o o m d p q r s t p e u v w x y f z * * * * The optimal way to set up these homophones is to calculate the frequency of appearance in the natural language you are using of each row of letters. Homophones should be added so that the cyphertext appearance of each homophone is reduced to a level where frequency analysis would yield little information. -Code Words One final method which can be used is that of code words. Simply replace important words in the plaintext with code words which represent another word. For example the nonsense plaintext that has been chosen for this document could actually mean: The blue cow will rise during the second moon from the west field. The king is angry and will attack in two weeks with the 1st calvary by way of the foothills. blue is angry cow is king rise is attack second is two weeks moon is 1st calvary west field stands for some foothills on the west side of the kingdom. Throughout this document I have mentioned frequency analysis of english documents. This is a fairly tedious task to do by hand, and so I am developing software to aid in frequency analysis of documents. I will be making it available via my website at http://www.cu-online.com/~jwthomp/ on Monday, September 8th. Please watch for it in the Cryptography section. Ok, now to try your hand at a few cyphertexts.. This one has to do with war. 1) kau noelb'd oerf xmtt okkopw ok qoxb euoqf kau kurhtoe wbmcakds, obq dkemwu amd podktu xamtu xu altq amr This one is an excerpt from a technical document. 2) etdsalwqs kpjsjljdq gwur orrh frurdjkrf sj qtkkjps npjtk ljeethalwsajhq sgrqr kpjsjljdq tqr w jhr sj ewhy kwpwfane ijp spwhqeaqqajh sykalwddy tqahn ldwqq f ahsrphrs kpjsjljd wffprqqrq sj qkrlaiy qkrlaial etdsalwqs npjtkq Mail me your answers and I'll put the first person who solves each cypher in the next Phrack. In fact, I would enjoy seeing some participation in this for the next Phrack. After reading this, I welcome the submission of any "Monoalphabetic" cypher based on the discussions of this article. Please do not yet submit any polyalphabetic cyphers (Next article). When submitting to me, please send me two letters. The first mail should include only the encyphered text. Make sure it is enough so that a reasonable examination can be made of the cypher. This first mail should have a subject "Cyphertext submission". If you are using a method of encypherment not found in this article, please enclose a brief description of the type of method you used. Follow this mail up with another entitled "Cyphertext Solution" along with a description of the encyphering method as well as the key or table used. I will select a number of these texts to be printed in the next Phrack, where readers may have a chance at solving the cyphers. The reason I ask for two seperate mailing is that I will want to take a crack at these myself. Finally, the names of individuals will be placed in the following phrack of the first to solve each cypher, and whomever solves the most cyphers prior to the next Phrack release (real name or pseudonym is fine). Please mail all submissions to [email protected] I welcome any comments, suggestions, questions, or whatever at [email protected] ----[ EOF