![]() |
Information Technologies
and the Information Profession |
|
Cryptography and Pretty Good PrivacyCopyright © 2000 by R. E. WyllysIntroductionWhat is Cryptography?The Oxford English Dictionary gives the first date of use of the term "cryptography" as 1641 and defines the term as
CiphersA cipher is a secret message in which each letter of the original message, the "plain text,"has been replaced by a different letter or in which the original sequence of letters has been rearranged in a nontrivial fashion. A simple method of replacement is one attributed to Julius Caesar: each letter is replaced by the letter following it in the alphabet (or by the second letter following, or the third, etc.). If B is substituted for A, C for B, . . . , Z for Y, and A for Z, then through encipherment the plain text GO HOME AT ONCE JOE becomes the cipher text HP IPNF BU PODF KPF which, in practice, would probably be written in groups of 5 letters to make it harder to decipher: HPIPN FBUPO DFKPF This is an example of a "substitution cipher." The substitution schemes used in real life nowadays are far more complicated than that which apparently sufficed for Caesar. Ciphers produced by rearrangement of the sequence of letters are known as "transposition ciphers." As a very simple example, consider a rearrangement scheme in which each set of four letters, with position numbers 1234, is replaced by moving the letters to positions 4213. In this scheme all messages must contain a multiple of four letters, so when needed spurious extra letters are added to the plain text to make the total number of letters a multiple of four. For the purpose of encipherment, the plain-text example above would become, say, GO HOME AT ONCE JOEQ and would be transposed into the cipher text OOGH TEMA ENOC QOJE Nowadays transposition ciphers are used far less often than substitution ciphers. CodesAnother form of secret writing is "codes." In this technique, whole words or phrases of the plain text are replaced by arbitrary groups of characters. A two-column, two-part dictionary is used in the replacement process. Encoding (i.e., going from plain text to code text) uses the part of the dictionary in which the left-hand column is arranged in alphabetical order by the words and phrases, with the right-hand column providing the matching code groups. Decoding (i.e., going from code text to plain text) uses the part of the dictionary in which the left-hand column is arranged in alphanumeric order by the code groups, with the matching words and phrases in the right-hand column. Though codes are rarely used nowadays for secrecy, they are sometimes used for brevity. Indeed, the use in email of abbreviations like "BTW" for "by the way,""OTOH" for "on the other hand," and "PMFJI" for "pardon me for jumping in," can be considered the use of a code (albeit a very low-level code). TerminologyStrictly speaking, the terms "encipherment" and "decipherment" refer to the use of substitution and/or transposition processes, and the terms "encoding" and "decoding" refer to the use of codes for secret communication. Again speaking strictly, the terms "cryptography," "encryption," and "decryption" refer to the hiding and unhiding of plain text through the use of either codes or ciphers. "Cryptology" refers to the science and practice of engaging in secret communication by either ciphers or codes. Encryption and Decryption and the InternetEncipherment and decipherment, usually called (somewhat loosely) encryption and decryption, have become extraordinarily important techniques in the development of the Internet and the World-Wide Web. Indeed, one student of the development of the Internet, Lawrence Lessig writes:
Practical Encryption and Decryption Using ComputersWith computers, and on the Internet, the techniques of cryptography are, for all practical purposes, based exclusively on using substitution methods to accomplish encipherment and decipherment. Let us consider in some detail how computer-based substitution methods work. The following is an example of one way of encrypting a string of text via substitution. Suppose we want to encipher the sentence shown below as the plain text. Plain Text (PT) We begin by noting the numerical equivalents of the letters and punctuation in decimal ASCII values (Endnote 2). Numerical Values of
Letters (ASCII) F O X J
U M P E D O V E
R T H E L
A Z Y D O G S
. We then encrypt the above sentence by using a stream of numbers called the "key" (K). Streams of numbers used for keys are generated by processes that are designed to appear random, i.e., to avoid containing any discernible patterns that could be used by people wanting to "break into" enciphered messages and recover the plain texts. Such processes yield key streams that are called "pseudo-random" because no one knows how to generate truly random key streams. The generation processes ensure that the streams of digits will minimize the occurrence of potential useful patterns and will not actually start to repeat themselves till astronomical quantities of digits have been produced. The digits in the key stream are added in a special way (explained below) to the numerical equivalents of the letters and punctuation. In the encryption table below, "CT" stands for cipher text. Note: In real-life encryption, the spaces between letters and words would not be present; they are shown here merely to make it easier for non-computers (i.e., humans) to see what is going on. ENCRYPTION PT:70 79 88 74 85 77
80 69 68 79 86 69 82 PT:84 72 69 76 65 90
89 68 79 71 83 46 If you look carefully at the stream of digits called CT, you will note that the addition operations above are performed without "carrying". That is, the addition is performed in a strictly columnar fashion, and when a sum equals 10 or more, the initial digit "1" is ignored. This is known as "modulo 10" addition. Decryption is performed in the reverse way, by subtraction; and the subtraction is performed without the reverse of "carrying", viz., "borrowing". This is known as "modulo 10" subtraction. DECRYPTION CT:85 35 29 78 78 97
29 82 41 30 99 81 91 CT:33 23 27 61 66 16
62 51 21 38 80 18 Note: The cipher text could be sent as a single stream: e.g., 9410609680846828260280683285352978789729824130998191 etc.; or, for ease of handling, the cipher text could be broken into 5-digit groups: e.g., 94106 09680 84682 82602 80683 28535 29787 89729 82413 09981 91 etc. The foregoing examples use decimal arithmetic to illustrate the concepts. In actual computer and telecommunications practice, the letters are expressed as binary numbers, as are the key values, and the arithmetic is performed modulo 2. Computers can, of course, carry out arithmetic at extremely high speeds, so they accomplish this kind of encryption and decryption with great rapidity. Traditional Encryption And DecryptionIn traditional encryption and decryption, both the sender and the recipient must have a copy of the key stream. Historically, this was often accomplished by providing both sender and recipient with a copy of a page from a "one-time pad," i.e., both sender and recipient would have a copy of the same set of random digits to be used for encipherment and decipherment. Alternatively, both sender and recipient would be provided with a a machine that could be set, by pre-arrangement, so as to generate the same "pseudo-random" key with which to process a given message. Such machines are widely used by military and diplomatic organizations. What happens if a third party somehow knows (or guesses) the plain text? He or she will be able to recover the key, as follows: RECOVERING THE KEY
and so on. If the same key is used for another message--either by mistake, or because the generation of "pseudo-random key" involves a process that eventually starts repeating itself--then the third party would be able to apply the key to the second message and decipher it. Public-Key Encryption and Decryption (PKED)We have seen that in traditional encryption and decryption, the sender and the recipient must share the same key. Furthermore, this key must be kept a secret, known only to the sender and the recipient, or else other people may be able to use the key to read messages that the sender and the recipient think are private. Public-key encryption and decryption (PKED) is a brilliant step up from traditional methods. It works with pairs of keys, each pair being related to each other in a special way (which can be explained by a sophisticated mathematical argument that we omit here). William Stallings (Endnote 3) uses the analogy of a lockbox with a right-handed key and a left-handed key. The relation between the keys is such that the lockbox can be initially locked with either key, but if it has been locked with, say, the left-handed key, then the right-handed key must be used to unlock it, and vice versa. In public-key encryption and decryption, the pairs of keys are matched pairs of keys that are called the "public-key" and the "private key." (Using the lockbox analogy, you might think of the public key as the left-handed key and the private key as the right-handed key.) Here is how Stallings summarizes what is done:
With such an arrangement, it is possible for people who have never met, and have not made advance arrangements to share a single key in common, to communicate privately with each other by using public-private pairs of keys. Furthermore, public-key encryption-decryption makes it possible for any pair of persons to use traditional encryption-decryption if they prefer to do so. They may well prefer to do so, because PKED makes heavy demands on computer time and, hence, may operate slowly, especially on microcomputers (though this is becoming less of a problem as hardware speeds increase). For example, a sender may use a recipient's public key to transmit secretly to that recipient a traditional kind of encryption-decryption key. A practical example might be Jane's sending Karl a short public-key encrypted message (using Karl's public key) telling Karl that she will shortly send him a long message that has been traditionally encrypted by using as the key stream the sequence of characters starting with the 17th line on page 123 of a particular edition of a book that Jane and Karl have previously agreed to use for such a purpose. For a message of only a few thousand characters this could be a reasonably secure, yet quick-and-easy way, of defining and sharing a traditional key. Using Public-Key Encryption-Decryption for AuthenticationAnother use of PKED is to provide authentication that the purported sender of a message is the real sender. Suppose that Karl has sent Lila a message using his private key, that Lila has decrypted the message by using Karl's public key, and that Karl subsequently wishes to deny having sent the message. All Lila has to do is to produce the encrypted message and show that Karl's public key suffices to decrypt it. This means that only Karl, using his private key (which, you will recall, is paired with his public key) could have encrypted the message originally. Of course, if Karl had originally wanted to reserve the possibility of later denying his authorship, he could simply have used Lila's public key to encrypt the message; and she would then have decrypted it using her private key. Since anyone could have used Lila's public key to send her an encrypted message, Karl could plausibly deny authorship. But in many circumstances, Karl may have good reasons for wanting to assure Lila that the message is from him. In such a circumstance, he can encrypt the message using his private key, as we noted earlier. A possible problem with Karl's using his private key for the entire message is that it may be so long that the PKED process will take an unacceptable amount of time. If this is the case and if, further, the message itself does not need to be secret (i.e., if the only concern is with assuring that Karl is the sender), then a variant of PKED known as the "secure hash function" technique can be useful. In the secure-hash-function technique, Karl uses a fast-acting, "hashing" algorithm (which is also available to Lila) to generate, from the original message, a relatively short string of characters. This string is the "hash code", and it is then encrypted by Karl, using his private key. The result is appended to the original message. This procedure serves to authenticate the message because Lila can do three things: (1) She can use Karl's public key to decrypt the encrypted hash code, thus obtaining the original hash code; (2) she can use the hashing algorithm on the text of the original message, to generate her version of the hash code; and (3) she can then compare the original hash code, as decrypted by her, with the hash code that she directly generated. If they agree, then the message (almost surely) must have come from Karl. Furthermore, the message (almost surely) cannot have been changed during the transmission process; for if it had been changed, then when Lila employed the hashing algorithm on the message she received, the hashing algorithm would have generated a hash code different from the one that Karl encrypted. Thus, almost surely, the message has come unchanged from Karl. However, there could be a possibility for fraud. Someone might have forged a plausible hash-function authenticator and attached it to the message. Is this possible? Yes. Is it likely? Probably not, but let us consider the details. Stallings lists the following five properties that the hash algorithm must possess:
If the size of the hash code is small, then it will be easy to forge. For example, if the hash code is 8 bits long, then there are only 28 = 256 possible values for it. Since in any functioning communications environment there will be far more than 256 possible messages, it follows that many different messages will yield the same one of the 256 possible hash codes. Stallings borrows an example from a book by Davies and Price (Endnote 4) to illustrate this point. Here is the example. To illustrate the problem, we assume we are using a 32-bit hash code. This means that there will be 232 possible distinct codes. Suppose a forger wants to send a false message to Alice, as follows:
The forger could capture a genuine signed message from Bob and could attach the signature to the above forgery. The forger could also use Bob's public key to decrypt the signature and obtain the hash code applicable to the genuine message from Bob. At this point, all the forger has to do is apply the hashing algorithm to his (the forger's) message and see what hash code it generates. If the forger's first trial of the algorithm on his forgery yields the same hash code as the genuine message, the forger is done. He can send the forgery and be sure that it will be undetectable. Of course, there is only one chance in 232 that the forger will be so lucky. But our forger is perseverant. He can make change after change in his forgery, using minor changes that leave unaltered the essential content of the message. For example, the first sentence might be rewritten as
or as
or as
and so on. After each change, the forger can test to see if the change has yielded the desired hash code, that of the genuine message from Bob. If the forger makes at least 232 changes, he will eventually find a minor variant of his forgery that generates the same hash code as the genuine message. The forger can then use that variant to send a fraudulent message to Alice. Certainly it will take the forger some time to try 232 or more changes, and this is definitely not a task to be undertaken by a human with only pencil and paper. But our forger, of course, has a computer to assist him. As a practical matter, the question is how fast the changes can be made and tested to see whether they generate the hash code of the genuine message. The larger the size of the hash code, the longer the trial-and-error process of changes and tests will take. The consensus is that with current computer technology, a 128-bit hash code is adequate, in the sense that for a forger the testing of 2128 possible changes would take an unfeasibly long time. The hash-code version of PKED is essentially what is used in the Secure-Socket Layer (SSL) protocol that implements secure communication between your browser and the browser-server of a company from which you want to order something and pay for it by sending the company your credit-card number. That is, PKED is what your browser and the company's server mutually agree to start using when you click on a Website that begins with "https://". Once your browser and the server have established a path for using encrypted messages, the little padlock icon in the lower left corner of your Netscape window changes from unlocked to locked position, or if you are using Internet Explorer (IE), a locked padlock icon appears toward the right side of the bottom bar of the IE window. Then you can send your credit-card number over the Internet with confidence. Note: I strongly recommend that, if you are eligible to do so (i.e., if you are a citizen or permanent resident of the U.S. or Canada), you use the 128-bit-security version of either IE or Netscape. Earlier versions used only a 40-bit hash code, and computers have now become so fast that 40-bit hash codes are no longer reasonably secure against a determined attack. The 128-bit versions are available free from Microsoft and Netscape. VerificationWe need to consider still another point: If Karl sends Lila an encrypted message by the procedures described above, Lila can be (almost) sure that the sender of the message is Karl. But how can Lila be sure that Karl is really the Karl that he claims to be, and not somebody else who has taken on the identity of Karl? To answer the question of whether Karl is really Karl, a further kind of assurance is needed. Here is how Lessig describes the problem and its solution:
The kind of assurance provided by a "trustworthy third party," as Lessig puts it, is referred to by such terms as "digital identification," "digital authentication," or "digital signature." As you may already be aware, there are commercial services, e.g., Verisign, that have been established in order to provide this kind of assurance, under such tradenames as "Digital ID." A useful guide to information on the subject is The Center for Democracy & Technology. Pretty Good PrivacyPGP, or Pretty Good Privacy, is a piece of software that provides a practical means of encrypting messages and providing digital signatures (hash-code procedures) for authentication. That is to say, PGP is a handy way of implementing on your own microcomputer the techniques of public-key encryption and decryption that we have described in this lesson. PGP was originally freeware, but now is also available in several commercial packages. Versions exist for both PCs and Macintoshes. PGP was written in 1991 by Philip R. Zimmermann, who released it as freeware because of his strong personal commitment to the concept of open-source software. Because PGP became widely available around the world after Zimmermann released it, the U.S. Government charged Zimmermann with violating U.S. export laws regarding the distribution of cryptographic techniques. However, the charges were eventually dropped. In my opinion, the charges against Zimmermann should never have been brought in the first place. PGP was simply an efficient and convenient software implementation of ideas that had been in the public domain--and well known around the world--since 1977, when R. L. Rivest, A. Shamir, and L. Adelman published their ideas on public-key, private-key cryptography and their algorithm for implementing those ideas (Endnote 6) and also filed a patent application for the algorithm. The algorithm has become known as the "RSA Algorithm" in honor of the inventors. In fact, the idea of public-key, private-key cryptography is even older than the RSA publication and patent, for in 1976 a paper on a very similar idea had been written by S. C. Pohlig and M. E. Hellman (Endnote 7). In short, Zimmermann had not revealed any secrets; he had merely made PKED easier for ordinary users to use. A good source for further information on PGP is the Website of the MIT Distribution Center for PGP. Historical BackgroundAn excellent, very thorough, yet quite readable book on cryptography and its history was written in 1967 by David Kahn under the title, The Codebreakers (Endnote 8). A second edition, revised to include the concept of public-key, private-key cryptography was published in 1996 (Endnote 9). Endnotes1. Lessig, Lawrence. Code and Other Laws of Cyberspace. New York, NY: Basic Books; 1999. ISBN:0-465-03913-8. Pp. 35-36. [Note: The word "code" in the title of this book refers not to cryptographic codes but to the code of law and the code of custom, the codes by which human society regulates itself--or at least attempts to do so.] 2. A handy reference for ASCII characters and their numerical equivalents is a Webpage entitled ASCII - ISO 8859-1 (Latin-1) Table with HTML Entity Names. 3. Stallings, William. Protect Your Privacy: The PGP User's Guide. Englewood Cliffs, NJ: Prentice-Hall; 1995. ISBN:0-13-185596-4. 4. Davies, Donald W.; Price, W. L. Security for Computer Networks: An Introduction to Data Security in Teleprocessing and Electronic Funds Transfer. 2nd ed. New York, NY: Wiley; 1989. ISBN:0-471-92137-8. 5. Lessig, op. cit., pp. 36-37. 6. Rivest, Ronald L.; Shamir, Adi; Adelman, Len. On Digital Signatures and Public Key Cryptosystems: Technical Memorandum 82. Cambridge, MA: MIT Laboratory for Computer Science; 1977. 7. Pohlig, Stephen C.; Hellman, Martin E. An Improved Algorithm for Computing Logarithms over GF(p) and its Cryptographic Significance. 1978 January: IEEE Transactions on Information Theory. (Article submitted on 1976 June 17.) 8. Kahn, David. The Codebreakers: The Story of Secret Writing. New York, NY: Macmillan; 1967. LC:63-16109. 9. Kahn, David. The Codebreakers: The Comprehensive History of Secret Communication from Ancient Times to the Internet. New York, NY: Scribner; 1996. ISBN: 0-684-83130-9.
|
|||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||
| © As of July 2000, the material displayed
here is under copyright by the LIS 386.13 class team at the Graduate School
of Library and Information Science at UT-Austin: Ronald Wyllys, Philip
Doty, Quinn Stewart, Carlos Ovalle, Lori Eichelberger, Tony Cherian, and
Don Drumtra.
Appropriate educational and other non-profit use of the material is encouraged, provided that this copyright notice is appended, full attribution is given, and no fees are charged for access to the material. For-profit use is strictly forbidden. |
||||||||||||||||||||||||||||||||||||||||||