This is my evaluation of Week 26 of the Google IT Support Professional certification course on coursera.org. This week: Cryptology has always interested me, even though I don’t understand mathematics. I think it is just a fascination with information, secrecy, and the importance of these concepts as they relate to history. I don’t know, codes are cool.
Keeping information secret has always been an important part of the human experie
nce. Here are some key terms relating to the obscured world of cryptography…
Cryptography is the art of “hiding messages from potential enemies.” It has advanced significantly with the advent of advanced technology.
Encryption is the process by which one takes a plaintext message and applies an operation to it, called a cipher, which produces an unreadable message as output, called ciphertext.
Decryption is the reverse of this process, when a ciphertext is decoded into plaintext.
A cipher is made of two parts:
The Encryption Algorithm is the “underlying logic [or] process that’s used to convert plaintext into ciphertext.” This is usually a complex mathematical algorithm.
The Key is an element that adds a layer of obfuscation to your algorithm, so that someone else using the same algorithm will not be able to decrypt your information.
Security through obscurity is the concept of keeping encryption algorithms secret. If attackers don’t know your security practices it will be harder to design an attack.
Kerchoff’s Principle states that a cryptosystem is a “collection of algorithms for key generation and encryption and decryption operations that comprise a cryptographic service should remain secure—even if everything about the system is known, except the key.”
This may also be known as Shannon’s Maxim or as “the enemy knows the system.” Keeping keys secure should mean that an enemy will not be able to break your encryption.
Cryptology is the study of cryptography.
Searching for and studying encryption is called cryptanalysis.
Frequency analysis is “the practice of studying the frequency with which letters appear in a ciphertext.” This is a method used to help break encryption.
The most commonly used letters in the English language are E, T, A, and O. The most commonly seen pairs of these letters are TH, ER, ON, and AN.
The first programmable digital computer, Colossus, was built during WWII at Bletchley Park to work on cryptanalysis.
Steganography is the practice of hiding information without encrypting it. Modern steganography methods can be used to hide information within files that do not appear to contain messages, such as image files.
Reading: The Future of Cryptanalysis
Here’s a reading on integer factorization.
Remember that the key must remain secret under Kerchoff’s Principle. This is especially important when working with a symmetric-key algorithm, because they use the same key to encrypt and decrypt messages.
A simple encryption mechanism is a substitution cipher, which replaces parts of plaintext with ciphertext:
E = O
O = Y
“Hello World” becomes “Holly Wyrld.”
A popular substitution cipher is known as the Caesar Cipher, where characters in the alphabet are replaced with others, usually by shifting the entire alphabet a certain number of letters—making that the key. If someone learns that key, they can easily decrypt and encrypt messages.
A Caesar cipher that uses a key of 13 is referred to as ROT13, and would work like this:
“HELLO WORLD” – – > ROT13 – – > “URYYB JBEYQ”
A Stream Cipher takes a stream of input characters and encrypts it one character at a time, outputting it as it goes. This means there is a 1:1 relationship between input and encrypted output.
A Block cipher takes data input, places it in a block of data of a fixed size, and encodes the block as one unit.
Generally, stream ciphers are faster and easier to implement, but can be less secure than block ciphers.
To avoid key re-use, some random data may be added to the encryption key called initialization vector, or I.V. This allows the creation of a “master” key, which can be combined with I.V. to create a one-time encryption key.
I.V. is sent in plaintext before the encrypted segments.
Symmetric Encryption Algorithms
One of the first encryption standards was known as the DES, or data encryption standard. It was designed in the 1970s with IBM and the NSA. It was adopted as the government standard for securing and encrypting data.
DES keys are 64 bits, but there is an 8-bit component used for error-checking, making the effective size only 56 bits.
The longer the key, the more secure the data. Key length defines the maximum strength of the encryption system.
A brute-force attack tries to guess the encryption key. This is time-consuming but often ultimately successful. Longer key lengths help protect against brute-force attacks.
A 56-bit key length has a maximum of 2^56 (72,000,000,000,000) possible keys. That sounds like a lot, but with computers today this size key is not effective.
In 2001 the Advanced Encryption Standard (AES) was adopted. AES is the only public encryption standard approved for use with government data by the NSA.
AES is a symmetric block cipher, similar to DES, but it uses 128-bit blocks and 128=bit, 192-bit, and 256-bit key lengths.
A 256-bit key length is still theoretically vulnerable to a brute-force attack, but the time and computing power necessary is considered “unfeasible.”
Speed of encryption and ease of implementation are important considerations when designing security protocol. Overly-complicated implementation processes can cause errors that result in data loss or vulnerability to attack.
To address speed, some cryptographic capability can be built into hardware. Many modern CPUs have AES instructions built into them to lessen computational workloads when processing encryption.
Rivest Cipher 4 (RC4) is a symmetric stream cipher that gained widespread adoption because of its speed and simplicity. RC4 supports key sizes up to 2048 bits, making it resilient to brute-force attack, but there are many examples of real-world attacks succeeding against RC4.
RC4 was used in many popular protocols, included WEP and its successor, WPA. It was also supported in SSL and TLS until 2015 when it was dropped by TLS.
The current standard is TLS 1.2 with AES GCM, a mode that effectively makes the AES block cipher into a stream cipher.
Symmetric encryption is relatively easy to maintain and implement, but has the inherent weakness of one shared key.
Reading: Symmetric Encryption
Read about RC4 No More.
Rob – What a Security Engineer Does
They make sure things are secure from internet mischief.
The Chapo Guide to Revolution: A Manifesto Against Logic, Facts, and Reason
Public Key or Asymmetric Encryption
Symmetric ciphers use the same key to encrypt and decrypt. Asymmetric ciphers use different keys.
Two parties that want to communicate using asymmetric encryption will first create a private key, which will be used to generate a public key. Once these public/private key pairs have been created for each party, they will exchange public keys.
Person A in this arrangement will use Person B’s public key to encrypt a message and send it to Person B, who will use their private key to decrypt. In response, Person B can encrypt messages using Person A’s public key, and only Person A will be able to decrypt by using their private key.
Another useful feature of this kind of system is the public key signature, which is generated by combining the message and the private key to create a special code which will validate the message.
Asymmetric encryption systems grant:
- Confidentiality: through encryption/decryption
- Authenticity: digital signature mechanism
- Non-repudiation: the author cannot dispute the origin
Asymmetric encryption is more secure, but more complex and computationally complex than symmetric systems. Some security solutions use both symmetric and asymmetric encryption for different functions.
A final item related to asymmetric encryption are Message Authentication Codes (MACs). A MAC is “a bit of information that allows authentication of a received message,” which ensures the message came from the alleged sender.
HMAC is a keyed-hash message authentication code, and is sent along with the coded message, and verified by the recipient.
CMACs are cipher-based message authentication codes, which use a symmetric cipher with a shared key is used to encrypt the message and the output of that is used as a MAC.
A popular CMAC is the CBC-MAC, or cipher block-chaining MAC. This uses block ciphers to encrypt a message in CBC mode¸ which incorporates a previously encrypted block into the next block’s plaintext, creating a “chain.” This means that any modification to aby block will create discrepancies in every resulting block.
Asymmetric Encryption Algorithms
RSA was one of the first practical asymmetric encryption systems. The math is too complicated to go into here, but it is important to know that the key generation process is based on two unique, randomly-generated and large prime numbers.
DSA is the digital signature algorithm, another asymmetric encryption system. DSA uses a randomly-selected number to “seed” the encryption process and generation of the private key. A failure to keep this number secret led to the large-scale piracy of the Sony Playstation in 2010.
Diffie-Hellman (DH) is a popular key exchange algorithm used to transmit keys through an asymmetric system that will be subsequently used in a symmetric system.
First, they choose a random number to begin a session, which can be public. Then, they each choose their own secret random number, and combine that with the public number. They then share the result with each other, and combine that with their own secret number.
Elliptic curve=cryptography (ECC) is a public-key encryption system that uses the algebraic structure of elliptic curves to generate keys. This is an equation that plots coordinates, such as y^2=x^3+ax+b.
ECC systems are more efficient, for example, a 256-bit ECC key is comparable to a 3072-bit RSA key.
Diffie-Hellman and DSA have elliptic curve variants, known as ECDH and ECDSA.
Reading: Asymmetric Encryption Attack
The Playstation 3 attack and again.
Hashing (or a hash function) is a type of operation that “takes in an arbitrary data input and maps it to an output of fixed size, called a hash or digest.”
A hash size will usually be expressed in bits and is often included in the name of the hash function. The output size is always going to be the same size but will be unique to the input. So any input will always produce the same size output.
Hashing is a popular way of uniquely identifying data, and is common throughout computing. Hashing can also be used to identify duplicate data sets in databases or to speed up searching data tables.
Cryptographic hashing functions are used for message authentication and integrity, data corruption detection and digital signatures.
It is at this point in the video that a banner pops up telling me I can change sessions to accommodate how behind I am in the course (two weeks). I accept and am now back on schedule. That is a good feature, and really kind of makes sense in the online ecosystem. It is the least they can do.
Hash functions are similar to encryption in that you can input plaintext into a hash function, but the hash is one directional and will not produce output that can be turned back into plaintext.
An “ideal” cryptographic hash function should be deterministic: “The same input value should always return the same hash value.” A change in the input should result in a change in the output, but there should be no correlation between the change in the input and the change in the output. And the function must not create hash collisions, meaning that different inputs do not create the same outputs.
Hashes work on blocks of data much like cryptographic block ciphers, and some hashes are based on similar principals and functions.
Here’s an example of an imaginary hash function:
Input: “Hello World” | [hash function] | E49AOOFF
Input: “hello world” | [hash function] | FF1832AE
Note the simple change from uppercase to lowercase in the second line and the complete change in the output. This isn’t really an example of how a hash works like the guy said it would be.
Here are some popular hashing algorithms.
MD5 operates on 512-bit blocks and creates 128-bit digests. MD5 was found to have a vulnerability in 1995, and security researches recommended the alternative SHA1, although the vulnerability was not considered critical.
In 2004 MD5 was found to create hash collisions, and researchers were able to create different files with matching hash digests. A very serious problem! In 2008 security researchers were able to create a fake SSL certificate that validated due to MD5 hash collisions. After some real-world exploits it was recommended that SHA1 replace MD5.
SHA1 (Secure Hash Algorithm) was developed by the NSA, and published in 1995. It operates on 512-bit blocks and produces 160-bit hash digests. It is used in TLS/SSL, PGP SSH, and IPsec protocols.
Since 2010 the National Institute of Standards and Technology (NIST) recommends using SHA2 instead of SHA1. There have been several demonstrations of partial hash collisions using SHA1, and until recently full collisions are still considered impractical (but not impossible), as they would cost tens of thousands of dollars in computing power.
In 2017 a full collision was demonstrated using CPU and GPU cloud computing—an estimated equivalent of a single CPU running for 6500 years, and a single GPU running for 110 years.
A Message Integrity Check (MIC) is basically a hash digest of a message, like a checksum for a message to indicate that the message has not been altered. But there is no secret key, so there is no protection against an attacker modifying a message and re-calculating the MIC. A MIC will protect against corruption or errors, but not tampering.
Reading: SHA1 Attacks
Some theoretical attacks against SHA1 were created demonstrating partial collisions. Then there was a full collision.
Hashing Algorithms (cont’d)
Authentication is a crucial application of hash functions. When you enter a username and a password, somewhere the system is comparing the password you entered to the one it has on file. But these should never be stored in plaintext, as that is an enormous security vulnerability. (This is found to be the case far more regularly than it should.) Instead, during authentication the system is comparing hashes of the passwords. You enter your password, a hash function processes it, and the resulting hash digest is compared to the stored hash digest.
If a hacker does steal the stored hashes, they can only perform a brute-force attack to try to find the corresponding passwords.
It should be noted that brute-force attacks technically impossible to completely protect against. Their success is a function of the attacker’s time and computing resources.
Processing through the hashing algorithm multiple times can help strengthen against brute-force attacks.
A rainbow table is a pre-computed table of plaintext passwords and corresponding hashes. This allows an attacker to lookup a hash without having to compute a brute-force attack. It trades CPU power and time for disk space.
A password salt is randomized data that is added into a hashing function to generate a hash that is unique to the password+salt combination. This helps protect against rainbow tables.
Depending on the size of the salt used—early UNIX systems used a 12-bit salt, which would require an attacker to compute 4096 hash+salt combinations for a rainbow table. Modern salts can be up to 128-bit, meaning there are 2^128 possible salt values, or 340 undecillion possible values. That is 340 with 36 zeroes after it. That is a large enough number to be considered impractical to calculate within a reasonable timeframe.
Public Key Infrastructure
PKI (public key infrastructure) is a critical part of securing communications over the internet. It defines the system of creation, distribution, and storage of digital certificates.
A certificate contains information about a public key and the organization that it belongs to, as well as a signature from another party that has verified the information.
This entity that has stored and signed the certificate is called the CA, or certificate authority. There is also an RA, or registration authority, that verifies the identities of anyone seeking to have certificates signed or stored with the CA.
A central repository is used to securely store keys, and a certificate management system will be employed to maintain the system.
An SSL/TLS server certificate is presented to a client when a connection is being made between a client and a server. The client will verify that the name of the server matches that of the certificate, and that the certificate is signed by a trusted CA.
A self-signed certificate is one that has been signed by the same entity that issued it—essentially signing the public key with the private key. This certificate will fail to verify unless the key is already trusted.
SSL/TLS client certificates are less common, but they allow the server to authenticate a client and allow access control to an SSL/TLS server.
Code signing certificates are used to verify the authenticity of executable programs.
PKI works on a “chain of trust” principle, and the chain begins with a root certificate authority, which creates a self-signed certificate (because it is the first link in the chain, or the highest authority). The root private key can then be used to create other certificates, which will inherit the trust needed to create other intermediate certificates. This creates a tree structure of trust emanating from the root CA.
A certificate without any authority as a CA is called an end-entity, or leaf certificate. It is at the end of the chain of trust.
Each major OS vendor ships a large number of root CA certificates with their OS, and usually have programs to distribute them, and browsers will use the provided store of root certificates.
The X.509 standard defines the format of digital certificates. This standard also defines a certificate revocation list (CRL) which is a way to distribute a list of certificates that are no longer valid.
X.509 was first implemented in 1988, and is currently in version 3.
X.509 defines the following fields in a certificate:
- Serial number: a unique identifier used by the CA to manage certificates
- Certificate Signature Algorithm: what public key algorithm is used for the public key and what hashing algorithm is used for the signature.
- Issuer Name
- Validity: “Not before” and “not after” define the dates for when the certificate is valid.
- Subject: info about the entity the certificate was issued to
- Subject public key info: define the algorithm of the public key and the public key itself.
- Certificate Signature Algorithm: Same as the subject public key info field—these two fields must match.
- Certificate signature value: The signature data itself.
There are also certificate fingerprints, which are hash digests of the entire certificate, used and computed by clients when validating certificates.
In addition to the PKI chain of trust model there is what is called the web of trust. This is when individuals sign each other’s public keys, after verifying their identities.
Reading: X.509 Standard
I feel like this may be a very exciting read.
Cryptography in Action
Here we cover some real-world applications of encryption that we’ve gone over.
Here’s how digital certificates work in securing website traffic using HTTPS, the secure version of HyperText Transport Protocol.
HTTPS uses either SSL or TLS to create a secure channel of communication, and though they may be referred to almost interchangeably SSL 3.0 is no longer recommended, as TLS 1.2 is the latest, most secure standard.
TLS is an independent protocol, and can be used in different applications including web browsing, VoIP, instant messaging and even Wi-Fi security. It offers three things:
- Secure communication, protected from eavesdropping.
- The ability to authenticate both parties (although usually only a server is authenticated to a client.)
- Integrity of communications: messages have checks to ensure they are not lost or altered.
A TLS connection begins with a TLS handshake.
The handshake is initiated by a client sending a signal to a TLS service, called the ClientHello, which includes information about the client. The server responds with a ServerHello message, including cipher information and a certificate. It then sends a ServerHelloDone message.
The client will then validate the certificate. If it checks out, it will send a ClientKeyExchange message, which includes a key exchange mechanism to create a shared secret with the server, and is used with a symmetric encryption cipher to encrypt all further communication. The client also sends a ChangeCipherSpec message, indicating that is switching to encrypted communication. This is followed by an encrypted Finished message, indicating that the handshake completed.
The server replies with its own ChangeCipherSpec message and encrypted Finished message.
Now, application data can begin to pass between server and client over the secure channel.
The Session key is the shared symmetric encryption key used in TLS. Because this key is derived from the public-private key, if it is compromised an attacker could be able to decrypt any communication using that session key.
The concept of forward secrecy helps defend against this.
SSH (SecureShell) is a secure network protocol that has many applications, most commonly used for remote login for command line access. It is critical that remote login applications and protocols use encryption. SSH uses public key cryptography to authenticate remote machines to clients. A key pair is generated by a user who wants to authenticate, then they distribute those public keys to any system they want to authenticate. SSH matches the public and private keys.
PGP (Pretty Good Privacy) is an encryption application that uses asymmetric encryption to allow authentication of data as well as privacy. It is commonly used in email encryption, but can also be used for disk encryption. PGP was released in 1991 by an anti-nuclear activist named Phil Zimmerman. It was available for anyone to use, and because it became so popular so quickly, Zimmerman ran afoul of the US government, which considered encryption technology that used keys larger than 40 bits to be subject to the same restrictions as munitions.
PGP was designed to use keys no smaller than 128 bits, which caused it to violate export restrictions Zimmerman published the source code as a hardcover book, which provided it with first amendment protections. He was not charged with any violations.
PGP is very secure, and has not known to have been broken.
Read about Zimmerman and PGP.
Securing Network Traffic
Encryption is used to both protect the privacy of information and the integrity. Sometimes, data is too sensitive to expose directly to the internet.
A virtual private network (VPN) is “a mechanism that allows you to remotely connect a host or network to an internal private network, passing data over a public channel, like the internet.”
There are many forms of VPNs available, using different mechanisms and protocols.
IPsec is a VPN protocol originally created in conjunction with IPv6. IPsec works by encapsulating an IP packet inside an IPsec packet, which is routed to an endpoint, decapsulated and decrypted and sent to the final destination.
IPsec has a transport mode that only encrypts the IP packet payload, leaving IP headers unencrypted.
If IPsec is used in tunnel mode the entire IP packet is encrypted, and is encapsulated in a new IP packet with new headers.
L2TP (Layer 2 Tunneling Protocol) is used to support VPNs, and can be used in conjunction with IPsec to encapsulate data in protocols on networks that may not support that type of traffic.
Think of both working together (L2TP IPsec) as L2TP providing a tunnel for data to pass between two networks, and the secure channel in that tunnel being provided by IPsec, offering confidentiality, integrity, and authentication.
OpenVPN uses SSL TLS and the OpenSSL library to handle key exchange and encryption. This can use certificate authentication, username/password authentication, and pre-shared secrets. Certificates are the most secure, but will require more overhead and management. Username/password can be implemented in conjunction with certificates for an added layer of security.
OpenVPN can operate using TCP or UDP, usually over port 1194. It can rely on a layer 3 IP tunnel or a layer 2 Ethernet tap, which is more flexible.
The OpenSSL library supports up to 256-bit encryption.
Reading: Securing Network Traffic
Here you go, kids. IETC RFC 3193, whatever the hell that is, and OpenVPN.
The Trusted Platform Module (TPM) is a hardware device that is integrated into a computer, and handles cryptographic processing. TPMs offer
- Secure generation of keys
- Random number generation
- Remote attestation: A system can authenticate its hardware and software to a remote system.
- Data binding and sealing
A TPM contains a unique RSA key embedded in it, which can be used to generate a hash of its system configuration, and allows for remote attestation.
The TPM can also use this embedded key for binding and sealing, meaning that only a unique key derived from the TPM key can be used to encrypt data, and only keys installed in the TPM can be used to decrypt data.
TPMs also feature tamper resistance, to defeat physical attacks on the chips. Mobile devices have similar chips called a secure element that can be integrated into a processor or main board.
A Trusted Execution Environment (TEE) is an isolated environment running alongside the main OS, which serves to isolate sensitive processes.
Attacks are possible on TPMs, and have been criticized because manufacturers have access to the TPM’s key at the time of manufacture.
Full disk encryption (FDE) is exactly what it sounds like. Examples include PGP, Bitlocker from Microsoft, Filevault 2 from Apple, and the open-source software dm-crypt for Linux systems.
Encrypted disks will have a main partition that is encrypted, and a small unencrypted boot partition.
Encryption relies on random numbers, because if encryption is based on anything else there is a chance that patterns could be detected which would allow breaking the encryption. Something that is not truly random is called pseudo-random. Operating systems contain what is called an entropy pool, which is a source of random data used to seed random number generators.
Reading: TPM Attacks
Here’s a reading on a physical attack against a TPM.
>>>>>Create/inspect key pair, encrypt/decrypt and sign/verify using openSSL
>>>>> Hands-On With Hashing
These two exercises were actually kind of fun—creating private and public keys, encrypting and decrypting messages, and signing and verifying files using hashes. Then doing more hashing and verifying in the second exercise—all in Linux instances so they worked the first time and I was able to sign in with no problems!
See you all next week for “AAA Security”!
Samsung 860 EVO 500GB 2.5 Inch SATA III Internal SSD
WD 2TB Elements Portable External Hard Drive - USB 3.0
Homer's Odyssey | Translated by Norbert A.D. Albertson | Paperback
Vacationland: True Stories from Painful Beaches - by John Hodgman
2018 Lenovo Business Flagship Laptop PC 15.6" Touchscreen Intel 8th Gen i5-8250U Quad-CoreThe Chapo Guide to Revolution: A Manifesto Against Logic, Facts, and Reason