Format Preserved Encryption

So what is Format Preserved Encryption exactly?

Soham Kamble
5 min readMar 5, 2021

Format-Preserving Encryption (FPE), refers to encrypting in such a way that the output (the ciphertext) is in the same format as the input (the plaintext). The meaning of “format” varies.

FPE encrypts plaintext that is a certain length and produces a ciphertext that is the same length as the plaintext and uses the same set of values as the plaintext.

Using the example of a 16- digit credit card number with a plaintext of 1483920193402918, the ciphertext created with FPE could produce an output of 1483666666662918.

For finite domains, the cipher is equivalent to a permutation of N integers{0,.., N-1} where N is the size of the domain

Some of the famous Algorithms are:

1.The FPE constructions of Black and Rogaway
2. The Thorp Shuffle
3. VIL mode
4. Hasty pudding Cipher
5. The FFSEM/FFX mode of AES
6. FPE for JPEG 2000 encryption

How FPE works:

The problem of FPE is to generate a pseudorandom permutation from a secret key, in such a way that the computation time for a single value is small (ideally constant, but most importantly smaller than O(N))

The measure of a GOOD FPE is whether an attacker can distinguish the FPE from a truly random permutation.

Format Preserved Encryption in Google Cloud Platform

GCP gives users access to a de-identification technique called pseudonymization. Pseudonymization is a technique that replaces sensitive data with cryptographically generated tokens.

Google Cloud uses a type of FPE called FPE-FFX. FFX focuses on two different FPE methods, FF1 and FF3, to encrypt data.

FFX uses multiple rounds of a Feistel function on the plaintext, along with the use of a key, to create the ciphertext.

A Feistel function splits the plaintext into two parts and does a permutation each round on each half of the plaintext, and then swaps the left half of text to the right and vice versa.

Diagram of a Feistel network. The input (left side) is split into two halves and passed
through a number of rounds. In the Luby-Rackoff variant, each function “F” is implemented via an independent, keyed pseudorandom function. Decryption is accomplished by reversing the process.

The FF1 method uses 10 rounds of the Feistel function, and FF3 uses 8 rounds of the Feistel function. FPE-FFX has several steps necessary to encrypt data.

To begin encryption, the alphabet being used to de-identify the data must be specified in one of three ways:
1. Using one of four values that represent the most common character sets/alphabets
2. Using a radix value specifying the size of the alphabet. Specifying 2 gives an alphabet consisting of the numbers 0 and 1 while specifying 95 gives an alphabet with all numeric, upper-case alpha, lower-case alpha, and symbol characters
3. By building an alphabet containing the exact characters to be used.

When using FPE-FFX in Google Cloud, the data is encrypted, but can also be prepended with a surrogate annotation, resulting in a final token. The token takes the following form when a surrogate annotation is included:
surrogate_infotype(surrogate_length): surrogate_value.

The surrogate annotation is surrogate_infotype(surrogate_length). The infotype is defined by the user and the surrogate value is the resulting ciphertext. If no surrogate annotation is specified, then the final token is just the surrogate value. To re-identify unstructured data, the full token, including a surrogate annotation, is necessary, while structured data only needs the surrogate value.

Applications

Format-preserving encryption is mostly used in on-premise encryption and tokenization solutions. It is used to protect sensitive data sets such as payment card data, Social Security numbers, and country identifiers that are commonly used and stored in retail, healthcare, and financial databases and applications.

Drawbacks in the FPE market

The format-preserving encryption market appears to be losing stability because two of the three FFX methods used for format-preserving encryption are not considered to be cryptographically secure. Additionally, organizations that are producing format-preserving encryption solutions leveraging FF2 and FF3 methods are struggling to validate their solutions with NIST 800–38G, which will prove to be challenging. There only remains one patented method that can be used, limiting competition and increasing pricing.

One of the biggest milestones in FPE research was the Black and Rogaway paper so here are a few key observations made in them.

Observations made in the Black and Rogaway paper

  1. There are all sorts of applications where you need to preserve the format of input data.
  2. You could design your own cipher to do it, but it would probably suck.
  3. It would sure be nice if we could do it securely using standard ciphers, like AES.

What Black and Rogaway noticed is that there are really two problems here. The first is to build a cipher that works on arbitrary bit lengths. The second problem is to use such a cipher to encrypt arbitrary sets.
The first problem is usually accomplished with a Feistel network construction like the one used in the DES cipher.

To build a cipher that operates on 54-bit blocks, for example, we would cut the input block into two 27-bit halves, L0 and R0, and push them through a Feistel network for several rounds. This is an amazingly simple construction; it requires only XOR and some non-linear function F(), which we’ll implement using an existing cipher like AES (with truncation of the output).

Note that each call to AES must use a different key, although we can derive these all from one single key using standard techniques. Since AES is much better than the DES S-boxes, we won’t need to run the network for 16 rounds. Three or four rounds is probably sufficient.

How the cycling algorithm works

Let’s say I want to encrypt my Visa card number (“4532294977918448“). I’ll follow these steps:

  1. First, I’ll encode the number as a 54-bit integer, then encrypt it using the fancy 54-bit block cipher I built above.
  2. At the end of this process, I should have 54-bit ciphertext C. Now, there’s a very good chance that represented as an integer, C will be ‘small’ enough to represent as a 16-digit integer. If so, then I’m done. That’s my output!
  3. But what if C is too large? No problem. If first, you don’t succeed, just try again. Take that first ciphertext C, and encrypt it again using the block cipher to get a new C. Go back to step 2.

Conclusion

Format preserving encryption is extremely important for users who wish to keep the ciphertext after encryption at the same length as the plaintext. Of the several different FPE-FFX methods used on Google Cloud, FF1 is the best practice method to use, due to the extra rounds of the Feistel function it goes through.

Structured data requires a surrogate annotation to be prepended on the ciphertext to allow for re-identification of data. Google Cloud has a strong implementation of FPE in place for customer use. For those in need of same-length plaintext and ciphertext, Google Cloud’s FPE-FFX is their best choice.

References

--

--