pefs crypto primitives
This is a second part of a series of posts devoted to pefs (stacked cryptographic filesystem). First part was about benchmarking encryption overhead
So I’m going to shed some light on cryptographic primitives used in pefs.
Supported data encryption algorithms: AES, Camellia, Salsa20.
Salsa20 is a new stream cypher by Daniel Bernstein from eSTREAM portfolio. Available AES and Camellia key sizes: 128, 192 and 256 bits. Adding another block cipher is trivial (choosing one with 128bit block size would save your from writing few more lines of code). Stream ciphers are a bit different as they do not usually support encryption at arbitrary offset in key stream (what CTR mode does for block ciphers). Most of them also lack support for tweaked encryption. Salsa20 fits here well mostly because it’s designed on top of hash function. Another possible candidate for inclusion is Skein – SHA3 hash function by Bruce Schneier et al. submitted to SHA3 competition. Usage of Skein as a stream cipher is suggested by its authors.
File names are always encrypted using AES-128 in CBC mode. Encrypted file name consists of a unique per file tweak, checksum and name itself:
XBase64(checksum || E(tweak || filename))
Checksum is VMAC of encrypted tweak and file name:
checksum = VMAC(E(tweak || filename))
Main reason for not providing alternatives is to keep it simple: “complexity is insecurity“. Data encryption is entirely different: encrypted data is not parsed in any way by pefs and user expects to be able to use secure-fast-best_name cipher.
Name has such structure to work around some of CBC shortcomings. Random tweak value is placed at the beginning of the first encrypted block. That gives us unique encrypted file names and eliminates the need of dealing with initial IV. (For those who cares: initial IV is zero and name is padded with zeros).
Encrypt-then-Authenticate construction is used. In addition to being most secure variant at the moment it allows checking if the name was encrypted by the given key without performing decryption. VMAC was chosen because of it performance characteristics and its ability to produce 64bit MAC (without truncation of original result, like in HMAC case). 64bit is almost mandatory here because larger MAC would result in much larger file name and bigger MAC can hardly improve security. But the real reason is that no real “authentication” performed. It’s designed to be just a cryptographic checksum (sounds incorrect but I can’t find a better wording), so that breaking VMAC wouldn’t result in breaking encrypted data, and this VMAC doesn’t “authenticate” encrypted data. It’s main purpose is to find a key the file is encrypted with.
Encrypted directory name also contains tweak but it’s used solely to randomize first CBC cipher block and keep name structure uniform.
Data encryption utilizes the tweak to get unique per file ciphertext (for files with same data). Block ciphers (AES, Camellia) operate in counter (CTR) mode with 64bits of a block containing the tweak and other 64bits containing file offset. Salsa20 supports tweaked encryption out of the box. Choosing such mode of operation eliminates necessity of performing cipher block size io. Which significantly simplifies entire filesystem code: io, page mapping, doesn’t change file size, etc (it removes a tone of error prone code, believe me, I’ve tried it other way).
As one has probably guessed there are 3 keys involved: one for name encryption, VMAC and data encryption. These are derived from user supplied key using HKDF algorithm (HMAC based key derivation function, IETF draft). The kernel part expects cryptographically strong key from userspace. This key is generated by PBKDF on top of HMAC-SHA512.
Standard implementations of ciphers are used, but I do not use opencrypto framework, so there is no hardware acceleration available. opencrypto is not used mainly because it lacks support for CTR mode. It’s also unlikely to ever support Salsa20 the way it’s used in pefs. opencrypto is rather heavy weight so using it solely for name encryption would probably worsen performance (hw initialization costs for encrypting short chunks with different keys) and add extra dependency.
Besides pefs supports multiply keys, mixing files encrypted with different keys in single directory, transparent(unencrypted) mode, key chaining (adding a series of keys by entering just one of them) and more. I’m going to write about it soon.