Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make encrypted file indistinguishable from random data? #312

Closed
msimonsson opened this issue Jun 1, 2021 · 4 comments
Closed

Make encrypted file indistinguishable from random data? #312

msimonsson opened this issue Jun 1, 2021 · 4 comments

Comments

@msimonsson
Copy link

Hi,

If I strip the first 16-bytes from an encrypted file, i.e. magic string + parameters, the remaining bytes should be indistinguishable from random data, is that correct?

If I don't control control the parameters (version <= 1.3.0), the missing 16-bytes should be easily brute-forced using the header checksum?

Thanks,
Mikael

@gperciva
Copy link
Member

Interesting question! Sorry for the delay.

I can't speak to any theoretical guarantees, but it looks promising empirically. We can strip the first 16 bytes with dd, and use ent to evaluate the randomness. (You might need to install ent separately.)

Then, we can make a small script to do this testing:

#!/bin/sh
echo "" | scrypt enc -P $1      \
        | dd ibs=16 skip=1      \
        | ent

Sample output of testing the scrypt binary itself:

$ ./test_random.sh scrypt 
26152+1 records in
817+1 records out
418440 bytes transferred in 0.013690 secs (30564280 bytes/sec)
Entropy = 7.999478 bits per byte.

Optimum compression would reduce the size
of this 418440 byte file by 0 percent.

Chi square distribution for 418440 samples is 302.41, and randomly
would exceed this value 2.22 percent of the times.

Arithmetic mean value of data bytes is 127.3885 (127.5 = random).
Monte Carlo value for Pi is 3.145856037 (error 0.14 percent).
Serial correlation coefficient is -0.004233 (totally uncorrelated = 0.0).
$ 

(You might want to add a 2>/dev/null to the dd line.)

@cperciva
Copy link
Member

Without the 16 byte "scrypt" + parameters, you can still distinguish scrypt data from random by searching for parameters which make the hash starting at byte 48 correct.

I'm not clear on why you care about making the file indistinguishable from random though...?

@gperciva
Copy link
Member

gperciva commented Jun 15, 2021

I can think of two reasons:

  1. security through obscurity. If an attack doesn't know what algorithm was used to encrypt the file, it would be harder to decrypt.
    (I'm not saying that this is a good reason; if our implementation of the scrypt algorithm is secure, then adding "obscurity" doesn't provide any noticeable extra protection.)

  2. plausible (?) deniability, particularly in a jurisdiction which has a "you can be legally compelled to decrypt files upon request" law.
    A programmer could have a couple of files of random data on her system (maybe to have a reproducible source of random data for a game or physical simulation?), and random-02.data could be where she stores her personal info.

We've said that

As a result of these exceptions, network administrators can identify Tarsnap network traffic. Tarsnap is not a tool designed to hide its usage
https://www.tarsnap.com/network.html

and I imagine that this is true of scrypt as well.

@msimonsson
Copy link
Author

Outstanding replies @gperciva!

I can't speak to any theoretical guarantees, but it looks promising empirically.

It does, I did not know about ent, thank you!

I'm not clear on why you care about making the file indistinguishable from random though...?

@gperciva nailed it, security through obscurity and plausible deniability. I don't want to write too much about it here but for example, say you're traveling and you're robbed of all your possessions, how do you recover? If you have your tarsnap.key (or whatever you need to recover) stored on a publicly accessible web server it's easy, just download it and go from there. But there's absolutely no reason to advertise to anyone (VPS provider, hackers etc.) what this file contains.

I'm closing this issue for now, thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants