Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solutions/forms.py should use task.max_file_size if uploaded solution is a zip-file in an other way. #15

Closed
ifrh opened this issue Dec 31, 2022 · 8 comments

Comments

@ifrh
Copy link
Contributor

ifrh commented Dec 31, 2022

Currently HSO-Praktomat blocks Zip-files which are bigger than task.max_file_size.

if sum(fileinfo.file_size for fileinfo in zip.infolist()) > max_file_size:

If one add n single non-zip-files as solution than the upload limit is n*max_file_size.
Therefor I think a possible fix is:
change solutions/forms.py L50
from if sum(fileinfo.file_size for fileinfo in zip.infolist()) > max_file_size:
to if sum(fileinfo.file_size for fileinfo in zip.infolist()) > (max_file_size * len(zip.infolist()):

(cf. KITPraktomatTeam/Praktomat#359 )

@hannesbraun
Copy link
Contributor

The problem that I see with this method is that it doesn't protect against zip bombs as good as before. An example: this now allows a zip file to contain a large number of files with 0 bytes in size as well as one easily compressible large file. Sure, there is still a limit. But I'm not sure if that's effective anymore.

Maybe something like if sum(fileinfo.file_size for fileinfo in zip.infolist()) > (max_file_size * max(len(zip.infolist()), 8): is a better tradeoff. After all, the zip file as a whole still has to be smaller than max_file_size. With an increasing number of files, we allow for higher compression ratios. However, at a maximum of eight files or more, we allow a compression ratio of up to 8:1. I think that would probably suffice... And these assumptions are made under the condition that you're always right at the limit. If you've got a lot of small files that don't exceed the "zip bomb limit" (which is somewhat common for programming exercises), you're fine anyway.

@ifrh
Copy link
Contributor Author

ifrh commented Jan 7, 2023

Well , in our Praktomat instance for Database Lecture, our students should include not only UTF8-encoded textfiles inside their submissions but also some binary files like PDF or JPG. Perhaps introcuding max_zip_file_size in Task-Model would be a better way to prevent "zip bombs" one the one hand and be adaptive enought to match expected filesizes.

@hannesbraun
Copy link
Contributor

our students should include not only UTF8-encoded textfiles inside their submissions but also some binary files like PDF or JPG

In that case, this should matter even less as JPG itself is already a compressed file format. Contained in a ZIP file, you're not likely to compress the file by a lot. About the same should be true for PDF. Then you're hitting the max_file_size limit first.

Introducing another parameter for a task would just add unnecessary complexity here in my opinion. Or am I missing something important?

@ifrh
Copy link
Contributor Author

ifrh commented Mar 8, 2023

Hannesbraun wrote in #15 (comment) :

Maybe something like if sum(fileinfo.file_size for fileinfo in zip.infolist()) > (max_file_size * max(len(zip.infolist()), 8): is a better tradeoff.

What is the semantical reason to use "8" in above condition?

Just a funny view, the code-snippet 8): looks like an emoticon
grafik

@hannesbraun
Copy link
Contributor

What is the semantical reason to use "8" in above condition?

Without the surrounding max(<number of files>, 8), the number of files could increase endlessly. If the number of files increases endlessly, so does the allowed size for the sum of all file sizes. That again allows for a zip bomb where you place a huge number of files with 0 bytes in size into the zip file and one with the size of max_file_size multiplied by the number of files in that zip file. That huge file would of course be very easy to compress, which is why the size of the zip file itself would still be small.

That's the reason I'm limiting the number of files in that calculation to 8. It means that if the sum of file sizes is bigger than 8 * max_file_size, we'll reject the submission as that looks like a suspiciously high compression ratio given that we know what students will submit and set the max_file_size to a rational value. If the sum of file sizes is still smaller (even though the contents of the zip file are highly compressed with a ratio of more than 8:1), we will still accept the submission.

I hope this explanation makes a bit more sense. If not, let me know :)

Just a funny view, the code-snippet 8): looks like an emoticon

What a happy accident ;)

@ifrh
Copy link
Contributor Author

ifrh commented Mar 12, 2023

Hannes wrote.

Without the surrounding max(<number of files>, 8), the number of files could increase endlessly.

Well, if 8 should be the limit, than you need to use min and not max.

@hannesbraun
Copy link
Contributor

Of course, yes. Somehow, I got it mixed up... Thanks for noticing.

@hannesbraun
Copy link
Contributor

Fixed in #24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants