-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add isLiveDataset
field
#494
Changes from all commits
d8829b2
c650c24
7a66ce6
a32f2ef
3833257
4886642
2fcdb8b
541b1da
36840f2
8542526
6462f34
62a3371
faae477
3c66c31
405e7e9
aa05e1f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,6 @@ | |
|
||
from mlcroissant._src.core import constants | ||
from mlcroissant._src.core.constants import EncodingFormat | ||
from mlcroissant._src.core.context import CroissantVersion | ||
from mlcroissant._src.core.optional import deps | ||
from mlcroissant._src.core.path import get_fullpath | ||
from mlcroissant._src.core.path import Path | ||
|
@@ -168,9 +167,17 @@ def _check_hash(self, filepath: epath.Path): | |
logging.info( | ||
"Hash of downloaded file is not identical with reference in metadata.json" | ||
) | ||
# In v0.8 only, hashes were not checked. | ||
|
||
ctx = self.node.ctx | ||
if ctx.conforms_to and ctx.conforms_to > CroissantVersion.V_0_8: | ||
# For live datasets, we do not raise an error if the hashes checks fail, but | ||
# only a warning. | ||
if ctx.is_live_dataset: | ||
logging.warning( | ||
"Hash of downloaded file not identical with reference in metadata.json!" | ||
) | ||
return | ||
# In v0.8 only, hashes were not checked. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of the comment, use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I just moved the pre-existing line below. I think ctx.is_v0() would do the same. |
||
if not ctx.is_v0(): | ||
raise ValueError( | ||
f"Hash of downloaded file {filepath} is not identical with the" | ||
f" reference in the Croissant JSON-LD. Expected: {expected_hash} -" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a unit test in download_test.py? Something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, done.