Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added connector folder and HF file #313
base: main
Are you sure you want to change the base?
added connector folder and HF file #313
Changes from 11 commits
68df99a
16257b9
4cd48ff
86b230f
497e04b
74be0dc
a5a41f4
152a99e
2f783ca
9f0575b
d30529d
829d7df
8bf9c52
9ae14f4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here about large datasets - are we sure this will not break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could I solve this with batch processing and by passing
streaming=True
through load dataset? Aaron mentioned that so it seems like that could prevent that issueThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two potential issues here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we making an index here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite; that part mainly just converts anything that isn't a string, like booleans or lists, into strings because it will error out if I don't. I didn't really like how it is just a bunch of conditional statements but I couldn't find a better way to resolve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about instead of making this an interactive script, use argparse: https://docs.python.org/3.10/library/argparse.html?highlight=argparse#module-argparse
that way it's easier to handle optional args like split and limit