Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline.Text #95

Open
sahilgupta2105 opened this issue May 26, 2021 · 1 comment
Open

Pipeline.Text #95

sahilgupta2105 opened this issue May 26, 2021 · 1 comment
Assignees
Labels

Comments

@sahilgupta2105
Copy link
Collaborator

some of the methods feel incomplete, eg. from_folder tries to ingest a bunch of text files from a folder, but what about the labels?

@sahilgupta2105 sahilgupta2105 added the bug Something isn't working label May 26, 2021
@sahilgupta2105 sahilgupta2105 self-assigned this May 26, 2021
@aiqc
Copy link
Owner

aiqc commented May 26, 2021

Option a) List argument for labels where the number of list elements is validated against the number of textdata entries?

Option b) When faced with this problem for Dataset.Image, I opted to create the higher-level Pipeline.Image which constructs both a Dataset.Tabular for the label and a Dataset.Image for the image, which is the main reason why Splitset accepts labels and features from different datasets.

If you chose not to include label columns in Dataset.Text, then you are free to name the columns whatever you like and you can automatically use the text-based encoding methods on them by default.

so there's pros and cons

@aiqc aiqc added the feature label May 8, 2022
@aiqc aiqc changed the title revisit data ingestion methods for text dataset Pipeline.Text May 8, 2022
@aiqc aiqc removed the bug Something isn't working label May 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants