Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor datasets logic #43

Merged
merged 20 commits into from
Nov 25, 2024
Merged

Refactor datasets logic #43

merged 20 commits into from
Nov 25, 2024

Conversation

truff4ut
Copy link
Collaborator

No description provided.

@truff4ut truff4ut requested a review from voorhs November 12, 2024 05:36
Copy link
Collaborator

@voorhs voorhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

найс! можно продолжить работу в этом направлении

мне больше всего нравится что тут поддерживаются сплиты на трейн тест вал

@truff4ut truff4ut requested review from voorhs and Samoed November 20, 2024 11:01
Copy link
Collaborator

@voorhs voorhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

осталось добавить валидацию и пуш в hf hub

Comment on lines +32 to +33
if self.dataset.multilabel:
self.dataset = self.dataset.encode_labels()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

эти действия никак не вставить на этап загрузки датасета?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Надо подумать

autointent/context/data_handler/data_handler.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@voorhs voorhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Кажется надо удалить файл autointent/context/data_handler/scheme.py и соседние ему multilabel_generation.py и sampling.py. Они не используется и при этом вызывают ошибки тайпинга.

Плюс надо прогнать форматтер make lint

Еще я сейчас обнаружил что у нас есть целая фича не покрытая тестами: Tags. Надо будет заняться этим до релиза

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@voorhs voorhs merged commit 4e1d43f into dev Nov 25, 2024
20 checks passed
@voorhs voorhs deleted the refactor/datasets branch November 25, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants