Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf_features and a few other libraries are not imported #118

Open
asleroid opened this issue Apr 24, 2024 · 1 comment
Open

pdf_features and a few other libraries are not imported #118

asleroid opened this issue Apr 24, 2024 · 1 comment

Comments

@asleroid
Copy link

Even though pdf_features is in the installed libraries within venv, running 'pip list' does not return the library.

As a result, when running the following command, the script errors out:
(venv) asleroid@Aslis-MBP pdf_paragraphs_extraction % python src/create_paragraph_extractor_model.py /Users/asleroid/Code/pdf-labeled-data/labeled_data/paragraph_extraction loading one_column_test from /Users/asleroid/Code/pdf-labeled-data/labeled_data/paragraph_extraction/one_column_test Traceback (most recent call last): File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/create_paragraph_extractor_model.py", line 25, in <module> train_model() File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/create_paragraph_extractor_model.py", line 12, in train_model pdf_paragraph_tokens_list = load_labeled_data(PDF_LABELED_DATA_ROOT_PATH) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/paragraph_extraction_trainer/load_labeled_data.py", line 34, in load_labeled_data pdf_paragraph_tokens = PdfParagraphTokens.from_labeled_data(pdf_labeled_data_root_path, dataset, pdf_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/paragraph_extraction_trainer/PdfParagraphTokens.py", line 29, in from_labeled_data pdf_features = PdfFeatures.from_labeled_data(pdf_labeled_data_root_path, dataset, pdf_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asleroid/Code/pdf_paragraphs_extraction/venv/lib/python3.11/site-packages/pdf_features/PdfFeatures.py", line 126, in from_labeled_data pdf_features.set_token_types(token_type_labels) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'set_token_types'

@gabriel-piles
Copy link
Member

Thank you for reaching out.

The PdfFeatures class is inside the pdf-tokens-type-labeler package. You can install this package using the following command

pip install git+https://github.com/huridocs/pdf-tokens-type-labeler@1c12c368887372164ab4981c3277a49e9dc43b9a

Let us know if this solves your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants