You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When loading PDF document from WEB, populate Vector Store fails with the following error:
2025-Jan-09 19:19:04 - INFO - (modules.utilities): Response for https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/oracle-ai-vector-search-users-guide.pdf: 200
2025-Jan-09 19:19:04 - INFO - (chunk_embed): Loading PDF from web to /tmp/tmpotjj_jne/oracle-ai-vector-search-users-guide.pdf
2025-Jan-09 19:19:04 - INFO - (chunk_embed): Wrote /tmp/tmpotjj_jne/oracle-ai-vector-search-users-guide.pdf
2025-Jan-09 19:19:04 - INFO - (modules.split): Loading oracle-ai-vector-search-users-guide.pdf (6270 bytes)
2025-Jan-09 19:19:04 - WARNING - (pypdf._reader): invalid pdf header: b'<!DOC'
2025-Jan-09 19:19:04 - WARNING - (pypdf._reader): EOF marker not found
2025-Jan-09 19:19:04 - ERROR - (chunk_embed): Operation Failed: Stream has ended unexpectedly
Traceback (most recent call last):
File "/app/content/split_embed.py", line 427, in main
split_docos, _ = split.load_and_split_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/modules/split.py", line 171, in load_and_split_documents
loaded_doc = loader.load()
^^^^^^^^^^^^^
File "/opt/venv/lib64/python3.11/site-packages/langchain_core/document_loaders/base.py", line 31, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib64/python3.11/site-packages/langchain_community/document_loaders/pdf.py", line 257, in lazy_load
yield from self.parser.parse(blob)
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib64/python3.11/site-packages/langchain_core/document_loaders/base.py", line 127, in parse
return list(self.lazy_parse(blob))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib64/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 123, in lazy_parse
pdf_reader = pypdf.PdfReader(pdf_file_obj, password=self.password)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib64/python3.11/site-packages/pypdf/_reader.py", line 133, in __init__
self._initialize_stream(stream)
File "/opt/venv/lib64/python3.11/site-packages/pypdf/_reader.py", line 155, in _initialize_stream
self.read(stream)
File "/opt/venv/lib64/python3.11/site-packages/pypdf/_reader.py", line 608, in read
self._find_eof_marker(stream)
File "/opt/venv/lib64/python3.11/site-packages/pypdf/_reader.py", line 716, in _find_eof_marker
line = read_previous_line(stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib64/python3.11/site-packages/pypdf/_utils.py", line 288, in read_previous_line
raise PdfStreamError(STREAM_TRUNCATED_PREMATURELY)
pypdf.errors.PdfStreamError: Stream has ended unexpectedly
The text was updated successfully, but these errors were encountered:
When loading PDF document from WEB, populate Vector Store fails with the following error:
The text was updated successfully, but these errors were encountered: