-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#81 Venomx integration #96
Conversation
Currently seeing these errors on my system: ============================================================================================== short test summary info ===============================================================================================
FAILED tests/cli/test_store_cli.py::test_store_management - AssertionError: assert 1 == 0
FAILED tests/store/test_duckdb_adapter.py::test_store_variations[all-MiniLM-L6-v2-False-model-True] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need to load the extension that...
FAILED tests/store/test_duckdb_adapter.py::test_store_variations[None-False-model-True] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need to load the extension that...
FAILED tests/store/test_duckdb_adapter.py::test_store_variations[all-MiniLM-L6-v2-False-id-True] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need to load the extension that...
FAILED tests/store/test_duckdb_adapter.py::test_store_variations[None-False-id-True] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need to load the extension that...
FAILED tests/store/test_duckdb_adapter.py::test_fetch_all_memory_safe - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need to load the extension that...
FAILED tests/store/test_duckdb_adapter.py::test_the_embedding_function_variations[None-None-False] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need to load the extension that...
FAILED tests/store/test_duckdb_adapter.py::test_the_embedding_function_variations[one_collection-None-False] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need to load the extension that...
ERROR tests/store/test_duckdb_adapter.py::test_diversified_search - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need to load the extension that...
==================================================================== 8 failed, 62 passed, 96 skipped, 8426 warnings, 1 error in 82.86s (0:01:22) =====================================================================
py: exit 1 (86.02 seconds) /home/harry/curate-gpt> poetry run pytest pid=269016
py: FAIL code 1 (86.49=setup[0.48]+cmd[86.02] seconds)
evaluation failed :( (86.69 seconds) |
We looked into the WAL errors before: #85 (comment) |
@caufieldjh There must be different behaviours regarding file-clean ups/ file locks and your machine might see the WAL and wants to recover but does not have the extensions loaded (because by now the connection is first established and than the extensions are loaded) DuckDB's documentation mentions this scenario with HNSW indexes, recommending to load the VSS extension before any database recovery attempts. I'm implementing this pre-loading step to ensure consistent behavior across platforms, especially for test scenarios where databases are frequently created and reset. |
@caufieldjh |
OK - now the WAL errors in Strangely, I'm still getting the FAILED tests/cli/test_store_cli.py::test_store_management - AssertionError: assert 1 == 0 stack trace: ____________________________________________________________________________________ test_store_management ____________________________________________________________________________________
runner = <click.testing.CliRunner object at 0x7f53374c1db0>
def test_store_management(runner):
result = runner.invoke(
main, ["ontology", "index", ONT_DB, "-D", "chromadb", "-m", "all-MiniLM-L6-v2", "-c", "oai"]
)
assert result.exit_code == 0
result = runner.invoke(main, ["ontology", "index", ONT_DB, "-c", "default"])
assert result.exit_code == 0
result = runner.invoke(main, ["search", "-c", "default", "nuclear membrane"])
assert result.exit_code == 0
assert "nuclear membrane" in result.output
result = runner.invoke(main, ["search", "-c", "oai", "nuclear membrane"])
assert result.exit_code == 0
assert "nuclear membrane" in result.output
result = runner.invoke(main, ["collections", "list"])
> assert result.exit_code == 0
E AssertionError: assert 1 == 0
E + where 1 = <Result KeyError('_venomx')>.exit_code
tests/cli/test_store_cli.py:21: AssertionError
-------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------
ERROR curategpt.store.chromadb_adapter:chromadb_adapter.py:293 Failed to get collection oai: Collection oai does not exist.
ERROR curategpt.store.chromadb_adapter:chromadb_adapter.py:293 Failed to get collection oai: Collection oai does not exist.
ERROR curategpt.store.chromadb_adapter:chromadb_adapter.py:293 Failed to get collection default: Collection default does not exist.
ERROR curategpt.store.chromadb_adapter:chromadb_adapter.py:293 Failed to get collection default: Collection default does not exist. |
Or the more useful stack trace: $ curategpt collections list
Traceback (most recent call last):
File "/home/harry/curate-gpt/.venv/bin/curategpt", line 6, in <module>
sys.exit(main())
File "/home/harry/curate-gpt/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/harry/curate-gpt/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/harry/curate-gpt/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/harry/curate-gpt/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/harry/curate-gpt/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/harry/curate-gpt/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/harry/curate-gpt/src/curategpt/cli.py", line 2063, in list_collections
cm = db.collection_metadata(cn, include_derived=derived)
File "/home/harry/curate-gpt/src/curategpt/store/chromadb_adapter.py", line 298, in collection_metadata
cm = Metadata.deserialize_venomx_metadata_from_adapter(metadata_data, self.name)
File "/home/harry/curate-gpt/src/curategpt/store/metadata.py", line 49, in deserialize_venomx_metadata_from_adapter
venomx_json = metadata_dict.pop("_venomx")
KeyError: '_venomx' |
Ah, I think this is a me problem: I had a bunch of existing chromadb collections made before this PR, so they don't have any of the venomx metadata. The ERROR:curategpt.store.chromadb_adapter:Failed to get object count: "Metadata" object has no field "object_count"
## Collection: vbo_new N=19762 meta={'hnsw_space': 'cosine', 'model': 'openai:', 'name': 'vbo_new', 'object_type': 'OntologyClass'} // venomx=None hnsw_space='cosine' object_type='OntologyClass' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to mention in the release notes and docs that this is a breaking change re: older ChromaDB collections (but I think duckdb collections will be OK)
🥳 Any ontology indexed with the All fields that were prior carried by the old metadata are also carried with venomx. |
this is an ongoing PR to migrate to the venomx metadata format as described in #81