-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing tests with provided OPENAI_API_KEY #85
Fixing tests with provided OPENAI_API_KEY #85
Conversation
ab5763d
to
a245bf7
Compare
…enAI API key was passed - Implement abstractmethod in dae_evaluator and refactor to new OpenAI API - Refactor dimension function in duckdb_adapter - test refactoring: clinwar wrapper, dae evaluator, splitter - Fix issues with readonly database and linting
a245bf7
to
9c08675
Compare
I'm still getting some strange duckdb errors locally - any idea what is going on @iQuxLE ? ================================================================================= short test summary info ==================================================================================
FAILED tests/store/test_duckdb_adapter.py::test_store_variations[all-MiniLM-L6-v2-False] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need ...
FAILED tests/store/test_duckdb_adapter.py::test_store_variations[None-False] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need ...
FAILED tests/store/test_duckdb_adapter.py::test_fetch_all_memory_safe - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need ...
FAILED tests/store/test_duckdb_adapter.py::test_the_embedding_function_variations[None-None-False] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need ...
FAILED tests/store/test_duckdb_adapter.py::test_the_embedding_function_variations[one_collection-None-False] - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need ...
ERROR tests/store/test_duckdb_adapter.py::test_diversified_search - duckdb.duckdb.Error: Failure while replaying WAL file "/home/harry/curate-gpt/tests/output/duckdbvss.db.wal": Cannot bind index 'test_collection', unknown index type 'HNSW'. You need ... |
More of a note to myself than anyone else: reminder that running |
test_runner also seems to have trouble finding the test collection: FAILED tests/evaluation/test_runner.py::test_runner[20-4-fields_to_predict0-fields_to_mask0] - ValueError: Insufficient test objects in collection terms_go_testing_4; 0 < 4
FAILED tests/evaluation/test_runner.py::test_runner[20-4-fields_to_predict1-fields_to_mask1] - ValueError: Insufficient test objects in collection terms_go_testing_4; 0 < 4
FAILED tests/evaluation/test_runner.py::test_runner[20-4-fields_to_predict2-fields_to_mask2] - ValueError: Insufficient test objects in collection terms_go_testing_4; 0 < 4 |
These WAL errors did not come up for me as of right now. However, it is some kind of a known problem with the persistent duckdb vector similarity search feature
By now I handled this by killing the old process when connecting to the db as here. I keep you updated and try to find a better solution. |
I also ran into those in some cases, however somehow most of the times it seemed to be not failing. I did lots of Debugging for this already and the data is stratified correctly but then suddenly It fails to find it. For this I think the reason might also be some "parallel" read problems, so different processes are looking into the same db. |
Some |
OK, excellent. I'm not too worried about the duckdb issues as there are so many ways that can break and we can't account for all of them, plus we have chromadb as a fallback. But catching the errors around creating the DBs is a good idea. |
…e isolated dbs for each test, introduce tmp_path and DEBUG mode, refactor to cleaner code
This is a quick fix so not all dbs from tests/wrappers are outputted locally. It enables better debugging and isolated dbs for all wrapper tests.
|
Fixing all tests that were currently failing even though an API key was given
test_mapper.py
sometimes randomly cannot find the "test" collection, but this is not a continuous error.Most relevant changes:
wrapper
module we are now using actual temporary files to prevent readonly database operationsDatabaseAugmentedCompletionEvaluator
was missing an abstract classopenai_extractor.py
is now fully refactored to use the updated openAI APIduckdb_adapter.py
had still some problems with fetching the right dimensions regarding a model"openai:"
for defaulttext-embedding-ada-002
,"openai:text-embedding-ada-002"
or any other model after the colon (e.g."openai:text-embedding-3-small"
), if no model is provided default to sentence-transformerclinvar_wrapper.py
now correctly fetches the"clinical_significance"
via"germline_classification"
and does not break if"trait_set"
is not in the data