You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I have a pandas dataframe whose index does not include a 0, I get an error if I try to query it: chdb.ChdbError: Code: 1001. DB::Exception: pybind11::error_already_set: KeyError: 0. If I include an index with 0, but missing any other numbers, everything works as expected.
importpandasaspdimportchdbdf_okay=pd.DataFrame(
data={"id": ["id1", "id2", "id3"], "name": ["nm1", "nm2", "nm3"]},
index=[0, 2, 3],
)
# this is fine:chdb.query("SELECT * FROM Python(df_okay)").show()
df_problem=pd.DataFrame(
data={"id": ["id1", "id2", "id3"], "name": ["nm1", "nm2", "nm3"]},
index=[1, 2, 3],
)
# error:chdb.query("SELECT * FROM Python(df_problem)").show()
This appears in several python versions (3.9-3.12) and MacOS + Ubuntu. For reference:
chdb==2.1.1
pandas==2.2.3
The text was updated successfully, but these errors were encountered:
Forgive my ignorance, I really didn't know that a DataFrame can have its index set. I will debug the issue you mentioned.
BTW, I'm curious about the application scenarios and objectives of setting an index like this?
Forgive my ignorance, I really didn't know that a DataFrame can have its index set. I will debug the issue you mentioned. BTW, I'm curious about the application scenarios and objectives of setting an index like this?
No worries, thanks!
I am no expert as I don't really ever work with pandas indexes directly, but my understanding is that they are more performant for certain data operations, like row selection and joins (DataFrame.merge in pandas terms) than using ordinary columns. So you may want to have an index with semantically meaningful data in it, if it makes sense for the kind of operations you would be doing.
In my case it is happening as a side-effect of subsetting some data - the pandas frame I am receiving is a (row) subset of some other frame, and this by default leaves the index values unchanged (so I run into this if the rows I am selecting don't include the row with index value 0). In this case I think there is fairly easy to workaround - I can reset the index before running a query on it.
If I have a pandas dataframe whose index does not include a
0
, I get an error if I try to query it:chdb.ChdbError: Code: 1001. DB::Exception: pybind11::error_already_set: KeyError: 0
. If I include an index with 0, but missing any other numbers, everything works as expected.This appears in several python versions (3.9-3.12) and MacOS + Ubuntu. For reference:
The text was updated successfully, but these errors were encountered: