Cannot query pandas table with index starting above zero #282

ADBond · 2024-10-30T11:19:11Z

If I have a pandas dataframe whose index does not include a 0, I get an error if I try to query it: chdb.ChdbError: Code: 1001. DB::Exception: pybind11::error_already_set: KeyError: 0. If I include an index with 0, but missing any other numbers, everything works as expected.

import pandas as pd
import chdb

df_okay = pd.DataFrame(
    data={"id": ["id1", "id2", "id3"], "name": ["nm1", "nm2", "nm3"]},
    index=[0, 2, 3],
)
# this is fine:
chdb.query("SELECT * FROM Python(df_okay)").show()

df_problem = pd.DataFrame(
    data={"id": ["id1", "id2", "id3"], "name": ["nm1", "nm2", "nm3"]},
    index=[1, 2, 3],
)
# error:
chdb.query("SELECT * FROM Python(df_problem)").show()

This appears in several python versions (3.9-3.12) and MacOS + Ubuntu. For reference:

chdb==2.1.1
pandas==2.2.3

The text was updated successfully, but these errors were encountered:

auxten · 2024-10-31T10:35:21Z

Forgive my ignorance, I really didn't know that a DataFrame can have its index set. I will debug the issue you mentioned.
BTW, I'm curious about the application scenarios and objectives of setting an index like this?

ADBond · 2024-10-31T11:30:06Z

Forgive my ignorance, I really didn't know that a DataFrame can have its index set. I will debug the issue you mentioned. BTW, I'm curious about the application scenarios and objectives of setting an index like this?

No worries, thanks!

I am no expert as I don't really ever work with pandas indexes directly, but my understanding is that they are more performant for certain data operations, like row selection and joins (DataFrame.merge in pandas terms) than using ordinary columns. So you may want to have an index with semantically meaningful data in it, if it makes sense for the kind of operations you would be doing.

In my case it is happening as a side-effect of subsetting some data - the pandas frame I am receiving is a (row) subset of some other frame, and this by default leaves the index values unchanged (so I run into this if the rows I am selecting don't include the row with index value 0). In this case I think there is fairly easy to workaround - I can reset the index before running a query on it.

auxten self-assigned this Oct 31, 2024

ADBond mentioned this issue Dec 5, 2024

Occasional test failures ADBond/splinkclickhouse#40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot query pandas table with index starting above zero #282

Cannot query pandas table with index starting above zero #282

ADBond commented Oct 30, 2024

auxten commented Oct 31, 2024

ADBond commented Oct 31, 2024

Cannot query pandas table with index starting above zero #282

Cannot query pandas table with index starting above zero #282

Comments

ADBond commented Oct 30, 2024

auxten commented Oct 31, 2024

ADBond commented Oct 31, 2024