Skip to content

Missing DataFrame index in Result.data #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
menezesandre opened this issue Apr 23, 2025 · 4 comments
Open

Missing DataFrame index in Result.data #88

menezesandre opened this issue Apr 23, 2025 · 4 comments
Assignees

Comments

@menezesandre
Copy link

When a DataFrame is displayed, the corresponding Result has the data attribute in the format {column -> [values]} (equivalent to df.to_dict(orient="list")). This means that we lose the table index, which can be relevant. Is it possible to use a format that preserves this information?

To keep consistency with pandas' to_dict, any of these options would work:

  • 'dict' (default) : dict like {column -> {index -> value}}
  • 'split' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}
  • 'tight' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values], 'index_names' -> [index.names], 'column_names' -> [column.names]}
  • 'index' : dict like {index -> {column -> value}}

(Note: 'tight' is the only option that preserves the full information, including the index name)

Example

from e2b_code_interpreter import Sandbox

code = """
import pandas as pd
df = pd.DataFrame({"key": ["a", "b", "a", "b"], "value": [1, 2, 3, 4]})
display(df.groupby("key").sum())
"""
with Sandbox() as sandbox:
    execution = sandbox.run_code(code)

result = execution.results[0]
print("Text:")
print(result.text)
print("Data:")
print(result.data)
Text:
     value
key       
a        4
b        6
Data:
{'value': [4, 6]}

Expected (one of the options):

Data:
{'index': ['a', 'b'], 'columns': ['value'], 'data': [[4], [6]], 'index_names': ['key'], 'column_names': [None]}
Copy link

linear bot commented Apr 23, 2025

@jakubno
Copy link
Member

jakubno commented Apr 24, 2025

Hey @menezesandre,

I'll look into this. It might take some time since it's a breaking change due to the incompability with current format.

@menezesandre
Copy link
Author

Hi @jakubno,

It might take some time since it's a breaking change due to the incompatibility with current format.

To avoid a breaking change, this could be addressed by adding an argument (e.g. in run_code) to control this behavior. The default value can correspond to the current format (making it non-breaking), but then we can explicitly set it to get the desired format.
Following the example above, I could get the expected output with something like sandbox.run_code(code, data_orient="tight") (here I'm following pandas, but could also just be a boolean to switch between the current and full formats).

@antonioalegria
Copy link

Hi, any news on this topic? This is currently a blocking issue for us to use the dataframe data format coming out of E2B. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants