Missing DataFrame index in `Result.data` #88

menezesandre · 2025-04-23T11:28:26Z

When a DataFrame is displayed, the corresponding Result has the data attribute in the format {column -> [values]} (equivalent to df.to_dict(orient="list")). This means that we lose the table index, which can be relevant. Is it possible to use a format that preserves this information?

To keep consistency with pandas' to_dict, any of these options would work:

'dict' (default) : dict like {column -> {index -> value}}
'split' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}
'tight' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values], 'index_names' -> [index.names], 'column_names' -> [column.names]}
'index' : dict like {index -> {column -> value}}

(Note: 'tight' is the only option that preserves the full information, including the index name)

Example

from e2b_code_interpreter import Sandbox

code = """
import pandas as pd
df = pd.DataFrame({"key": ["a", "b", "a", "b"], "value": [1, 2, 3, 4]})
display(df.groupby("key").sum())
"""
with Sandbox() as sandbox:
    execution = sandbox.run_code(code)

result = execution.results[0]
print("Text:")
print(result.text)
print("Data:")
print(result.data)

Text:
     value
key       
a        4
b        6
Data:
{'value': [4, 6]}

Expected (one of the options):

Data:
{'index': ['a', 'b'], 'columns': ['value'], 'data': [[4], [6]], 'index_names': ['key'], 'column_names': [None]}

The text was updated successfully, but these errors were encountered:

linear · 2025-04-23T11:28:29Z

E2B-2090 Missing DataFrame index in `Result.data`

jakubno · 2025-04-24T08:37:42Z

Hey @menezesandre,

I'll look into this. It might take some time since it's a breaking change due to the incompability with current format.

menezesandre · 2025-04-29T11:17:02Z

Hi @jakubno,

It might take some time since it's a breaking change due to the incompatibility with current format.

To avoid a breaking change, this could be addressed by adding an argument (e.g. in run_code) to control this behavior. The default value can correspond to the current format (making it non-breaking), but then we can explicitly set it to get the desired format.
Following the example above, I could get the expected output with something like sandbox.run_code(code, data_orient="tight") (here I'm following pandas, but could also just be a boolean to switch between the current and full formats).

antonioalegria · 2025-05-12T11:16:36Z

Hi, any news on this topic? This is currently a blocking issue for us to use the dataframe data format coming out of E2B. Thanks!

mlejva assigned jakubno Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing DataFrame index in `Result.data` #88

Missing DataFrame index in `Result.data` #88

menezesandre commented Apr 23, 2025

linear bot commented Apr 23, 2025

Uh oh!

jakubno commented Apr 24, 2025

Uh oh!

menezesandre commented Apr 29, 2025

Uh oh!

antonioalegria commented May 12, 2025

Uh oh!

Missing DataFrame index in Result.data #88

Missing DataFrame index in Result.data #88

Comments

menezesandre commented Apr 23, 2025

Example

linear bot commented Apr 23, 2025

Uh oh!

jakubno commented Apr 24, 2025

Uh oh!

menezesandre commented Apr 29, 2025

Uh oh!

antonioalegria commented May 12, 2025

Uh oh!

Missing DataFrame index in `Result.data` #88

Missing DataFrame index in `Result.data` #88