Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance __repr__ and _repr_html_ with a note for additional rows #1026

Open
konjac opened this issue Feb 18, 2025 · 2 comments · May be fixed by #1041
Open

Enhance __repr__ and _repr_html_ with a note for additional rows #1026

konjac opened this issue Feb 18, 2025 · 2 comments · May be fixed by #1041
Labels
enhancement New feature or request

Comments

@konjac
Copy link

konjac commented Feb 18, 2025

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Nowadays, __repr__ and _repr_html_ only shows 10 rows. If the DataFrame contains more than 10 rows, we can show message to indicate there are truncated rows.

Describe the solution you'd like
@timsaucer shared a thought in #discussion_r1957173896. Just take 11 rows internally. If 11 rows are returned, display message for additional rows.

Describe alternatives you've considered
n/a

Additional context
n/a

@kylebarron
Copy link
Contributor

You could also implement a "config" system like pandas uses, so the user can opt-in to displaying more columns or rows https://pandas.pydata.org/docs/user_guide/options.html#overview

@Spaarsh
Copy link

Spaarsh commented Feb 28, 2025

I'd like to work on this issue. Adding a few lines of code in dataframe.rs along the lines of:

fn __repr__(&self, py: Python) -> PyDataFusionResult<String> {
    let df = self.df.as_ref().clone().limit(0, Some(11))?;
    let batches = wait_for_future(py, df.collect())?;
    let num_rows = batches.iter().map(|batch| batch.num_rows()).sum::<usize>();
    let limited_batches = batches.iter().take(10).cloned().collect::<Vec<_>>();
    let batches_as_string = pretty::pretty_format_batches(&limited_batches);

    match batches_as_string {
        Ok(batch) => {
            if num_rows > 10 {
                Ok(format!("DataFrame()\n{batch}\nand more..."))
            } else {
                Ok(format!("DataFrame()\n{batch}"))
            }
        }
        Err(err) => Ok(format!("Error: {:?}", err.to_string())),
    }
}

Should suffice, I suppose?

You could also implement a "config" system like pandas uses, so the user can opt-in to displaying more columns or rows https://pandas.pydata.org/docs/user_guide/options.html#overview

As for the config, we'd need to decide on a particular format. I would suggest toml since it is used by Cargo. But that in itself requires a new issue since I am sure there can be a host of other things that could benefit from this system.

We could start from this issue itself too if it is alright.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants