-
Notifications
You must be signed in to change notification settings - Fork 1.5k
describe
does not handle mixed case or dots in column names
#16017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
take |
Just an update. I was able to reproduce the error with the following goofy test added to the dataframe tests #[tokio::test]
async fn bad_describe_behavior() -> Result<()> {
let config = SessionConfig::from_string_hash_map(&HashMap::from([(
"datafusion.sql_parser.enable_ident_normalization".to_owned(),
"true".to_owned(),
)]))?;
let ctx = SessionContext::new_with_config(config);
let name = "aggregate_test_100";
register_aggregate_csv(&ctx, name).await?;
let df = ctx.table(name);
let df = df
.await?
.filter(col("c2").eq(lit(3)).and(col("c1").eq(lit("a"))))?
.limit(0, Some(1))?
.sort(vec![
// make the test deterministic
col("c1").sort(true, true),
col("c2").sort(true, true),
col("c3").sort(true, true),
])?
.select_columns(&["c1"])?;
let df_renamed = df.clone().with_column_renamed("c1", "CoLu.Mn1")?;
let res = &df_renamed.clone().collect().await?;
println!("{:?}", df_renamed.describe().await.unwrap());
Ok(())
} I get a similar error to the one you shared @johnkerl :
One of my initial thoughts for where this error would be coming from is an unexpected behavior of the identifier normalization which is enabled by default. This seems to be the right thread to pull at as disabling |
Thanks @jfahne ! :) |
I figured it out! The error is coming from here in the call to the |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
Doing
describe
on a table with any upper-case or dot characters in column names results inTo Reproduce
./Cargo.toml
src/main.rs
./desc-good.csv
./desc-bad.csv
Expected behavior
With column names
abc,def,ghi
we seecargo run ./desc-good.csv
With column names
abc,Def,gh.i
I would expect similar. But I actually see:cargo run ./desc-bad.csv
Additional context
No response
The text was updated successfully, but these errors were encountered: