Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala case class fields shuffling #225

Closed
AlexisBRENON opened this issue Nov 26, 2021 · 2 comments
Closed

Scala case class fields shuffling #225

AlexisBRENON opened this issue Nov 26, 2021 · 2 comments

Comments

@AlexisBRENON
Copy link
Contributor

Hi, I just created #224 to highlight a weird behavior when using scalapb encoders with "classic" scala case class.

Given a case class with two fields with same type (AddressLike(street: Option[String], city: Option[String])) I save a dataset using the first field (street) as the partitionning column.
Then when loading the dataset, spark create a dataframe with columns city and street.
Finally, when collecting this dataframe to an Array[AddressLike], the value of the field street is in the city field and vice-versa.

This seems to happen because the dataframe schema has the street field at the end (while it is the first field of the case class):

root
 |-- city: string (nullable = true)
 |-- street: string (nullable = true)

Providing the schema before calling .load does not modify this actual schema.

Finally, when deserializing to scala case class, mapping seems to be done by position instead of name, leading to city values mapped to the street field and vice-versa.

This index-based mapping can be problematic too, if you have a dataframe with "useless" columns and try to "cast" it to a case class with fewer fields.I will add a test to highlight this too.

Maybe this is more related to frameless encoder than scalapb ones. I can forward this issue there if required.

@AlexisBRENON
Copy link
Contributor Author

This seems to be a frameless issue.
I reproduce the errors on their repo: typelevel/frameless@master...AlexisBRENON:case_class_support#diff-dd83f3b1d1a249804b5620473177ce6034efbc5f36b45a9b1ef01283cafd50f9R540
And it seems that others already report it: typelevel/frameless#411

@thesamet
Copy link
Contributor

I suggest to keep tracking it upstream. My understanding from the above is that it is not actionable by ScalaPB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants