You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I just created #224 to highlight a weird behavior when using scalapb encoders with "classic" scala case class.
Given a case class with two fields with same type (AddressLike(street: Option[String], city: Option[String])) I save a dataset using the first field (street) as the partitionning column.
Then when loading the dataset, spark create a dataframe with columns city and street.
Finally, when collecting this dataframe to an Array[AddressLike], the value of the field street is in the city field and vice-versa.
This seems to happen because the dataframe schema has the street field at the end (while it is the first field of the case class):
Providing the schema before calling .load does not modify this actual schema.
Finally, when deserializing to scala case class, mapping seems to be done by position instead of name, leading to city values mapped to the street field and vice-versa.
This index-based mapping can be problematic too, if you have a dataframe with "useless" columns and try to "cast" it to a case class with fewer fields.I will add a test to highlight this too.
Maybe this is more related to frameless encoder than scalapb ones. I can forward this issue there if required.
The text was updated successfully, but these errors were encountered:
Hi, I just created #224 to highlight a weird behavior when using scalapb encoders with "classic" scala case class.
Given a case class with two fields with same type (
AddressLike(street: Option[String], city: Option[String])
) I save a dataset using the first field (street) as the partitionning column.Then when loading the dataset, spark create a dataframe with columns
city
andstreet
.Finally, when collecting this dataframe to an
Array[AddressLike]
, the value of the field street is in thecity
field and vice-versa.This seems to happen because the dataframe schema has the street field at the end (while it is the first field of the case class):
Providing the schema before calling
.load
does not modify this actual schema.Finally, when deserializing to scala case class, mapping seems to be done by position instead of name, leading to city values mapped to the
street
field and vice-versa.This index-based mapping can be problematic too, if you have a dataframe with "useless" columns and try to "cast" it to a case class with fewer fields.I will add a test to highlight this too.
Maybe this is more related to frameless encoder than scalapb ones. I can forward this issue there if required.
The text was updated successfully, but these errors were encountered: