Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mismatch for datatypes when providing Features with Array2D and user specified dtype and using with_format("numpy") #7254

Open
Akhil-CM opened this issue Oct 26, 2024 · 1 comment

Comments

@Akhil-CM
Copy link

Describe the bug

If the user provides a Features type value to datasets.Dataset with members having Array2D with a value for dtype, it is not respected during with_format("numpy") which should return a np.array with dtype that the user provided for Array2D. It seems for floats, it will be set to float32 and for ints it will be set to int64

Steps to reproduce the bug

import numpy as np
import datasets
from datasets import Dataset, Features, Array2D

print(f"datasets version: {datasets.__version__}")

data_info = {
    "arr_float" : "float64",
    "arr_int" : "int32"
}

sample = {key : [np.zeros([4, 5], dtype=dtype)] for key, dtype in data_info.items()}

features = {key : Array2D(shape=(None, 5), dtype=dtype) for key, dtype in data_info.items()}
features = Features(features)

dataset = Dataset.from_dict(sample, features=features)

ds = dataset.with_format("numpy")
for key in features:
    print(f"{key} feature dtype: ", ds.features[key].dtype)
    print(f"{key} dtype:", ds[key].dtype)

Output:

datasets version: 3.0.2
arr_float feature dtype:  float64
arr_float dtype: float32
arr_int feature dtype:  int32
arr_int dtype: int64

Expected behavior

It should return a np.array with dtype that the user provided for the corresponding member in the Features type value

Environment info

  • datasets version: 3.0.2
  • Platform: Linux-6.11.5-arch1-1-x86_64-with-glibc2.40
  • Python version: 3.12.7
  • huggingface_hub version: 0.26.1
  • PyArrow version: 16.1.0
  • Pandas version: 2.2.2
  • fsspec version: 2024.5.0
@Akhil-CM
Copy link
Author

It seems that #5517 is exactly the same issue.

It was mentioned there that this would be fixed in version 3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant