You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to add input_ids to dataset with map(), and I used the return_tensors='pt', but why I got the callback with the type of List?
Steps to reproduce the bug
Expected behavior
Sorry for this silly question, I'm noob on using this tool. But I think it should return a tensor value as I have used the protocol?
When I tokenize only one sentence using tokenized_input=tokenizer(input, return_tensors='pt' ),it does return in tensor type. Why doesn't it work in map()?
Hi ! datasets uses Arrow as storage backend which is agnostic to deep learning frameworks like torch. If you want to get torch tensors back, you need to do dataset = dataset.with_format("torch")
Hi ! datasets uses Arrow as storage backend which is agnostic to deep learning frameworks like torch. If you want to get torch tensors back, you need to do dataset = dataset.with_format("torch")
Describe the bug
I tried to add input_ids to dataset with map(), and I used the return_tensors='pt', but why I got the callback with the type of List?
Steps to reproduce the bug
Expected behavior
Sorry for this silly question, I'm noob on using this tool. But I think it should return a tensor value as I have used the protocol?
When I tokenize only one sentence using tokenized_input=tokenizer(input, return_tensors='pt' ),it does return in tensor type. Why doesn't it work in map()?
Environment info
transformers>=4.41.2,<=4.45.0
datasets>=2.16.0,<=2.21.0
accelerate>=0.30.1,<=0.34.2
peft>=0.11.1,<=0.12.0
trl>=0.8.6,<=0.9.6
gradio>=4.0.0
pandas>=2.0.0
scipy
einops
sentencepiece
tiktoken
protobuf
uvicorn
pydantic
fastapi
sse-starlette
matplotlib>=3.7.0
fire
packaging
pyyaml
numpy<2.0.0
The text was updated successfully, but these errors were encountered: