Why return_tensors='pt' doesn't work？ #7291

bw-wang19 · 2024-11-15T15:01:23Z

Describe the bug

I tried to add input_ids to dataset with map(), and I used the return_tensors='pt', but why I got the callback with the type of List？

Steps to reproduce the bug

Expected behavior

Sorry for this silly question, I'm noob on using this tool. But I think it should return a tensor value as I have used the protocol？
When I tokenize only one sentence using tokenized_input=tokenizer(input, return_tensors='pt' )，it does return in tensor type. Why doesn't it work in map()？

Environment info

transformers>=4.41.2,<=4.45.0
datasets>=2.16.0,<=2.21.0
accelerate>=0.30.1,<=0.34.2
peft>=0.11.1,<=0.12.0
trl>=0.8.6,<=0.9.6
gradio>=4.0.0
pandas>=2.0.0
scipy
einops
sentencepiece
tiktoken
protobuf
uvicorn
pydantic
fastapi
sse-starlette
matplotlib>=3.7.0
fire
packaging
pyyaml
numpy<2.0.0

lhoestq · 2024-11-18T11:26:44Z

Hi ! datasets uses Arrow as storage backend which is agnostic to deep learning frameworks like torch. If you want to get torch tensors back, you need to do dataset = dataset.with_format("torch")

bw-wang19 · 2024-11-18T13:47:06Z

Hi ! datasets uses Arrow as storage backend which is agnostic to deep learning frameworks like torch. If you want to get torch tensors back, you need to do dataset = dataset.with_format("torch")

It does work! Thanks for your suggestion!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why return_tensors='pt' doesn't work？ #7291

Why return_tensors='pt' doesn't work？ #7291

bw-wang19 commented Nov 15, 2024

lhoestq commented Nov 18, 2024

bw-wang19 commented Nov 18, 2024

Why return_tensors='pt' doesn't work？ #7291

Why return_tensors='pt' doesn't work？ #7291

Comments

bw-wang19 commented Nov 15, 2024

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

lhoestq commented Nov 18, 2024

bw-wang19 commented Nov 18, 2024