Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. #1237

Merged
merged 1 commit into from
Jan 26, 2025

Conversation

liuzhaoze
Copy link
Contributor

When I trained the agent on a Mac, the following error occurred:

Epoch #1:   0%|                                                                                                                                 | 0/10000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/rocco/Documents/code/faas-resource-drl/run.py", line 290, in <module>
    result, ma_policy = train_agent(args)
                        ^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/faas-resource-drl/run.py", line 266, in train_agent
    ).run()
      ^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/trainer/base.py", line 629, in run
    deque(self, maxlen=0)  # feed the entire iterator into a zero-length deque
    ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/trainer/base.py", line 334, in __next__
    train_stat, update_stat, self.stop_fn_flag = self.training_step()
                                                 ^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/trainer/base.py", line 483, in training_step
    training_stats = self.policy_update_fn(collect_stats)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/trainer/base.py", line 715, in policy_update_fn
    update_stat = self._sample_and_update(self.train_collector.buffer)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/trainer/base.py", line 651, in _sample_and_update
    update_stat = self.policy.update(sample_size=self.batch_size, buffer=buffer)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/policy/base.py", line 545, in update
    batch = self.process_fn(batch, buffer, indices)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/policy/multiagent/mapolicy.py", line 158, in process_fn
    results[agent] = policy.process_fn(tmp_batch, buffer, tmp_indice)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/policy/modelfree/dqn.py", line 148, in process_fn
    return self.compute_nstep_return(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/policy/base.py", line 715, in compute_nstep_return
    batch.returns = to_torch_as(n_step_return_IA, target_q_torch_IA)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/data/utils/converter.py", line 75, in to_torch_as
    return to_torch(x, dtype=y.dtype, device=y.device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rocco/Documents/code/tianshou-dev/tianshou/data/utils/converter.py", line 48, in to_torch
    x = torch.from_numpy(x).to(device)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

I print some information in converter.py:

if isinstance(x, np.ndarray) and issubclass(
    x.dtype.type,
    np.bool_ | np.number,
):  # most often case
    print(tmp := torch.from_numpy(x), tmp.dtype, tmp.device, dtype)
    x = torch.from_numpy(x).to(device)
    if dtype is not None:
        x = x.type(dtype)
    return x

The output closest to the error is:

tensor([[-4.4213, -4.2664, -4.1114,  ...,  3.0171,  3.1721,  3.3271],
        [-9.0000, -8.6400, -8.2800,  ...,  8.2800,  8.6400,  9.0000],
        [-4.4213, -4.2664, -4.1114,  ...,  3.0171,  3.1721,  3.3271],
        ...,
        [-6.1470, -5.9108, -5.6746,  ...,  5.1905,  5.4267,  5.6628],
        [-4.4213, -4.2664, -4.1114,  ...,  3.0171,  3.1721,  3.3271],
        [-4.4213, -4.2664, -4.1114,  ...,  3.0171,  3.1721,  3.3271]],
       dtype=torch.float64) torch.float64 cpu torch.float32

Therefore, I think the tensor's dtype should be set before it is passed to the device, just like in the second often case:

if isinstance(x, torch.Tensor):  # second often case
    if dtype is not None:
        x = x.type(dtype)
    return x.to(device)

@MischaPanch
Copy link
Collaborator

Looks good, but for some reason the test on mac is failing now. I can't see how the failure is related to this change, maybe it's a fluke or some mistake in the test itself. I'll look into it and get back to you.

Did changing the type prior to moving the tensor to the device solve your issue, or have you not tried with the adjusted code yet?

@liuzhaoze
Copy link
Contributor Author

The adjusted code solved my issue. And I tested a small demo below:

Python 3.11.11 (main, Dec 11 2024, 10:25:04) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.1.1'
>>> a = torch.rand(2, dtype=torch.float64)
>>> a.device
device(type='cpu')
>>> b = a.to('mps')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
>>> c = a.type(torch.float32)
>>> b = c.to('mps')
>>> b.device
device(type='mps', index=0)

It seems that MPS can only accept float32 tensors. The type of the tensor should be changed before being moved to MPS.

The same situation happens in the latest version of PyTorch:

Python 3.11.11 (main, Dec 11 2024, 10:25:04) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.5.1'
>>> a = torch.rand(2, dtype=torch.float64)
>>> a.device
device(type='cpu')
>>> b = a.to('mps')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
>>> b = a.type(torch.float32).to('mps')
>>> b
tensor([0.6099, 0.9129], device='mps:0')

@MischaPanch MischaPanch merged commit 0a79016 into thu-ml:master Jan 26, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants