Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run in Macbook Pro M1 #315

Open
lawrencekwong2 opened this issue Oct 7, 2024 · 3 comments
Open

Can't run in Macbook Pro M1 #315

lawrencekwong2 opened this issue Oct 7, 2024 · 3 comments

Comments

@lawrencekwong2
Copy link
Collaborator

lawrencekwong2 commented Oct 7, 2024

I'm trying to run Meridian in Macbook Pro M1 2021 32GB, macOS 13.5.1 but got an error that's undocumented anywhere online (or requires me to downgrade tensorflow-macos). Does anyone know how to resolve this? Thank you in advance.

Code

roi_mu = 0.2     # Mu for ROI prior for each media channel.
roi_sigma = 0.9  # Sigma for ROI prior for each media channel.
prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(roi_mu, roi_sigma, name=constants.ROI_M)
)
model_spec = spec.ModelSpec(prior=prior)

with tf.device("/GPU:0"):
    mmm = model.Meridian(input_data=data, model_spec=model_spec)

Error

2024-10-07 18:35:00.459460: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:574 : NOT_FOUND: could not find registered platform with id: 0x11ded9820

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
Cell In[23], [line 9](vscode-notebook-cell:?execution_count=23&line=9)
      [6](vscode-notebook-cell:?execution_count=23&line=6) model_spec = spec.ModelSpec(prior=prior)
      [8](vscode-notebook-cell:?execution_count=23&line=8) with tf.device("/GPU:0"):
----> [9](vscode-notebook-cell:?execution_count=23&line=9)     mmm = model.Meridian(input_data=data, model_spec=model_spec)

File /opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:174, in Meridian.__init__(self, input_data, model_spec)
    [172](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:172) self._validate_custom_priors()
    [173](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:173) self._validate_geo_invariants()
--> [174](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:174) self._validate_time_invariants()

File /opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:580, in Meridian._validate_time_invariants(self)
    [576](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:576) def _validate_time_invariants(self):
    [577](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:577)   """Validates model time invariants."""
    [579](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:579)   self._check_if_no_time_variation(
--> [580](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:580)       self.controls_scaled,
    [581](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:581)       constants.CONTROLS,
    [582](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:582)       self.input_data.controls.coords[constants.CONTROL_VARIABLE].values,
    [583](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:583)   )
    [584](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:584)   if self.input_data.media is not None:
    [585](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:585)     self._check_if_no_time_variation(
    [586](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:586)         self.media_tensors.media_scaled,
    [587](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:587)         constants.MEDIA,
    [588](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:588)         self.input_data.media.coords[constants.MEDIA_CHANNEL].values,
    [589](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:589)     )

File /opt/miniconda3/envs/meridian_gpu/lib/python3.11/functools.py:1001, in cached_property.__get__(self, instance, owner)
    [999](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/functools.py:999) val = cache.get(self.attrname, _NOT_FOUND)
   [1000](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/functools.py:1000) if val is _NOT_FOUND:
-> [1001](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/functools.py:1001)     val = self.func(instance)
   [1002](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/functools.py:1002)     try:
   [1003](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/functools.py:1003)         cache[self.attrname] = val

File /opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:293, in Meridian.controls_scaled(self)
    [291](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:291) @functools.cached_property
    [292](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:292) def controls_scaled(self) -> tf.Tensor:
--> [293](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/meridian/model/model.py:293)   return self.controls_transformer.forward(self.controls)

File /opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    [151](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:151) except Exception as e:
    [152](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:152)   filtered_tb = _process_traceback_frames(e.__traceback__)
--> [153](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153)   raise e.with_traceback(filtered_tb) from None
    [154](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:154) finally:
    [155](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:155)   del filtered_tb

File /opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     [51](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:51) try:
     [52](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:52)   ctx.ensure_initialized()
---> [53](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:53)   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     [54](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:54)                                       inputs, attrs, num_outputs)
     [55](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:55) except core._NotOkStatusException as e:
     [56](https://file+.vscode-resource.vscode-cdn.net/opt/miniconda3/envs/meridian_gpu/lib/python3.11/site-packages/tensorflow/python/eager/execute.py:56)   if name is not None:

NotFoundError: could not find registered platform with id: 0x11ded9820 [Op:__inference_forward_1175]

Packages

absl-py==2.1.0
altair==4.2.2
appnope @ file:///home/conda/feedstock_root/build_artifacts/appnope_1707233003401/work
arviz==0.20.0
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
astunparse==1.6.3
attrs==24.2.0
cachetools==5.5.0
certifi==2024.8.30
charset-normalizer==3.3.2
cloudpickle==3.0.0
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work
contourpy==1.3.0
cycler==0.12.1
debugpy @ file:///private/var/folders/k1/30mswbxs7r1g6zwn8y4fyt500000gp/T/abs_563_nwtkoc/croot/debugpy_1690905063850/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
dm-tree==0.1.8
entrypoints==0.4
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1725214404607/work
flatbuffers==24.3.25
fonttools==4.54.1
gast==0.6.0
google-auth==2.35.0
google-auth-oauthlib==1.2.1
google-pasta==0.2.0
grpcio==1.66.2
h5netcdf==1.4.0
h5py==3.12.1
idna==3.10
immutabledict==4.2.0
importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1726082825846/work
ipykernel @ file:///Users/runner/miniforge3/conda-bld/ipykernel_1719845458456/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1727944696411/work
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
Jinja2==3.1.4
joblib==1.4.2
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1726610684920/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1727163409502/work
keras==2.15.0
kiwisolver==1.4.7
libclang==18.1.1
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.2
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work
mdurl==0.1.2
meridian @ git+https://github.com/google/meridian.git@43269092016510ac9a1512244e392d77d4190821
ml-dtypes==0.3.2
namex==0.0.8
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
numpy==1.26.4
oauthlib==3.2.2
opt_einsum==3.4.0
optree==0.13.0
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1718189413536/work
pandas==1.5.3
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
pillow==10.4.0
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1726613481435/work
prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1727341649933/work
protobuf==4.25.5
psutil @ file:///Users/cbousseau/work/recipes/ci_py311_2/psutil_1678995687212/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work
pyasn1==0.6.1
pyasn1_modules==0.4.1
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work
pyparsing==3.1.4
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1709299778482/work
pytz==2024.2
pyzmq @ file:///private/var/folders/k1/30mswbxs7r1g6zwn8y4fyt500000gp/T/abs_43pxpbos3z/croot/pyzmq_1705605108344/work
referencing==0.35.1
requests==2.32.3
requests-oauthlib==2.0.0
rich==13.9.2
rpds-py==0.20.0
rsa==4.9
scikit-learn==1.5.2
scipy==1.12.0
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
tensorboard==2.15.2
tensorboard-data-server==0.7.2
tensorflow==2.15.1
tensorflow-estimator==2.15.0
tensorflow-io-gcs-filesystem==0.37.1
tensorflow-macos==2.15.1
tensorflow-metal==1.1.0
tensorflow-probability==0.23.0
termcolor==2.5.0
threadpoolctl==3.5.0
toolz==1.0.0
tornado @ file:///private/var/folders/k1/30mswbxs7r1g6zwn8y4fyt500000gp/T/abs_a4w03z48br/croot/tornado_1718740114858/work
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
urllib3==2.2.3
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
Werkzeug==3.0.4
wrapt==1.14.1
xarray==2024.3.0
xarray-einstats==0.8.0

More info

I followed the tutorial from this youtube video https://www.youtube.com/watch?v=cGEIEnekmRM to enable tensorflow in macbook silicon. It involves installing tensorflow-macos and tensorflow-metal

GPU check 1

# check if GPU is available
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("Num CPUs Available: ", len(tf.config.experimental.list_physical_devices('CPU')))
Your runtime has 34.4 gigabytes of available RAM

Num GPUs Available:  1
Num CPUs Available:  1

GPU check 2

devices = tf.config.list_physical_devices()
print("\nDevices: ", devices)

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  details = tf.config.experimental.get_device_details(gpus[0])
  print("GPU details: ", details)  
Devices:  [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
GPU details:  {'device_name': 'METAL'}
@andyl7an
Copy link
Collaborator

andyl7an commented Oct 9, 2024

Hi @lawrencekwong2

Thanks for reporting this issue and providing detailed information! We appreciate you helping us improve Meridian.

It appears the problem is rooted in the way Apple silicon handles TensorFlow's JIT compilation with tensorflow-macos and tensorflow-metal. Unfortunately, the suggested workaround of downgrading these packages isn't feasible for us at the moment due to compatibility issues with Meridian's required TensorFlow version.

We're keeping an eye on the upstream issues in the TensorFlow repository and Apple forums. Hopefully, a solution will be available soon that allows for GPU usage on Macbooks without compromising compatibility.

In the meantime, we're exploring the possibility of adding a parameter to Meridian to toggle jit_compile on/off. This could potentially allow you to use the GPU on your Macbook by disabling JIT compilation. We'll update this issue with our progress on that front.

For now, you can still run Meridian on your Macbook using the CPU. While not ideal, it should allow you to use the model. You can also consider using Google Colab which supports GPU out of the box.

We appreciate your patience as we work towards a resolution. We'll update this issue as soon as we have more information.

@lawrencekwong2
Copy link
Collaborator Author

hi @andyl7an ,

Thank you for the suggestions.

Unfortunately, Meridian doesn't work with CPU on my dataset. It's giving me very high R-hat:
Image

The exact same code with colab gives reasonable R-hat:
Image

I believe this is not supposed to happen?

@andyl7an
Copy link
Collaborator

andyl7an commented Oct 11, 2024

Hi @lawrencekwong2,

Thanks for providing the screenshots and letting us know about this! It's unexpected that you're seeing this sampling error when running on the CPU. I tried to reproduce the issue on my Macbook Pro M1 with the same TensorFlow version Image
Image
, and sampling parameters Image but I haven't encountered the same error yet.
Image
To help us investigate this further and figure out what's going on, could you share either a minimal repro notebook or the dimensions of your dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants