Add osx-arm64 platform to conda-lock.yml file and GitHub Actions CI #164

weiji14 · 2024-02-26T01:57:27Z

Support installation on macOS ARM64 devices (M1 chips) by:

Adding osx-arm64 as another platform in the environment.yml file
Relocking the conda-lock.yml file using conda-lock lock --mamba --file environment.yml --with-cuda=12.0
Adding macos-14 to the .github/workflows/test.yml GitHub Actions workflow

References:

https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source

Addresses #161 and extends #162.

Need to increase check-added-large-files limit from 512kb to 768kb because conda-lock.yml is now >512kb!

Test on the M1 macOS runners, see https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source.

weiji14 · 2024-02-26T03:08:03Z

Test on macos-14 failing at https://github.com/Clay-foundation/model/actions/runs/8042592659/job/21963448881#step:4:79:

============================= test session starts ==============================
platform darwin -- Python 3.11.8, pytest-8.0.2, pluggy-1.4.0 -- /Users/runner/micromamba/envs/claymodel/bin/python
cachedir: .pytest_cache
rootdir: /Users/runner/work/model/model
plugins: anyio-4.3.0
collecting ... collected 16 items

src/tests/test_callbacks.py::test_callbacks_wandb_log_mae_reconstruction PASSED [  6%]
src/tests/test_datamodule.py::test_datapipemodule[fit-train_dataloader-ClayDataModule] PASSED [ 12%]
src/tests/test_datamodule.py::test_datapipemodule[fit-train_dataloader-GeoTIFFDataPipeModule] PASSED [ 18%]
src/tests/test_datamodule.py::test_datapipemodule[predict-predict_dataloader-ClayDataModule] PASSED [ 25%]
src/tests/test_datamodule.py::test_datapipemodule[predict-predict_dataloader-GeoTIFFDataPipeModule] PASSED [ 31%]
src/tests/test_datamodule.py::test_geotiffdatapipemodule_list_from_s3_bucket PASSED [ 37%]
src/tests/test_model.py::test_model_vit_fit FAILED                       [ 43%]
src/tests/test_model.py::test_model_predict[mean-CLAYModule-32-true] FAILED [ 50%]
src/tests/test_model.py::test_model_predict[mean-ViTLitModule-16-mixed] FAILED [ 56%]
src/tests/test_model.py::test_model_predict[patch-CLAYModule-32-true] FAILED [ 62%]
src/tests/test_model.py::test_model_predict[patch-ViTLitModule-16-mixed] FAILED [ 68%]
src/tests/test_model.py::test_model_predict[group-CLAYModule-32-true] FAILED [ 75%]
src/tests/test_model.py::test_model_predict[group-ViTLitModule-16-mixed] FAILED [ 81%]
src/tests/test_trainer.py::test_cli_main[fit] PASSED                     [ 87%]
src/tests/test_trainer.py::test_cli_main[validate] PASSED                [ 93%]
src/tests/test_trainer.py::test_cli_main[test] PASSED                    [100%]

=================================== FAILURES ===================================
______________________________ test_model_vit_fit ______________________________

datapipe = IterableWrapperIterDataPipe

    def test_model_vit_fit(datapipe):
        """
        Run a full train and validation loop using 1 batch.
        """
        # Get some random data
        dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
    
        # Initialize model
        model: L.LightningModule = ViTLitModule()
    
        # Run tests in a temporary folder
        with tempfile.TemporaryDirectory() as tmpdirname:
            # Training
            trainer: L.Trainer = L.Trainer(
                accelerator="auto",
                devices=1,
                precision="16-mixed",
                fast_dev_run=True,
                default_root_dir=tmpdirname,
            )
>           trainer.fit(model=model, train_dataloaders=dataloader)

src/tests/test_model.py:84: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:544: in fit
    call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:580: in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
    self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
    self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
    self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
    return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
    return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
    param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

t = Parameter containing:
tensor([[[[-7.8660e-03, -3.7126e-03,  1.1882e-02,  ...,  3.5900e-03,
            2.2451e-02,  4....[ 1.9821e-02, -1.4641e-02, -3.9173e-02,  ..., -1.5309e-02,
           -3.2961e-02,  2.5180e-02]]]], requires_grad=True)

    def convert(t):
        if convert_to_format is not None and t.dim() in (4, 5):
            return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
                        non_blocking, memory_format=convert_to_format)
>       return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E       RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Using 16bit Automatic Mixed Precision (AMP)
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_________________ test_model_predict[mean-CLAYModule-32-true] __________________

datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_clay.CLAYModule'>, precision = '32-true'
embeddings_level = 'mean'

    @pytest.mark.parametrize(
        "litmodule,precision",
        [
            (CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
            (ViTLitModule, "16-mixed"),
        ],
    )
    @pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
    def test_model_predict(datapipe, litmodule, precision, embeddings_level):
        """
        Run a single prediction loop using 1 batch.
        """
        # Get some random data
        dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
    
        # Initialize model
        if litmodule == CLAYModule:
            litargs = {
                "embeddings_level": embeddings_level,
            }
        else:
            litargs = {}
    
        model: L.LightningModule = litmodule(**litargs)
    
        # Run tests in a temporary folder
        with tempfile.TemporaryDirectory() as tmpdirname:
            # Training
            trainer: L.Trainer = L.Trainer(
                accelerator="auto",
                devices="auto",
                precision=precision,
                fast_dev_run=True,
                default_root_dir=tmpdirname,
            )
    
            # Prediction
>           trainer.predict(model=model, dataloaders=dataloader)

src/tests/test_model.py:124: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
    return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
    self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
    self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
    self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
    return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
    return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
    param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

t = Parameter containing:
tensor([[-0.6110,  0.1869],
        [-0.5562, -0.1639],
        [ 0.4407,  0.0668],
        ...,
        [-0.2467,  0.0192],
        [ 0.3921,  0.6417],
        [ 0.1073, -0.1948]], requires_grad=True)

    def convert(t):
        if convert_to_format is not None and t.dim() in (4, 5):
            return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
                        non_blocking, memory_format=convert_to_format)
>       return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E       RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
________________ test_model_predict[mean-ViTLitModule-16-mixed] ________________

datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_vit.ViTLitModule'>, precision = '16-mixed'
embeddings_level = 'mean'

    @pytest.mark.parametrize(
        "litmodule,precision",
        [
            (CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
            (ViTLitModule, "16-mixed"),
        ],
    )
    @pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
    def test_model_predict(datapipe, litmodule, precision, embeddings_level):
        """
        Run a single prediction loop using 1 batch.
        """
        # Get some random data
        dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
    
        # Initialize model
        if litmodule == CLAYModule:
            litargs = {
                "embeddings_level": embeddings_level,
            }
        else:
            litargs = {}
    
        model: L.LightningModule = litmodule(**litargs)
    
        # Run tests in a temporary folder
        with tempfile.TemporaryDirectory() as tmpdirname:
            # Training
            trainer: L.Trainer = L.Trainer(
                accelerator="auto",
                devices="auto",
                precision=precision,
                fast_dev_run=True,
                default_root_dir=tmpdirname,
            )
    
            # Prediction
>           trainer.predict(model=model, dataloaders=dataloader)

src/tests/test_model.py:124: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
    return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
    self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
    self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
    self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
    return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
    return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
    param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

t = Parameter containing:
tensor([[[[-1.8605e-02, -7.1301e-03,  7.6448e-03,  ...,  4.5676e-02,
           -1.9168e-02,  1....[-2.3598e-02, -1.7550e-02, -3.5928e-03,  ...,  1.3235e-02,
            1.6877e-02,  5.0347e-02]]]], requires_grad=True)

    def convert(t):
        if convert_to_format is not None and t.dim() in (4, 5):
            return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
                        non_blocking, memory_format=convert_to_format)
>       return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E       RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Using 16bit Automatic Mixed Precision (AMP)
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_________________ test_model_predict[patch-CLAYModule-32-true] _________________

datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_clay.CLAYModule'>, precision = '32-true'
embeddings_level = 'patch'

    @pytest.mark.parametrize(
        "litmodule,precision",
        [
            (CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
            (ViTLitModule, "16-mixed"),
        ],
    )
    @pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
    def test_model_predict(datapipe, litmodule, precision, embeddings_level):
        """
        Run a single prediction loop using 1 batch.
        """
        # Get some random data
        dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
    
        # Initialize model
        if litmodule == CLAYModule:
            litargs = {
                "embeddings_level": embeddings_level,
            }
        else:
            litargs = {}
    
        model: L.LightningModule = litmodule(**litargs)
    
        # Run tests in a temporary folder
        with tempfile.TemporaryDirectory() as tmpdirname:
            # Training
            trainer: L.Trainer = L.Trainer(
                accelerator="auto",
                devices="auto",
                precision=precision,
                fast_dev_run=True,
                default_root_dir=tmpdirname,
            )
    
            # Prediction
>           trainer.predict(model=model, dataloaders=dataloader)

src/tests/test_model.py:124: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
    return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
    self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
    self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
    self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
    return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
    return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
    param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

t = Parameter containing:
tensor([[ 0.4197,  0.7064],
        [-0.3017, -0.5617],
        [ 0.3429, -0.5639],
        ...,
        [-0.0225,  0.3200],
        [ 0.1219, -0.0069],
        [ 0.4542, -0.6156]], requires_grad=True)

    def convert(t):
        if convert_to_format is not None and t.dim() in (4, 5):
            return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
                        non_blocking, memory_format=convert_to_format)
>       return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E       RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_______________ test_model_predict[patch-ViTLitModule-16-mixed] ________________

datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_vit.ViTLitModule'>, precision = '16-mixed'
embeddings_level = 'patch'

    @pytest.mark.parametrize(
        "litmodule,precision",
        [
            (CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
            (ViTLitModule, "16-mixed"),
        ],
    )
    @pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
    def test_model_predict(datapipe, litmodule, precision, embeddings_level):
        """
        Run a single prediction loop using 1 batch.
        """
        # Get some random data
        dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
    
        # Initialize model
        if litmodule == CLAYModule:
            litargs = {
                "embeddings_level": embeddings_level,
            }
        else:
            litargs = {}
    
        model: L.LightningModule = litmodule(**litargs)
    
        # Run tests in a temporary folder
        with tempfile.TemporaryDirectory() as tmpdirname:
            # Training
            trainer: L.Trainer = L.Trainer(
                accelerator="auto",
                devices="auto",
                precision=precision,
                fast_dev_run=True,
                default_root_dir=tmpdirname,
            )
    
            # Prediction
>           trainer.predict(model=model, dataloaders=dataloader)

src/tests/test_model.py:124: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
    return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
    self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
    self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
    self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
    return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
    return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
    param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

t = Parameter containing:
tensor([[[[ 1.6769e-02,  2.0459e-04,  3.0793e-02,  ...,  4.2431e-02,
            1.5604e-03,  2....[ 1.7419e-02,  6.7345e-04, -5.1334e-03,  ..., -2.9248e-02,
           -1.8247e-02,  3.1454e-02]]]], requires_grad=True)

    def convert(t):
        if convert_to_format is not None and t.dim() in (4, 5):
            return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
                        non_blocking, memory_format=convert_to_format)
>       return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E       RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Using 16bit Automatic Mixed Precision (AMP)
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_________________ test_model_predict[group-CLAYModule-32-true] _________________

datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_clay.CLAYModule'>, precision = '32-true'
embeddings_level = 'group'

    @pytest.mark.parametrize(
        "litmodule,precision",
        [
            (CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
            (ViTLitModule, "16-mixed"),
        ],
    )
    @pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
    def test_model_predict(datapipe, litmodule, precision, embeddings_level):
        """
        Run a single prediction loop using 1 batch.
        """
        # Get some random data
        dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
    
        # Initialize model
        if litmodule == CLAYModule:
            litargs = {
                "embeddings_level": embeddings_level,
            }
        else:
            litargs = {}
    
        model: L.LightningModule = litmodule(**litargs)
    
        # Run tests in a temporary folder
        with tempfile.TemporaryDirectory() as tmpdirname:
            # Training
            trainer: L.Trainer = L.Trainer(
                accelerator="auto",
                devices="auto",
                precision=precision,
                fast_dev_run=True,
                default_root_dir=tmpdirname,
            )
    
            # Prediction
>           trainer.predict(model=model, dataloaders=dataloader)

src/tests/test_model.py:124: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
    return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
    self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
    self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
    self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
    return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
    return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
    module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
    param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

t = Parameter containing:
tensor([[-0.1077,  0.0463],
        [ 0.0930, -0.4663],
        [ 0.5140,  0.1373],
        ...,
        [ 0.3736,  0.0872],
        [ 0.5844,  0.0184],
        [ 0.5122,  0.6130]], requires_grad=True)

    def convert(t):
        if convert_to_format is not None and t.dim() in (4, 5):
            return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
                        non_blocking, memory_format=convert_to_format)
>       return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E       RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO     lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_______________ test_model_predict[group-ViTLitModule-16-mixed] ________________

datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_vit.ViTLitModule'>, precision = '16-mixed'
embeddings_level = 'group'

    @pytest.mark.parametrize(
        "litmodule,precision",
        [
            (CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
            (ViTLitModule, "16-mixed"),
        ],
    )
    @pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
    def test_model_predict(datapipe, litmodule, precision, embeddings_level):
        """
        Run a single prediction loop using 1 batch.
        """
        # Get some random data
        dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
    
        # Initialize model
      Unable to serialize instance <lightning.pytorch.plugins.io.async_plugin.AsyncCheckpointIO object at
      0x144974950>
  
    warning(val)

src/tests/test_trainer.py::test_cli_main[validate]
  /Users/runner/micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/cli.py:518: LightningCLI's args parameter is intended to run from within Python like if it were from the command line. To prevent mistakes it is not recommended to provide both args and command line arguments, got: sys.argv[1:]=['--verbose', 'src/tests/'], args=['validate', '--print_config=skip_null'].

src/tests/test_trainer.py::test_cli_main[test]
  /Users/runner/micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/cli.py:518: LightningCLI's args parameter is intended to run from within Python like if it were from the command line. To prevent mistakes it is not recommended to provide both args and command line arguments, got: sys.argv[1:]=['--verbose', 'src/tests/'], args=['test', '--print_config=skip_null'].

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED src/tests/test_model.py::test_model_vit_fit - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[mean-CLAYModule-32-true] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[mean-ViTLitModule-16-mixed] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[patch-CLAYModule-32-true] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[patch-ViTLitModule-16-mixed] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[group-CLAYModule-32-true] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[group-ViTLitModule-16-mixed] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
============= 7 failed, 9 passed, 28 warnings in 67.73s (0:01:07) ==============

Main error message is RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Might need to try setting PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 as suggested, or xfail test_model.py on macos-14.

@leothomas, if you have time, could you try installing from the environment.yml/conda-lock.yml file in this branch on your macOS M1 computer, and see if the docs/partial-inputs.ipynb notebook works? The torchvision/torchdata conda-forge incompatibility seems to have gone away today after conda-forge/torchvision-feedstock#89.

leothomas · 2024-02-28T21:12:47Z

Did a fresh install with micromamba and was able to run the test without errors on this branch - although I just realized that I have a Mac M2 and not M1 (not sure if that made an important difference)

To try and fix `RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB)` on GitHub Actions CI.

weiji14 · 2024-02-29T01:48:05Z

Did a fresh install with micromamba and was able to run the test without errors on this branch - although I just realized that I have a Mac M2 and not M1 (not sure if that made an important difference)

Cool, M2 should be fine too (probably more memory than M1). I tried setting PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 in the CI but it still fails. Looking at actions/runner-images#9254 (comment) and https://discuss.pytorch.org/t/mps-back-end-out-of-memory-on-github-action/189773/2, it seems like the GitHub Actions runners don't have access to the underlying Metal Performance Shaders (MPS) hardware unfortunately, so we might need to fallback to using CPU on the macos-14 CI.

See if this can disable MPS on GitHub Actions.

.pre-commit-config.yaml

Xref https://pre-commit.com/#config-exclude Co-authored-by: Chuck Daniels <[email protected]>

Use the environment variable PYTORCH_MPS_PREFER_METAL=0 to fallback to running on CPU on the macos-14 GitHub Actions runner whose MPS hardware doesn't actually work. Inspired by neuml/txtai@e7552a6

Equality check didn't work because we were matching the integer 0, not string '0'.

weiji14

Ok, have fixed the MPS-related unit test failure on the macos-14 GitHub Actions runner by falling back to the CPU. Ready for final review!

weiji14 · 2024-03-05T03:30:02Z

src/tests/test_model.py

+            accelerator=(
+                "cpu"  # fallback to CPU on osx-arm64 CI
+                if os.getenv("PYTORCH_MPS_PREFER_METAL") == "0"
+                and torch.backends.mps.is_available()
+                else "auto"
+            ),


Using this environment variable hack to fallback to CPU on the macos-14 GitHub Actions runners. If someone with a Mac M1/M2 device can run python -m pytest --verbose src/tests/ locally and get the unit tests to pass, that would be great!

@leothomas, can you do this? My Mac is pre-M1.

I'll merge this into the main branch first, and then Leo can merge this into the changes in his branch at #166. Might be easier to test this way.

chuckwondo · 2024-03-05T15:07:31Z

.github/workflows/test.yml

+        env:
+          PYTORCH_MPS_PREFER_METAL: 0  # disable MPS which doesn't work on macos-14 runner


This disables MPS for all platforms in the matrix. Is that what you want, or do you want to disable it only for macos-14?

MPS is only available on macos-14, and I've added an if-condition in the unit tests to only disable MPS on devices with MPS, so should be ok? Open to a better way of doing things though.

Ah, I see. I suppose the comment may have thrown me off. Here, "doesn't work" doesn't mean that we can't choose to use it, but rather that when we do choose to use it, the test fails due to running out of memory. So it would work if we had enough memory.

Ok, maybe I can reword it a little bit, how about this:

Suggested change

env:

PYTORCH_MPS_PREFER_METAL: 0 # disable MPS which doesn't work on macos-14 runner

env:

PYTORCH_MPS_PREFER_METAL: 0 # disable MPS which runs out of memory on macos-14 runner

We disable MPS because it runs out of memory (or rather, it doesn't allow allocating GPU-like MPS memory?) on the macos-14 GitHub Actions runner as mentioned at actions/runner-images#9254 (comment).

Ok, reworded at 7451104

chuckwondo

Looks good to me. I'm approving, but I don't know if you want to wait for @leothomas to test using his M1 Mac, since mine is pre-M1.

Moved advanced section from main README.md to docs/installation.md, and reorganized the sections to make it clearer how to install the claymodel environment on macOS using conda-lock.

weiji14 · 2024-03-17T20:52:37Z

Looks good to me. I'm approving, but I don't know if you want to wait for @leothomas to test using his M1 Mac, since mine is pre-M1.

Thanks @chuckwondo for reviewing, I'll merge this in first so that a Mac M1 user can get the install working on their device (#161 (comment)), and will let Leo test things later once the changes here get incoporated into #166 as mentioned at #164 (comment)

Used to wrong syntax

Remove the `--platform linux-64` flag since unified lockfile is for linux-64, osx-64 and osx-arm64 as of #164. Also re-locking the conda-lock.yml file after 2a9ef9d/#193.

weiji14 added 2 commits February 26, 2024 14:35

📌 Add osx-arm64 platform to conda-lock.yml file

05e0364

Need to increase check-added-large-files limit from 512kb to 768kb because conda-lock.yml is now >512kb!

👷 Add macos-14 to GitHub Actions CI test matrix

1cbf311

Test on the M1 macOS runners, see https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source.

weiji14 added this to the v1 Release milestone Feb 26, 2024

weiji14 self-assigned this Feb 26, 2024

weiji14 mentioned this pull request Feb 27, 2024

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

Open

weiji14 added the maintenance Boring but important stuff for the core devs label Feb 27, 2024

weiji14 added 2 commits February 29, 2024 14:15

🔀 Merge branch 'main' into ci/osx-arm64

af53ab4

🔧 Set PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0

efb568b

To try and fix `RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB)` on GitHub Actions CI.

🔧 Try setting PYTORCH_MPS_PREFER_METAL=0

9069449

See if this can disable MPS on GitHub Actions.

chuckwondo reviewed Feb 29, 2024

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

weiji14 and others added 3 commits March 4, 2024 16:56

Exclude conda-lock.yml from pre-commit check-added-large-files

8ebbd31

Xref https://pre-commit.com/#config-exclude Co-authored-by: Chuck Daniels <[email protected]>

💚 Fallback to CPU on MPS when PYTORCH_MPS_PREFER_METAL=0

ea52a3b

Use the environment variable PYTORCH_MPS_PREFER_METAL=0 to fallback to running on CPU on the macos-14 GitHub Actions runner whose MPS hardware doesn't actually work. Inspired by neuml/txtai@e7552a6

🔊 Debug output of torch.backends.mps.is_available()

2119c6e

weiji14 mentioned this pull request Mar 5, 2024

EPIC: Support running on macOS #161

Closed

7 tasks

🏷️ Check that PYTORCH_MPS_PREFER_METAL matches str(0), not 0

cbb30db

Equality check didn't work because we were matching the integer 0, not string '0'.

weiji14 marked this pull request as ready for review March 5, 2024 03:28

weiji14 commented Mar 5, 2024

View reviewed changes

chuckwondo reviewed Mar 5, 2024

View reviewed changes

chuckwondo approved these changes Mar 12, 2024

View reviewed changes

weiji14 added 3 commits March 13, 2024 10:16

Merge branch 'main' into ci/osx-arm64

a0854fd

Reword comment on why MPS is disable on macos-14 runner

7451104

📝 Update installation instructions for macOS users

70b2075

Moved advanced section from main README.md to docs/installation.md, and reorganized the sections to make it clearer how to install the claymodel environment on macOS using conda-lock.

weiji14 enabled auto-merge (squash) March 17, 2024 20:52

🚑 Quickfix note admonition on main README.md

bb3a3e7

Used to wrong syntax

weiji14 merged commit 2fc6332 into main Mar 17, 2024
6 checks passed

weiji14 deleted the ci/osx-arm64 branch March 17, 2024 21:01

weiji14 mentioned this pull request Apr 21, 2024

Update instructions to re-lock conda-lock.yml file #225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add osx-arm64 platform to conda-lock.yml file and GitHub Actions CI #164

Add osx-arm64 platform to conda-lock.yml file and GitHub Actions CI #164

weiji14 commented Feb 26, 2024

weiji14 commented Feb 26, 2024 •

edited

Loading

leothomas commented Feb 28, 2024

weiji14 commented Feb 29, 2024

weiji14 left a comment

weiji14 Mar 5, 2024

chuckwondo Mar 12, 2024

weiji14 Mar 17, 2024

chuckwondo Mar 5, 2024

weiji14 Mar 5, 2024

chuckwondo Mar 6, 2024

weiji14 Mar 7, 2024

weiji14 Mar 12, 2024

chuckwondo left a comment

weiji14 commented Mar 17, 2024

		env:
		PYTORCH_MPS_PREFER_METAL: 0 # disable MPS which doesn't work on macos-14 runner

Add osx-arm64 platform to conda-lock.yml file and GitHub Actions CI #164

Add osx-arm64 platform to conda-lock.yml file and GitHub Actions CI #164

Conversation

weiji14 commented Feb 26, 2024

weiji14 commented Feb 26, 2024 • edited Loading

leothomas commented Feb 28, 2024

weiji14 commented Feb 29, 2024

weiji14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chuckwondo left a comment

Choose a reason for hiding this comment

weiji14 commented Mar 17, 2024

weiji14 commented Feb 26, 2024 •

edited

Loading