-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add osx-arm64 platform to conda-lock.yml file and GitHub Actions CI #164
Conversation
Need to increase check-added-large-files limit from 512kb to 768kb because conda-lock.yml is now >512kb!
Test on macos-14 failing at https://github.com/Clay-foundation/model/actions/runs/8042592659/job/21963448881#step:4:79: ============================= test session starts ==============================
platform darwin -- Python 3.11.8, pytest-8.0.2, pluggy-1.4.0 -- /Users/runner/micromamba/envs/claymodel/bin/python
cachedir: .pytest_cache
rootdir: /Users/runner/work/model/model
plugins: anyio-4.3.0
collecting ... collected 16 items
src/tests/test_callbacks.py::test_callbacks_wandb_log_mae_reconstruction PASSED [ 6%]
src/tests/test_datamodule.py::test_datapipemodule[fit-train_dataloader-ClayDataModule] PASSED [ 12%]
src/tests/test_datamodule.py::test_datapipemodule[fit-train_dataloader-GeoTIFFDataPipeModule] PASSED [ 18%]
src/tests/test_datamodule.py::test_datapipemodule[predict-predict_dataloader-ClayDataModule] PASSED [ 25%]
src/tests/test_datamodule.py::test_datapipemodule[predict-predict_dataloader-GeoTIFFDataPipeModule] PASSED [ 31%]
src/tests/test_datamodule.py::test_geotiffdatapipemodule_list_from_s3_bucket PASSED [ 37%]
src/tests/test_model.py::test_model_vit_fit FAILED [ 43%]
src/tests/test_model.py::test_model_predict[mean-CLAYModule-32-true] FAILED [ 50%]
src/tests/test_model.py::test_model_predict[mean-ViTLitModule-16-mixed] FAILED [ 56%]
src/tests/test_model.py::test_model_predict[patch-CLAYModule-32-true] FAILED [ 62%]
src/tests/test_model.py::test_model_predict[patch-ViTLitModule-16-mixed] FAILED [ 68%]
src/tests/test_model.py::test_model_predict[group-CLAYModule-32-true] FAILED [ 75%]
src/tests/test_model.py::test_model_predict[group-ViTLitModule-16-mixed] FAILED [ 81%]
src/tests/test_trainer.py::test_cli_main[fit] PASSED [ 87%]
src/tests/test_trainer.py::test_cli_main[validate] PASSED [ 93%]
src/tests/test_trainer.py::test_cli_main[test] PASSED [100%]
=================================== FAILURES ===================================
______________________________ test_model_vit_fit ______________________________
datapipe = IterableWrapperIterDataPipe
def test_model_vit_fit(datapipe):
"""
Run a full train and validation loop using 1 batch.
"""
# Get some random data
dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
# Initialize model
model: L.LightningModule = ViTLitModule()
# Run tests in a temporary folder
with tempfile.TemporaryDirectory() as tmpdirname:
# Training
trainer: L.Trainer = L.Trainer(
accelerator="auto",
devices=1,
precision="16-mixed",
fast_dev_run=True,
default_root_dir=tmpdirname,
)
> trainer.fit(model=model, train_dataloaders=dataloader)
src/tests/test_model.py:84:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:544: in fit
call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:580: in _fit_impl
self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
t = Parameter containing:
tensor([[[[-7.8660e-03, -3.7126e-03, 1.1882e-02, ..., 3.5900e-03,
2.2451e-02, 4....[ 1.9821e-02, -1.4641e-02, -3.9173e-02, ..., -1.5309e-02,
-3.2961e-02, 2.5180e-02]]]], requires_grad=True)
def convert(t):
if convert_to_format is not None and t.dim() in (4, 5):
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
non_blocking, memory_format=convert_to_format)
> return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Using 16bit Automatic Mixed Precision (AMP)
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_________________ test_model_predict[mean-CLAYModule-32-true] __________________
datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_clay.CLAYModule'>, precision = '32-true'
embeddings_level = 'mean'
@pytest.mark.parametrize(
"litmodule,precision",
[
(CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
(ViTLitModule, "16-mixed"),
],
)
@pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
def test_model_predict(datapipe, litmodule, precision, embeddings_level):
"""
Run a single prediction loop using 1 batch.
"""
# Get some random data
dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
# Initialize model
if litmodule == CLAYModule:
litargs = {
"embeddings_level": embeddings_level,
}
else:
litargs = {}
model: L.LightningModule = litmodule(**litargs)
# Run tests in a temporary folder
with tempfile.TemporaryDirectory() as tmpdirname:
# Training
trainer: L.Trainer = L.Trainer(
accelerator="auto",
devices="auto",
precision=precision,
fast_dev_run=True,
default_root_dir=tmpdirname,
)
# Prediction
> trainer.predict(model=model, dataloaders=dataloader)
src/tests/test_model.py:124:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
t = Parameter containing:
tensor([[-0.6110, 0.1869],
[-0.5562, -0.1639],
[ 0.4407, 0.0668],
...,
[-0.2467, 0.0192],
[ 0.3921, 0.6417],
[ 0.1073, -0.1948]], requires_grad=True)
def convert(t):
if convert_to_format is not None and t.dim() in (4, 5):
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
non_blocking, memory_format=convert_to_format)
> return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
________________ test_model_predict[mean-ViTLitModule-16-mixed] ________________
datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_vit.ViTLitModule'>, precision = '16-mixed'
embeddings_level = 'mean'
@pytest.mark.parametrize(
"litmodule,precision",
[
(CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
(ViTLitModule, "16-mixed"),
],
)
@pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
def test_model_predict(datapipe, litmodule, precision, embeddings_level):
"""
Run a single prediction loop using 1 batch.
"""
# Get some random data
dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
# Initialize model
if litmodule == CLAYModule:
litargs = {
"embeddings_level": embeddings_level,
}
else:
litargs = {}
model: L.LightningModule = litmodule(**litargs)
# Run tests in a temporary folder
with tempfile.TemporaryDirectory() as tmpdirname:
# Training
trainer: L.Trainer = L.Trainer(
accelerator="auto",
devices="auto",
precision=precision,
fast_dev_run=True,
default_root_dir=tmpdirname,
)
# Prediction
> trainer.predict(model=model, dataloaders=dataloader)
src/tests/test_model.py:124:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
t = Parameter containing:
tensor([[[[-1.8605e-02, -7.1301e-03, 7.6448e-03, ..., 4.5676e-02,
-1.9168e-02, 1....[-2.3598e-02, -1.7550e-02, -3.5928e-03, ..., 1.3235e-02,
1.6877e-02, 5.0347e-02]]]], requires_grad=True)
def convert(t):
if convert_to_format is not None and t.dim() in (4, 5):
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
non_blocking, memory_format=convert_to_format)
> return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Using 16bit Automatic Mixed Precision (AMP)
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_________________ test_model_predict[patch-CLAYModule-32-true] _________________
datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_clay.CLAYModule'>, precision = '32-true'
embeddings_level = 'patch'
@pytest.mark.parametrize(
"litmodule,precision",
[
(CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
(ViTLitModule, "16-mixed"),
],
)
@pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
def test_model_predict(datapipe, litmodule, precision, embeddings_level):
"""
Run a single prediction loop using 1 batch.
"""
# Get some random data
dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
# Initialize model
if litmodule == CLAYModule:
litargs = {
"embeddings_level": embeddings_level,
}
else:
litargs = {}
model: L.LightningModule = litmodule(**litargs)
# Run tests in a temporary folder
with tempfile.TemporaryDirectory() as tmpdirname:
# Training
trainer: L.Trainer = L.Trainer(
accelerator="auto",
devices="auto",
precision=precision,
fast_dev_run=True,
default_root_dir=tmpdirname,
)
# Prediction
> trainer.predict(model=model, dataloaders=dataloader)
src/tests/test_model.py:124:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
t = Parameter containing:
tensor([[ 0.4197, 0.7064],
[-0.3017, -0.5617],
[ 0.3429, -0.5639],
...,
[-0.0225, 0.3200],
[ 0.1219, -0.0069],
[ 0.4542, -0.6156]], requires_grad=True)
def convert(t):
if convert_to_format is not None and t.dim() in (4, 5):
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
non_blocking, memory_format=convert_to_format)
> return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_______________ test_model_predict[patch-ViTLitModule-16-mixed] ________________
datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_vit.ViTLitModule'>, precision = '16-mixed'
embeddings_level = 'patch'
@pytest.mark.parametrize(
"litmodule,precision",
[
(CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
(ViTLitModule, "16-mixed"),
],
)
@pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
def test_model_predict(datapipe, litmodule, precision, embeddings_level):
"""
Run a single prediction loop using 1 batch.
"""
# Get some random data
dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
# Initialize model
if litmodule == CLAYModule:
litargs = {
"embeddings_level": embeddings_level,
}
else:
litargs = {}
model: L.LightningModule = litmodule(**litargs)
# Run tests in a temporary folder
with tempfile.TemporaryDirectory() as tmpdirname:
# Training
trainer: L.Trainer = L.Trainer(
accelerator="auto",
devices="auto",
precision=precision,
fast_dev_run=True,
default_root_dir=tmpdirname,
)
# Prediction
> trainer.predict(model=model, dataloaders=dataloader)
src/tests/test_model.py:124:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
t = Parameter containing:
tensor([[[[ 1.6769e-02, 2.0459e-04, 3.0793e-02, ..., 4.2431e-02,
1.5604e-03, 2....[ 1.7419e-02, 6.7345e-04, -5.1334e-03, ..., -2.9248e-02,
-1.8247e-02, 3.1454e-02]]]], requires_grad=True)
def convert(t):
if convert_to_format is not None and t.dim() in (4, 5):
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
non_blocking, memory_format=convert_to_format)
> return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Using 16bit Automatic Mixed Precision (AMP)
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_________________ test_model_predict[group-CLAYModule-32-true] _________________
datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_clay.CLAYModule'>, precision = '32-true'
embeddings_level = 'group'
@pytest.mark.parametrize(
"litmodule,precision",
[
(CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
(ViTLitModule, "16-mixed"),
],
)
@pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
def test_model_predict(datapipe, litmodule, precision, embeddings_level):
"""
Run a single prediction loop using 1 batch.
"""
# Get some random data
dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
# Initialize model
if litmodule == CLAYModule:
litargs = {
"embeddings_level": embeddings_level,
}
else:
litargs = {}
model: L.LightningModule = litmodule(**litargs)
# Run tests in a temporary folder
with tempfile.TemporaryDirectory() as tmpdirname:
# Training
trainer: L.Trainer = L.Trainer(
accelerator="auto",
devices="auto",
precision=precision,
fast_dev_run=True,
default_root_dir=tmpdirname,
)
# Prediction
> trainer.predict(model=model, dataloaders=dataloader)
src/tests/test_model.py:124:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:864: in predict
return call._call_and_handle_interrupt(
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:44: in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:903: in _predict_impl
results = self._run(model, ckpt_path=ckpt_path)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py:965: in _run
self.strategy.setup(self)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:77: in setup
self.model_to_device()
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/strategies/single_device.py:74: in model_to_device
self.model.to(self.root_device)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/fabric/utilities/device_dtype_mixin.py:54: in to
return super().to(*args, **kwargs)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1160: in to
return self._apply(convert)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:810: in _apply
module._apply(fn)
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:833: in _apply
param_applied = fn(param)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
t = Parameter containing:
tensor([[-0.1077, 0.0463],
[ 0.0930, -0.4663],
[ 0.5140, 0.1373],
...,
[ 0.3736, 0.0872],
[ 0.5844, 0.0184],
[ 0.5122, 0.6130]], requires_grad=True)
def convert(t):
if convert_to_format is not None and t.dim() in (4, 5):
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
non_blocking, memory_format=convert_to_format)
> return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
../../../micromamba/envs/claymodel/lib/python3.11/site-packages/torch/nn/modules/module.py:1158: RuntimeError
----------------------------- Captured stderr call -----------------------------
INFO: GPU available: True (mps), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO: Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
------------------------------ Captured log call -------------------------------
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 GPU available: True (mps), used: True
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 TPU available: False, using: 0 TPU cores
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 IPU available: False, using: 0 IPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 HPU available: False, using: 0 HPUs
INFO lightning.pytorch.utilities.rank_zero:rank_zero.py:64 Running in `fast_dev_run` mode: will run the requested loop using 1 batch(es). Logging and checkpointing is suppressed.
_______________ test_model_predict[group-ViTLitModule-16-mixed] ________________
datapipe = IterableWrapperIterDataPipe
litmodule = <class 'src.model_vit.ViTLitModule'>, precision = '16-mixed'
embeddings_level = 'group'
@pytest.mark.parametrize(
"litmodule,precision",
[
(CLAYModule, "16-mixed" if torch.cuda.is_available() else "32-true"),
(ViTLitModule, "16-mixed"),
],
)
@pytest.mark.parametrize("embeddings_level", ["mean", "patch", "group"])
def test_model_predict(datapipe, litmodule, precision, embeddings_level):
"""
Run a single prediction loop using 1 batch.
"""
# Get some random data
dataloader = torchdata.dataloader2.DataLoader2(datapipe=datapipe)
# Initialize model
Unable to serialize instance <lightning.pytorch.plugins.io.async_plugin.AsyncCheckpointIO object at
0x144974950>
warning(val)
src/tests/test_trainer.py::test_cli_main[validate]
/Users/runner/micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/cli.py:518: LightningCLI's args parameter is intended to run from within Python like if it were from the command line. To prevent mistakes it is not recommended to provide both args and command line arguments, got: sys.argv[1:]=['--verbose', 'src/tests/'], args=['validate', '--print_config=skip_null'].
src/tests/test_trainer.py::test_cli_main[test]
/Users/runner/micromamba/envs/claymodel/lib/python3.11/site-packages/lightning/pytorch/cli.py:518: LightningCLI's args parameter is intended to run from within Python like if it were from the command line. To prevent mistakes it is not recommended to provide both args and command line arguments, got: sys.argv[1:]=['--verbose', 'src/tests/'], args=['test', '--print_config=skip_null'].
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED src/tests/test_model.py::test_model_vit_fit - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[mean-CLAYModule-32-true] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[mean-ViTLitModule-16-mixed] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[patch-CLAYModule-32-true] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[patch-ViTLitModule-16-mixed] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[group-CLAYModule-32-true] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 6.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
FAILED src/tests/test_model.py::test_model_predict[group-ViTLitModule-16-mixed] - RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 156.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
============= 7 failed, 9 passed, 28 warnings in 67.73s (0:01:07) ============== Main error message is @leothomas, if you have time, could you try installing from the |
Did a fresh install with micromamba and was able to run the test without errors on this branch - although I just realized that I have a Mac M2 and not M1 (not sure if that made an important difference) |
To try and fix `RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB)` on GitHub Actions CI.
Cool, M2 should be fine too (probably more memory than M1). I tried setting |
See if this can disable MPS on GitHub Actions.
Xref https://pre-commit.com/#config-exclude Co-authored-by: Chuck Daniels <[email protected]>
Use the environment variable PYTORCH_MPS_PREFER_METAL=0 to fallback to running on CPU on the macos-14 GitHub Actions runner whose MPS hardware doesn't actually work. Inspired by neuml/txtai@e7552a6
Equality check didn't work because we were matching the integer 0, not string '0'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, have fixed the MPS-related unit test failure on the macos-14
GitHub Actions runner by falling back to the CPU. Ready for final review!
accelerator=( | ||
"cpu" # fallback to CPU on osx-arm64 CI | ||
if os.getenv("PYTORCH_MPS_PREFER_METAL") == "0" | ||
and torch.backends.mps.is_available() | ||
else "auto" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this environment variable hack to fallback to CPU on the macos-14
GitHub Actions runners. If someone with a Mac M1/M2 device can run python -m pytest --verbose src/tests/
locally and get the unit tests to pass, that would be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leothomas, can you do this? My Mac is pre-M1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this into the main
branch first, and then Leo can merge this into the changes in his branch at #166. Might be easier to test this way.
.github/workflows/test.yml
Outdated
env: | ||
PYTORCH_MPS_PREFER_METAL: 0 # disable MPS which doesn't work on macos-14 runner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This disables MPS for all platforms in the matrix. Is that what you want, or do you want to disable it only for macos-14?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPS is only available on macos-14, and I've added an if-condition in the unit tests to only disable MPS on devices with MPS, so should be ok? Open to a better way of doing things though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I suppose the comment may have thrown me off. Here, "doesn't work" doesn't mean that we can't choose to use it, but rather that when we do choose to use it, the test fails due to running out of memory. So it would work if we had enough memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe I can reword it a little bit, how about this:
env: | |
PYTORCH_MPS_PREFER_METAL: 0 # disable MPS which doesn't work on macos-14 runner | |
env: | |
PYTORCH_MPS_PREFER_METAL: 0 # disable MPS which runs out of memory on macos-14 runner |
We disable MPS because it runs out of memory (or rather, it doesn't allow allocating GPU-like MPS memory?) on the macos-14
GitHub Actions runner as mentioned at actions/runner-images#9254 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, reworded at 7451104
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'm approving, but I don't know if you want to wait for @leothomas to test using his M1 Mac, since mine is pre-M1.
Moved advanced section from main README.md to docs/installation.md, and reorganized the sections to make it clearer how to install the claymodel environment on macOS using conda-lock.
Thanks @chuckwondo for reviewing, I'll merge this in first so that a Mac M1 user can get the install working on their device (#161 (comment)), and will let Leo test things later once the changes here get incoporated into #166 as mentioned at #164 (comment) |
Used to wrong syntax
Support installation on macOS ARM64 devices (M1 chips) by:
osx-arm64
as another platform in theenvironment.yml
fileconda-lock lock --mamba --file environment.yml --with-cuda=12.0
macos-14
to the.github/workflows/test.yml
GitHub Actions workflowReferences:
Addresses #161 and extends #162.