Skip to content

Add additive manufacturing example - from shape deviation prediction and compensation #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 71 commits into
base: main
Choose a base branch
from

Conversation

dearleiii
Copy link
Contributor

@dearleiii dearleiii commented Aug 28, 2024

Modulus Pull Request

Description

Checklist

Dependencies

@dearleiii dearleiii changed the title add additive manufacturing example - from Data-driven shape deviation prediction and compensation Add additive manufacturing example - from shape deviation prediction and compensation Aug 28, 2024
dearleiii and others added 19 commits August 28, 2024 16:06
Signed-off-by: Chen <[email protected]>
Signed-off-by: Chen <[email protected]>
…nto compensation

pull update from remote modulus before merging
@mnabian
Copy link
Collaborator

mnabian commented Jan 23, 2025

/blossom-ci

@mnabian mnabian self-assigned this Jan 23, 2025
@mnabian mnabian self-requested a review January 23, 2025 02:29
@mnabian mnabian assigned dearleiii and unassigned mnabian Jan 23, 2025
ktangsali and others added 30 commits March 18, 2025 12:46
* post merge name changes

* some more updates

* updates
* initial regen release

* add readme

* cleanup figures, use existing crps routine

* update changelog
* fixed grid effect

* added entrypoint fix

* whit space

* V2 name change

* fixed regisry

* fixed regisry

* CI

* removed back check

* fixed brocken dock string

* blaa fix

---------

Co-authored-by: Oliver Hennigh <[email protected]>
* This commit address version compatibility issues with pytorch.

Many new features of physicsnemo's distributed utilities, targeting domain parallelism,
require pytorch's DTensor package which was introduced in pytorch 2.6.0.  But, we don't
want to limit physicsnemo usage unnecessarily.

This commit introduces version checking utilities, which are then aplied to ShardTensor.
If torch is below 2.6.0, the distributed utilities will not import ShardTensor but
will still work.  If a user attempts to import ShardTensor directly, avoiding the
__init__.py  file, the version checking utilities will raise an exception.

Tests on shard tensor are likewise skipped if torch 2.6.0 is not installed.

Finally, an additional test file is included to validate the version checking tools.

* This commit further protects against older versions of pytorch
- change shard tensor minimum version to 2.5.9 to accomodate alpha release of 2.6.0a
- set minimum pytorch version for DeviceMesh to 2.4.0
- introduce function decorator that raises an exception when unavailable functions are called.
- adds a little more protection in the tests to differntiate,

---------
Replace `modulus` links with updated `physicsnemo` links.

Co-authored-by: Nicholas Geneva <[email protected]>
* Update README.md

---------

Co-authored-by: Nicholas Geneva <[email protected]>
* Update README.md
* Update dockerfile

* Update dockerfile

* Order swap

* update

* Swap again

* add FORCE_CUDA flags to torch-scatter and torch-cluster source installs, install makani and fignet dependencies explicitly

---------

Co-authored-by: Kaustubh Tangsali <[email protected]>
* Working changes to be cleaned up.

* Rename msc_config.yaml

* Fixed pytorch test issue by removing MSC Cache

* Updated project dependencies

* Find MSC config using absolute path.

* Re-added cuda test parameter.

* Add test to read from public S3 bucket using MSC.

* Revert save_checkpoint_freq value.

* Remove temporary printing

* Remove unnecessary dependency

* Switched to use consistent mechanism for detecting msc URIs

* Moved fsspec.filesystem logic into filesystem.py

* Change to cache for non-file protocols when reading non-modulus models.

* Moved code to generate checkpoint directory.directory

* Added get_checkpoint_dir import

* Address review feedback.

* Changes from code review.

* Addressed file test issue from review.

* Fix to file existence check.

* Fix merge conflicts due to project name change.

* Updated CHANGELOG.

* Added Multi-Storage Client to allow checkpointing to/from Object Storage

Signed-off-by: Chris Hawes <[email protected]>

* Addressed issues identified by pre-commit.

* Update filesystem.py

* Update __init__.py

* Update Dockerfile

---------

Signed-off-by: Chris Hawes <[email protected]>
Co-authored-by: Nicholas Geneva <[email protected]>
* Fixes DeprecationWarning introduced in setuptools>=77

* setuptools does not allow redundant license specification in project.license and project.classifiers
…ining (NVIDIA#790)

* Add recent checkpoints option, adjust configs

* Doc for deterministic_sampler

* Typo fix

* Bugfix and cleanup of corrdiff regression loss and UNet

* Minor fix in docstrings

* Bugfix + doc for corrdiff regression CE loss

* Refactor corrdiff configs for custom dataset

* Bugfix in configs

* Added info in corrdiff docs for custom training

* Minor change in corrdiff config

* bring back base config file removed by mistake

* Added config for generation on custom dataset

* Forgot some config files

* Fixed overlap pixel in custom config based on discussion in PR NVIDIA#703

* Corrdiff fixes to enable non-squared images and/or non-square patches. Needs testing.

* Fix small bug in config

* Removed arguments redundancy in patching utilities + fixed hight-width order

* Cleanup

* Added tests for rectangle images and patches

* Added wandb logging for corrdiff training

* Implements patching API. Refactors corrdiff train abnd generate to use it

* Corrdiff function to register new custom dataset

* Reorganize configs again

* Correction in configs: training duration is NOT in kilo images

* Readme re-write

* Updated CHANGELOG

* Fixed formatting

* Test fixes

* Typo fix

* Fixes on patching API

* Fixed patching bug and tests

* Simplifications in corrdiff diffusion step

* Forgot to propagate change to test for cordiff diffusion step

* Renamed patching API to explicit 2D

* Fixed shape in test

* Replace loops with fold/unfold patching for perf

* Added method to dynamically change number of patches in RandomPatching

* Adds safety checks for patch shapes in patching function. Fixes tests

* Fixes docs

* Forgot a fix in docs

* New embedding selection strategy in CorrDiff UNet models

* Updated CHANGELOG.md

* Fixed tests for SongUNet position emneddings

* More robust tests for patching

* Fixed docs bug

* More bugfixes in doc tests

* Some renaming

Signed-off-by: Charlelie Laurent <[email protected]>

* Bugfixes, cleanup, docstrings

Signed-off-by: Charlelie Laurent <[email protected]>

* Docstring improvement for UNet and EDMPrecondSR

Signed-off-by: Charlelie Laurent <[email protected]>

* Docs for InfiniteSampler

Signed-off-by: Charlelie Laurent <[email protected]>

* Corrected Readme info about training/generate from checkpoints

Signed-off-by: Charlelie Laurent <[email protected]>

* Bugfixes in generate scripts, cleanup debugging flags

Signed-off-by: Charlelie Laurent <[email protected]>

* Removed blank line from changelog

Signed-off-by: Charlelie Laurent <[email protected]>

* Fixes in CI tests

Signed-off-by: Charlelie Laurent <[email protected]>

* Forgot to commit one of the CI fixes

Signed-off-by: Charlelie Laurent <[email protected]>

* Fix example in doc

Signed-off-by: Charlelie Laurent <[email protected]>

---------

Signed-off-by: Charlelie Laurent <[email protected]>
Co-authored-by: Peter Harrington <[email protected]>
* Add common dataloader interface

* Training script runs with refactored dataloader

* More trainer refactoring

* Refactor inference

* Add support for gradient accumulation

* Add support for AMP 16-bit training

* Align training parameters with StormCast paper

* Add comments to inference.py

* Add lite configs

* Add lite configs

* Small bug fixes

* Add support for compiling model

* Validation fixes

* Refactor checkpoint loading at startup

* Support wandb offline mode

* Fix regression_model_forward

* Update CHANGELOG.md
* Update era5 download example

* Update changelog
* add code to measure time spent in pytest

* speed up datapipe tests

* fix cleanup of dist vars (was causing slowdown in test_capture.py)

* speed up model tests

* bring back some parameterizations, reduced cpu tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.