Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-thread fix and request merging #205

Merged
merged 45 commits into from
Sep 5, 2024
Merged

Multi-thread fix and request merging #205

merged 45 commits into from
Sep 5, 2024

Conversation

houjun
Copy link
Member

@houjun houjun commented Jul 1, 2024

Related Issues / Pull Requests

#190

Description

  • Add client-side region transfer requests merging optimization, currently limited to 1D, and when there are more than 50 requests (PDC_MERGE_TRANSFER_MIN_COUNT).
  • Add new PDCregion_transfer_start(_all)_mpi APIs that internally call MPI_Barrier for more coordinated metadata and data operations, avoiding forced collective operations with the original PDCregion_transfer_start(_all) APIs (Performace improvement for vpicio #211).
  • Fix compile issues when multi-threading is enabled.

What changes are proposed in this pull request?

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality not to work as expected; for instance, examples in this repository must be updated too)
  • This change requires a documentation update

Checklist:

  • My code modifies existing public API, or introduces new public API, and I updated or wrote docstrings
  • I have commented my code
  • My code requires documentation updates, and I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

houjun and others added 25 commits February 23, 2024 14:51
* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>
…mands in run scripts to detect Perlmutter compute nodes
* Update getting_started.rst (#184)
* Removing gres option for ctest (#182)
* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>

* enable cache by default (#187)
* Removing PDC macro (#189)
* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core
* Remove PDC macro
* Committing clang-format changes

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* BDCATS fix (#193)
* Fix issues with bdcats_batch
* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Update mpi_test.sh (#197)
* Update .gitlab-ci.yml (#195)
* Updates for latest integration with Jacamar and Gitlab tokens in CI
* VPICIO bugfix (#196)
* Fix VPICIO bug
* Add more checks and error out when no server is selected
* Committing clang-format changes
* Add VPICIO and BDCATS to MPI test

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <[email protected]>

* Fix vpicio_mts (#199)

---------

Co-authored-by: Houjun Tang <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
@houjun houjun marked this pull request as ready for review July 1, 2024 21:37
@houjun houjun marked this pull request as draft July 1, 2024 21:38
@houjun houjun marked this pull request as ready for review August 30, 2024 22:42
@jeanbez jeanbez merged commit 275919a into develop Sep 5, 2024
8 checks passed
@houjun houjun deleted the wait_all_fix branch September 10, 2024 18:52
jeanbez added a commit that referenced this pull request Dec 3, 2024
* Performace improvement for vpicio (#211)

* Paritial fix for the region transfer/wait performance issue

* Committing clang-format changes

* Improve the async processing for vpicio_mts_all, also fix a few compile issues

* Committing clang-format changes

* Minor change

* Continue to optimize start_all performance for vpicio, add a few time related convinient functions

* Committing clang-format changes

* Fix hanging issue in CI testing

* Committing clang-format changes

* Disable debug prints

* Revert back for non-all ops

* Better pthread management

* Better pthread management

* Fix timeout issue with CI testing and clang-formatting

* Committing clang-format changes

* Test

* Trigger test

* Committing clang-format changes

* Trigger CI

* Committing clang-format changes

* Trigger CI

* Switch to static partition for vpicio

* Replace vpicio_mts with new implementation

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Multi-thread fix and request merging (#205)

* Update getting_started.rst (#184)
* Removing gres option for ctest (#182)
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>

* Fix an issue with region transfer request
* Committing clang-format changes
* Merge small requests when they are contiguous and 1D, change srun commands in run scripts to detect Perlmutter compute nodes
* Merge only for REGION_LOCAL partition
* Committing clang-format changes
* Fix a bug that causes some tests to fail
* Fix a couple of issues with start/wait all
* Committing clang-format changes
* Add aggregation support for contiguous read operations
* Committing clang-format changes
* Fix compile issue when multithread is enabled
* Committing clang-format changes
* minor change with test code
* Committing clang-format changes
* Remove metadata mutex for multi threading
* Committing clang-format changes
* Fix mutex
* Committing clang-format changes
* Fix an issue when closing an obj
* Sync develop to stable (v.0.5) (#201)
* Update getting_started.rst (#184)
* Removing gres option for ctest (#182)
* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>

* enable cache by default (#187)
* Removing PDC macro (#189)
* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core
* Remove PDC macro
* Committing clang-format changes

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* BDCATS fix (#193)
* Fix issues with bdcats_batch
* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Update mpi_test.sh (#197)
* Update .gitlab-ci.yml (#195)
* Updates for latest integration with Jacamar and Gitlab tokens in CI
* VPICIO bugfix (#196)
* Fix VPICIO bug
* Add more checks and error out when no server is selected
* Committing clang-format changes
* Add VPICIO and BDCATS to MPI test

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <[email protected]>

* Fix vpicio_mts (#199)

---------

Co-authored-by: Houjun Tang <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Committing clang-format changes
* Fix rebase issue
* Add timers
* Committing clang-format changes
* Add explict transfer start (all) with MPI communicator
* Committing clang-format changes
* MPI fix
* remove debug msg
* Committing clang-format changes
* Add function comment for doc
* Revert script changes
* Committing clang-format changes
* Revert script changes
* Committing clang-format changes
* Revert script setting

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Update CMakeLists.txt to bump version number (#202)

* Update CMakeLists.txt to bump version number

* Update clang-format-check.yml

* IDIOMS Update & BULKI v0.1 (#203)

* fix cmake mercury_util not found issue

* update for Julia support

* fix hdf5.h not found for src/tools

* update container config

* add libhdf5-dev for Github Actions

* update CMake for HDF5 in tools

* update logic for finding HDF5

* update

* remove use system hdf5

* delete useless find library

* update findHDF5

* Feature/dart (#11)

Update to avoid fixing compilation issue on src/tools (due to : HDF5 cannot be found)

* Use cc on Perlmutter (#161)

Dr. Tang fixed a compilation issue in NERSC CI where HDF5 cannot be detected even if the cray-parallel-hdf5 module is loaded on Perlmultter.

* update with fixes on tools and llsm example

* add gitignore for llsm

* update gitignore

* Feature/dart (#12)

* fix formatting

* update clangformat10

* update base dockerfile

* Add clang-format10 to docker container. Also fixed clang-format.

Add clang-format10 to docker container. Also fixed clang-format.

* Fix pdc ls (#154)

* pdc import, export, ls compiled successfully

* removed requested files

* formatting issues

* changed install tools

* gets checkpoint files

* grabbing checkpoint files from within sub-directories, minor comments

* Committing clang-format changes

* Committing clang-format changes

* Fix a few issues with pdc_ls

* Committing clang-format changes

---------

Co-authored-by: nickaruwang <[email protected]>
Co-authored-by: Nick Wang <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <[email protected]>

* update documentation

* update document 

update document

* sync branch 

sync branch

* no UUID module is required

* update document and make UUID an optional package

* update docker repo name

* updating docker repo name and make UUID optional

* Complete support for Docker and Github Codespace  (#157)

Include support for Docker and Github Codespace so we can run our dev environment with the support of Docker.

* SQLite and RocksDB support for KVtags (#165)

SQLite and RocksDB support for KVtags

* fix round for tag delete

* update test

* bulki update

* BULKI base type worked

* BULKI all tests done

* new index code

* update

* update new test

* update csv bench

* update

* update script

* adding python scripts for generating large metadata set for LLSM application

* update json schema

* better json validator

* update importer

* update code for non-MPI compatibility

* update llsm converter

* update LLSM data converter

* split files

* update .gitignore

* update

* add timing info

* update

* update tag size

* detect object creation failure

* update

* update object name with date

* update for robustness

* update

* update JMD_DEBUG option

* update output for overall output

* update inttypes.h

* update

* update extractor

* update inttypes.h

* update converter

* update importer information

* Update getting_started.rst (#184)

* Removing gres option for ctest (#182)

* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>

* fix issue

* fixed search issues

* update for infix

* update

* index persistence still needs improvement

* update

* enable cache by default (#187)

* Removing PDC macro (#189)

* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core
* Remove PDC macro
* Committing clang-format changes

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* update

* range query done'

* range query local test passed

* multi-condition in progress

* clean up code

* add comments

* new benchmark

* update

* update range query test

* update cmake:

* update

* update

* update

* update

* someta range query

* someta range query

* someta range query

* fix value serialization

* update

* update double free

* update

* update

* update

* fixed pointer issue

* rb_tree delete fixed, now need to check index persistence

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* BDCATS fix (#193)

* Fix issues with bdcats_batch

* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* update

* clean up code

* update test sh

* IDIOMS persistence DONE

* update

* remove old kvtag benchmarks

* update

* update

* update changes

* dart info

* update

* multi data type for the same key, supported now

* Monitoring changes from feature/dart to develop (#18)

Major changes: 
* IDIOMS -> affix-based query benchmark
* IDIOMS -> Simulation Test
* IDIOMS -> Multi data type supported for the same key
* IDIOMS -> Range Query and Exact Query for Numeric Values
* IDIOMS -> benchmark for numeric values (exact search and range query)
* IDIOMS -> Index Persistence
* BULKI -> A data serialization and deserialization mechanism.

* fix CMakeLists.txt

* update

* update format

* update BULKI interface order

* BULKI API sorted

* add idioms ci test

* Feature/dart (#20)

1. add documentation about BULKI and IDIOMS query conditions
2. add ci test for IDIOMS
3. optimized BULKI to save space on its metadata fields.

* Feature/dart (#22)

update version

* update

* update

* update

* remove unnecessary .bin file

* update

* update

---------

Co-authored-by: Houjun Tang <[email protected]>
Co-authored-by: nickaruwang <[email protected]>
Co-authored-by: Nick Wang <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>

* Fix region transfer with object static partitioning (#214)

* Update pdc_region_transfer.c

* Committing clang-format changes

* Update .gitlab-ci.yml

Fix issue with Perlmutter CI libfabric module

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* EQSIM benchmark code and fixes (#213)

* Update getting_started.rst (#184)

* Removing gres option for ctest (#182)

* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>

* enable cache by default (#187)

* Benchmark code for EQSIM data

* Committing clang-format changes

* Minor adjustments

* Committing clang-format changes

* Updates

* Committing clang-format changes

* Change vpicio to use local server partitioning, add some debug prints

* Committing clang-format changes

* Add metadata query to benchmark code

* Committing clang-format changes

* Add ZFP compression for read and write

* Committing clang-format changes

* Add a option to use more ranks to read data so total data of each rank is less than the 4GB chunk limit

* Committing clang-format changes

* Add a data query code for EQSIM data

* Committing clang-format changes

* Minor adjustments for the HDF5 read code

* Committing clang-format changes

* Fix an issue with periodic data flush, minor changes to benchmark code

* Committing clang-format changes

* fix an issue with 3d read segfault

* Committing clang-format changes

* Fix compile issue

* Update .gitlab-ci.yml

* Update sleep time

* Replace function

* Replace function

* Minor updates and doc changes

* Committing clang-format changes

* Update

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: Houjun Tang <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Zhang Wei <[email protected]>
Co-authored-by: nickaruwang <[email protected]>
Co-authored-by: Nick Wang <[email protected]>
jeanbez added a commit that referenced this pull request Dec 3, 2024
* Update getting_started.rst (#184)
* Removing gres option for ctest (#182)
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>

* Fix an issue with region transfer request
* Committing clang-format changes
* Merge small requests when they are contiguous and 1D, change srun commands in run scripts to detect Perlmutter compute nodes
* Merge only for REGION_LOCAL partition
* Committing clang-format changes
* Fix a bug that causes some tests to fail
* Fix a couple of issues with start/wait all
* Committing clang-format changes
* Add aggregation support for contiguous read operations
* Committing clang-format changes
* Fix compile issue when multithread is enabled
* Committing clang-format changes
* minor change with test code
* Committing clang-format changes
* Remove metadata mutex for multi threading
* Committing clang-format changes
* Fix mutex
* Committing clang-format changes
* Fix an issue when closing an obj
* Sync develop to stable (v.0.5) (#201)
* Update getting_started.rst (#184)
* Removing gres option for ctest (#182)
* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>

* enable cache by default (#187)
* Removing PDC macro (#189)
* Removing gres option for ctest
* Removing gres option from scripts
* Update check for core
* Remove PDC macro
* Committing clang-format changes

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* BDCATS fix (#193)
* Fix issues with bdcats_batch
* Committing clang-format changes

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Update mpi_test.sh (#197)
* Update .gitlab-ci.yml (#195)
* Updates for latest integration with Jacamar and Gitlab tokens in CI
* VPICIO bugfix (#196)
* Fix VPICIO bug
* Add more checks and error out when no server is selected
* Committing clang-format changes
* Add VPICIO and BDCATS to MPI test

---------

Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jean Luca Bez <[email protected]>

* Fix vpicio_mts (#199)

---------

Co-authored-by: Houjun Tang <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>

* Committing clang-format changes
* Fix rebase issue
* Add timers
* Committing clang-format changes
* Add explict transfer start (all) with MPI communicator
* Committing clang-format changes
* MPI fix
* remove debug msg
* Committing clang-format changes
* Add function comment for doc
* Revert script changes
* Committing clang-format changes
* Revert script changes
* Committing clang-format changes
* Revert script setting

---------

Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Hyunju Oh <[email protected]>
Co-authored-by: Jean Luca Bez <[email protected]>
Co-authored-by: github-actions <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants