Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[rapids] removed spark tests, updated to a more recent rapids release (…
…#1219) * [gpu] clean-up of sources.list and keyring file assertion * merge from master * allow main to access dkms certs * remove full upgrade * tested sources.list cleanup function * only unhold systemd on debian12 where the build breaks otherwise * merged from custom-images/examples/secure-boot/install_gpu_driver.sh * added comments for difficut to understand functions * tested with 24.06 ; using conda for cuda 12 * tested with 24.06 ; using conda for cuda 12 inlined functions and re-ordered definitions using 22.08 max for cuda 11 * removed os check functions and the use of them * capturing runtime of mamba install * retry failed mamba with conda * increase machine type ; reduce disk size ; test 11.8 (12.4 is default) * spark does not yet have 24.08.0 * tested with 2.1 and 2.2 * always create environment ; run test scripts with python from envs/dask-rapids/bin * skipping dask with yarn runtime tests for now * added copyright block * temporary changes to improve test performance * increasing machine type, attempting 2024.06 again now that I have fixed the conda mismatch * refactored code a bit * how did this get in this change? * we are seeing an error in this config file ; investigate * temporary changes to improve test performance * Adding disable shielded boot flag and disk type ssd flag to enhance the cluster creation (#1209) * Adding disable shielded boot flag and disk type ssd flag to enhance the cluster creation * Disabling secure boot for all the gpu dependent init action scripts. * Disabling secure boot for all the gpu dependent init action scripts. * tested on debian11 w/ cuda11 * added skein tests for dask-yarn * accidentally using the wrong bigtable.sh in this PR ; checking out master version * using correct conda env for dask-yarn environment * added skein test for dask * that was the wrong filename * perform the skein tests before skipping the dask ones * whitespace changes * removing the excessive logging * taking master hostname from argv ; added array test * defining two separate services to ease debugging * dask service tests are passing * refactored yarn tests to its own py file ; updated rapids.sh to separate services into their own units * tested with debian and rocky * added skein test * reduced operations slightly when setting master hostname * python operators. amirite? * status fails ; list-units | grep works * explicitly including cudf * corrected variable name * working with cuda12 + yarn as dask runtime specifying a recent dask for rapids with cuda12 specifying yarn yaml environment using path to python applied fixes to gpu driver installer from gpu-20240813 * removed pinning for numba as per jakirkham * easing the version constraints some * fully changing the variable name * removing test_skein.py * removed extra lines from rebase * reducing line count * relaxed cuda version to 11.8 * disabling rocky9 tests for now * skipping the whole test on rocky9 for now * trying 24.08 * increase max cluster age for rocky9 ; using CUDA_VERSION=11.8 for non-spark rapids runtime (this should be changed) * increase timeout for init actions as well as max-age from previous commit * reverted attempt to change a r/o variable * trying with 24.08 * removing spark from the rapids tests * 2.2.20 is known to work * using new fangled key management path * explicitly specifying path to curl ; also installing curl * perform update before install * modified to run as a custom-images script * remove delta from master for gpu/ * recently tested to have worked with n1-standard-4 and 54GB * reduce log noise from Dockerfile * removing delta from dask on master * update verify_dask_instance test to use systemd unit defined in dask and rapids init actions * removing quotes from systemctl command * protecting from empty string state * replacing removed dask-runtime=yarn instance test * [dask-rapids] merge from custom-images rapids/BUILD * removed dependence on verify_xgboost_spark.scala - this belongs in [spark-rapids] * removed dependence on dask rapids/rapids.sh * added utility functions * reverted dask_spec="dask>=2024.5" * using realpath to /opt/conda/miniconda3/bin/mamba instead of default symlink * remove conda environment [dask] if installed * asserting existence of directory depended on by the script when run as custom-images script * created exit_handler and prepare_to_install functions to set up and clean up rapids/test_rapids.py * refactored to make use of systemd unit defined in rapids.sh * added retry to ssh * removed condition to keep tests from running on 2.0 images * revert to master * refactored to match dask ; removed all spark code paths (see spark-rapids) * added some testing helpers and documentation * dask-yarn tests do not work ; disabling until new release of dask-yarn is produced * increase max idle time ; print the command to be run * cleaned up comment positioning and content * using ram disk for temp files if we have it * double quotes will allow temp directory variable to be expanded correctly * using else instead of is_rocky * corrected release version names * revert to mainline * simplify and modernize this comment * default to using internal IP ; have not yet renamed rapids to dask-rapids ; tunnel through iap * prepare layout for rename of rapids to dask-rapids * reduce noise from docker run * reduce noise in docker build * removing older GPU from list * removing delta from master * Thread.yield() * improved documentation * default to non-private ip ; maybe that is why this last run failed * revert dataproc_test_case.py to last known good * using correct df command ; using greater or equal to rapids version ; dask>=2024.7 ; correctly capturing retval of installer program --------- Co-authored-by: Prince Datta <[email protected]>
- Loading branch information