Description
Context
Check #2865 for a bit of historical context, which lead to #3091.
In #2865 we proposed implementing PyHPS to interact with HPC clusters. While PyHPS is very powerful, it is not an scheduler, so it needs to be installed additionally to an scheduler (like SLURM) and depends on it.
In this PR we are going to support SLURM HPC clusters only and directly, without PyHPS.
Research
Check #3397 for the research done on launching MAPDL and PyMAPDL on SLURM clusters
Introduction
For the moment we are going to focus more on launching single MAPDL instances, leaving aside the MapdlPool
since it does create issues when regarding resource splitting. I think comming up with a good default resource sharing scheme might be a bit tricky.
Also, we are going to focus on the most useful stuff:
- [Case 1] Batch script submission (Scenario A in PyMAPDL and PyHPS #2865)
- [Case 2] Interactive MAPDL instance on HPC, and PyMAPDL on entrypoint (Scenario B in PyMAPDL and PyHPS #2865)
- [Case 3] Interactive MAPDL instance on HPC, and PyMAPDL on outside-cluster computer (Similar to scenario B in PyMAPDL and PyHPS #2865)
We might need to ssh to the entrypoint pc. - [Case 4] Batch submission from ouside-cluster machine. This is tricky because attaching files is complicated. This issue is solved if we are running interactively, because PyMAPDL can take care of uploading the files to the instance. So we will leave this one to the very end.
Roadmap
Start to implement this on the following PRs:
- Basic reorg in the main files. Just reorg, to avoid triggering Kathy's review docs: reorg hpc section #3436 docs: another hpc docs reorg #3465
- Add info about MAPDL running on clusters, env var etc.
- [Case 1] Running PyMAPDL locally on a cluster smoothly feat: passing tight integration env vars to mapdl #3500 docs: documenting using pymapdl on clusters #3466
- Allow PyMAPDL to start and connect to interactive sessions
To be broken in different PRs depending on whether the client is running:- [Case 2] Client is running on the headnode
feat: support for launching an MAPDL instance in an SLURM HPC cluster #3497 - [Case 3] Client is running outside the cluster.
feat: adding support to launch MAPDL on remote HPC clusters #3713
Remote launching MAPDL instances on an HPC cluster #3562
- [Case 2] Client is running on the headnode
- Check if we can implement a command line interface to simplify submission of scripts or launching remote MAPDL instances. HPC command line interface #2961
- ... More to be defined
Other features
- Detach remote ssh session from cluster. We should be able to launch MAPDL instances on remote machines without SLURM scheduler.
- Checks if
run_location
exists.