Python >= 3.10
Clone the repository
git clone https://github.com/TomMelt/mdb.git
cd mdb
Create a virtual Python virtual environment
python3 -m venv $HOME/mdb-venv
Activate the virtual environement
source $HOME/mdb-venv/bin/activate
Install mdb and the optional dependency with pip3
pip3 install .
pip3 install termgraph
As part of the CONQUEST project. I (Connor Aird) tried using mdb on the UCL cluster, Myriad. I installed mdb following the above guide within an interactive session with the following modules loaded.
Currently Loaded Modulefiles:
1) beta-modules 3) compilers/intel/2022.2 5) libxc/6.2.2/intel-2022 7) openssl/1.1.1u 9) gerun 11) emacs/28.1
2) gcc-libs/10.2.0 4) mpi/intel/2021.6.0/intel 6) cmake/3.21.1 8) python/3.11.4 10) git/2.41.0-lfs-3.3.0 12) userscripts/1.4.0
I then launched an mdb sessions in the background using
mdb launch -b gdb -n 4 -t ./Conquest
Ctrl+z
bg
Then, attached to the session, which appeared to be successful.
(mdb-venv) [ccaecai@node-d00a-124 np-4-debug]$ mdb attach
mdb - mpi debugger - built on various backends. Type ? for more info. To exit interactive mode type "q", "quit", "Ctrl+D" or "Ctrl+]".
(mdb 0-3)
Adding a breakpoint seems to be successful,
(mdb 0-3) command break exx_kernel_default.f90:204
0: Breakpoint 2 at 0x7ee299: file exx_kernel_default.f90, line 204.
************************************************************************
1: Breakpoint 2 at 0x7ee299: file exx_kernel_default.f90, line 204.
************************************************************************
2: Breakpoint 2 at 0x7ee299: file exx_kernel_default.f90, line 204.
************************************************************************
3: Breakpoint 2 at 0x7ee299: file exx_kernel_default.f90, line 204.
However, trying to run until hitting this breakpoint results in an error
(mdb 0-3) command c
0: Continuing.
0: [cli_0]: write_line error; fd=9 buf=:cmd=init pmi_version=1 pmi_subversion=1
0: :
0: system msg for write_line failure : Bad file descriptor
0: [cli_0]: Unable to write to PMI_fd
0: [cli_0]: write_line error; fd=9 buf=:cmd=get_appnum
0: :
0: system msg for write_line failure : Bad file descriptor
0: Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
0: MPIR_Init_thread(178):
0: MPID_Init(1427)......:
0: MPIR_pmi_init(129)...: PMI_Get_appnum returned -1
0: [cli_0]: write_line error; fd=9 buf=:cmd=abort exitcode=1090831
0: :
0: system msg for write_line failure : Bad file descriptor
0:
0: Program received signal SIGSEGV, Segmentation fault.
0: MPIR_Err_return_comm (comm_ptr=0x7fffffff1100, fcname=0x0, errcode=1090831) at ../../src/mpi/errhan/errutil.c:309
0: 309 ../../src/mpi/errhan/errutil.c: No such file or directory.
0: Missing separate debuginfos, use: debuginfo-install numactl-libs-2.0.12-5.el7.x86_64
************************************************************************
1: Continuing.
1: [cli_1]: write_line error; fd=10 buf=:cmd=init pmi_version=1 pmi_subversion=1
1: :
1: system msg for write_line failure : Bad file descriptor
1: [cli_1]: Unable to write to PMI_fd
1: [cli_1]: write_line error; fd=10 buf=:cmd=get_appnum
1: :
1: system msg for write_line failure : Bad file descriptor
1: Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
1: MPIR_Init_thread(178):
1: MPID_Init(1427)......:
1: MPIR_pmi_init(129)...: PMI_Get_appnum returned -1
1: [cli_1]: write_line error; fd=10 buf=:cmd=abort exitcode=1090831
1: :
1: system msg for write_line failure : Bad file descriptor
1:
1: Program received signal SIGSEGV, Segmentation fault.
1: MPIR_Err_return_comm (comm_ptr=0x7fffffff1100, fcname=0x0, errcode=1090831) at ../../src/mpi/errhan/errutil.c:309
1: 309 ../../src/mpi/errhan/errutil.c: No such file or directory.
1: Missing separate debuginfos, use: debuginfo-install numactl-libs-2.0.12-5.el7.x86_64
************************************************************************
2: Continuing.
2: [cli_2]: write_line error; fd=15 buf=:cmd=init pmi_version=1 pmi_subversion=1
2: :
2: system msg for write_line failure : Bad file descriptor
2: [cli_2]: Unable to write to PMI_fd
2: [cli_2]: write_line error; fd=15 buf=:cmd=get_appnum
2: :
2: system msg for write_line failure : Bad file descriptor
2: Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
2: MPIR_Init_thread(178):
2: MPID_Init(1427)......:
2: MPIR_pmi_init(129)...: PMI_Get_appnum returned -1
2: [cli_2]: write_line error; fd=15 buf=:cmd=abort exitcode=1090831
2: :
2: system msg for write_line failure : Bad file descriptor
2:
2: Program received signal SIGSEGV, Segmentation fault.
2: MPIR_Err_return_comm (comm_ptr=0x7fffffff1100, fcname=0x0, errcode=1090831) at ../../src/mpi/errhan/errutil.c:309
2: 309 ../../src/mpi/errhan/errutil.c: No such file or directory.
2: Missing separate debuginfos, use: debuginfo-install numactl-libs-2.0.12-5.el7.x86_64
************************************************************************
3: Continuing.
3: [cli_3]: write_line error; fd=19 buf=:cmd=init pmi_version=1 pmi_subversion=1
3: :
3: system msg for write_line failure : Bad file descriptor
3: [cli_3]: Unable to write to PMI_fd
3: [cli_3]: write_line error; fd=19 buf=:cmd=get_appnum
3: :
3: system msg for write_line failure : Bad file descriptor
3: Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
3: MPIR_Init_thread(178):
3: MPID_Init(1427)......:
3: MPIR_pmi_init(129)...: PMI_Get_appnum returned -1
3: [cli_3]: write_line error; fd=19 buf=:cmd=abort exitcode=1090831
3: :
3: system msg for write_line failure : Bad file descriptor
3:
3: Program received signal SIGSEGV, Segmentation fault.
3: MPIR_Err_return_comm (comm_ptr=0x7fffffff1100, fcname=0x0, errcode=1090831) at ../../src/mpi/errhan/errutil.c:309
3: 309 ../../src/mpi/errhan/errutil.c: No such file or directory.
3: Missing separate debuginfos, use: debuginfo-install numactl-libs-2.0.12-5.el7.x86_64