Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds support for Serial workflow #36

Merged
merged 1 commit into from
Oct 20, 2021

Conversation

VMois
Copy link

@VMois VMois commented Sep 28, 2021

Closes #35

@VMois VMois self-assigned this Sep 28, 2021
@VMois VMois marked this pull request as draft September 28, 2021 14:27
@VMois VMois marked this pull request as ready for review September 28, 2021 14:47
Copy link
Member

@tiborsimko tiborsimko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  • Please call it reana-debug.yaml, since we don't use "-serial" elsewhere and we don't really want to use reana.yaml as that would be Serial advertising. (This is similar naming as in h4l example, see the issue description.)

  • Please put scripts into reana-debug.yaml so that they don't live separately. If you would like them to live separately, we'd need to include one into "eventselection" and one into "statanalysis", because that's how ATLAS is packaging their code base. However, we would like to stick to what ATLAS typically uses, so that there would be no surprises for the researchers such as "do I need to create these files?", hence it'd be better to include commands into reana-debug.yaml. The name is scary enough so that people won't look there

@VMois
Copy link
Author

VMois commented Sep 29, 2021

(1) yes, will do it.

(2) How to add a script to the Serial workflow? Yadage has the script option, Serial doesn't according to the code. I can chain commands together, maybe.

@mvidalgarcia
Copy link
Member

For reference, in other demos we chain commands: https://github.com/reanahub/reana-demo-cms-h4l/blob/master/reana-debug.yaml

@VMois VMois requested a review from tiborsimko October 19, 2021 08:49
signal_file: 'eventselection/submitDir/hist-sample.root'
background_file: /code/data/background.root
dxaod_file: https://recastwww.web.cern.ch/recastwww/data/reana-recast-demo/mc15_13TeV.123456.cap_recast_demo_signal_one.root
workflow:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well 👍 Two observations:

  1. Comparing workspace files, we see config and results and other extra files for the debug serial workflow when compared to yadage workflow:
$ reana-client ls -w debug
NAME                                                SIZE    LAST-MODIFIED      
recast_xsecs.txt                                    81      2021-10-19T09:30:06
recast_inputs.txt                                   113     2021-10-19T09:30:06
config/meas.xml                                     403     2021-10-19T09:30:27
config/meas_channel1.xml                            791     2021-10-19T09:30:27
results/meas_meas.root                              459     2021-10-19T09:30:28
results/meas_results.table                          15      2021-10-19T09:30:28
results/meas_channel1_meas_model.root               36751   2021-10-19T09:30:28
results/meas_combined_meas_model.root               37960   2021-10-19T09:30:28
eventselection/submitDir/driver.root                1378    2021-10-19T09:30:06
eventselection/submitDir/location                   126     2021-10-19T09:30:06
eventselection/submitDir/hist-sample.root           10337   2021-10-19T09:30:20
eventselection/submitDir/submitted                  0       2021-10-19T09:30:20
statanalysis/fitresults/pre.png                     10142   2021-10-19T09:30:29
statanalysis/fitresults/post.png                    10120   2021-10-19T09:30:29
statanalysis/fitresults/limit.png                   17047   2021-10-19T09:30:31
statanalysis/fitresults/limit_data.json             174     2021-10-19T09:30:31
statanalysis/fitresults/limit_data_nomsignal.json   176     2021-10-19T09:30:31
eventselection/submitDir/input/sample.root          2114    2021-10-19T09:30:06
eventselection/submitDir/hist/sample.root           2050    2021-10-19T09:30:06

$ reana-client ls -w recast | grep -v ^_ | grep -v ^workflow
NAME                                                SIZE     LAST-MODIFIED      
eventselection/submitDir/driver.root                1378     2021-10-19T09:30:17
eventselection/submitDir/location                   126      2021-10-19T09:30:17
eventselection/submitDir/hist-sample.root           10338    2021-10-19T09:30:39
eventselection/submitDir/submitted                  0        2021-10-19T09:30:39
statanalysis/fitresults/pre.png                     10142    2021-10-19T09:30:45
statanalysis/fitresults/post.png                    10120    2021-10-19T09:30:46
statanalysis/fitresults/limit.png                   17047    2021-10-19T09:30:47
statanalysis/fitresults/limit_data.json             174      2021-10-19T09:30:47
statanalysis/fitresults/limit_data_nomsignal.json   176      2021-10-19T09:30:47
eventselection/submitDir/input/sample.root          2114     2021-10-19T09:30:17
eventselection/submitDir/hist/sample.root           2050     2021-10-19T09:30:17

This is because the Yadage workflow changes working directory upon execution:

$ docker run -i -t --rm reanahub/reana-demo-atlas-recast-eventselection:1.0 /bin/bash
[bash][atlas]:build > source /home/atlas/release_setup.sh
Configured GCC from: /opt/lcg/gcc/6.2.0binutils/x86_64-slc6
Configured AnalysisBase from: /usr/AnalysisBase/21.2.51/InstallArea/x86_64-slc6-gcc62-opt
[bash][atlas AnalysisBase-21.2.51]:build > source /analysis/build/x86*/setup.sh
[bash][atlas AnalysisBase-21.2.51]:build > pwd
/analysis/build

So the business creating recast_xsecs.txt etc stays "hidden" there, in /analysis/build, which is not under current working directory but under container root, i.e. using ephemeral storage. (This is BTW why some of the ATLAS RECAST analyses are failing on production due to low ephemeral disk storage space. CC @lukasheinrich. Is this the case for all the latest ATLAS images? Perhaps we should change that?)

@VMois What you could do to make this more apparent in the debug serial example is to simply create a new directory analysis_build under usual user workspace where all these extra files would live, that is recast_*.txt, config/*, and results/*.

  1. Please squash your commits into one, there is no need to keep the intermediate version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should I do this only for serial, or yadage too?

  2. Yes. I wanted to do it after the final review. Doesn't make sense to squash a couple of times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT 1, just for serial. We should keep Yadage untouched as it emulates exactly what ATLAS RECAST workflows are usually doing in production.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I can see recast*.txt in the workspace tab in UI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see it for vanilla RECAST example? I posted above the full output, and it is not there. See also:

$ reana-client ls -w recast | grep -c recast
0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT 2, after you amend the analysis_build business, you can safely squash, because this is good to go.

Copy link
Author

@VMois VMois Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For Serial workflow I can see the following:
reana-client ls -w recast-serial3.3 | grep -c recast
2

I don't understand what I should amend in serial if recast* file is presented. Should I just mkdir analysis_build in the step and put all recast* files there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly what I mentioned above "to simply create a new directory analysis_build under usual user workspace...".

The desired result would look like:

$ reana-client ls -w debug
NAME                                                 SIZE    LAST-MODIFIED      
analysis_build/recast_xsecs.txt                      81      2021-10-19T09:30:06
analysis_build/recast_inputs.txt                     113     2021-10-19T09:30:06
analysis_build/config/meas.xml                       403     2021-10-19T09:30:27
analysis_build/config/meas_channel1.xml              791     2021-10-19T09:30:27
analysis_build/results/meas_meas.root                459     2021-10-19T09:30:28
analysis_build/results/meas_results.table            15      2021-10-19T09:30:28
analysis_build/results/meas_channel1_meas_model.root 36751   2021-10-19T09:30:28
analysis_build/results/meas_combined_meas_model.root 37960   2021-10-19T09:30:28
eventselection/submitDir/driver.root                 1378    2021-10-19T09:30:06
eventselection/submitDir/location                    126     2021-10-19T09:30:06
eventselection/submitDir/hist-sample.root            10337   2021-10-19T09:30:20
eventselection/submitDir/submitted                   0       2021-10-19T09:30:20
statanalysis/fitresults/pre.png                      10142   2021-10-19T09:30:29
statanalysis/fitresults/post.png                     10120   2021-10-19T09:30:29
statanalysis/fitresults/limit.png                    17047   2021-10-19T09:30:31
statanalysis/fitresults/limit_data.json              174     2021-10-19T09:30:31
statanalysis/fitresults/limit_data_nomsignal.json    176     2021-10-19T09:30:31
eventselection/submitDir/input/sample.root           2114    2021-10-19T09:30:06
eventselection/submitDir/hist/sample.root            2050    2021-10-19T09:30:06

In this way, Serial's ./analysis_build directory would exactly emulate what Yadage is doing in a somewhat "hidden" manner in /analysis_build directory. (note the dot, relative vs absolute path)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a separate issue for this as it requires more work than expected. It is not possible to control where the config/ and results/ folders will be created.

@VMois
Copy link
Author

VMois commented Oct 19, 2021

Rebase done. Ready for merge

@tiborsimko tiborsimko merged commit b796810 into reanahub:master Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

workflow: develop serial workflow for debugging
3 participants