Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Add color.h test #524

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

jodavies
Copy link
Collaborator

@jodavies jodavies commented May 21, 2024

This test seems trickier than expected. Currently:

mpirun -np {1,2} parform: OK
mpirun -np {3,4} parform: hang?
mpirun -np {5,6} parform: crash

valgrind vorm: OK
valgrind tvorm -w2: mostly hangs takes 100-200s, sometimes finishes in 2s
valgrind tvorm -w4: OK, finishes in 2s

The CI sees valgrind errors in vorm, tvorm that I can't reproduce locally. Edit: I can reproduce them on ubuntu 20.04 (as the runners are using) but not in 22.04.

@jodavies
Copy link
Collaborator Author

I added some print statements to PF_UnpackRedefinedPreVars to try to work out what happens. There are many successful redefines, and then we have:

0 i = 0
0 trying to redefine ik1 (35) to 1
0    loop j = 0; j < 2
0       AC.pfirstnum[0] (ik1c) (37), index 35
0       AC.pfirstnum[1] (adj) (38), index 35

so it fails to find the variable it is trying to redefine in pfirstnum. So then it evaluates

if ( AC.inputnumbers[j] < inputnumber ) {

for j=2, causing the Conditional jump or move depends on uninitialised value(s).

So, why is ik1 no longer in the AC.pfirstnum array? Earlier in the program it was there (and had index 35).

@tueda
Copy link
Collaborator

tueda commented May 22, 2024

Thanks for the investigation. I will look into the ParFORM issue. (You know, in programming, the person you were a month ago is a stranger. Then, the person you were more than 10 years ago is...)

By the way, maybe this (the code that gives Valgrind error) should be broken up into small unit tests.

@jodavies
Copy link
Collaborator Author

jodavies commented Jun 12, 2024

It seems the problem with valgrind and tvorm -w2 is due to the load balancing. The same issue happens with -w3. If I run this test under callgrind, we see that ThreadsProcessor makes 100s of millions of calls of LoadReadjusted (which is stealing terms from the working thread and distributing them around the idle threads) which also involve locks.

With w4, there are only ~400 calls of LoadReadjusted.

Edit: it is some kind of race condition though it seems: if I add a MesPrint in LoadReadjusted it prints only ~400 times, even when running under valgrind.

The easiest solution is to just disable valgrind for this test...

jodavies added a commit to jodavies/form that referenced this pull request Jun 13, 2024
Don't run this test with tform and less than 4 threads, it goes extremely slowly.
See discussion in vermaseren#524 .
@jodavies
Copy link
Collaborator Author

Once #525 is merged we can rebase this on top and the parform tests will run successfully also.

@jodavies
Copy link
Collaborator Author

Here is a simpler way to reproduce the load-balancing issue under valgrind. I cut the expression out of the color test.

This script does basically nothing at all, but on my system valgrind tform -w2 test-lb.frm takes about 30 seconds and results in millions of calls to LoadReadjusted and its pthread mutex. Using 4 threads or regular form, it finishes in a fraction of a second.

#-
#include color.h

Local Q10 =
    + NA*I2R*cR*[cR-cA/2]^3
    - 1/2*NA*I2R*cA*[cR-cA/2]^3
    - 3*NA*I2R*cA*cR*[cR-cA/2]^2
    + 5/2*NA*I2R*cA^2*[cR-cA/2]^2
    + 7/4*NA*I2R*cA^2*cR*[cR-cA/2]
    - 25/12*NA*I2R*cA^3*[cR-cA/2]
    - 1/12*NA*I2R*cA^3*cR
    + 19/48*NA*I2R*cA^4
    + 5*cOlTt(cOlpA1,cOlpA1,cOlpA1,cOlpA1)*[cR-cA/2]
    - 17/6*cOlTt(cOlpA1,cOlpA1,cOlpA1,cOlpA1)*cA
    + 2*cOlTt(cOlpA1,cOlpA1,cOlpA1,N1_?,N2_?)*f(cOlpA1,N1_?,N2_?)*i_
    + cOlTt(cOlpA1,cOlpA1,N1_?,cOlpA1,N2_?)*f(cOlpA1,N1_?,N2_?)*i_
    + cOlTt(cOlpA1,cOlpA1,N1_?,N2_?)*f(cOlpA1,N1_?,N3_?)*f(cOlpA1,N2_?,N3_?)
    - 1/6*cOlTt(N1_?,N2_?,N3_?,N4_?,N5_?)*f(N1_?,N3_?,N6_?)*f(N2_?,N5_?,N7_?)*f(N4_?,N6_?,N7_?)*i_*cA
    + 1/6*cOlTt(N1_?,N2_?,N3_?,N4_?,N5_?)*f(N1_?,N5_?,N6_?)*f(N2_?,N3_?,N7_?)*f(N4_?,N6_?,N7_?)*i_*cA
    - 1/12*cOlTt(N1_?,N2_?,N3_?,N4_?,N5_?)*f(N1_?,N6_?,N7_?)*f(N2_?,N3_?,N6_?)*f(N4_?,N5_?,N7_?)*i_*cA
    - 1/12*cOlTt(N1_?,N2_?,N3_?,N4_?,N5_?)*f(N1_?,N6_?,N7_?)*f(N2_?,N3_?,N7_?)*f(N4_?,N5_?,N6_?)*i_*cA
    - 1/12*cOlTt(N1_?,N2_?,N3_?,N4_?,N5_?)*f(N1_?,N6_?,N7_?)*f(N2_?,N5_?,N6_?)*f(N3_?,N4_?,N7_?)*i_*cA
    - 1/12*cOlTt(N1_?,N2_?,N3_?,N4_?,N5_?)*f(N1_?,N6_?,N7_?)*f(N2_?,N5_?,N7_?)*f(N3_?,N4_?,N6_?)*i_*cA
    - 2/3*cOlTt(N1_?,N2_?,N3_?,N4_?)*f(N1_?,N2_?,N5_?)*f(N3_?,N4_?,N5_?)*cA*[cR-cA/2]
    + 7/18*cOlTt(N1_?,N2_?,N3_?,N4_?)*f(N1_?,N2_?,N5_?)*f(N3_?,N4_?,N5_?)*cA^2
    + 2/3*cOlTt(N1_?,N2_?,N3_?,N4_?)*f(N1_?,N4_?,N5_?)*f(N2_?,N3_?,N5_?)*cA*[cR-cA/2]
    - 7/18*cOlTt(N1_?,N2_?,N3_?,N4_?)*f(N1_?,N4_?,N5_?)*f(N2_?,N3_?,N5_?)*cA^2
   ;
.sort

Print +s;
.end

@jodavies jodavies added this to the v4.3.2 milestone Nov 6, 2024
Don't run this test with tform and less than 4 threads, it goes extremely slowly.
See discussion in vermaseren#524 .
@coveralls
Copy link

coveralls commented Nov 14, 2024

Coverage Status

coverage: 53.172% (+2.9%) from 50.227%
when pulling 9c77706 on jodavies:test-color
into 0289435 on vermaseren:master.

@tueda
Copy link
Collaborator

tueda commented Nov 15, 2024

Do you eventually want to put color.frm and color.h here?

The former does not comply with the rule in README (modulo user.frm). The latter is a large file and is originally distributed at the Nikhef site.

Perhaps the rule could be reconsidered and rewritten. Maybe the check.rb script could include a functionality to automatically download libraries as needed.

@jodavies
Copy link
Collaborator Author

color.h could be fetched by the CI config as forcer.h currently is, and the test run in the same way? It doesn't then run locally with a make check then, but is still tested on commit.

@tueda
Copy link
Collaborator

tueda commented Nov 15, 2024

If we want to run the CI tests for color.frm in the same way as we do for forcer.frm (and it is not necessary for everyone to test color.frm locally), then we can configure this in test.yaml.

For running Valgrind/coverage jobs for color.frm on the CI, currently I'm not sure if it is ready to use, as I forgot if I wrote it in an extendable and general way. Forcer is not compatible with Valgrind, so I didn't need it.

@jodavies
Copy link
Collaborator Author

What do you mean, by forcer not being compatible with valgrind?

@tueda
Copy link
Collaborator

tueda commented Nov 15, 2024

Forcer always hits memory leaks of FORM, if my memory is correct.

@jodavies
Copy link
Collaborator Author

OK, I will try to test that. Ideally both the color and forcer tests can pass valgrind too.

@tueda
Copy link
Collaborator

tueda commented Nov 15, 2024

Here is the issue that is problematic for Forcer with Valgrind: #252

@jodavies
Copy link
Collaborator Author

Currently the forcer test (and color, if it moves to the same status) are not part of the coveralls statistic. Perhaps they should also go into that run. At least on my machine, the forcer test running with vorm takes only 8s.

@jodavies
Copy link
Collaborator Author

Do you think it makes sense to keep things like forcer, color etc all in separate directories, or would it be better to create a single directory "extra" (for eg) with such tests in? Then the CI can run everything in the extra dir (but not for the valgrind run).

@tueda
Copy link
Collaborator

tueda commented Nov 29, 2024

I think collecting extra test cases into the extra directory is a good idea. We have the benchmarks directory in #541. So, the same can be done for the extra tests. (Note: formunit/fu.frm is a benchmark test.)

@jodavies
Copy link
Collaborator Author

jodavies commented Dec 3, 2024

@tueda how does this seem? Now "extra" runs both the forcer and color test, and I added them to the coverage run. Further tests which require external libraries or otherwise somehow don't fit in the default tests could also go into "extra".

@tueda
Copy link
Collaborator

tueda commented Dec 3, 2024

Oh, now I see what you meant, but just to clarify: (1) You don't want to collect checkpoint and multithreaded into extra, do you? And big tests will be put in extra. In future, adding many big tests in extra in this way may slow down this job and we would need to split it again. (2) Will you add Valgrind check for extra? See:

# we measure code coverage only for tests checked with Valgrind.

That said, this principle may be open to discussion.

By the way, in future, duplicated code in workflows may be rewritten with reusable workflows.

@jodavies
Copy link
Collaborator Author

jodavies commented Dec 4, 2024

In principle checkpoint and multithreaded could move into extra also. Then the yaml file is a bit simpler in that there are fewer "special" jobs. With this structure it is also easier to add tests to extra in the future, since one only needs to possibly add to the libraries to download (I had in mind tests for mincer, harmpol, summer, Opiter?).

If they take a long time, can't we just split the runs up with the "groups" mechanism that the valgrind tests already use?

In principle it would be nice to add valgrind tests for extra (currently as you say, we would need to skip the forcer test, and also the color test when running under tform).

Is there a particular reason why the coverage check should only be for the tests which run under valgrind? This isn't actually quite the case at the moment anyway, since a few tests are skipped for the valgrind run but run anyway with lcov.

@tueda
Copy link
Collaborator

tueda commented Dec 4, 2024

OK, placing all the extra tests in extra is beneficial. I agree.

Is there a particular reason why the coverage check should only be for the tests which run under valgrind?

It depends on how we define the scope of test coverage. Measuring coverage for Valgrind-validated tests ensures that our tests include verification of the absence of memory errors, at least those detectable by Valgrind (though currently there are #pend_if for valgrind?, which skip this verification during coverage measurement).

@jodavies
Copy link
Collaborator Author

jodavies commented Dec 4, 2024

Right, this does make sense; it is good to know the % of code which is both covered and valgrind-error-free. In this case is there a way to fake the valgrind flag, so that the tests which are #pend_if valgrind? are also skipped in the lcov run?

Then all test runs could "just run everything", and rely on the #pend_if flag to do the skipping. Then we could have, for eg, the color test running with valgrind vorm but not tvorm for now.

@tueda
Copy link
Collaborator

tueda commented Dec 4, 2024

is there a way to fake the valgrind flag

I think I can add a command option for this. --fake-valgrind or do you have a better name?

@jodavies
Copy link
Collaborator Author

jodavies commented Dec 4, 2024

--fake-valgrind sounds fine to me. So once that exists, the next steps would be:

  • move multithreaded and checkpoint to "extra"
  • run the "extra" tests also in the valgrind run, with #pend_if valgrind? where necessary (eg the forcer test)
  • run the lcov checks with --fake-valgrind

More generally, is it possible to have a "build" job which caches the executables, which are then used by all of the checks? Currently all 35 valgrind jobs compile form, as I understand it. Maybe this will improve performance a bit, though I expect some of the harder valgrind tests will still be the bottleneck.

tueda added a commit to tueda/form that referenced this pull request Dec 4, 2024
@tueda
Copy link
Collaborator

tueda commented Dec 5, 2024

is it possible to have a "build" job which caches the executables, which are then used by all of the checks?

This might be worth a try. But there could be some time overhead for uploading/downloading executables, and hardware or OS versions may not be guaranteed to fully match in different jobs(?) (I've had a bad experience with Google Colab VM, though it is a different vendor.)

@jodavies
Copy link
Collaborator Author

jodavies commented Dec 5, 2024

OK, then in that case maybe it is fine as it is. At least, the tests only build the binaries that they use so the build is fairly fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants