Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilisation fixes for JNA (and Tests) #1603

Merged

Conversation

matthiasblaesing
Copy link
Member

The primary reason for this PR is e8cade5:

After 4bf9f92, which reduces the number of WeakReferences used when callbacks are created, higher error rates were observed in unittests. The errors were not normal problems (i.e. Exceptions), but hard JVM crashes. Investigation showed, that in callback code cleanup routines assumed, that fields not holding NULL need to be freed:

jna/native/callback.c

Lines 338 to 346 in e8cade5

if (cb->arg_classes) {
unsigned i;
for (i=0;i < cb->cif.nargs;i++) {
if (cb->arg_classes[i]) {
(*env)->DeleteWeakGlobalRef(env, cb->arg_classes[i]);
}
}
free(cb->arg_classes);
}

That assumption is false as malloc was used for memory allocation, which does not zero the allocated memory. That behavior does not matter if the memory is allocated from the OS, which zeros pages before handing them to the application, but if memory is freed and later reused by the same application, zeroing needs to be done by the application.

Instead of malloc allocations are now done with calloc, which zeros memory. It is assumed, that the benefit (higher runtime safety) if worth more than the, assumed small, performance hit by the zeroing step.

To prevent other potential problems, calloc is also applied in other locations, that might not be fully initialized.

The other commits address problems rendering unittests unstable:

  • wrong assumptions about certificates in the Windows ROOT store
  • wrong assumption about java environment variable JAVA_HOME in windows service test
  • unstable Secur32 tests
  • invalid assumptions about decryptability of read-only files in Advapi32Test
  • invalid assumption about DNS servers in IPHlpAPITest in an IPv6 environment

…gned certificates

The tests assumption that certificates in the ROOT store are all self
signed does not hold anymore.
Use system property java.home in favor of environment variable
JAVA_HOME to register the windows service.
@matthiasblaesing
Copy link
Member Author

@dbwiddis it would be great if you could have a look at this. I have two intentions here. The first is a general check of the changes here. The motivation for the first four changes was, that even locally I could no run the unittests reliably anymore. With the changes I can run the tests and get clean runs most of the time. The reason I got working was the fifth commit, which was motivated by unittests, that became unstable after recent changes (see above).

The second intention is a request for help for the IPHlpAPITest. The test seems to make wrong assumptions:

https://ci.appveyor.com/project/matthiasblaesing/jna/build/job/8a0wpm4q3snj2wqq#L2372

[junit] Testcase: testGetUdpStatistics(com.sun.jna.platform.win32.IPHlpAPITest): FAILED
[junit] Datagrams received with errors (332) or no port (2) should be less than inbound datagrams (97).
[junit] junit.framework.AssertionFailedError: Datagrams received with errors (332) or no port (2) should be less than inbound datagrams (97).
[junit] at com.sun.jna.platform.win32.IPHlpAPITest.testGetUdpStatistics(IPHlpAPITest.java:169)

I see your reasoning, but it seems, that windows does not listen to reason :-)

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious about this one. What memory do we malloc where we then try to delete pointers in random locations of it? Meaning, do you have an actual example where a block is allocated with non-zero bytes, then other code calls free on a particular byte within that block?

Also left some minor questions/comments that you can disregard if you like.


// Launch IE in a manner that should ensure it opens even if the system default browser is Chrome, Firefox, or something else.
Runtime.getRuntime().exec("cmd /c start iexplore.exe -nohome \"about:blank\"");
hr = Ole32.INSTANCE.CoInitializeEx(Pointer.NULL, Ole32.COINIT_MULTITHREADED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will tearDown() still run if this fails? Making sure you don't over- CoUninitialize().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will tearDown() still run if this fails? Making sure you don't over- CoUninitialize().

"fails" would only include an hr of RPC_E_CHANGED_MODE. Both S_OK and S_FALSE require the uninitialize.

OleAuto.INSTANCE.VariantClear(url);
OleAuto.INSTANCE.VariantClear(result);

ieDispatch.Release();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not cleanup with a failure, move the release into tearDown()?

Runtime.getRuntime().exec("cmd /c start iexplore.exe -nomerge -nohome \"http://www.google.com\"");
WinNT.HRESULT hr;

hr = Ole32.INSTANCE.CoInitializeEx(Pointer.NULL, Ole32.COINIT_MULTITHREADED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a base class since this is repeated?

Copy link
Contributor

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM generally.

The UDP stats test should be accurate per the docs. I guess a weak fallback would be to test that if either errors or no port is positive, then received is positive, e.g.,

Math.min(1, stats.dwNoPorts + stats.dwInErrors) <= stats.dwInDatagrams


// Launch IE in a manner that should ensure it opens even if the system default browser is Chrome, Firefox, or something else.
Runtime.getRuntime().exec("cmd /c start iexplore.exe -nohome \"about:blank\"");
hr = Ole32.INSTANCE.CoInitializeEx(Pointer.NULL, Ole32.COINIT_MULTITHREADED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will tearDown() still run if this fails? Making sure you don't over- CoUninitialize().

"fails" would only include an hr of RPC_E_CHANGED_MODE. Both S_OK and S_FALSE require the uninitialize.

- Advapi32#DecryptFile can be called on read-only file
- Secur32Test is unstable with ImpersonateSecurityContext, splitting it
  into separate test improves stability
- Fix IPHlpAPITest#testGetUdpStatistics - the statistics that is
  returned can be inconsistent
malloc does not clear the memory it allocates and thus random data can
be present. It was observed, that cleanup routines assume, that fields
not holding NULL need to be freed for example. That is not true if the
field was not cleared on allocation. One such example was
callback->arg_classes in free_callback.
@matthiasblaesing
Copy link
Member Author

I was curious about this one. What memory do we malloc where we then try to delete pointers in random locations of it? Meaning, do you have an actual example where a block is allocated with non-zero bytes, then other code calls free on a particular byte within that block?

Have a look in callback.c. The arg_classes array is an array of jobjects. It is malloced here:

cb->arg_classes = (jobject*)malloc(sizeof(jobject) * argc);

the arguments are further processed and while doing so, the entries of the arg_classes array are initialized, UNLESS, one of these places is hit:

jtype = get_java_type(env, cls);

if (!cb->java_arg_types[i+3]) {

if (jtype == -1) {

if (!cb->arg_types[i]) {

Then a jump to failure_cleanup occurs:

jna/native/callback.c

Lines 316 to 320 in 0a33062

failure_cleanup:
free_callback(env, cb);
if (throw_type) {
throwByName(env, throw_type, throw_msg);
}

And in free_callback:

jna/native/callback.c

Lines 329 to 337 in 0a33062

if (cb->arg_classes) {
unsigned i;
for (i=0;i < cb->cif.nargs;i++) {
if (cb->arg_classes[i]) {
(*env)->DeleteWeakGlobalRef(env, cb->arg_classes[i]);
}
}
free(cb->arg_classes);
}

arg_classes members might be uninitialized, have a non-zero value and are then passed to DeleteWeakGlobalRef.

There might be other places and I modified the most obvious candidates.

The UDP stats test should be accurate per the docs. I guess a weak fallback would be to test that if either errors or no port is positive, then received is positive, e.g.,

Math.min(1, stats.dwNoPorts + stats.dwInErrors) <= stats.dwInDatagrams

Good idea - I'll add it - lets see if it helps.

@dblock @dbwiddis in general I agree, that he COM invocation of the IE starting could be improved, but I stared at this to long already and the biggest part is yet to come (the native parts need to be rebuild again).

I pushed an update.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is there any measurable performance difference between using malloc/calloc?

Code LGTM.

@dblock
Copy link
Member

dblock commented Apr 25, 2024

@matthiasblaesing Thank you for the detailed explanation.

@dblock
Copy link
Member

dblock commented Apr 25, 2024

@matthiasblaesing feel free to merge

@matthiasblaesing
Copy link
Member Author

matthiasblaesing commented May 3, 2024

@Panxuefeng-loongson you offered to rebuild the loongarch64 binaries if necessary. I would appreciate it, if you could rebuild the binaries based on this PR/my stabilize branch. If you intent to build, please use the current head fb7fbe9 2e94624. I bumped the version for the native part to 7.0.2 in that commit, so that it differs from the last rebuild (though that was not officially released).
Thank you for your support!

@Panxuefeng-loongson
Copy link
Contributor

@matthiasblaesing I have already sent it to you via email. If there are any issues, please provide feedback

@matthiasblaesing
Copy link
Member Author

Appveyor is happy with this build (tests that are runnable in CI ran clean after a single restart). github-actions agrees.

@Panxuefeng-loongson kindly provided the binary for linux-loongarch64 (downloadable independently from https://maven.loongnix.cn/loongarch/maven/cn/loongson/jna/native/linux-loongarch64/)

@dbwiddis and @dblock (via email) reviewed.

Thank you all.

@matthiasblaesing matthiasblaesing merged commit f6670c6 into java-native-access:master May 30, 2024
9 checks passed
@matthiasblaesing matthiasblaesing deleted the stabilize branch July 13, 2024 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants