Make DatasetUpdateService respect isBulkLoad() for audit purposes #6791

bbimber · 2025-06-24T22:37:36Z

We have a server that runs an ETL to populate datasets. I noticed the dataset audit log was enormous, despite bulkTrue=true being set. In general, the QUS uses Summary audit logging, when bulkLoad=true. This PR makes DatasetUpdateService respect this behavior for delete and update. It was already doing that for insert and truncate.

I bet it's pretty easy for ETLs to accumulate a lot of audit records in a folder without an admin noticing. Even if other ETL users have not brought this up, it would be easy for it to be happening on many servers.

An alternative strategy would be to make this code inspect configParameters for DetailedAuditLogDataIterator.AuditConfigs.AuditBehavior=NONE or something to that effect. This would give the developer direct control and change existing behavior less. I'm happy to switch to this or another option if you prefer.

labkey-jeckels · 2025-06-28T16:47:32Z

@bbimber I hoped I'd be able to work on this before leaving for vacation but I ran out of time. From an initial glance it seems reasonable. @labkey-klum @jryurkanin @labkey-martyp might have a chance to review before I return.

labkey-jeckels · 2025-07-10T23:37:28Z

@bbimber I'm back from vacation and catching up. This change looks good to me. I'd like to get some test coverage for both the bulk and non-bulk scenarios. While the test ETL XML files are in the testAutomation repo, the tests themselves are in a private repo so I don't think it would be easy to ask you to add coverage. I'll get this into the queue.

bbimber · 2025-07-10T23:41:04Z

OK. I'm happy to write a simple integration test using the vehicles schema. Etl isn't really needed here

labkey-jeckels · 2025-07-10T23:45:07Z

OK. I'm happy to write a simple integration test using the vehicles schema. Etl isn't really needed here

I'm certainly not against adding more coverage on the vehicles schema, but these changes are all dataset-specific so we should be sure to get coverage against dataset tables and bulkLoad is an ETL concept. We have some existing ETL tests that set up a study so that's a logical place to build from.

bbimber · 2025-07-10T23:51:38Z

Yes, forgot that. Vehicles no then. I'm just trying to minimize the burden placed on labkey here.

It probably still would be reasonable to try to create a study in Java code in the integration test (though I haven't actually tried this recently).

Anyway, I'm willing to give something a try if it reduces the labkey burden, but if you're going to create the dataintegraiton test no matter what, I probably would not be redundant. Just let me know which route you want.

bbimber · 2025-07-11T01:17:57Z

hi @labkey-jeckels: i got back to my computer and looked at the code. there is already an existing integration test under DatasetUpdateService. It should be quite easy to piggyback off this, so i will try to write something simple tomorrow to exercise this.

bbimber · 2025-07-11T13:55:26Z

good morning @labkey-jeckels: it seems like the integration test I added passed. can you please let me know if you think that's sufficient here?

labkey-jeckels

Thanks for the tests. I pushed a couple of small changes, to use constants and just skip the audit handler completely for bulk loads (similar to insert and update). Please double-check that they seem reasonable and see my question about whether the second insert is improving the coverage.

bbimber · 2025-07-11T18:52:20Z

Thanks for the tests. I pushed a couple of small changes, to use constants and just skip the audit handler completely for bulk loads (similar to insert and update). Please double-check that they seem reasonable and see my question about whether the second insert is improving the coverage.

yes, i saw those come through. this all seems reasonable to me. certainly using constants is better than not.

Sorry - what question are you referring to? You mention 'second insert', so do you mean ~line 927 of the test? That code inserts 2 rows, check for 2 audit rows, and then inserts one 1, checking for one new audit row. If that's what you mean then I agree the second single-row insert doesnt add a heck of a lot, but it also doesnt seem that negative either.

Make DatasetUpdateService respect isBulkLoad() for audit purposes

e307e92

bbimber requested a review from labkey-jeckels June 24, 2025 22:37

Add test case

ee0db60

labkey-jeckels self-assigned this Jul 11, 2025

labkey-jeckels added 2 commits July 11, 2025 11:33

Use constants

ba5c3ab

No need to register handler for bulk loads

3983821

labkey-jeckels approved these changes Jul 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make DatasetUpdateService respect isBulkLoad() for audit purposes #6791

Make DatasetUpdateService respect isBulkLoad() for audit purposes #6791

Uh oh!

bbimber commented Jun 24, 2025

Uh oh!

labkey-jeckels commented Jun 28, 2025

Uh oh!

labkey-jeckels commented Jul 10, 2025

Uh oh!

bbimber commented Jul 10, 2025

Uh oh!

labkey-jeckels commented Jul 10, 2025

Uh oh!

bbimber commented Jul 10, 2025

Uh oh!

bbimber commented Jul 11, 2025

Uh oh!

bbimber commented Jul 11, 2025

Uh oh!

labkey-jeckels left a comment

Uh oh!

bbimber commented Jul 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Make DatasetUpdateService respect isBulkLoad() for audit purposes #6791

Are you sure you want to change the base?

Make DatasetUpdateService respect isBulkLoad() for audit purposes #6791

Uh oh!

Conversation

bbimber commented Jun 24, 2025

Uh oh!

labkey-jeckels commented Jun 28, 2025

Uh oh!

labkey-jeckels commented Jul 10, 2025

Uh oh!

bbimber commented Jul 10, 2025

Uh oh!

labkey-jeckels commented Jul 10, 2025

Uh oh!

bbimber commented Jul 10, 2025

Uh oh!

bbimber commented Jul 11, 2025

Uh oh!

bbimber commented Jul 11, 2025

Uh oh!

labkey-jeckels left a comment

Choose a reason for hiding this comment

Uh oh!

bbimber commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bbimber commented Jul 11, 2025 •

edited

Loading