-
Notifications
You must be signed in to change notification settings - Fork 7
Make DatasetUpdateService respect isBulkLoad() for audit purposes #6791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release25.3-SNAPSHOT
Are you sure you want to change the base?
Conversation
@bbimber I hoped I'd be able to work on this before leaving for vacation but I ran out of time. From an initial glance it seems reasonable. @labkey-klum @jryurkanin @labkey-martyp might have a chance to review before I return. |
@bbimber I'm back from vacation and catching up. This change looks good to me. I'd like to get some test coverage for both the bulk and non-bulk scenarios. While the test ETL XML files are in the |
OK. I'm happy to write a simple integration test using the vehicles schema. Etl isn't really needed here |
I'm certainly not against adding more coverage on the vehicles schema, but these changes are all dataset-specific so we should be sure to get coverage against dataset tables and bulkLoad is an ETL concept. We have some existing ETL tests that set up a study so that's a logical place to build from. |
Yes, forgot that. Vehicles no then. I'm just trying to minimize the burden placed on labkey here. It probably still would be reasonable to try to create a study in Java code in the integration test (though I haven't actually tried this recently). Anyway, I'm willing to give something a try if it reduces the labkey burden, but if you're going to create the dataintegraiton test no matter what, I probably would not be redundant. Just let me know which route you want. |
hi @labkey-jeckels: i got back to my computer and looked at the code. there is already an existing integration test under DatasetUpdateService. It should be quite easy to piggyback off this, so i will try to write something simple tomorrow to exercise this. |
good morning @labkey-jeckels: it seems like the integration test I added passed. can you please let me know if you think that's sufficient here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests. I pushed a couple of small changes, to use constants and just skip the audit handler completely for bulk loads (similar to insert and update). Please double-check that they seem reasonable and see my question about whether the second insert is improving the coverage.
yes, i saw those come through. this all seems reasonable to me. certainly using constants is better than not. Sorry - what question are you referring to? You mention 'second insert', so do you mean ~line 927 of the test? That code inserts 2 rows, check for 2 audit rows, and then inserts one 1, checking for one new audit row. If that's what you mean then I agree the second single-row insert doesnt add a heck of a lot, but it also doesnt seem that negative either. |
We have a server that runs an ETL to populate datasets. I noticed the dataset audit log was enormous, despite bulkTrue=true being set. In general, the QUS uses Summary audit logging, when bulkLoad=true. This PR makes DatasetUpdateService respect this behavior for delete and update. It was already doing that for insert and truncate.
I bet it's pretty easy for ETLs to accumulate a lot of audit records in a folder without an admin noticing. Even if other ETL users have not brought this up, it would be easy for it to be happening on many servers.
An alternative strategy would be to make this code inspect configParameters for DetailedAuditLogDataIterator.AuditConfigs.AuditBehavior=NONE or something to that effect. This would give the developer direct control and change existing behavior less. I'm happy to switch to this or another option if you prefer.