-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create worker to sync basefields #1259
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1259 +/- ##
==========================================
- Coverage 87.34% 87.31% -0.03%
==========================================
Files 185 195 +10
Lines 2387 2563 +176
Branches 321 343 +22
==========================================
+ Hits 2085 2238 +153
- Misses 276 298 +22
- Partials 26 27 +1 ☔ View full report in Codecov by Sentry. |
de7e177
to
daba11e
Compare
@slifty @jasonaowen finally got this (mostly) tested! I'll try and get to full coverage before merging but I've taken this out of draft status as I believe this is ready for review, with one caveat: I think the entity name needs to be changed :/ I got very deep into this before considering that. It's not a particularly intuitive name, and it feels awkward to write out in documentation. I figured I'd wait for your reviews before changing. Ready and willing to refactor everything as soon as we decide on a better name for the entity/task. |
1f27e03
to
6254d60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts on naming and a few little things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, @hminsky2002!
6254d60
to
9704bee
Compare
4076491
to
4b20c24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress; a fair bit of feedback. If it would help to talk through it let me know.
return; | ||
} | ||
|
||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing a test? Codecov is saying that all of these lines are covered... but I'm not seeing any tests that actually run anything.
} | ||
resolve(parsedData); | ||
} catch (err) { | ||
reject( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will reject
cause it to try again? Do we want it to try again if the format is unexpected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that on promise rejection, the copyBasefields function that calls the fetch intercepts the failure and updates the status of the task, which I believe means that it will not get called again (which I think is what we want, so long as it is logged!)
ecb1b38
to
a2fd57a
Compare
a2fd57a
to
37794a2
Compare
@slifty I have rebased this PR on to #1346 to address the AddBulkUploadTask refactor, but I think this puppy is ready to go! I am a leeeetle unsure about my implementation of error logging for the |
This commit adds an upsert query and operation for basefields, following the pattern laid out for basefield localizations. We dont want the synchronization to be destructive, so we update on shortcode.
This commit creates the new entity 'BaseFieldsCopyTask', along with the necessary database operations to access and update them. BaseFieldsCopyTasks, like BulkUploadTasks, are intended to serve as user-accessible entities that log the status of a basefield copy job between a local and remote pdc instance.
37794a2
to
80bcf81
Compare
80bcf81
to
1d849e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some naming cleanup is still needed -- also some code legibility + processing simplification requests around the actual task logic itself.
The `BaseFieldsCopyTask` worker is designed to copy basefields from a remote to a local instance of the pdc service. An administrative user looking to seed their database can make a POST request to the `/tasks/baseFieldsCopy` route, specifying the remote url in the body of the form: | ||
|
||
```json | ||
{ "synchronizationUrl": "https://remote.pdc.instance" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment applies to many places -- synchronizationUrl
isn't the right name for this attribute any longer. It's a copy, not a sync, but more importantly I think it's not a clear name even if it was copyUrl
. This should actually be pdcApiUrl
@@ -180,7 +180,11 @@ Migrations should be named according to the following pattern: `####-{action}-{t | |||
|
|||
For example: `0001-create-users` or `0001-modify-users` | |||
|
|||
In `/src/databases/seeds` there is seed or starter data. The contents can be run manually to help developers get data in their databases. The scripts are not referenced by the software and are included for convenience. The migrations must run prior to using seed scripts. | |||
The `BaseFieldsCopyTask` worker is designed to copy basefields from a remote to a local instance of the pdc service. An administrative user looking to seed their database can make a POST request to the `/tasks/baseFieldsCopy` route, specifying the remote url in the body of the form: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also applies to many places: we made the chance to use the verb form in actual task (which I think is correct!); that should be mirrored here: CopyBaseFieldsTask
.
export * from './BulkUploadTask'; | ||
export * from './Bundle'; | ||
export * from './Changemaker'; | ||
export * from './ChangemakerProposal'; | ||
export * from './CheckResult'; | ||
export * from './CopyBaseFieldsJobPayload'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't your issue / you're matching the existing pattern, but it seems odd to call things Tasks almost everywhere but then Job
sometimes. Again, not something to change here but might be a tiny cleanup task (or job?) later.
updateValues: Partial<InternallyWritableBaseFieldsCopyTask>, | ||
): Promise<BaseFieldsCopyTask> => { | ||
const { statusUpdatedAt, status } = updateValues; | ||
const defaultValues = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news: this is no longer how we do it! Now that we know undefined
is a valid thing to pass to a query, we can skip passing default values at all.
Also, and this isn't relevant since you're gonna trash this block, fileSize
is copypasta / isn't relevant here.
id: number, | ||
updateValues: Partial<InternallyWritableBaseFieldsCopyTask>, | ||
): Promise<BaseFieldsCopyTask> => { | ||
const { statusUpdatedAt, status } = updateValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
statusUpdatedAt
should not be something that is passed to this function, or even passed to SQL, but rather is updated by SQL.
To do this I believe you would use a function and a trigger.
The function should be defined in the same migration that defines the copy_base_fields_tasks
table (rather than initialization). Don't necesarily take this literally since it came from GPT, but it would be something like:
CREATE OR REPLACE FUNCTION reconcile_status_updated_at()
RETURNS TRIGGER AS $$
BEGIN
IF NEW.status IS DISTINCT FROM OLD.status THEN
NEW.status_updated_at := NOW();
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
And then you would add the trigger:
CREATE TRIGGER update_status_updated_at
BEFORE UPDATE ON your_table_name
FOR EACH ROW
EXECUTE FUNCTION update_status_timestamp();
logger: Logger, | ||
): Promise<BaseField[]> => | ||
new Promise((resolve, reject) => { | ||
https |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use https
for this -- having to handle the pipe directly is madness!
Just use fetch
which is in node 20 and is fully stable in node 22. Per spec fetch is already a promise so you don't have to worry about resolve / reject, which means you can collapse all of this into:
// These two lines need to be wrapped in individual try / catches still
const response = await fetch(`${synchronizationUrl}/baseFields`);
const data = await response.json();
// Validate the stuff
// return it
let processBaseFieldsCopyTaskFailed = false; | ||
|
||
try { | ||
await updateBaseFieldsCopyTask(baseFieldsCopyTask.id, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this throws an error? That error would be consumed by the fetch's catch.
You should put this update in outside of the try.
Generally speaking, be really concise about what goes in a given try / catch
pair. The more you put in, the more complex and thoughtful your catch needs to be.
}); | ||
await Promise.all( | ||
Object.entries(baseField.localizations).map( | ||
async (keyAndBaseFieldLocalization) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be more legible as ([language, baseFieldLocalization]) => ...
which destructures in the function definition.
Once you do that I think you'll find the inner function can be more concise.
|
||
try { | ||
await Promise.all( | ||
remoteBaseFields.map<Promise<void>>(async (baseField) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is pretty dense, let's move it to a named helper function copyBaseField
const copyBaseField = async (targetBaseField:BaseField) => {
...
}
); | ||
} catch (err) { | ||
helpers.logger.info('Basefields copy has failed', { err }); | ||
processBaseFieldsCopyTaskFailed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We renamed the context so this variable should be copyBaseFieldsTaskFailed
It could probably be shortened to just taskFailed
, since there's only one task context
This PR introduces a new worker to the API, which allows an administrator to synchronize BaseField data from a remote service instance to a local one. It heavily follows the patterns already laid out for the bulkUpload task/worker, creating and storing an entity (
SyncBaseField
) in the database to track the status of a synchronization job. Joyously, this functionality means we no longer need a seed file (see commit 2c8879b).Closes #756