Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create worker to sync basefields #1259

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hminsky2002
Copy link
Contributor

@hminsky2002 hminsky2002 commented Oct 16, 2024

This PR introduces a new worker to the API, which allows an administrator to synchronize BaseField data from a remote service instance to a local one. It heavily follows the patterns already laid out for the bulkUpload task/worker, creating and storing an entity (SyncBaseField) in the database to track the status of a synchronization job. Joyously, this functionality means we no longer need a seed file (see commit 2c8879b).

Closes #756

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 87.15084% with 23 lines in your changes missing coverage. Please review.

Project coverage is 87.31%. Comparing base (d7c27fd) to head (1d849e5).

Files with missing lines Patch % Lines
src/handlers/baseFieldsCopyTasksHandlers.ts 65.85% 13 Missing and 1 partial ⚠️
src/tasks/copyBaseFields.ts 91.22% 5 Missing ⚠️
...e/operations/baseFields/createOrUpdateBaseField.ts 90.00% 1 Missing ⚠️
...ns/baseFieldsCopyTasks/createBaseFieldsCopyTask.ts 88.88% 1 Missing ⚠️
...ions/baseFieldsCopyTasks/loadBaseFieldsCopyTask.ts 88.88% 1 Missing ⚠️
...ns/baseFieldsCopyTasks/updateBaseFieldsCopyTask.ts 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1259      +/-   ##
==========================================
- Coverage   87.34%   87.31%   -0.03%     
==========================================
  Files         185      195      +10     
  Lines        2387     2563     +176     
  Branches      321      343      +22     
==========================================
+ Hits         2085     2238     +153     
- Misses        276      298      +22     
- Partials       26       27       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hminsky2002 hminsky2002 force-pushed the 756-create-worker-to-sync-basefields branch 2 times, most recently from de7e177 to daba11e Compare October 17, 2024 04:30
@hminsky2002 hminsky2002 marked this pull request as ready for review October 17, 2024 04:48
@hminsky2002
Copy link
Contributor Author

hminsky2002 commented Oct 17, 2024

@slifty @jasonaowen finally got this (mostly) tested! I'll try and get to full coverage before merging but I've taken this out of draft status as I believe this is ready for review, with one caveat:

I think the entity name needs to be changed :/ I got very deep into this before considering that. It's not a particularly intuitive name, and it feels awkward to write out in documentation. I figured I'd wait for your reviews before changing. Ready and willing to refactor everything as soon as we decide on a better name for the entity/task.

@hminsky2002 hminsky2002 force-pushed the 756-create-worker-to-sync-basefields branch 2 times, most recently from 1f27e03 to 6254d60 Compare October 17, 2024 05:02
Copy link
Member

@slifty slifty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on naming and a few little things

src/__tests__/syncBaseFields.int.test.ts Outdated Show resolved Hide resolved
src/database/migrations/0040-create-sync_basefields.sql Outdated Show resolved Hide resolved
src/database/migrations/0040-create-sync_basefields.sql Outdated Show resolved Hide resolved
src/database/migrations/0040-create-sync_basefields.sql Outdated Show resolved Hide resolved
Copy link
Contributor

@jasonaowen jasonaowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, @hminsky2002!

src/database/migrations/0040-create-sync_basefields.sql Outdated Show resolved Hide resolved
src/database/migrations/0040-create-sync_basefields.sql Outdated Show resolved Hide resolved
src/tasks/__tests__/processSyncBaseFields.int.test.ts Outdated Show resolved Hide resolved
src/tasks/__tests__/processSyncBaseFields.int.test.ts Outdated Show resolved Hide resolved
@hminsky2002 hminsky2002 force-pushed the 756-create-worker-to-sync-basefields branch from 6254d60 to 9704bee Compare October 25, 2024 02:10
@hminsky2002 hminsky2002 force-pushed the 756-create-worker-to-sync-basefields branch 10 times, most recently from 4076491 to 4b20c24 Compare November 18, 2024 03:47
Copy link
Member

@slifty slifty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress; a fair bit of feedback. If it would help to talk through it let me know.

README.md Outdated Show resolved Hide resolved
src/database/queries/baseFieldsCopyTasks/updateById.sql Outdated Show resolved Hide resolved
src/tasks/__tests__/processBaseFieldsCopyTask.unit.test.ts Outdated Show resolved Hide resolved
return;
}

try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing a test? Codecov is saying that all of these lines are covered... but I'm not seeing any tests that actually run anything.

src/tasks/processBaseFieldsCopyTask.ts Outdated Show resolved Hide resolved
src/tasks/processBaseFieldsCopyTask.ts Outdated Show resolved Hide resolved
}
resolve(parsedData);
} catch (err) {
reject(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will reject cause it to try again? Do we want it to try again if the format is unexpected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that on promise rejection, the copyBasefields function that calls the fetch intercepts the failure and updates the status of the task, which I believe means that it will not get called again (which I think is what we want, so long as it is logged!)

src/types/BaseField.ts Outdated Show resolved Hide resolved
src/types/BaseField.ts Outdated Show resolved Hide resolved
@hminsky2002 hminsky2002 force-pushed the 756-create-worker-to-sync-basefields branch 2 times, most recently from ecb1b38 to a2fd57a Compare December 2, 2024 01:12
@hminsky2002
Copy link
Contributor Author

@slifty I have rebased this PR on to #1346 to address the AddBulkUploadTask refactor, but I think this puppy is ready to go! I am a leeeetle unsure about my implementation of error logging for the fetchBaseFieldsFromRemote function in src/tasks/copyBaseFields, so would appreciate a look through there, but otherwise I believe all issues have been addressed

This commit adds an upsert query and operation for basefields, following
the pattern laid out for basefield localizations. We dont want the
synchronization to be destructive, so we update on shortcode.
This commit creates the new entity 'BaseFieldsCopyTask', along with the
necessary database operations to access and update them. BaseFieldsCopyTasks,
like BulkUploadTasks, are intended to serve as user-accessible entities that
log the status of a basefield copy job between a local and
remote pdc instance.
@hminsky2002 hminsky2002 force-pushed the 756-create-worker-to-sync-basefields branch from 37794a2 to 80bcf81 Compare December 2, 2024 16:08
@hminsky2002 hminsky2002 force-pushed the 756-create-worker-to-sync-basefields branch from 80bcf81 to 1d849e5 Compare December 2, 2024 21:37
Copy link
Member

@slifty slifty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some naming cleanup is still needed -- also some code legibility + processing simplification requests around the actual task logic itself.

The `BaseFieldsCopyTask` worker is designed to copy basefields from a remote to a local instance of the pdc service. An administrative user looking to seed their database can make a POST request to the `/tasks/baseFieldsCopy` route, specifying the remote url in the body of the form:

```json
{ "synchronizationUrl": "https://remote.pdc.instance" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment applies to many places -- synchronizationUrl isn't the right name for this attribute any longer. It's a copy, not a sync, but more importantly I think it's not a clear name even if it was copyUrl. This should actually be pdcApiUrl

@@ -180,7 +180,11 @@ Migrations should be named according to the following pattern: `####-{action}-{t

For example: `0001-create-users` or `0001-modify-users`

In `/src/databases/seeds` there is seed or starter data. The contents can be run manually to help developers get data in their databases. The scripts are not referenced by the software and are included for convenience. The migrations must run prior to using seed scripts.
The `BaseFieldsCopyTask` worker is designed to copy basefields from a remote to a local instance of the pdc service. An administrative user looking to seed their database can make a POST request to the `/tasks/baseFieldsCopy` route, specifying the remote url in the body of the form:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also applies to many places: we made the chance to use the verb form in actual task (which I think is correct!); that should be mirrored here: CopyBaseFieldsTask.

export * from './BulkUploadTask';
export * from './Bundle';
export * from './Changemaker';
export * from './ChangemakerProposal';
export * from './CheckResult';
export * from './CopyBaseFieldsJobPayload';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't your issue / you're matching the existing pattern, but it seems odd to call things Tasks almost everywhere but then Job sometimes. Again, not something to change here but might be a tiny cleanup task (or job?) later.

updateValues: Partial<InternallyWritableBaseFieldsCopyTask>,
): Promise<BaseFieldsCopyTask> => {
const { statusUpdatedAt, status } = updateValues;
const defaultValues = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news: this is no longer how we do it! Now that we know undefined is a valid thing to pass to a query, we can skip passing default values at all.

Also, and this isn't relevant since you're gonna trash this block, fileSize is copypasta / isn't relevant here.

id: number,
updateValues: Partial<InternallyWritableBaseFieldsCopyTask>,
): Promise<BaseFieldsCopyTask> => {
const { statusUpdatedAt, status } = updateValues;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statusUpdatedAt should not be something that is passed to this function, or even passed to SQL, but rather is updated by SQL.

To do this I believe you would use a function and a trigger.

The function should be defined in the same migration that defines the copy_base_fields_tasks table (rather than initialization). Don't necesarily take this literally since it came from GPT, but it would be something like:

CREATE OR REPLACE FUNCTION reconcile_status_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  IF NEW.status IS DISTINCT FROM OLD.status THEN
    NEW.status_updated_at := NOW();
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

And then you would add the trigger:

CREATE TRIGGER update_status_updated_at
BEFORE UPDATE ON your_table_name
FOR EACH ROW
EXECUTE FUNCTION update_status_timestamp();

logger: Logger,
): Promise<BaseField[]> =>
new Promise((resolve, reject) => {
https
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use https for this -- having to handle the pipe directly is madness!

Just use fetch which is in node 20 and is fully stable in node 22. Per spec fetch is already a promise so you don't have to worry about resolve / reject, which means you can collapse all of this into:


// These two lines need to be wrapped in individual try / catches still
const response = await fetch(`${synchronizationUrl}/baseFields`);
const data = await response.json();

// Validate the stuff
// return it

let processBaseFieldsCopyTaskFailed = false;

try {
await updateBaseFieldsCopyTask(baseFieldsCopyTask.id, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this throws an error? That error would be consumed by the fetch's catch.

You should put this update in outside of the try.

Generally speaking, be really concise about what goes in a given try / catch pair. The more you put in, the more complex and thoughtful your catch needs to be.

});
await Promise.all(
Object.entries(baseField.localizations).map(
async (keyAndBaseFieldLocalization) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be more legible as ([language, baseFieldLocalization]) => ... which destructures in the function definition.

Once you do that I think you'll find the inner function can be more concise.


try {
await Promise.all(
remoteBaseFields.map<Promise<void>>(async (baseField) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is pretty dense, let's move it to a named helper function copyBaseField

const copyBaseField = async (targetBaseField:BaseField) => {
   ...
}

);
} catch (err) {
helpers.logger.info('Basefields copy has failed', { err });
processBaseFieldsCopyTaskFailed = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We renamed the context so this variable should be copyBaseFieldsTaskFailed

It could probably be shortened to just taskFailed, since there's only one task context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a worker that synchronizes with an instance of the PDC for base fields
3 participants