📜 add history to JSON objects #53

jadudm · 2024-12-28T22:08:09Z

As objects pass through the queued pipeline, we should build up a list of processes that touched the object.

For example, when fetch grabs a page, we create both a .raw and .json object in S3. The JSON object should have a field, history (or similar) that contains a list of values. To start, it would contain fetch. After we walk the page, we should have fetch,walk. After extraction, fetch,walk,extract, and so on.

validate can then tell whether or not objects have gone through the entire pipeline.

Part of this, however, involves us knowing when a process is "done."

for thought

do we want this with the object, or in a work database? Having it decoupled could be more trouble, but it would give us the ability to analyze more quickly/easily than retrieving all of the objects to inspect them.

The text was updated successfully, but these errors were encountered:

jadudm added this to jemison Dec 28, 2024

jadudm converted this from a draft issue Dec 28, 2024

jadudm changed the title ~~add history to JSON objects~~ 📜 add history to JSON objects Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📜 add history to JSON objects #53

📜 add history to JSON objects #53

jadudm commented Dec 28, 2024 •

edited

Loading

📜 add history to JSON objects #53

📜 add history to JSON objects #53

Comments

jadudm commented Dec 28, 2024 • edited Loading

for thought

jadudm commented Dec 28, 2024 •

edited

Loading