-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pages
and requests
schemas on staging to match crawl dataset
#18
Comments
all
dataset schema.pages
and requests
schemas written by agent to BQ to a new one
The transformation of @pmeenan let's update the wptagent to match |
HTTPArchive/dataform#33 to sync the pipeline with the adjustments. |
pages
and requests
schemas written by agent to BQ to a new onepages
and requests
schemas on staging to match crawl dataset
I hope it's not gonna be needed, BUT in case there are some issues with ingesting data to JSON columns we can fallback to parsing STRING data within the pipeline as we do currently. |
@pmeenan from data in crawl_staging it looks like
and
True, or are these rows outdated? |
Some of these fields were removed in #15. |
Were they present in all the records? The first few were before I removed
the fields but the last 1-2 should have most of them removed.
…On Thu, Dec 5, 2024 at 7:41 PM Max Ostapenko ***@***.***> wrote:
Some of these fields were removed in #15
<#15>.
How should we proceed?
—
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADMOBM2XNCNCONIR4U25AL2EDXDFAVCNFSM6AAAAABP4RC5H6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRRHAZTGNBRGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yeah, if sorted by |
The older data schema is being reprocessed using these queries:
After we promote these new schemas to be the new default we need to update agent processing.
We should be able to just do
SELECT *
when copying data fromcrawl_staging
tocrawl
in crawl_complete pipeline.The text was updated successfully, but these errors were encountered: