We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanks for creating such a great project!
I ran into a bug parsing microdata content where itemprop contained multiple properties, like in these examples and thought I'd share what I ran into:
itemprop
<meta data-rh="true" property="article:published" itemprop="datePublished dateCreated" content="2019-07-21T09:00:06.000Z"/>
<span itemProp="publisher copyrightHolder provider sourceOrganization" itemscope="" itemType="http://schema.org/NewsMediaOrganization" itemID="https://www.nytimes.com">
<figure itemprop="associatedMedia image" itemscope itemtype="http://schema.org/ImageObject" data-component="image" class="element element-image img--landscape fig--narrow-caption fig--has-shares " data-media-id="f82028d62b1edd7417d7d3773c4abf0d4fa86174" id="img-3"> <meta itemprop="url" content="https://i.guim.co.uk/img/media/f82028d62b1edd7417d7d3773c4abf0d4fa86174/0_272_6435_3861/master/6435.jpg?width=700&quality=85&auto=format&fit=max&s=016df6a3f33eabe3cbca39eb389a60fb"> </figure>
Markup like this is parsed correctly in Google's Structured Data Testing Tool, but web-auto-extractor does not currently split input based on spaces.
web-auto-extractor
I resolved this in a project which uses web-auto-extractor by doing this:
const __transformStructuredData = (structuredData) => { let result = structuredData Object.keys(result.microdata).forEach(schema => { result.microdata[schema].forEach(object => { Object.keys(object).forEach(key => { if (key.includes(' ')) { key.split(' ').forEach(newKey => { object[newKey] = object[key] }) delete object[key] } }) }) }) return result }
I'm aware there are some other PRs related to handling whitespace trimming open.
If an enhancement like this appeals I'd be happy to raise a PR.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Thanks for creating such a great project!
I ran into a bug parsing microdata content where
itemprop
contained multiple properties, like in these examples and thought I'd share what I ran into:Markup like this is parsed correctly in Google's Structured Data Testing Tool, but
web-auto-extractor
does not currently split input based on spaces.I resolved this in a project which uses
web-auto-extractor
by doing this:I'm aware there are some other PRs related to handling whitespace trimming open.
If an enhancement like this appeals I'd be happy to raise a PR.
The text was updated successfully, but these errors were encountered: