You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi.
For the two pages you mention, Javascript is not the problem.
It's the semantic content of the two pages, it's been defined to repeat similar contents in a row.
At this point, there's not much Reader can do about it.
Maybe try x-return-format: pageshot to get a graphical screenshot of the page, then present the screenshot to LLMs with the text content.
This is still not working for me.
Is there any other guidance on scraping the contents out of these types of webpages(which are heavy JS based or used heavy AJAX calls)
Any tips on crawling below websites?
https://www.samsung.com/us/smartphones/galaxy-s25-ultra/compare/
https://www.apple.com/macbook-pro/compare/
So far, I'm only able to get gibberish content in the jina reader API response
The text was updated successfully, but these errors were encountered: