Unable to crawl heavy Javascript based website #1148

deathofabat · 2025-02-20T02:59:46Z

Any tips on crawling below websites?

https://www.samsung.com/us/smartphones/galaxy-s25-ultra/compare/
https://www.apple.com/macbook-pro/compare/

So far, I'm only able to get gibberish content in the jina reader API response

nomagick · 2025-02-24T04:00:10Z

Hi.
For the two pages you mention, Javascript is not the problem.
It's the semantic content of the two pages, it's been defined to repeat similar contents in a row.

At this point, there's not much Reader can do about it.
Maybe try x-return-format: pageshot to get a graphical screenshot of the page, then present the screenshot to LLMs with the text content.

deathofabat · 2025-03-18T15:39:42Z

This is still not working for me.
Is there any other guidance on scraping the contents out of these types of webpages(which are heavy JS based or used heavy AJAX calls)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to crawl heavy Javascript based website #1148

Unable to crawl heavy Javascript based website #1148

deathofabat commented Feb 20, 2025

nomagick commented Feb 24, 2025

deathofabat commented Mar 18, 2025

Unable to crawl heavy Javascript based website #1148

Unable to crawl heavy Javascript based website #1148

Comments

deathofabat commented Feb 20, 2025

nomagick commented Feb 24, 2025

deathofabat commented Mar 18, 2025