Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Jina.ai scrape the websites anonymously or non-anonymously? #1150

Open
deathofabat opened this issue Feb 24, 2025 · 1 comment
Open

Comments

@deathofabat
Copy link

Hi Team,

Wanted to understand if Jina.ai scrape the websites anonymously or non-anonymously for a use-case for my company.
What if we have legal approval from website owners to scrape their websites, does in that case does Jina.ai announces who is it scraping on behalf of?

@nomagick
Copy link
Member

Hi @deathofabat.

Reader scrapes the website using a headless Chrome browser, and with a respective Chrome browser UA.

You can customize this UA, though, using x-user-agent header.
In addition to this, we recently added an option x-robots-txt to check-and-fail robots.txt of websites, ensuring scraping of the page is not explicitly prohibited by the site owner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants