Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking ByteDance for Hyper-Agressive / Malicious ByteSpider bot #48

Open
kkarhan opened this issue Dec 19, 2023 · 3 comments
Open

Blocking ByteDance for Hyper-Agressive / Malicious ByteSpider bot #48

kkarhan opened this issue Dec 19, 2023 · 3 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed request for addition request adding an entry to a list WIP Work In Progress

Comments

@kkarhan
Copy link
Contributor

kkarhan commented Dec 19, 2023

Apparently ByteDance is creating giant amounts of traffic with their ByteSpider crawler.

Whilst crawlers on their own ain't a problem per-se, the way it completely disregards the robots.txt file - unlike any bona-fide crawler - makes it basically malicious and should be considered as a DDoS attack.

Unlike the Internet Archive which also doesn't honour the robots.txt file it's behaviour is not one that could be considered 'negligible interference' as the Internet Archive mostly manually archives sites based off user requests to do so, but it basically siphons absurd amounts of data from it.

Sadly, ClownFlare aka. CloudFlare and AWS are somewhat complicit, so blocking the UserAgent bytespider server-sided is a must as well, similar to blocking GPTbot, and whilst recent reports indicate that ByteSpider now honors robots.txt I'd not count on that being true...

But adding the used AS138699 and AS396986's allocations to the ASN, IPv4 & IPv6 blocklists should be considered...

Needless to say hyper-agressive bots are a problem that needs resolution.

@kkarhan kkarhan added good first issue Good for newcomers request for addition request adding an entry to a list WIP Work In Progress labels Dec 19, 2023
@kkarhan kkarhan self-assigned this Dec 19, 2023
@kkarhan
Copy link
Contributor Author

kkarhan commented Dec 19, 2023

I guess adding an .htaccess file that bans these bots among others may be useful:

Mb2345Browser
LieBaoFast
zh-CN
MicroMessenger
zh_CN
Kinza
OPPO A33
Aspeigel
PetalBot
Baiduspider
Sogou web spider
Yisouspider
Linespider
Yeti
coccocbot
Mobile/11A465

Note that even some western companies do that shite:

facebookexternalhit
Twitterbot
Applebot
Qwantify

@kkarhan
Copy link
Contributor Author

kkarhan commented Dec 19, 2023

Also apparently Huawei Cloud Hongkong is a prime source of said crawlers...

Blocking AS136907 and 159.138.128.0/19 114.119.160.0/21 and 114.119.176.0/20 should also help.

@kkarhan kkarhan added the help wanted Extra attention is needed label Dec 19, 2023
@kkarhan
Copy link
Contributor Author

kkarhan commented Dec 19, 2023

This sadly is still ongoing

110.240.0.0/12
111.224.0.0/14
36.110.162.63

Someone may be interested in using 7G "Firewall" or nG-SetEnvIf instead...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed request for addition request adding an entry to a list WIP Work In Progress
Projects
None yet
Development

No branches or pull requests

1 participant