Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set crawl time for each URL in a list of URLS #1181

Open
ayushkr12 opened this issue Feb 15, 2025 · 3 comments
Open

set crawl time for each URL in a list of URLS #1181

ayushkr12 opened this issue Feb 15, 2025 · 3 comments
Labels
Type: Enhancement Most issues will probably ask for additions or changes.

Comments

@ayushkr12
Copy link

ayushkr12 commented Feb 15, 2025

P.S. - I originally asked this question in discussion but didn't get any response so am assuming katana doesn't have this feature implemented yet

i have a file named sites.txt like this

https://google.com
https://example.com

Using katana i want to scan them however using the default -l flag won't work for me as i need to set crawl duration for each URL using -ct flag . I tried the following:

cat sites.txt | while read URL; do katana -u $URL -ct 10s; done

Ouput

$ cat foobar | while read URL; do katana -u $URL -ct 10s; done

   __        __                
  / /_____ _/ /____ ____  ___ _
 /  '_/ _  / __/ _  / _ \/ _  /
/_/\_\\_,_/\__/\_,_/_//_/\_,_/							 

		projectdiscovery.io

[INF] Current katana version v1.1.0 (latest)
[INF] Started standard crawling for => https://example.com
[INF] Started standard crawling for => https://google.com

As you can see from above ouput, katana starts to crawl both the sites immediately instead of waiting for first to finish or setting any rules this can cause many issues such as:

  1. If there are multiple sites in the file say 1000. my system will crash immediatly since katana starts crawling all of them without any rules.
  2. I checked whether my above approach give me desired result i.e. "crawl maximum for 10s for each site" but it didn't work. see output below
$ time cat sites.txt | while read URL; do katana -u $URL -ct 10s; done

   __        __                
  / /_____ _/ /____ ____  ___ _
 /  '_/ _  / __/ _  / _ \/ _  /
/_/\_\\_,_/\__/\_,_/_//_/\_,_/							 

		projectdiscovery.io

[INF] Started standard crawling for => https://google.com
[INF] Started standard crawling for => https://example.com
https://google.com
https://example.com

real	0m15.171s
user	0m0.089s
sys	0m0.027s

As you can see it took only approx 15s to execute the command but it must be > 20s. since i set 10s + 10s for both sites. so this approach didn't work.

I would be grateful if anyone could bless me with a fix for this.

@ayushkr12 ayushkr12 added the Type: Enhancement Most issues will probably ask for additions or changes. label Feb 15, 2025
@GeorginaReeder
Copy link

Thanks for your feature request @ayushkr12 , we'll take a look into this!

@ehsandeep
Copy link
Member

@ayushkr12 if katana doesn't find anything on example.com, it will simply skip and move to next URL even if you specify custom time with -ct 10s or any time.

@ayushkr12
Copy link
Author

@ayushkr12 if katana doesn't find anything on example.com, it will simply skip and move to next URL even if you specify custom time with -ct 10s or any time.

Yes but the behaviour is absolute with every domain not just example.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement Most issues will probably ask for additions or changes.
Projects
None yet
Development

No branches or pull requests

3 participants