Skip to content

Regex Web Crawler that searches on custom regexes meanwhile crawling each site to find the information your looking for!

License

Notifications You must be signed in to change notification settings

zebbern/regex-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regex Web Crawler

Python Status License

An advanced web crawler built for bug bounty hunters!

Tool recursively crawls a target website, performs regex-based content searches, and saves results in structured YAML files.

Includes optional security analysis for reconnaissance.


Features:

Validate URLs before crawling to prevent errors.

Extract all internal links recursively up to a specified depth.

Perform regex-based searches on each page's content using a user-defined regex list.

Optionally enable advanced security checks such as scanning HTTP headers and HTML comments for potential leaks.

Store all crawled URLs and results in structured YAML format for easy analysis.


How To Run

Step 1: Configure the config.yaml file to set up the target URL and crawling options.
Step 2: Run the Python script and let it crawl the target website while extracting valuable information.
Step 3: Review the structured results saved in results.yaml.

Requirements:

requests
beautifulsoup4
pyyaml

Install the required dependencies with:

pip install -r requirements.txt

Usage:

  1. Set up your configuration in config.yaml:
    base_url: "https://example.com"
    crawl_depth: 1
    advanced: true
    regex_file: "regex_patterns.txt"
    output_file: "results.yaml"
  2. Create or edit your regex patterns in regex_patterns.txt (one per line):
    (?i)password\s*[:=]\s*['"][^'"]+['"]
    (?i)secret\s*[:=]\s*['"][^'"]+['"]
  3. Run the script:
    python para.py

Contribute:

Feel free to suggest improvements or contribute by visiting https://github.com/zebbern/regex-crawler.


Warning

This tool is intended for ethical hacking and bug bounty purposes only. Unauthorized scanning of third-party websites is illegal and unethical. Always obtain explicit permission before testing any target.

Releases

No releases published

Packages

No packages published

Languages