GitHub - zebbern/regex-crawler: Regex Web Crawler that searches on custom regexes meanwhile crawling each site to find the information your looking for!

Regex Web Crawler

An advanced web crawler built for bug bounty hunters!

Tool recursively crawls a target website, performs regex-based content searches, and saves results in structured YAML files.

Includes optional security analysis for reconnaissance.

`Features:`

Validate URLs before crawling to prevent errors.

Extract all internal links recursively up to a specified depth.

Perform regex-based searches on each page's content using a user-defined regex list.

Optionally enable advanced security checks such as scanning HTTP headers and HTML comments for potential leaks.

Store all crawled URLs and results in structured YAML format for easy analysis.

How To Run

Step 1: Configure the config.yaml file to set up the target URL and crawling options.
Step 2: Run the Python script and let it crawl the target website while extracting valuable information.
Step 3: Review the structured results saved in results.yaml.

Requirements:

requests
beautifulsoup4
pyyaml

Install the required dependencies with:

pip install -r requirements.txt

Usage:

Set up your configuration in config.yaml:

base_url: "https://example.com"
crawl_depth: 1
advanced: true
regex_file: "regex_patterns.txt"
output_file: "results.yaml"

Create or edit your regex patterns in regex_patterns.txt (one per line):

(?i)password\s*[:=]\s*['"][^'"]+['"]
(?i)secret\s*[:=]\s*['"][^'"]+['"]

Run the script:
```
python para.py
```

Contribute:

Feel free to suggest improvements or contribute by visiting https://github.com/zebbern/regex-crawler.

Warning

This tool is intended for ethical hacking and bug bounty purposes only. Unauthorized scanning of third-party websites is illegal and unethical. Always obtain explicit permission before testing any target.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
para.py		para.py
regex_patterns.txt		regex_patterns.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`Features:`

Requirements:

Usage:

Contribute:

About

Releases

Packages

Languages

License

zebbern/regex-crawler

Folders and files

Latest commit

History

Repository files navigation

Features:

Requirements:

Usage:

Contribute:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Features:`

Packages