Skip to content

Commit 8d55134

Browse files
committedFeb 17, 2015
Initial Push
1 parent 25fbb6d commit 8d55134

File tree

3 files changed

+168
-0
lines changed

3 files changed

+168
-0
lines changed
 

‎brutescrape.banner

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
====================================================================================
2+
__________ __ _________
3+
\______ \_______ __ ___/ |_ ____ / _____/ ________________ ______ ____
4+
| | _/\_ __ \ | \ __\/ __ \ \_____ \_/ ___\_ __ \__ \ \____ \_/ __ \
5+
| | \ | | \/ | /| | \ ___/ / \ \___| | \// __ \| |_> > ___/
6+
|______ / |__| |____/ |__| \___ >_______ /\___ >__| (____ / __/ \___ >
7+
\/ \/ \/ \/ \/|__| \/
8+
9+
Brutescrape | A web scraper for generating password files based on plain text found
10+
in specific web pages.
11+
Written by Peter Kim <Author, The Hacker Playbook>
12+
<CEO, Secure Planet LLC>
13+
14+
Usage | python brutescrape.py
15+
====================================================================================

‎brutescrape.py

+104
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
#!/usr/bin/python
2+
3+
#Secure Planet LLC
4+
5+
import urllib2
6+
import re
7+
import re
8+
import os, sys
9+
10+
from collections import OrderedDict
11+
12+
banner_file = "brutescrape.banner"
13+
def banner():
14+
global banner_file
15+
open_banner = open(banner_file, "r")
16+
for line in open_banner:
17+
print line.rstrip()
18+
open_banner.close()
19+
20+
def stripHTMLTags (html):
21+
#Strip HTML tags from any string and transfrom special entities.
22+
text = html
23+
24+
#Apply rules in given order.
25+
rules = [
26+
{ r'>\s+' : u'>'}, # Remove spaces after a tag opens or closes.
27+
{ r'\s+' : u' '}, # Replace consecutive spaces.
28+
{ r'\s*<br\s*/?>\s*' : u'\n'}, # Newline after a <br>.
29+
{ r'</(div)\s*>\s*' : u'\n'}, # Newline after </p> and </div> and <h1/>.
30+
{ r'</(p|h\d)\s*>\s*' : u'\n\n'}, # Newline after </p> and </div> and <h1/>.
31+
{ r'<head>.*<\s*(/head|body)[^>]*>' : u'' }, # Remove <head> to </head>.
32+
{ r'<a\s+href="([^"]+)"[^>]*>.*</a>' : r'\1' }, # Show links instead of texts.
33+
{ r'[ \t]*<[^<]*?/?>' : u'' }, # Remove remaining tags.
34+
{ r'^\s+' : u'' } # Remove spaces at the beginning.
35+
]
36+
37+
for rule in rules:
38+
for (k,v) in rule.items():
39+
try:
40+
regex = re.compile (k)
41+
text = regex.sub (v, text)
42+
except:
43+
pass #Pass up whatever we don't find.
44+
45+
#Replace special strings.
46+
special = {
47+
'&nbsp;' : ' ', '&amp;' : '&', '&quot;' : '"',
48+
'&lt;' : '<', '&gt;' : '>'
49+
}
50+
51+
for (k,v) in special.items():
52+
text = text.replace (k, v)
53+
54+
return text
55+
56+
banner()
57+
#Create an empty list for generation logic.
58+
y_arr = []
59+
60+
try:
61+
file_list = open('sites.scrape','r')
62+
sites = file_list.read().split(',')
63+
64+
except:
65+
banner()
66+
sys.exit()
67+
68+
for site in sites:
69+
try:
70+
site = site.strip()
71+
print "[*] Downloading Content For : " + site
72+
x_arr = []
73+
response = urllib2.urlopen(site)
74+
x = stripHTMLTags(response.read())
75+
#Replace junk found in our response
76+
x = x.replace('\n',' ')
77+
x = x.replace(',',' ')
78+
x = x.replace('.',' ')
79+
x = x.replace('/',' ')
80+
x = re.sub('[^A-Za-z0-9]+', ' ', x)
81+
x_arr = x.split(' ')
82+
for y in x_arr:
83+
y = y.strip()
84+
if y and (len(y) > 4):
85+
if ((y[0] == '2') and (y[1] == 'F')) or ((y[0] == '2') and (y[1] == '3')) or ((y[0] == '3') and (y[1] == 'F')) or ((y[0] == '3') and (y[1] == 'D')):
86+
y = y[2:]
87+
y_arr.append(y)
88+
except:
89+
pass
90+
91+
y_arr_unique = OrderedDict.fromkeys(y_arr).keys()
92+
print "[*] Processing List"
93+
f_write = open("passwordList.txt","w")
94+
for yy in y_arr_unique:
95+
if yy.strip().isdigit():
96+
pass
97+
else:
98+
#print yy.strip()
99+
f_write.write(yy.strip() + "\n")
100+
f_write.close()
101+
print "[*] Wordlist Generation Complete."
102+
print "[*] Output Located: passwordList.txt"
103+
print "[*] Total Count of Passwords >> " + str(len(y_arr_unique))
104+

‎readme.txt

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
====================================================================================
2+
__________ __ _________
3+
\______ \_______ __ ___/ |_ ____ / _____/ ________________ ______ ____
4+
| | _/\_ __ \ | \ __\/ __ \ \_____ \_/ ___\_ __ \__ \ \____ \_/ __ \
5+
| | \ | | \/ | /| | \ ___/ / \ \___| | \// __ \| |_> > ___/
6+
|______ / |__| |____/ |__| \___ >_______ /\___ >__| (____ / __/ \___ >
7+
\/ \/ \/ \/ \/|__| \/
8+
9+
Brutescrape | A web scraper for generating password files based on plain text found
10+
in specific web pages.
11+
Written by Peter Kim <Author, The Hacker Playbook>
12+
<CEO, Secure Planet LLC>
13+
14+
Usage | python brutescrape.py
15+
====================================================================================
16+
17+
< About >
18+
19+
Brutescrape is a tool designed to parse out text from specific web pages and generate password lists for bruteforcing with this text.
20+
The main idea in mind was to be able to create password lists that were specific to an organization. This way, the user will then have
21+
a password list that contains keywords specific to the target entity, which provides a better chance at recovering credentials used
22+
within said entity. Furthermore, the use of rule files found within the users favorite password cracking tool could essentially increase
23+
the chances of recovering plain text passwords from an organization.
24+
25+
E.X >> The user is performing a penetration test against HackMe, Inc. The user knows the HackMe company has a website http://www.hackme.com/, and
26+
uses BruteScrape against this site. The user now has a password file created specifically from parsing text within HackMe's website. The user
27+
then uses this wordlist against hashes they had found during a phase of the pentest. The user then decides to use this wordlist against his list
28+
of hashes within oclHashcat, and recovers the plain text of a hash: "hackme".
29+
30+
In this example, the user found a very weak password, but cases such as these would be very rare, as organizations usually have password policies
31+
in place. The use of rules files would probably be more viable in recovering these plain text hash values, and so the user attempts to crack the
32+
hashes again, this time using a rule file that will append 4 digits from 0000 - 9999 at the end of every word in his list.
33+
34+
Ah! More hashes are found: "hackme4331,hackme9901". How about a rule to change every word to leet speak?
35+
36+
More hashes found: "h4ckm3, h4ckm3,inc.P455". And so on and so forth.
37+
38+
< Usage >
39+
40+
Using the script is simple. The target webpage(s) should be listed in your "sites.scrape" file like so-
41+
42+
http://www.site.com,http://www.site2.com,http://www.site3.com/index.php,http://www.site4.com/admin
43+
44+
Then run the script-
45+
46+
python brutescrape.py
47+
48+
And that's it. The target sites defined in your "sites.scrape" file will be parsed through and the parsed words will be written to a file
49+
named "passwordList.txt".

0 commit comments

Comments
 (0)
Please sign in to comment.