Skip to content

This is a Project w/ Python & library lxml to get info from web and use it

Notifications You must be signed in to change notification settings

arturoaviles/Crawl_Web_Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Crawl_Web_Python

This is a Project w/ Python & library lxml to get info from web and use it

How to put hands to work!

1. Install Python

Enter to python.org and install python lastest version

2. Install pip

sudo easy_install pip
sudo easy_install pip3

3. Update pip

pip install --upgrade pip

4. Install lxml

**Normally:**
	pip install lxml

**Another way:**
	sudo CPATH=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/usr/include/libxml2 CFLAGS=-Qunused-arguments CPPFLAGS=-Qunused-arguments pip install lxml

5. Install Requests

pip install requests

6. See if you can import those libraries

python
python3

>>>from lxml import html
>>>import requests
>>>quit()

7. Find a website you like

Example: (https://itunes.apple.com/us/app/angry-birds-2/id880047117?mt=8)

How to Use it:

1. Download the file .py3
2. Open Terminal
3. Navigate to the file folder
4. Write:
		python3 file.py3 or python file.py *See Considerations
5. Hit Enter

If it did'nt deliver anything or deliver strange things you should edit the paths:

class AppCrawler
	def get_info_from_link
	name = tree.xpath('//PATH')[0]
	developer = tree.xpath('//PATH')[0]
	price = tree.xpath('//PATH')[0]
	links = tree.xpath('//PATH')

If you want to use an specific URL you can change it in the last lines.

Considerations:

1. If you are using python 2.* in 
class App 
	def __str__(self) you should put:

	return ("Name: " + self.name.encode('UTF-8') + 
    	"\r\nDeveloper: " + self.developer.encode('UTF-8') + 
    	"\r\nPrice: " + self.price.encode('UTF-8') + "\r\n")

More ways to crawl:

  • Beautiful Soup
  • Scrappy

About

This is a Project w/ Python & library lxml to get info from web and use it

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published