C implementation of the Readability algorithm (plus some goodies)

Dependencies:

libxml2
libpcre or ICU for regular expressions support

Building:

make

Building with ICU rather than pcre:

ICU=1 make

By default, both the readable program and the Python extension will be built.

Building for OS X using Xcode

Create a new directory named readable
Copy readable.h and readable.c in the newly created directory
Copy the directory named unicode from the ICU headers into your project (you can get it from the iPhoneSimulator SDK, under /usr/include/unicode)
Add the readable parent directory, the unicode parent directory and /usr/include/libxml2 to Header Search Path under Build Settings
Add libicucore.dylib and libxml2.xylib to the Link Binary with libraries Build Phase
In your code, import readable.h

Building for iOS using Xcode

Create a new directory named readable
Copy readable.h and readable.c in the newly created directory
Add the readable parent directory and /usr/include/libxml2 to Header Search Path under Build Settings
Add libicucore.dylib and libxml2.xylib to the Link Binary with libraries Build Phase
In your code, import readable.h

API:

char * readable(const char html, const char url, const char *encoding, int options)

Parses HTML to extract the interesting contents.

html: HTML code to parse
url: URL where this HTML was fetched from
encoding: HTML encoding
options: See readable.h for the avaialble options

char * next_page_url(const char html, const char url, const char *encoding);

Returns the url for the next page in a multipage article (pretty much in alpha):

html: HTML code to parse
url: URL where this HTML was fetched from
encoding: HTML encoding

License

This code is licensed under the AGPLv3. If you'd like to use the code under a different license, drop me a line to alberto@garciahierro.com

Name	Name	Last commit message	Last commit date
Latest commit fiam Silence errors and warnings from the HTML parser Jan 22, 2013 57dbac3 · Jan 22, 2013 History 10 Commits
tests	tests	Initial public commit for readable	Nov 17, 2011
.gitignore	.gitignore	Add *.pyc to .gitignore	Nov 29, 2012
LICENSE	LICENSE	Initial public commit for readable	Nov 17, 2011
Makefile	Makefile	Fixes for compiling with GCC >= 4.6	Nov 29, 2012
README.md	README.md	Initial public commit for readable	Nov 17, 2011
khash.h	khash.h	Initial public commit for readable	Nov 17, 2011
main.c	main.c	Initial public commit for readable	Nov 17, 2011
py.c	py.c	Initial public commit for readable	Nov 17, 2011
rd_list.c	rd_list.c	Initial public commit for readable	Nov 17, 2011
rd_list.h	rd_list.h	Initial public commit for readable	Nov 17, 2011
readable.c	readable.c	Silence errors and warnings from the HTML parser	Jan 22, 2013
readable.h	readable.h	Initial public commit for readable	Nov 17, 2011
run_test.sh	run_test.sh	Initial public commit for readable	Nov 17, 2011
setup.py	setup.py	Initial public commit for readable	Nov 17, 2011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C implementation of the Readability algorithm (plus some goodies)

Dependencies:

Building:

Building with ICU rather than pcre:

Building for OS X using Xcode

Building for iOS using Xcode

API:

char * readable(const char html, const char url, const char *encoding, int options)

char * next_page_url(const char html, const char url, const char *encoding);

License

About

Releases

Packages

Languages

License

fiam/readable

Folders and files

Latest commit

History

Repository files navigation

C implementation of the Readability algorithm (plus some goodies)

Dependencies:

Building:

Building with ICU rather than pcre:

Building for OS X using Xcode

Building for iOS using Xcode

API:

char * readable(const char *html, const char *url, const char *encoding, int options)

char * next_page_url(const char *html, const char *url, const char *encoding);

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

char * readable(const char html, const char url, const char *encoding, int options)

char * next_page_url(const char html, const char url, const char *encoding);

Packages