Skip to content

CodeBit: A full-featured, reliable HTML parser for .NET that implements the XmlReader interface.

License

Notifications You must be signed in to change notification settings

FileMeta/HtmlReader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HtmlReader

HtmlReader is a simple but full-featured HTML parser that implements the .NET XmlReader interface. This allows a programmer to use the rich XML features in .NET on HTML documents.

The software is distributed as a CodeBit located here.

This project include the master copy of HtmlReader.cs plus a set of unit tests that may also be examined as sample code.

Potential Applications for HtmlReader

  • Translate arbitrary HTML into well-formatted and indented XHTML.
  • Automated HTML processing such as templated content, link processing, and so forth.
  • Check HTML for adherence to practices such as WCAG compliance.
  • Screen-scraping websites.
  • Automated reprocessing of HTML.

HtmlReader follows the HTML5 parsing rules but tolerates malformed HTML whenever possible. In this, it's similar to the parsers built into web browsers. Future enhancements may include configurable tolerance and reporting of syntax errors.

Sample Use

Here's an example of loading HTML into a .NET XmlDocument:

XmlDocument doc = new XmlDocument();
HtmlReaderSettings settings = new HtmlReaderSettings();
settings.CloseInput = true;
using (HtmlReader reader = new HtmlReader(new StreamReader("sample.htm", Encoding.UTF8, true), settings))
{
  doc.Load(reader);
}

About CodeBits

A CodeBit is a way to share common code that's lighter weight than NuGet. Each CodeBit consists of a single source code file. A structured comment at the beginning of the file indicates where to find the master copy so that automated tools can retrieve and update CodeBits to the latest version.

License

Offered under the MIT Open Source License.

About

CodeBit: A full-featured, reliable HTML parser for .NET that implements the XmlReader interface.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published