Skip to content

Latest commit

 

History

History
89 lines (74 loc) · 2.29 KB

README.md

File metadata and controls

89 lines (74 loc) · 2.29 KB

Domain Metadata Analysis

  • Root Domain Crawl

    • Javascript / Cookie Tracking
    • Javascript Libs
    • SSL available
    • Page Speed
    • Domain Whois Data
    • Security Issues
    • HTTP Server
    • HTTP Protocol
    • structured Data (schema.org)
    • Used HTML Tags ("iframe", "svg", ...)
    • Content Management Systems
    • PHP Versions
    • RSS/Atom feeds
  • Full Domain Crawl

    • Match Tracking Data with data privacy statement
    • Referrer
    • Redirects
    • Broken Links
  • time consuming Crawl

    • SSL Implementation / Rating
    • HTML Validation (w3.org)
    • Ports (MySQL, MongoDB, ...)

Other similar Projects

Domain Lists

Used Libs and Formats

Splash - Lightweight, scriptable browser as a service with an HTTP API

adblockparser - Parser for Adblock Plus rules

HTTP Archive format (HAR)

HTTP Archive format (HAR) Viewer

Publish

Keywords

"Webometrie" "Webometrics" "Cybermetrics" "Web Mining" "Internet Data Mining", "Internet Research", "Internet Technologie Trends"

Crawler Performance without Threads

avg sec. * domain count = duration sec. / 86400 = duration days 5 * 1000000 = 5000000 / 86400 = 57.8 days