Skip to content

Elixir/Erlang bindings for lexborisov's myhtml. THIS IS A MIRROR, real repo at https://git.pleroma.social/pleroma/elixir-libraries/fast_html

License

Notifications You must be signed in to change notification settings

rinpatch/fast_html

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
rinpatch
Sep 1, 2020
fc1d67c · Sep 1, 2020
Nov 8, 2019
Aug 13, 2020
Aug 28, 2017
Aug 1, 2020
Aug 1, 2020
Oct 29, 2019
Feb 22, 2018
Aug 1, 2020
Aug 13, 2020
Sep 1, 2020
Sep 7, 2017
Aug 25, 2020
Aug 1, 2020
Sep 1, 2020
Aug 1, 2020

Repository files navigation

FastHTML

A C Node wrapping lexborisov's myhtml. Primarily used with FastSanitize.

  • Available as a hex package: {:fast_html, "~> 2.0"}
  • Documentation

Benchmarks

The following table provides median times it takes to decode a string to a tree for html parsers that can be used from Elixir. Benchmarks were conducted on a machine with an AMD Ryzen 9 3950X (32) @ 3.500GHz CPU and 32GB of RAM. The mix fast_html.bench task can be used for running the benchmark by yourself.

File/Parser fast_html (Port) mochiweb_html (erlang) html5ever (Rust NIF) Myhtmlex (NIF)¹
document-large.html (6.9M) 125.12 ms 1778.34 ms 395.21 ms 327.17 ms
document-medium.html (85K) 1.93 ms 12.10 ms 4.74 ms 3.82 ms
document-small.html (25K) 0.50 ms 2.76 ms 1.72 ms 1.19 ms
fragment-large.html (33K) 0.93 ms 4.78 ms 2.34 ms 2.15 ms
fragment-small.html² (757B) 44.60 μs 42.13 μs 43.58 μs 289.71 μs

Full benchmark output can be seen in this snippet

  1. Myhtmlex has a C-Node mode, but it wasn't benchmarked here because it segfaults on document-large.html
  2. The slowdown on fragment-small.html is due to Port overhead. Unlike html5ever and Myhtmlex in NIF mode, fast_html has the parser process isolated and communicates with it over stdio, so even if a fatal crash in the parser happens, it won't bring down the entire VM.

Contribution / Bug Reports

  • Please make sure you do git submodule update after a checkout/pull
  • The project aims to be fully tested

About

Elixir/Erlang bindings for lexborisov's myhtml. THIS IS A MIRROR, real repo at https://git.pleroma.social/pleroma/elixir-libraries/fast_html

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages