xml-i

xml-i ("XML eye") is a personal playground project where I experiment with implementing a simple program in various programming languages (and libraries), focusing on efficiency and performance in each approach.

The cli of the program should look the same in each language, which would be something like this:

./<path_to_program> <path_to_xml_file> [node_name1 node_name2 ...]

The program expects at least one argument: the path to the XML file.
Additional arguments can be provided to specify node names to filter.

The program counts the occurrences of distinct XML nodes in a given XML document. A custom list of node names may also be passed if only a subset of all nodes should be counted. (see if that makes any difference in runtime/mem consumption...)

A basic benchmarking feature is included to demonstrate and compare the performance differences between the language implementations.

Baseline implementation: Written in Rust, using quick-xml, serving as the primary reference for efficiency and performance.
Alternative implementations: Code in other languages can be found in the alien directory.

Example Usage

./target/release/xml-i ./test/huge.xml boing blips
Node counts:
boing: 1342440
blips: 1279

./alien/bin/xml-i-xerces ./test/_/huge.xml boing blips
Node counts:
boing: 1342440
blips: 1279

Build / TestData / Benchmarking

Each variant defines it's buildconfig via New-AppDecl. All <something>.bc.ps1 are automatically picked up and dot-sources byxml-i.build.ps1. Use PowerShell and Invoke-Build.

See the Rust quick-xml build config as example.

Basis for my benchmark-results:

OS: Linux 6.something-MANJARO
Model: ThinkPad X13 Laptop
CPU: 12th Gen Intel(R) Core(TM) i7-1270P
RAM: 32GB
HDD: WD Black SN770 / PC SN740 256GB / PC SN560 (DRAM-less) NVMe SSD

=> See benchmark_results.md

The results show that Rust using quick-xml (a StAX-style parser) is consistently the fastest and most memory-efficient implementation across all file sizes, often by a wide margin. C++ parsers (especially pugixml and rapidxml, both DOM) are also very fast, but can use significantly more memory, especially with large files. SAX-style parsers (such as C++ libxml2-SAX, Java SAX, and .NET XmlReader) generally use less memory than DOM parsers and perform well, but are usually a bit slower than the fastest DOM implementations in C++ and Rust.

Motivation

This repository serves as a learning and benchmarking tool, helping to explore language-specific approaches to XML processing and performance optimization.

License

MIT License - see LICENSE.txt

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
alien		alien
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.txt		LICENSE.txt
README.md		README.md
xml-i.build.ps1		xml-i.build.ps1
xml-i.reporting.ps1		xml-i.reporting.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

xml-i

Example Usage

Build / TestData / Benchmarking

Motivation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

mwallner/xml-i

Folders and files

Latest commit

History

Repository files navigation

xml-i

Example Usage

Build / TestData / Benchmarking

Motivation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages