LogLSHD

LogLSHD is an enhanced log parsing framework built upon Loghub-2.0, integrating Locality-Sensitive Hashing (LSH) techniques to improve log template extraction and clustering efficiency.

Method Overview

LogLSHD consists of the following stages:

Preprocessing In this stage, raw log messages are preprocessed to replace common variables (such as IP addresses and domain names) with placeholders using domain-specific regular expressions.
Initial Log Grouping This stage employs simple heuristic methods to initially group unstructured log messages. It assesses whether logs belong to the same group by comparing the token count, content length, and specific position characters.
Merging Initial Groups into Log Clusters After initial grouping, a stricter similarity comparison is applied to aggregate the grouped logs. Using a Jaccard similarity threshold, similar log messages are clustered together for template extraction.
Template Extraction Finally, the method randomly selects a representative sample of 10 logs from each group for template extraction, resulting in the formation of final log templates. The template extraction process leverages Dynamic Time Warping (DTW) to optimize accuracy in conjunction with similarity grouping.

Getting Started

To use LogLSHD for log parsing, follow these steps:

1. Datasets download

Please first download the full datasets of Loghub-2.0 via Zenodo.

After downloading, place the datasets in the full_dataset/ directory, ensuring the format matches the 2k_dataset directory.

2. Install Dependencies

Install python >= 3.8
pip install -r requirements.txt

3. Run Benchmark

Evaluate LogLSHD performance on 14 datasets:

cd benchmark/
./run_lsh_full.sh

Run all statistic-based log parsers on Loghub-2k:

cd benchmark/
./run_statistic_2k.sh

Run all statistic-based log parsers on Loghub-2.0:

cd benchmark/
./run_statistic_full.sh

Default Jaccard Thresholds of LSH for Different Datasets

The following table presents the default Jaccard similarity thresholds used in LogLSHD for various datasets.

Dataset	Jaccard Threshold
Proxifier	1.00
Linux	0.65
Apache	0.65
Zookeeper	0.80
Hadoop	0.85
HealthApp	0.65
OpenStack	0.70
HPC	0.70
Mac	0.95
OpenSSH	0.85
Spark	0.90
Thunderbird	0.60
BGL	0.90
HDFS	0.80

Research Paper

For more details, please refer to the paper: LogLSHD: Locality-Sensitive Hashing for Accurate and Efficient Log Parsing.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
2k_dataset		2k_dataset
RQs_experiments		RQs_experiments
benchmark		benchmark
full_dataset		full_dataset
result		result
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogLSHD

Method Overview

Getting Started

1. Datasets download

2. Install Dependencies

3. Run Benchmark

Default Jaccard Thresholds of LSH for Different Datasets

Research Paper

About

Releases

Packages

Languages

License

mooselab/LogLSHD

Folders and files

Latest commit

History

Repository files navigation

LogLSHD

Method Overview

Getting Started

1. Datasets download

2. Install Dependencies

3. Run Benchmark

Default Jaccard Thresholds of LSH for Different Datasets

Research Paper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages