Skip to content

Latest commit

 

History

History
100 lines (73 loc) · 3.34 KB

README.md

File metadata and controls

100 lines (73 loc) · 3.34 KB

MicroRank

MicroRank is a novel system to locate root causes that lead to latency issues in microservice environments.

MicroRank extracts service latency from tracing data then conducts the anomaly detection procedure.

By combining PageRank and spectrum analysis, the service instances that lead to latency issues are ranked with high scores.

image

Paper Download

Our paper has been published at WWW'2021.

The paper can be downloaded as below:

MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments

Reference

Please cite our paper if you find this work is helpful.

@inproceedings{microrank,
  title={MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments},
  author={Guangba Yu, Pengfei Chen, Hongyang Chen, Zijie Guan, Zicheng Huang, Linxiao Jing, TianjunWeng, Xinmeng Sun, and Xiaoyun Li},
  booktitle={Proceedings of the Web Conference 2021 (WWW’2021)},
  year={2021},
  organization={ACM},
  page = {3087-3098},
  doi={https://doi.org/10.1145/3442381.3449905}
}

Running MicroRank

Notices

If you want to use MicroRank to production system, some notices below should be considered.

  • Our anomaly detetion module is not always suitable for each microservice system. If you have more excellent anomaly detection module for your system, we recommend that replacing the anomaly detetion module with your approach before RCA.
  • Microrank needs more iterations in PageRank if your system is a large microservice system. The accuracy of RCA may decline in a large microservice system.
  • We acknowledge that the accuracy of RCA may be degraded when intermittent failures and broken traces are encountered.

Replace Database

Line 12 in the file preprocess_data.py

// ES address
es_url = 'http://11.11.11.24:9200'
root_index = 'root'

Replace Normal Window

Line 32 in online_rca.py.

We need to set a normal window to calculate the normal avarge latency and variance for each microservice.

Longer window is prefered.

# need to replace 
normal_start = '2020-08-28 14:56:43'
normal_end = '2020-08-28 14:57:44'

span_list = get_span(start=timestamp(start), end=timestamp(end))
# print(span_list)
operation_list = get_service_operation_list(span_list)
print(operation_list)
operation_slo = get_operation_slo(
    service_operation_list=operation_list, span_list=span_list)
print(operation_slo)

Start MicroRank

python online_rca.py

File content

- anomaly_detector
  - get_slo                                 # get the average latency and variance for each operation
  - system_anormaly_detect                  # determine whether the system is abnormal 
  - trace_anormaly_detect                   # determine whether the single trace is abnormal 
  - trace_list_partition                    # divide traces into normal and abnormal traces
- online_rca.py
  - calculate_spectrum_without_delay_list   # calculate spectrum reuslt
  - online_anomaly_detect_RCA               # running microrank
- pagerank.py                               # calculate pagerank result
- preporcess_data.py
  - get_span 
  - get_service_operation_list 
  - get_operation_slo 
  - get_operation_duration_data 
  - get_pagerank_graph