Graph-Based Trace Analysis for Microservice Architecture Understanding and Problem Diagnosis
Microservice systems are highly dynamic and complex. For such systems, operation engineers and developers highly rely on trace analysis to understand architectures and diagnose various problems such as service failures and quality degradation.
However, the huge number of traces produced at runtime makes it challenging to capture the required information in real-time. To address the faced challenges, in this paper, we propose a graph-based microservice trace analysis approach GMTA for understanding architecture and diagnosing various problems.
Built on a graph-based representation, GMTA includes efficient processing of traces produced on the fly.
It abstracts traces into different paths and further groups them into business flows.
To support various analytical applications, GMTA includes an efficient storage and access mechanism by combining a graph database and a real-time analytics database and using a carefully designed storage structure.
Based on GMTA, we construct analytical applications for architecture understanding and problem diagnosis, these applications support various needs such as visualizing service dependencies, making architectural decisions, analyzing the changes of services behaviors, detecting performance issues, and locating root causes.
GMTA has been implemented and deployed in eBay.
An experimental study based on trace data produced by eBay demonstrates GMTA's effectiveness and efficiency for architecture understanding and problem diagnosis.
Case studies conducted in eBay's monitoring team and Site Reliability Engineering (SRE) team further confirm GMTA's substantial benefits in industrial-scale microservice systems.
Wed 11 NovDisplayed time zone: (UTC) Coordinated Universal Time change
01:30 - 02:00
Cloud / Services 2Paper Presentations / Research Papers / Industry Papers at Virtual room 2
|A Principled Approach to GraphQL Query Cost AnalysisACM SIGSOFT Distinguished Paper Award|
Alan Cha IBM Research, USA, Erik Wittern IBM, USA, Guillaume Baudart IBM Research, USA, James C. Davis Purdue University, USA, Louis Mandel IBM Research, USA, Jim A. Laredo IBM Research, USADOI Pre-print Media Attached
|Block Public Access: Trust Safety Verification of Access Control Policies|
Malik Bouchet Amazon, USA, Byron Cook Amazon, Bryant Cutler Amazon, USA, Anna Druzkina Amazon, USA, Andrew Gacek Amazon, USA, Liana Hadarean Amazon, Ranjit Jhala Amazon, USA, Brad Marshall Amazon, USA, Dan Peebles Amazon, USA, Neha Rungta Amazon Web Services, Cole Schlesinger Amazon, USA, Chriss Stephens Amazon, USA, Carsten Varming Amazon, USA, Andy Warfield Amazon, USADOI
|Efficient Incident Identification from Multi-dimensional Issue Reports via Meta-heuristic Search|
Jiazhen Gu Fudan University, China, Chuan Luo Microsoft Research, China, Si Qin Microsoft Research, n.n., Bo Qiao Microsoft Research, China, Qingwei Lin Microsoft Research, China, Hongyu Zhang University of Newcastle, Australia, Ze Li Microsoft, USA, Yingnong Dang Microsoft, USA, Shaowei Cai Institute of Software at Chinese Academy of Sciences, China, Wei-Cheng Wu University of Southern California, USA, Yangfan Zhou Fudan University, China, Murali Chintalapati Microsoft, n.n., Dongmei Zhang Microsoft Research, ChinaDOI
|Graph-Based Trace Analysis for Microservice Architecture Understanding and Problem Diagnosis|
Xiaofeng Guo Fudan University, China, Xin Peng Fudan University, China, Hanzhang Wang eBay, Wanxue Li eBay, USA, Huai Jiang eBay, USA, Dan Ding Fudan University, China, Tao Xie Peking University, Liangfei Su eBay, USADOI
|Real-Time Incident Prediction for Online Service Systems|
Nengwen Zhao Tsinghua University, Junjie Chen Tianjin University, China, Zhou Wang BizSeer, China, Xiao Peng Beijing University of Posts and Telecommunications, China, Gang Wang China EverBright Bank, Yong Wu China EverBright Bank, Fang Zhou China EverBright Bank, Zhen Feng EverBright Bank, China, Xiaohui Nie EverBright Bank, China, Wenchi Zhang Tsinghua University, China, Kaixin Sui BizSeer, Dan Pei BizSeer, ChinaDOI
|Scaling Static Taint Analysis to Industrial SOA Applications: A Case Study at Alibaba|
Jie Wang Peking University, China / Ant Group, China / Alibaba Group, China, Yunguang Wu Ant Group, China, Gang Zhou Ant Group, China, Yiming Yu Ant Group, China, Zhenyu Guo Ant Group, China, Yingfei Xiong Peking UniversityDOI
|Conversations on Cloud / Services 2|
Alan Cha IBM Research, USA, Andrew Gacek , Jiazhen Gu , Jie Wang Institute of Software, Chinese Academy of Sciences, Nengwen Zhao Tsinghua University, Xiaofeng Guo Fudan University, China, M: Satish Chandra Facebook, USA