Efficient Incident Identification from Multi-dimensional Issue Reports via Meta-heuristic Search
In large-scale cloud systems, unplanned service interruptions and outages may cause severe degradation of service availability. Such incidents can occur in a bursty manner, which will deteriorate user satisfaction. Identifying incidents rapidly and accurately is critical to the operation and maintenance of a cloud system. In industrial practice, incidents are typically detected through analyzing the issue reports, which are generated over time by monitoring cloud services. Identifying incidents in a large number of issue reports is quite challenging. An issue report is typically multi-dimensional: it has many categorical attributes. It is difficult to identify a specific attribute combination that indicates an incident. Existing methods generally rely on pruning-based search, which is time-consuming given high-dimensional data, thus not practical to incident detection in large-scale cloud systems. In this paper, we propose MID (Multi-dimensional Incident Detection), a novel framework for identifying incidents from large-amount, multi-dimensional issue reports effectively and efficiently. Key to the MID design is encoding the problem into a combinatorial optimization problem. Then a specific-tailored meta-heuristic search method is designed, which can rapidly identify attribute combinations that indicate incidents. We evaluate MID with extensive experiments using both synthetic data and real-world data collected from a large-scale production cloud system. The experimental results show that MID significantly outperforms the current state-of-the-art methods in terms of effectiveness and efficiency. Additionally, MID has been successfully applied to Microsoft's cloud systems and helped greatly reduce manual maintenance effort.
Wed 11 NovDisplayed time zone: (UTC) Coordinated Universal Time change
01:30 - 02:00 | |||
01:30 2mTalk | A Principled Approach to GraphQL Query Cost AnalysisACM SIGSOFT Distinguished Paper Award Research Papers Alan Cha IBM Research, USA, Erik Wittern IBM, USA, Guillaume Baudart IBM Research, USA, James C. Davis Purdue University, USA, Louis Mandel IBM Research, USA, Jim A. Laredo IBM Research, USA DOI Pre-print Media Attached | ||
01:33 1mTalk | Block Public Access: Trust Safety Verification of Access Control Policies Research Papers Malik Bouchet Amazon, USA, Byron Cook Amazon, Bryant Cutler Amazon, USA, Anna Druzkina Amazon, USA, Andrew Gacek Amazon, USA, Liana Hadarean Amazon, Ranjit Jhala Amazon, USA, Brad Marshall Amazon, USA, Dan Peebles Amazon, USA, Neha Rungta Amazon Web Services, Cole Schlesinger Amazon, USA, Chriss Stephens Amazon, USA, Carsten Varming Amazon, USA, Andy Warfield Amazon, USA DOI | ||
01:35 1mTalk | Efficient Incident Identification from Multi-dimensional Issue Reports via Meta-heuristic Search Research Papers Jiazhen Gu Fudan University, China, Chuan Luo Microsoft Research, China, Si Qin Microsoft Research, n.n., Bo Qiao Microsoft Research, China, Qingwei Lin Microsoft Research, China, Hongyu Zhang University of Newcastle, Australia, Ze Li Microsoft, USA, Yingnong Dang Microsoft, USA, Shaowei Cai Institute of Software at Chinese Academy of Sciences, China, Wei-Cheng Wu University of Southern California, USA, Yangfan Zhou Fudan University, China, Murali Chintalapati Microsoft, n.n., Dongmei Zhang Microsoft Research, China DOI | ||
01:37 1mTalk | Graph-Based Trace Analysis for Microservice Architecture Understanding and Problem Diagnosis Industry Papers Xiaofeng Guo Fudan University, China, Xin Peng Fudan University, China, Hanzhang Wang eBay, Wanxue Li eBay, USA, Huai Jiang eBay, USA, Dan Ding Fudan University, China, Tao Xie Peking University, Liangfei Su eBay, USA DOI | ||
01:39 1mTalk | Real-Time Incident Prediction for Online Service Systems Research Papers Nengwen Zhao Tsinghua University, Junjie Chen Tianjin University, China, Zhou Wang BizSeer, China, Xiao Peng Beijing University of Posts and Telecommunications, China, Gang Wang China EverBright Bank, Yong Wu China EverBright Bank, Fang Zhou China EverBright Bank, Zhen Feng EverBright Bank, China, Xiaohui Nie EverBright Bank, China, Wenchi Zhang Tsinghua University, China, Kaixin Sui BizSeer, Dan Pei BizSeer, China DOI | ||
01:41 1mTalk | Scaling Static Taint Analysis to Industrial SOA Applications: A Case Study at Alibaba Industry Papers Jie Wang Peking University, China / Ant Group, China / Alibaba Group, China, Yunguang Wu Ant Group, China, Gang Zhou Ant Group, China, Yiming Yu Ant Group, China, Zhenyu Guo Ant Group, China, Yingfei Xiong Peking University DOI | ||
01:43 17mTalk | Conversations on Cloud / Services 2 Paper Presentations Alan Cha IBM Research, USA, Andrew Gacek , Jiazhen Gu , Jie Wang Institute of Software, Chinese Academy of Sciences, Nengwen Zhao Tsinghua University, Xiaofeng Guo Fudan University, China, M: Satish Chandra Facebook, USA |