Write a Blog >>
Wed 11 Nov 2020 01:39 - 01:40 at Virtual room 2 - Cloud / Services 2

Incidents in online service systems could dramatically degrade system availability and destroy user experience. To guarantee service quality and reduce economic loss, it is essential to predict the occurrence of incidents in advance so that engineers can take some proactive actions to prevent them. In this work, we propose an effective and interpretable incident prediction approach, called eWarn, which utilizes historical data to forecast whether an incident will happen in the near future based on alert data in real time. More specifically, eWarn first extracts a set of effective features (including textual features and statistical features) to represent omen alert patterns via careful feature engineering. To reduce the influence of noisy alerts (that are not relevant to the occurrence of incidents), eWarn then incorporates the multi-instance learning formulation. Finally, eWarn builds a classification model via machine learning and generates an interpretable report about the prediction result via a state-of-the-art explanation technique (i.e., LIME). In this way, an early warning signal along with its interpretable report can be sent to engineers to facilitate their understanding and handling for the incoming incident. An extensive study on 11 real-world online service systems from a large commercial bank demonstrates the effectiveness of eWarn, outperforming state-of-the-art alert-based incident prediction approaches and the practice of incident prediction with alerts. In particular, we have applied eWarn to two large commercial banks in practice and shared some success stories and lessons learned from real deployment.

Wed 11 Nov
Times are displayed in time zone: (UTC) Coordinated Universal Time change

01:30 - 02:00: Cloud / Services 2Paper Presentations / Research Papers / Industry Papers at Virtual room 2
01:30 - 01:32
Talk
A Principled Approach to GraphQL Query Cost AnalysisACM SIGSOFT Distinguished Paper Award
Research Papers
Alan ChaIBM Research, USA, Erik WitternIBM, USA, Guillaume BaudartIBM Research, USA, James C. DavisPurdue University, USA, Louis MandelIBM Research, USA, Jim A. LaredoIBM Research, USA
DOI Pre-print Media Attached
01:33 - 01:34
Talk
Block Public Access: Trust Safety Verification of Access Control Policies
Research Papers
Malik BouchetAmazon, USA, Byron CookAmazon, Bryant CutlerAmazon, USA, Anna DruzkinaAmazon, USA, Andrew GacekAmazon, USA, Liana HadareanAmazon, Ranjit JhalaAmazon, USA, Brad MarshallAmazon, USA, Dan PeeblesAmazon, USA, Neha RungtaAmazon Web Services, Cole SchlesingerAmazon, USA, Chriss StephensAmazon, USA, Carsten VarmingAmazon, USA, Andy WarfieldAmazon, USA
DOI
01:35 - 01:36
Talk
Efficient Incident Identification from Multi-dimensional Issue Reports via Meta-heuristic Search
Research Papers
Jiazhen GuFudan University, China, Chuan LuoMicrosoft Research, China, Si QinMicrosoft Research, n.n., Bo QiaoMicrosoft Research, China, Qingwei LinMicrosoft Research, China, Hongyu ZhangUniversity of Newcastle, Australia, Ze LiMicrosoft, USA, Yingnong DangMicrosoft, USA, Shaowei CaiInstitute of Software at Chinese Academy of Sciences, China, Wei-Cheng WuUniversity of Southern California, USA, Yangfan ZhouFudan University, China, Murali ChintalapatiMicrosoft, n.n., Dongmei ZhangMicrosoft Research, China
DOI
01:37 - 01:38
Talk
Graph-Based Trace Analysis for Microservice Architecture Understanding and Problem Diagnosis
Industry Papers
Xiaofeng GuoFudan University, China, Xin PengFudan University, China, Hanzhang WangeBay, Wanxue LieBay, USA, Huai JiangeBay, USA, Dan DingFudan University, China, Tao XiePeking University, Liangfei SueBay, USA
DOI
01:39 - 01:40
Talk
Real-Time Incident Prediction for Online Service Systems
Research Papers
Nengwen ZhaoTsinghua University, Junjie ChenTianjin University, China, Zhou WangBizSeer, China, Xiao PengBeijing University of Posts and Telecommunications, China, Gang WangChina EverBright Bank, Yong WuChina EverBright Bank, Fang ZhouChina EverBright Bank, Zhen FengEverBright Bank, China, Xiaohui NieEverBright Bank, China, Wenchi ZhangTsinghua University, China, Kaixin SuiBizSeer, Dan PeiBizSeer, China
DOI
01:41 - 01:42
Talk
Scaling Static Taint Analysis to Industrial SOA Applications: A Case Study at Alibaba
Industry Papers
Jie WangPeking University, China / Ant Group, China / Alibaba Group, China, Yunguang WuAnt Group, China, Gang ZhouAnt Group, China, Yiming YuAnt Group, China, Zhenyu GuoAnt Group, China, Yingfei XiongPeking University
DOI
01:43 - 02:00
Talk
Conversations on Cloud / Services 2
Paper Presentations
Alan ChaIBM Research, USA, Andrew Gacek, Jiazhen Gu, Jie WangInstitute of Software, Chinese Academy of Sciences, Nengwen ZhaoTsinghua University, Xiaofeng GuoFudan University, China, M: Satish ChandraFacebook, USA