Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks? (ESEC/FSE 2020 - Research Papers)

Who

Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, Miryung Kim

Track

ESEC/FSE 2020 Research Papers

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 10 Nov 2020 01:41 - 01:42 at Virtual room 1 - ML Testing 1

Abstract

Recent effort to test deep learning systems has produced an intuitive and compelling test criterion called neuron coverage (NC), which resembles the notion of traditional code coverage. NC measures the proportion of neurons activated in a neural network and it is implicitly assumed that increasing NC improves the quality of a test suite. In an attempt to automatically generate a test suite that increases NC, we design a novel diversity promoting regularizer that can be plugged into existing adversarial attack algorithms. We then assess whether such attempts to increase NC could generate a test suite that (1) detects adversarial attacks successfully, (2) produces natural inputs, and (3) is unbiased to particular class predictions. Contrary to expectation, our extensive evaluation finds that increasing NC actually makes it harder to generate an effective test suite: higher neuron coverage leads to fewer defects detected, less natural inputs, and more biased prediction preferences. Our results invoke skepticism that increasing neuron coverage may not be a meaningful objective for generating tests for deep neural networks and call for a new test generation technique that considers defect detection, naturalness, and output impartiality in tandem.

DOI

https://doi.org/10.1145/3368089.3409754

Fabrice Harel-Canada

University of California at Los Angeles, USA

Lingxiao Wang

University of California at Los Angeles, USA

Muhammad Ali Gulzar

University of California at Los Angeles, USA

United States

Quanquan Gu

University of California at Los Angeles, USA

Miryung Kim

University of California at Los Angeles, USA

United States

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 10 Nov
Displayed time zone: (UTC) Coordinated Universal Time change

01:30 - 02:00	ML Testing 1Research Papers / Journal First at Virtual room 1

01:30 2m Talk		Correlations between Deep Neural Network Model Coverage Criteria and Model Quality Research Papers Shenao Yan Rutgers University, USA, Guanhong Tao Purdue University, USA, Xuwei Liu Purdue University, USA, Juan Zhai Rutgers University, USA, Shiqing Ma Rutgers University, USA, Lei Xu Nanjing University, China, Xiangyu Zhang Purdue University DOI
01:33 1m Talk		Deep Learning Library Testing via Effective Model GenerationACM SIGSOFT Distinguished Paper Award Research Papers Zan Wang Tianjin University, China, Ming Yan Tianjin University, China, Junjie Chen Tianjin University, China, Shuang Liu Tianjin University, China, Dongdi Zhang Tianjin University, China DOI
01:35 1m Talk		Detecting Numerical Bugs in Neural Network ArchitecturesACM SIGSOFT Distinguished Paper Award Research Papers Yuhao Zhang Peking University, Luyao Ren Peking University, China, Liqian Chen National University of Defense Technology, China, Yingfei Xiong Peking University, Shing-Chi Cheung Hong Kong University of Science and Technology, China, Tao Xie Peking University DOI
01:37 1m Talk		Dynamic Slicing for Deep Neural Networks Research Papers Ziqi Zhang Peking University, China, Yuanchun Li Microsoft Research, China, Yao Guo Peking University, Xiangqun Chen Peking University, Yunxin Liu Microsoft Research, China DOI
01:39 1m Talk		Grammar Based Directed Testing of Machine Learning Systems Journal First Sakshi Udeshi Singapore University of Technology and Design, Sudipta Chattopadhyay Singapore University of Technology and Design
01:41 1m Talk		Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks? Research Papers Fabrice Harel-Canada University of California at Los Angeles, USA, Lingxiao Wang University of California at Los Angeles, USA, Muhammad Ali Gulzar University of California at Los Angeles, USA, Quanquan Gu University of California at Los Angeles, USA, Miryung Kim University of California at Los Angeles, USA DOI
01:43 1m Talk		Operational Calibration: Debugging Confidence Errors for DNNs in the Field Research Papers Zenan Li Nanjing University, China, Xiaoxing Ma Nanjing University, China, Chang Xu Nanjing University, China, Jingwei Xu Nanjing University, China, Chun Cao Nanjing University, China, Jian Lv Nanjing University, China DOI
01:45 15m Talk		Conversations on ML Testing 1 Research Papers Fabrice Harel-Canada University of California at Los Angeles, USA, Ming Yan Tianjin University, China, Sakshi Udeshi Singapore University of Technology and Design, Shenao Yan Rutgers University, USA, Yuhao Zhang Peking University, Zenan Li Nanjing University, China, Ziqi Zhang Peking University, China, M: Hamid Bagheri University of Nebraska-Lincoln, USA