A Generalized Method for Fault Detection and Diagnosis in SCADA Sensor Data via Classification with Uncertain Labels

Abstract

Supervisory control and data acquisition (SCADA) systems automatically collect data from an array of sensors inside large-scale industrial machines such as wind turbines and chemical reaction chambers. We propose a new generalized method to use SCADA sensor data in real time to detect potential faults and indicate sensors associated with these faults. The problem can be regarded as a classification problem with uncertain or soft labels since the onset time of each fault is not known. A novel data transformation technique is proposed that allows any weight-sensitive classification algorithms to be used on data with soft labels. The method uses a decreasing function to assign instance weights to the instances based on the time interval before the failure. Each instance appears in the model as a positive and negative example with different weights. The method then can use any instance-weighted classification algorithm such as random forest and SVM. We compare this soft-label approach with a naive hard label approach and also propose within-outage and cross-outage testing scheme for more comprehensive evaluation. We develop a novel method for diagnosis that uses feature importance score. Experimental results on wind turbine sensor data support the effectiveness of the proposed method at both detecting and diagnosing faults. Our analysis of the calculated feature importance scores provide evidence of multiple potential root causes for a particular fault class in wind turbines that belong to the same wind farm.

Publication
14th International Conference on Data Science (ICDATA)
Date

alt text