Development of a methodology based on Explainable Artificial Intelligence for validating pseudo-labels in the diagnosis of faults in wind turbines
Please login to view abstract download link
Reliable fault diagnosis in wind turbines relies on observational data such as that from the Supervisory Control and Data Acquisition (SCADA) system. However, the scarcity of labeled faults concerning such data requires the use of unsupervised anomaly detection algorithms (such as Isolation Forest and DBSCAN) to generate pseudo-labels indicating potential system faults. However, these pseudo-labels often lack transparency and reliability, confusing real fault signals with benign operational outliers. To overcome this challenge, this work proposes an innovative methodology that utilizes explainable artificial intelligence (XAI) techniques, specifically Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), to validate and refine pseudo-labels systematically. The methodology starts with anomaly identification and anomaly generation by unsupervised methods. Then, XAI is applied to explain which specific SCADA signals led to each anomaly classification. Finally, these explanations generated by XAI are analyzed from an engineering perspective to verify their correspondence with known patterns of mechanical or electrical failures. This step is crucial since it transforms the opaque pseudo-labels into “explained pseudo-labels”, which carry diagnostic information and allow the differentiation between genuine failures and false alarms. The fundamental contribution lies in the proactive use of XAI to validate the pseudo-label generation process, as opposed to its conventional use in interpreting predictive outputs. This application deepens the understanding of deviation signaling by the anomaly detection model, enabling the optimized selection of algorithms for specific types of failures and the identification of spurious benign anomalies in wind turbines. It is expected that this work will provide a robust framework to enhance the reliability of unsupervised fault diagnosis, offering a clear understanding of anomaly origins and making alerts based on SCADA data more accurate. Additionally, the methodology enables the comparison of different algorithms, guiding their selection and promoting a more interpretable approach to predictive maintenance in wind turbines.