A novel transformer-based DL model enhanced by position-sensitive attention and gated hierarchical LSTM for aero-engine RUL prediction

Chen, Xinping

doi:10.1038/s41598-024-59095-3

Download PDF

Article
Open access
Published: 02 May 2024

A novel transformer-based DL model enhanced by position-sensitive attention and gated hierarchical LSTM for aero-engine RUL prediction

Xinping Chen¹

Scientific Reports volume 14, Article number: 10061 (2024) Cite this article

239 Accesses
Metrics details

Subjects

Abstract

Accurate prediction of remaining useful life (RUL) for aircraft engines is essential for proactive maintenance and safety assurance. However, existing methods such as physics-based models, classical recurrent neural networks, and convolutional neural networks face limitations in capturing long-term dependencies and modeling complex degradation patterns. In this study, we propose a novel deep-learning model based on the Transformer architecture to address these limitations. Specifically, to address the issue of insensitivity to local context in the attention mechanism employed by the Transformer encoder, we introduce a position-sensitive self-attention (PSA) unit to enhance the model's ability to incorporate local context by attending to the positional relationships of the input data at each time step. Additionally, a gated hierarchical long short-term memory network (GHLSTM) is designed to perform regression prediction at different time scales on the latent features, thereby improving the accuracy of RUL estimation for mechanical equipment. Experiments on the C-MAPSS dataset demonstrate that the proposed model outperforms existing methods in RUL prediction, showcasing its effectiveness in modeling complex degradation patterns and long-term dependencies.

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Article 08 May 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Predicting equilibrium distributions for molecular systems with deep learning

Article Open access 08 May 2024

Introduction

Accurate prediction of remaining useful life (RUL) is crucial for proactive maintenance, reducing casualties and economic losses. RUL prediction methods are classified into physics-based, artificial intelligence-based, and hybrid models^{1,2,3,4,5,6,7}. Physics-based methods use specific models based on failure mechanisms to explain degradation patterns and integrate real-time monitoring data for RUL assessment. However, they face limitations in complex mechanical systems^8,9,10,11. Artificial intelligence methods learn degradation patterns from observational data without expert knowledge. They excel in predicting complex systems where physical or statistical models are inadequate and have gained attention with advancements in technology¹². Hybrid methods integrate the advantages of different approaches but may be limited in complex rotating machinery¹³.

With the accumulation of valuable data and the rapid advancement of computing power, deep learning (DL) has become a hot topic and has been successfully applied in various engineering fields. DL + PHM has gained popularity in both academia and industry. For instance, in the early days, some methods employed classical RNN models for regression tasks on time series data. However, RNN models face challenges such as the gradient vanishing or exploding problem¹⁴, limiting their performance in long sequence prediction tasks. As a solution, RNN variants like LSTM^15,16 and GRU¹⁷ emerged, which use nonlinear gating mechanisms to control the flow of information and alleviate these limitations to some extent. The research on using gated networks for RUL prediction has been growing rapidly. Zhang et al.¹⁸ proposed an LSTM-Fusion network structure for estimating the RUL of aircraft engines. This network integrates observation sequences of different lengths to extract hidden information effectively. Miao et al.¹⁹ introduced a novel dual-task stacked LSTM method that simultaneously evaluates the degradation stages and predicts the RUL of aircraft engines. Liu et al.²⁰ presented a multi-level prediction approach for aircraft engine health using LSTM and statistical process analysis for bearing fault prediction. Zhang et al.²¹ proposed a dual-task network structure based on bidirectional GRU and a mixture of multiple gating expert units. This structure enables simultaneous evaluation of aircraft engine health status and prediction of the RUL. Ma et al.²² introduced a new prediction model based on deep wavelet sequence gated recurrent units for RUL prediction of rotating machinery. The proposed wavelet sequence gated recurrent units generate wavelet sequences of different scales through a wavelet layer. Xiao et al.²³ enhanced the robustness of the BLSTM model for RUL prediction by adding Gaussian white noise to the health indicators based on principal component analysis. Song et al.²⁴ constructed aircraft engine health indicators using variational autoencoders and employed the BLSTM model for RUL prediction.

In addition to enhancing the model's temporal data processing capability using recurrent approaches, another alternative is the use of convolutional neural networks (CNNs), which employ shared receptive fields to improve spatial feature extraction²⁵. CNN-based models have also been successfully applied in RUL prediction and have shown competitive performance. Zhu et al.²⁶ proposed a multi-scale CNN for predicting the RUL of bearings. Compared to traditional CNNs, this network maintains synchronization of global and local information. Li et al.²⁷ introduced a new approach based on deep CNNs for RUL prediction using raw data. Yang et al.²⁸ employed a dual CNN model for RUL prediction. In this model, the first CNN model identifies early fault points, while the second CNN model predicts the RUL. Jiang et al.²⁹ transformed time series data into multi-channel data and used CNN to construct health indicators, leading to improved accuracy in residual life prediction.

The Transformer model^30,31,32, as one of the most popular deep learning architectures in recent years, has been introduced for sequence data modeling. It efficiently handles long sequences of parallel data and can be applied to time series data of varying lengths. It has achieved remarkable success in various industrial applications, including natural language processing³³, machine vision³⁴, medical diagnosis³⁵, and more. In recent years, it has also been gradually applied in the field of RUL prediction. Zhang et al.³⁶ introduced a novel Transformer-based bidirectional self-attention deep model for RUL prediction. This method is a fully self-attention-based encoder-decoder structure without any RNN/CNN modules. Su et al.³⁷ proposed an adaptive Transformer that combines attention mechanisms and recurrent structures for predicting the RUL of rolling bearings. It directly models the relationship between shallow features and RUL, mitigating the vanishing gradient problem and better representing complex time degradation patterns. Based on the proposed shared temporal attention layer, Chadha et al.³⁸ developed two Transformer models specifically designed for handling multivariate time series data and applied them to predict the RUL of aircraft engines. Chang et al.³⁹ proposed a novel Transformer model for RUL prediction based on a sparse multi-head self-attention mechanism and knowledge distillation technique. It effectively reduces the computational burden of the model and improves domain adaptation capability for raw signal data of rolling bearings. Ren et al.⁴⁰ introduced a T2 tensor-assisted multiscale Transformer model to accurately predict the RUL of industrial components. Ding et al.⁴¹ presented a new convolutional Transformer model capable of extracting degradation-related information from both local and global original signals.

In this study, we propose a DL model based on a Transformer-based auto-encoder for the task of RUL prediction. Unlike RNN and CNN models, the Transformer architecture allows for the processing of a sequence of data in a single pass by leveraging attention mechanisms, enabling access to any part of the historical data without being limited by distance. This makes it potentially more powerful in capturing long-term dependencies. However, the adopted dot-product self-attention in Transformers results in the extracted high-level features being insensitive to their local context at each time step³⁴, which requires the model to invest more effort in estimating the corresponding RUL. Therefore, we introduce position-aware self-attention units (PSA) to enhance the model's ability to focus on the positional relationships of the input data at each time step and improve the incorporation of local context. Additionally, to leverage the improved features extracted by the encoder, we design a gated hierarchical long short-term memory network (GHLSTM) for regression predictions at different time scales, further enhancing the accuracy of RUL prediction for mechanical equipment. The main contributions in the article are as follows.

(1)
The traditional attention mechanism used in the Transformer encoder is insensitive to the local context, which is essential for predicting remaining useful life. The proposed position-aware self-attention (PSA) mechanism captures the positional relationships of input data, enabling the model to incorporate local context and generate more effective hidden features. This leads to improved accuracy in predicting remaining useful life.
(2)
For enhancing the ability to model long-term dependencies and improve performance in handling large-scale sequential data, the gated hierarchical long short-term memory (GHLSTM) network is proposed, which learns features at different time scales, enables regression predictions at multiple scales, and provides comprehensive feature learning. This results in improved accuracy in predicting RUL.
(3)
Experimental results on a widely used aerospace dataset demonstrate the superiority of our proposed method over other existing methods based on quantitative evaluation metrics.

The outline of the article is as follows. Section "Introduction" provides an introduction to the research topic. Section "Theoretical basis" presents the theoretical basis. Section "Proposed methodology" gives a detailed deduction of the proposed DL model. Section "Experimental analysis" is the content of experiments and relevant analysis. And finally, a conclusion is given in Section "Conclusion".

Theoretical basis

The Transformer was first introduced in 2017 for NLP tasks⁴². It is a sequence-to-sequence model that essentially functions as an auto-encoder, composed of a sophisticated encoder module and a decoder module. The encoder module maps the input sequence to a high-dimensional hidden vector, which is then fed into the decoder to generate the output sequence. Unlike recurrent networks with their sequential data input nature, the Transformer is capable of capturing long-term dependencies by utilizing self-attention mechanisms based on dot products. Transformer-based models have achieved remarkable performance in various time series tasks, including natural language processing, computer vision, and PHM.

The proposed model primarily focuses on the improved structure of the Transformer encoder module. Therefore, in this section, we provide a detailed explanation of the main components and architecture of the Transformer encoder module. The Transformer encoder structure, as shown in Fig. 1, mainly consists of multi-head attention, feed-forward networks, and position encoding.

Multi-head self-attention

The multi-head self-attention mechanism is a variant of the attention mechanism widely used in natural language processing and machine translation tasks. It is an extension of the self-attention mechanism designed to enhance the modeling capacity of the model for different semantic information. The self-attention mechanism allows the model to interact and exchange information between different positions in the input sequence, while the multi-head self-attention mechanism further expands this interactive capability. It achieves this by applying the attention mechanism to different projections in multiple subspaces, creating multiple attention heads. Each attention head has its own set of parameters and can learn different attention weights to capture the associations between different semantic information.

The calculation process of the multi-head self-attention mechanism is as follows. We project the input sequence, i.e. ${\textbf{f}} = \left\{ {f_{i} } \right\}_{i = 1}^{t}$ with $f_{i}$ w.r.t $x_{i}$ and $f_{i} \in R^{d}$, into multiple subspaces through linear transformations. For each attention head, we use different parameter matrices to perform the projection, obtaining representations for each sub-space. We denote the parallel attention calculations as H, which represents the multi-head attention mechanism:

$$MultiHead({\textbf{Q, K, V}}) = C{\text{on}}cat(\{ head_{j} \}_{j = 1}^{H} ){\textbf{W}}^{A} ,$$

(1)

$$head_{j} = Attention({\textbf{Q, K, V}})_{j} = soft\max (\frac{{{\textbf{Q}}_{j} {\textbf{K}}_{j}^{T} }}{{\sqrt {d_{k} } }}){\textbf{V}}_{j} ,$$

(2)

$$\begin{gathered} {\textbf{K}}_{j} = {\textbf{fW}}_{j}^{k} \hfill \\ {\textbf{V}}_{j} = {\textbf{fW}}_{j}^{v} \hfill \\ {\textbf{Q}}_{j} = {\textbf{fW}}_{j}^{q} \hfill \\ \end{gathered}$$

(3)

where ${\textbf{W}}^{A} \in {\textbf{R}}^{{H_{{d_{k} \times d}} }}$ and $d_{k} = d/H$;${\textbf{k}}_{j}$,${\textbf{V}}_{j}$ and ${\textbf{Q}}_{j}$ are the key, value and query vectors; ${head}_{j}$ is the jth attention head; ${\textbf{W}}_{j}^{k}$, ${\textbf{W}}_{j}^{v}$, ${\textbf{W}}_{j}^{q} \in {\textbf{R}}^{{d \times d_{k} }}$ are the trainable matrixes.

Feedforward neural network and position encoding

The feed-forward NN is composed of two full connection (FC) layers with ReLU activation function, whose formula is as follows,

$$F({\textbf{x}}) = {\textbf{W}}_{2} \cdot {\text{Re}} LU({\textbf{W}}_{1} {\textbf{x}} + {\textbf{b}}_{1} ) + {\textbf{b}}_{2} ,$$

(4)

where ${\textbf{W}}$ and ${\textbf{b}}$ are the weights and bias of the following connected FC layers,${\textbf{x}}$ is the input of the forward neural network.

The formula of position encoding is demonstrated as follows,

$$\begin{gathered} p_{i}^{(2s)} = \sin (i/10000^{2s/d} ) \hfill \\ p_{i}^{(2s + 1)} = \cos (i/10000^{2s/d} ) \hfill \\ \end{gathered}$$

(5)

By the above design, for given input with any length l, $p_{i}$ and $p_{i + l}$ has a linear relationship, which helps the regression model learn the sequence relationship effectively. Thus the final input of the transformer encoder module is ${\textbf{X}} = {\textbf{x}} + {\textbf{p}}$.

Proposed methodology

The proposed enhanced Transformer model

The proposed enhanced Transformer model consists of three parts: the feature extraction module, the encoding module, and the regression module, as shown in Fig. 2. The feature extraction module consists of a simple fully connected (FC) layer and position encoding, which performs a simple non-linear dimensionality reduction on the multidimensional raw data and incorporates positional information. The encoding module further compresses and extracts valuable latent features from the extracted features. Compared to the encoding module of traditional Transformers, the proposed model mainly adopts Position-Sensitive Attention (PSA) to replace the self-attention component, enabling the encoding module to capture more contextual information. The PSA unit is integrated to address the insensitivity to local context in the Transformer encoder, thus enhancing the model's ability to incorporate positional relationships and local context at each time step. PSA collectively contributes to the generation of latent features with higher efficacy, which benefits the remaining useful life prediction in the regression module. The regression module utilizes the proposed GHLSTM with multiple hidden features at different time scales for regression prediction. Compared to ordinary linear regression or recursive network-based regression, it can more effectively learn from hidden features, thereby improving the accuracy of RUL predictions.

Supposed that for each sample i, the predicted RUL is $\overline{Rul}_{i}$ and the true RUL is $Rul_{i}$. Mean square error (MSE) is adopted as a loss function to tune the learnable parameters $\theta$ of the proposed enhanced Transformer model during the training stage by the optimization Adam, whose formula is given below,

$$L\left( {MSE,\theta } \right) = \frac{1}{2}\sum\limits_{i = 1}^{N} {\left( {\overline{Rul}_{i} - Rul_{i} } \right)^{2} } .$$

(6)

Table 1 shows the hyper-parameters of the proposed DL method. The optimized hyper-parameters of the model are obtained by the grid search. The pseudocode of the proposed prediction method has been summarized in Table 2.

Table 1 The hyper-parameters of the proposed enhanced Transformer model.

Full size table

Table 2 The pseudocode of the proposed RUL prediction method.

Full size table

Position-sensitive self-attention (PSA)

To overcome the issue of insensitivity of high-level features to local context in Transformer encoders, we introduce a position-aware self-attention (PSA) unit in our proposed model. This improvement enables the model to focus on the positional relationships of input data at each time step, thereby enhancing its capability to capture local context. Consequently, this approach computes similarity scores between each input element and all other elements, considering both content and positional encodings. Attention weights are then computed based on these scores, and the output is formed by taking the weighted sum of the inputs. By incorporating positional information, the PSA mechanism enhances the model's ability to capture local context, leading to more accurate attention weights and improved feature representations, generating more effective hidden features for accurately predicting the RUL of mechanical equipment. This enhanced sensitivity to local context is crucial for accurately predicting RUL. The deduction of PSA is described as follows.

(1) Construction of the input:

The input of PSA consists of the input sequence ${\textbf{x}} = \left\{ {\begin{array}{*{20}c} {x_{1} } & {x_{2} } & {\begin{array}{*{20}c} \cdots & {x_{n - 1} } \\ \end{array} } & {x_{n} } \\ \end{array} } \right\}$ and the relevant position encoding ${\textbf{p}} = \left\{ {p\begin{array}{*{20}c} {_{1} } & {p_{2} } & {\begin{array}{*{20}c} \cdots & {p_{n - 1} } \\ \end{array} } & {p_{n} } \\ \end{array} } \right\}$, where $x_{i}$ is the feature representation of the ith element, and $p_{i}$ is the position encoding of the position i whose formulas are Eq. (5).

(2) The calculation of the similarity score:

For each element $x_{i}$ in the input sequence, calculate the similarity Score $s_{ij}$ of other elements $x_{j}$ in the input sequence ${\textbf{x}}$, meantime considering the influence of position encoding $p_{ij}$, thus the formula is deduced as follows,

$$\begin{aligned} s_{{ij}}^{1} &= similarity\left( {x_{{_{i} }} \;\;\; p_{{_{{ij}} }} } \right) = \left( {x_{{_{i} }} \cdot p_{{_{{ij}} }} } \right)/\left( {\left\| {x_{{_{i} }} } \right\|*\left\| {p_{{_{{ij}} }} } \right\|} \right) \\ s_{{ij}}^{2} &= similarity\left( {x_{{_{i} }} \;\;\; p_{{_{{ij}} }} } \right) = \left( {x_{{_{i} }} \cdot p_{{_{{ij}} }} } \right)/\left( {\left\| {x_{{_{i} }} } \right\|*\left\| {p_{{_{{ij}} }} } \right\|} \right) \\ s_{{ij}} &= similarity\left( {s_{{ij}}^{1} \;\;\;s_{{ij}}^{2} } \right) = \left( {s_{{ij}}^{1} \cdot s_{{ij}}^{2} } \right)/\left( {\left\| {s_{{ij}}^{1} } \right\|*\left\| {s_{{ij}}^{2} } \right\|} \right) \end{aligned}$$

(7)

(3) The calculation of attention weights:

For each element $x_{i}$, calculating the attention weights $w_{ij}$ based on the similarity Score,

$$w_{ij} = soft\max \left( {{\text{s}}_{ij} } \right) = \frac{{\exp \left( {{\text{s}}_{ij} } \right)}}{{\sum\limits_{k = 1}^{n} {\exp \left( {{\text{s}}_{ik} } \right)} }}.$$

(8)

(4) Finally, the output element $\tilde{x}_{i}$ after PSA is the weighted sum of attention weights with $w_{ij}$ the input element,

$$\tilde{x}_{i} = \sum\limits_{j = 1}^{n} {w_{ij} .x_{j} } .$$

(9)

Position-sensitive attention mechanism considers both the correlation between elements and the influence of position encoding, resulting in more accurate and position-aware attention weights.

Gated hierarchical long short-term memory network (GHLSTM)

The goal of the Hierarchical LSTM with gating is to further enhance the LSTM model's ability to model long-term dependencies and improve its performance in handling large-scale sequential data. It achieves this by introducing multiple levels of gating to gradually model dependencies at different time scales. The diagram of GHLSTM is shown in Fig. 3. The GHLSTM network is designed to model long-term dependencies across multiple time scales, thereby enhancing the accuracy of RUL prediction. This method consists of two hierarchical LSTM layers: a top-level LSTM for modeling global long-term dependencies and a bottom-level LSTM for capturing medium-term dependencies. The top LSTM processes the entire input sequence to capture long-term dependencies. The bottom LSTM processes half of the input sequence to capture medium-term dependencies. The outputs of the top and bottom LSTMs are concatenated to form the final output as a comprehensive temporal representation. The approach enables the model to adaptively focus on relevant features across different time scales, thereby improving the overall RUL prediction accuracy.

For the top hierarchical LSTM, the whole sequence of the input ${\textbf{x}}_{1t}$ is input into the LSTM cellular, the formula is,

$${\textbf{i}}_{1t} = \sigma \left( {{\textbf{w}}_{1ix} {\textbf{x}}_{1t} + {\textbf{w}}_{1ih} {\textbf{h}}_{1t - 1} + {\textbf{b}}_{1i} } \right),$$

(10)

$${\textbf{f}}_{1t} = \sigma \left( {{\textbf{w}}_{1fx} {\textbf{x}}_{1t} + {\textbf{w}}_{1fh} {\textbf{h}}_{1t - 1} + {\textbf{b}}_{1f} } \right),$$

(11)

$${\textbf{o}}_{1t} = \sigma \left( {{\textbf{w}}_{1ox} {\textbf{x}}_{1t} + {\textbf{w}}_{1oh} {\textbf{h}}_{1t - 1} + {\textbf{b}}_{1o} } \right),$$

(12)

$${\overline{\textbf{c}}}_{1t} = \tanh \left( {{\textbf{w}}_{1cx} {\textbf{x}}_{1t} + {\textbf{w}}_{1ch} {\textbf{h}}_{1t - 1} + {\textbf{b}}_{1c} } \right),$$

(13)

$${\textbf{c}}_{1t} = {\textbf{f}}_{1t} \odot {\textbf{c}}_{1t - 1} + {\textbf{i}}_{1t} \odot {\overline{\textbf{c}}}_{1t} ,$$

(14)

$${\textbf{h}}_{1t} = {\textbf{o}}_{1t} \odot \tanh \left( {{\textbf{c}}_{1t} } \right),$$

(15)

For the bottom hierarchical LSTM, half of the whole sequence of the input ${\textbf{x}}_{2t}$ is input into the LSTM cellular to extract the hidden feature in another time scale. Noted that the time scale can be deiced by the requirements. The formula of the bottom hierarchical LSTM cellular is,

$${\textbf{i}}^{2}_{t} = \sigma \left( {{\textbf{w}}^{2}_{ix} {\textbf{x}}^{2}_{t} + {\textbf{w}}^{2}_{ih} {\textbf{h}}^{2}_{t - 1} + {\textbf{b}}^{2}_{i} } \right),$$

(16)

$${\textbf{f}}_{2t} = \sigma \left( {{\textbf{w}}_{2fx} {\textbf{x}}_{2t} + {\textbf{w}}_{2fh} {\textbf{h}}_{2t - 1} + {\textbf{b}}_{2f} } \right),$$

(17)

$${\textbf{o}}_{2t} = \sigma \left( {{\textbf{w}}_{2ox} {\textbf{x}}_{2t} + {\textbf{w}}_{2oh} {\textbf{h}}_{2t - 1} + {\textbf{b}}_{2o} } \right),$$

(18)

$${\overline{\textbf{c}}}_{2t} = \tanh \left( {{\textbf{w}}_{2cx} {\textbf{x}}_{2t} + {\textbf{w}}_{2ch} {\textbf{h}}_{2t - 1} + {\textbf{b}}_{2c} } \right),$$

(19)

$${\textbf{c}}_{2t} = {\textbf{f}}_{2t} \odot {\textbf{c}}_{2t - 1} + {\textbf{i}}_{2t} \odot {\overline{\textbf{c}}}_{2t} ,$$

(20)

$${\textbf{h}}_{2t} = {\textbf{o}}_{2t} \odot \tanh \left( {{\textbf{c}}_{2t} } \right),$$

(21)

where ${\textbf{w}}_{ix}$(${\textbf{w}}_{fx}$, ${\textbf{w}}_{ox}$ and ${\textbf{w}}_{cx}$), and ${\textbf{w}}_{ih}$(${\textbf{w}}_{fh}$, ${\textbf{w}}_{oh}$, and ${\textbf{w}}_{ch}$) are the input and recurrent matrix weights, ${\textbf{b}}_{i}$(${\textbf{b}}_{f}$, ${\textbf{b}}_{o}$, and ${\textbf{b}}_{c}$) are the bias of the hidden layer,${\overline{\textbf{c}}}_{t}$ denotes the internal state of the cell, ${\textbf{c}}_{t}$ denotes the memory cell state, $\sigma$ represents the sigmoid function; $\tanh$ represents the tanh activation, and $\odot$ represents the pointwise multiplication. And the definitions of the two LSTM cellular are the same.

Then the output of the top hierarchical LSTM cellular ${\textbf{h}}_{1t}$ and the bottom hierarchical LSTM cellular ${\textbf{h}}_{2t}$ are combined to construct the final output of GHLSTM ${\textbf{h}}_{t}$,

$${\textbf{h}}_{t} = {\textbf{w}}_{1} {\textbf{h}}_{1t} + {\textbf{w}}_{2} {\textbf{h}}_{2t} .$$

(22)

where ${\textbf{w}}$ are the connected weights making the two outputs the same dimension.

Experimental analysis

Evaluation indexes

The widely used evaluation indexes for RUL prediction, i.e. score and root mean square error (RMSE), are adopted for the quantitated demonstration of the model performance. And the formulas of the indexes are given below,

$$A_{i} = \left\{ {\begin{array}{*{20}c} {\exp ( - ((\overline{{Rul_{i} }} - Rul_{i} )/13)) - 1,} \\ {\exp ((\overline{{Rul_{i} }} - Rul_{i} )/10) - 1,} \\ \end{array} } \right.\begin{array}{*{20}c} {\begin{array}{*{20}c} {} & {} \\ \end{array} \overline{{Rul_{i} }} < Rul_{i} } \\ {\begin{array}{*{20}c} {} & {} \\ \end{array} \overline{{Rul_{i} }} \ge Rul_{i} } \\ \end{array}$$

(23)

$$Score = \sum\limits_{i = 1}^{N} {A_{i} } ,$$

(24)

$$RMSE = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {Rul_{{\text{i}}} - \overline{{Rul_{i} }} } \right)^{2} } } ,$$

(25)

As shown in Fig. 4, it serves as a graphical representation of the trend of evaluation metrics. The curve's changing trend is easily discernible from the graph. When the error is positive, the Score value increases rapidly, indicating that the Score imposes a stronger penalty on lagged predictions. This characteristic aligns better with practical engineering requirements. Therefore, the Score is considered more reasonable compared to RMSE.

The description of the C-MAPSS dataset

To demonstrate the effectiveness and superiority of the proposed method in predicting the remaining useful life of aircraft engines, we utilized the C-MAPSS dataset provided by NASA²⁷, whose diagram is shown in Fig. 5. The dataset consists of a collection of aircraft engines, as shown in the figure. Furthermore, to showcase the capabilities of the proposed method under different operating conditions and fault modes, we used the simplest FD001 dataset and the most complex FD004 dataset as validation data.

FD001 dataset consists of 100 engines operating under a single operating condition and a single fault mode. The engines have varying lifespans, with the shortest operational cycle being 128 and the longest being 362. The dataset includes sensor measurements, such as fan speed, compressor speed, oil pressure, and various temperatures, along with operational settings like throttle setting and true airspeed. FD004 is a more complex dataset derived from the same aircraft engine simulations, containing 249 engines operating under 6 different operating conditions and experiencing 2 different fault modes. Similar to FD001, the engines have lifespans ranging from 128 to 543 operational cycles. The sensor measurements and operational settings are also similar to FD001, but the inclusion of multiple conditions and fault modes makes FD004 significantly more challenging for RUL prediction. The data details are presented in Table 3. The tasks of FD001 and FD004 remain the same, to accurately predict the RUL of each engine.

Table 3 The details of dataset C-MAPSS.

Full size table

The preprocessing of input

Firstly, we delete the unimportant sensor measurements (sensors 1, 5, 6, 10, 16, 18, and 19), which are stable and have less degradation information. According to the literature²⁷, operating condition information is also helpful in RUL prediction. Thus the final input matrix consists of the remaining 14 sensor measurements and the three operating condition information. The second step, data segmentation is executed, as shown in Fig. 6 For the ith input with n dimension input and l sequence length (window size), the relevant RUL label is set as Ts–l–(i–1) × m, where m and T are the sliding steps and full-lifecycle value. Through greedy search by the experiments, the hyper-parameters l and m are set to 30 and 1. The last step is the linear piecewise RUL preprocessing for the RUL label $Rul_{\max } = 125$ as below,

$$Rul = \left\{ {\begin{array}{*{20}c} {Rul,{\text{ if }}Rul \le Rul_{\max } \, } \\ {Rul_{\max } ,{\text{ if }}Rul > Rul_{\max } \, } \\ \end{array} } \right.$$

(26)

The analysis and comparison of RUL prediction results

RUL prediction performance of the proposed method

The predicted results of the proposed model on the FD001 and FD004 subsets are shown in the figures below. In Figs. 7, 8, the value of the x-axis denotes the test engine number of the subset, while the y-axis represents the remaining useful life values (in cycles). The predicted remaining useful life and the actual remaining useful life of the test engines are represented by the red solid line and the purple dashed line, respectively. Overall, the predicted remaining useful life values of the test set engines in both subsets roughly align with the actual values, indicating the effectiveness of the proposed method in predicting the remaining useful life in these two subsets. Additionally, the error between the predicted life and the actual life in Fig. 7 is smaller than in Fig. 8. This indicates that the proposed model performs better on the FD001 subset compared to the FD004 dataset. This is because the degradation trend of the aerospace engine under a single operating condition is relatively simpler. Moreover, there is a significant overlap between the degradation trends of the training set and the test set of aerospace engines. Therefore, the proposed method achieves higher accuracy in predicting the remaining useful life of aerospace engines under a single operating condition and a single fault compared to complex operating conditions and compound faults.

The proposed model's predictions on the complete degradation process are shown in Figs. 9a–d and 10a–d for four randomly selected aerospace engine engines from each subset The predicted RUL (PR) and actual RUL (AR) are represented by the blue line and red line, the absolute error (AE), calculated by based on PR and AR at each time instant, is denoted by the green bar. Thus the average error is represented by the mean of all AE values (MAE). The overall remaining useful life prediction results for FD001 are significantly better than for FD004, as indicated by the average MAE value. As the number of cycles increases, the degradation trend of aerospace engine engines becomes apparent. The proposed model exhibits higher accuracy in predicting the remaining useful life of most aerospace engine engines in the later stages of degradation compared to the earlier stages, as shown in Figs. 9a, c, d and 10a–d.

Ablation experiments

To validate the superiority of the proposed method, namely the effectiveness of PSA and GHLSTM, a series of erosion experiments were conducted. Assuming model m1 represents the proposed enhanced Transformer model, model m2 uses the same model architecture except for the attention part, which adopts the traditional multi-head self-attention module. Similarly, model m3 employs the same deep learning module except for the regression module, which uses the traditional LSTM model. Model m4 also uses the same deep learning module, but both its regression module and self-attention module adopt traditional models. All models are fine-tuned and ten parallel experiments are conducted. The mean values and standard deviation (std) value of RMSE and score across all experiments are calculated, as shown in Table 4. The mean value is taken as the final predicted RUL, while std is used to quantify the robustness of the RUL prediction. It is evident from the table that the proposed model exhibits the lowest metric values and demonstrates the best predictive performance compared to other models. The std value is significantly lower than the mean value. Moreover, the predictive performance of m2 and m3 is superior to that of m4. These observations indicate that the proposed techniques contribute to the improvement of the accuracy in predicting the RUL.

Table 4 The RUL prediction comparisons of different methods on subset FD001and FD002.

Full size table

Compared with state-of-arts

To further highlight the advantages of the proposed enhanced Transformer in predicting remaining useful life, a comparative experiment was conducted between the proposed model and several state-of-the-art methods^{42,43,44,45,46,47,48,49}. To provide a comprehensive evaluation, the training set and testing set are fixed as the same for all compared models, each model was fine-tuned with the optimization goal of maximizing the accuracy in predicting remaining useful life, and 10 parallel experiments were conducted on FD001 and FD004 subsets. Subsequently, the scores and RMSE values based on the prediction results of all the aforementioned methods are listed in Table 5. From the table, it can be observed that all methods perform best in the FD001 subset and worst in the FD004 subset. This is because FD001 has the simple operating condition and fault type, while FD004 is the most complex subset with a larger number of tested engines.

Table 5 The RUL prediction comparisons of different methods on subset FD001and FD002.

Full size table

On the FD001 dataset, which contains a single operating condition and fault mode, the proposed model achieved a Score of 220 ± 23 and an RMSE of 13.14 ± 0.21. This performance showed an improvement of 4% in Score compared to the best-performing existing method (acyclic graph network), which obtained a Score of 229 and an RMSE of 11.96. While the RMSE value of the proposed method is lower than acyclic graph network, the evaluation index Score is more in line with the actual engine and the Score value is lower than Acyclic Graph Network. This means that the comprehensive performance of the proposed method is best compared with other models.

Furthermore, on the more complex FD004 dataset, which encompasses multiple operating conditions and faults, the proposed model achieved a Score of 1420 ± 125 and an RMSE of 14.25 ± 0.25. This performance demonstrated an improvement of 10% in Score and 6% in RMSE compared to the best-performing existing method (SIGRNNDWI), which obtained a Score of 1568 and an RMSE of 15.12. Overall, the proposed model exhibited improved RUL prediction accuracy on both datasets, particularly on the more complex FD004 subset. These results validate the effectiveness of the proposed PSA and GHLSTM techniques in enhancing RUL prediction for aircraft engines.

Conclusion

For accurately predicting the RUL of aero-engines, this article proposed a novel enhanced transformer-based DL method with the PSA mechanism and GHLSTM method. The main contributions of the article are as follows. One is the proposed PSA mechanism, PSA can solve the problem of the traditional attention mechanism that the extracted high-level features are insensitive to their local context at each time step. Another is the development of GHLSTM, GHLSTM can learn the hidden features at different time scales, which helps to improve RUL. The effect of the proposed technologies has been validated by the ablation experiments. Through the quantitative evaluation of common indicators, the proposed method has an average improvement of 7% in Score and 11% in RMSE compared with other methods on the RUL prediction tasks of FD001 and FD004.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

References

Lei, Y. et al. Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mech. Syst. Signal Process. 104, 799–834 (2018).
Article ADS Google Scholar
Gebraeel, N., Lei, Y., Li, N., Si, X. & Zio, E. Prognostics and Remaining Useful Life Prediction of Machinery: Advances, Opportunities and Challenges. Journal of Dynamics, Monitoring and Diagnostics 2(1), 1–12 (2023).
Google Scholar
Zhang, J. et al. A variational local weighted deep sub-domain adaptation network for remaining useful life prediction facing cross-domain condition. Reliab. Eng. Syst. Saf. 231, 108986 (2023).
Article Google Scholar
Ma, F., Zhang, H., Gong, Q. & Hon, K. K. B. A novel energy efficiency grade evaluation approach for machining systems based on inherent energy efficiency. Int. J. Prod. Res. 59, 6022–6033 (2021).
Article Google Scholar
Shu, H., Zou, C., Chen, J. & Wang, S. Research on micro/nano surface flatness evaluation method based on improved particle swarm optimization algorithm. Front. Bioeng. Biotechnol. 9, 775455 (2021).
Article PubMed PubMed Central Google Scholar
Yu, T., Chen, W., Junfeng, G. & Poxi, H. Intelligent detection method of forgings defects detection based on improved EfficientNet and memetic algorithm. IEEE Access 10, 79553–79563 (2022).
Article Google Scholar
Duan, L. et al. State of charge estimation of lithium-ion batteries based on second-order adaptive extended Kalman filter with correspondence analysis. Energy 280, 128159 (2023).
Article Google Scholar
Al-Greer, M. & Bashir, I. Physics-based model informed smooth particle filter for remaining useful life prediction of lithium-ion battery. Measurement 214, 112838 (2023).
Article Google Scholar
Fordal, J. M. et al. Application of sensor data based predictive maintenance and artificial neural networks to enable Industry 4.0. Adv. Manuf. 11, 248–263 (2023).
Article Google Scholar
Yousaf, M. Z., Khalid, S., Tahir, M. F., Tzes, A. & Raza, A. A novel dc fault protection scheme based on intelligent network for meshed dc grids. Int. J. Electr. Power Energy Syst. 154, 109423 (2023).
Article Google Scholar
Shi, Y., Mao, Y., Xu, X. & Xia, J. Machine learning-assisted dual fiber Bragg grating-based flexible direction sensing. IEEE Sens. J. 23, 25572–25578 (2023).
Article ADS Google Scholar
Dong, S. et al. Deep transfer learning based on Bi-LSTM and attention for remaining useful life prediction of rolling bearing. Reliab. Eng. Syst. Saf. 230, 108914 (2023).
Article Google Scholar
Liu, X. et al. A hybrid multi-stage methodology for remaining useful life prediction of control system: Subsea Christmas tree as a case study. Expert Syst. Appl. 215, 119335 (2023).
Article Google Scholar
Jiang, B. et al. A holistic feature selection method for enhanced short-term load forecasting of power system. IEEE Trans. Instrum. Meas. 72, 1–11 (2022).
Google Scholar
Liu, Q., Jia, M., Gao, Z., Xu, L. & Liu, Y. Correntropy long short term memory soft sensor for quality prediction in industrial polyethylene process. Chemometr. Intell. Lab. Syst. 231, 104678 (2022).
Article CAS Google Scholar
Xiang, S., Li, P., Luo, J. & Qin, Y. Micro transfer learning mechanism for cross-domain equipment RUL prediction. IEEE Trans. Autom. Sci. Eng. https://doi.org/10.1109/TASE.2024.3366288 (2024).
Article Google Scholar
Wu, Q., Zhou, X. & Pan, X. Cutting tool wear monitoring in milling processes by integrating deep residual convolution network and gated recurrent unit with an attention mechanism. Proc. Inst. Mech. Eng. B J. Eng. Manuf. 237, 1171–1181 (2023).
Article Google Scholar
Zhang, Y., Hutchinson, P., Lieven, N. A. & Nunez-Yanez, J. Remaining useful life estimation using long short-term memory neural networks and deep fusion. IEEE Access 8, 19033–19045 (2020).
Article Google Scholar
Miao, H., Li, B., Sun, C. & Liu, J. Joint learning of degradation assessment and RUL prediction for aeroengines via dual-task deep LSTM networks. IEEE Trans. Ind. Inform. 15(9), 5023–5032 (2019).
Article Google Scholar
Liu, J., Lei, F., Pan, C., Hu, D. & Zuo, H. Prediction of remaining useful life of multi-stage aero-engine based on clustering and LSTM fusion. Reliab. Eng. Syst. Saf. 214, 107807 (2021).
Article Google Scholar
Zhang, Y., Xin, Y., Liu, Z.-W., Chi, M. & Ma, G. Health status assessment and remaining useful life prediction of aero-engine based on BiGRU and MMoE. Reliab. Eng. Syst. Saf. 220, 108263 (2022).
Article Google Scholar
Ma, M. & Mao, Z. Deep wavelet sequence-based gated recurrent units for the prognosis of rotating machinery. Struct. Health Monit. 20(4), 1794–1804 (2021).
Article Google Scholar
Xiao, L., Duan, F., Tang, J. & Abbott, D. A noise-boosted remaining useful life prediction method for rotating machines under different conditions. IEEE Trans. Instrum. Meas. 70, 1–12 (2021).
Google Scholar
Song, Y., Shi, G., Chen, L., Huang, X. & Xia, T. Remaining useful life prediction of turbofan engine using hybrid model based on autoencoder and bidirectional long short-term memory. J. Shanghai Jiaotong Univ. (Sci.) 23, 85–94 (2018).
Article Google Scholar
Gu, J. et al. Recent advances in convolutional neural networks. Pattern Recognit. 77, 354–377 (2018).
Article ADS Google Scholar
Zhu, J., Chen, N. & Peng, W. Estimation of bearing remaining useful life based on multiscale convolutional neural network. IEEE Trans. Ind. Electron. 66(4), 3208–3216 (2018).
Article Google Scholar
Li, X., Ding, Q. & Sun, J.-Q. Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab. Eng. Syst. Saf. 172, 1–11 (2018).
Article Google Scholar
Yang, B., Liu, R. & Zio, E. Remaining useful life prediction based on a double-convolutional neural network architecture. IEEE Trans. Ind. Electron. 66(12), 9521–9530 (2019).
Article Google Scholar
Jiang, J.-R., Lee, J.-E. & Zeng, Y.-M. Time series multiple channel convolutional neural network with attention-based long short-term memory for predicting bearing remaining useful life. Sensors 20(1), 166 (2019).
Article ADS PubMed PubMed Central Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neur. Inp. Syst. 30, 1–11 (2017).
Google Scholar
Jin, X.-B. et al. End-to-end GPS tracker based on switchable fuzzy normalization codec for assistive drone application. IEEE Trans. Consum. Electron. 8, 33 (2023).
Google Scholar
Kong, J. et al. ADCT-Net: Adaptive traffic forecasting neural network via dual-graphic cross-fused transformer. Inf. Fusion 103, 102122 (2024).
Article Google Scholar
Tetko, I. V., Karpov, P., Van Deursen, R. & Godin, G. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat. Commun. 11(1), 5575 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Naseer, M. M. et al. Intriguing properties of vision transformers. Adv. Neur. Inp. Syst. 34, 23296–23308 (2021).
Google Scholar
Zhou, H.-Y. et al. A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics. Nat. Biomed. Eng. 7, 1–13 (2023).
Article Google Scholar
Zhang, Z., Song, W. & Li, Q. Dual-aspect self-attention based on transformer for remaining useful life prediction. IEEE ASME Trans. Mechatron. 71, 1–11 (2022).
Google Scholar
Su, X., Liu, H., Tao, L., Lu, C. & Suo, M. An end-to-end framework for remaining useful life prediction of rolling bearing based on feature pre-extraction mechanism and deep adaptive transformer model. Comput. InE Comp. 161, 107531 (2021).
Google Scholar
Chadha, G. S., Shah, S. R. B., Schwung, A. & Ding, S. X. Shared temporal attention transformer for remaining useful lifetime estimation. IEEE Access 10, 74244–74258 (2022).
Article Google Scholar
Chang, Y., Li, F., Chen, J., Liu, Y. & Li, Z. Efficient temporal flow Transformer accompanied with multi-head probsparse self-attention mechanism for remaining useful life prognostics. Reliab. Eng. Syst. Saf. 226, 108701 (2022).
Article Google Scholar
Ren, L., Jia, Z., Wang, X., Dong, J. & Wang, W. A $ T^{2} $-tensor-aided multiscale transformer for remaining useful life prediction in IIoT. IEEE Trans. Ind. Inform. 18(11), 8108–8118 (2022).
Article Google Scholar
Ding, Y. & Jia, M. Convolutional transformer: An enhanced attention mechanism architecture for remaining useful life estimation of bearings. IEEE ASME Trans. Mechatron. 71, 1–10 (2022).
Google Scholar
Zhang, C., Lim, P., Qin, A. K. & Tan, K. C. Multiobjective deep belief networks ensemble for remaining useful life estimation in prognostics. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2306–2318 (2016).
Article PubMed Google Scholar
Zhao, R. et al. Machine health monitoring using local feature-based gated recurrent unit networks. IEEE Trans. Ind. Electron. 65(2), 1539–1548 (2017).
Article Google Scholar
Li, J., Li, X. & He, D. A directed acyclic graph network combined with CNN and LSTM for remaining useful life prediction. IEEE Access 7, 75464–75475 (2019).
Article Google Scholar
Cheng, Y., Hu, K., Wu, J., Zhu, H. & Shao, X. Autoencoder quasi-recurrent neural networks for remaining useful life prediction of engineering systems. IEEE ASME Trans. Mechatron. 27(2), 1081–1092 (2021).
Article Google Scholar
Xiang, S., Qin, Y., Luo, J., Pu, H. & Tang, B. Multicellular LSTM-based deep learning model for aero-engine remaining useful life prediction. Reliab. Eng. Syst. Saf. 216, 107927 (2021).
Article Google Scholar
Xiang, S., Qin, Y., Luo, J. & Pu, H. Spatiotemporally multidifferential processing deep neural network and its application to equipment remaining useful life prediction. IEEE Trans. Ind. Inform. 18(10), 7230–7239 (2021).
Article Google Scholar
Xiang, S., Li, P., Huang, Y., Luo, J. & Qin, Y. Single gated RNN with differential weighted information storage mechanism and its application to machine RUL prediction. Reliab. Eng. Syst. Saf. 242, 109741 (2024).
Article Google Scholar
Xu, T., Han, G., Zhu, H., Lin, C. & Peng, J. Multiscale BLS-based lightweight prediction model for remaining useful life of aero-engine. IEEE Trans. Reliab. https://doi.org/10.1109/TR.2023.3349201 (2024).
Article Google Scholar

Download references

Acknowledgements

This paper was supported by Chongqing Technical Innovation and Application Development Special General Project (cstc2019jscx-msxmX0168, cstc2019jscx-msxmX0312, cstc2019jscx-msxmX0008, cstc2020jscx-msxmX0119), and partially supported by school level research projects (120777), supported by the experimental conditions of Chongqing University of Posts and Telecommunications and Chongqing University.

Author information

Authors and Affiliations

College of Artificial Intelligence and Big Data, Chongqing College of Electronic Engineering, Chongqing, 401331, China
Xinping Chen

Authors

Xinping Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Chen wrote the main manuscript text.

Corresponding author

Correspondence to Xinping Chen.

Ethics declarations

Competing interests

The author declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, X. A novel transformer-based DL model enhanced by position-sensitive attention and gated hierarchical LSTM for aero-engine RUL prediction. Sci Rep 14, 10061 (2024). https://doi.org/10.1038/s41598-024-59095-3

Download citation

Received: 26 July 2023
Accepted: 08 April 2024
Published: 02 May 2024
DOI: https://doi.org/10.1038/s41598-024-59095-3

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Highly accurate protein structure prediction with AlphaFold

Predicting equilibrium distributions for molecular systems with deep learning

Introduction

Theoretical basis

Multi-head self-attention

Feedforward neural network and position encoding

Proposed methodology

The proposed enhanced Transformer model

Position-sensitive self-attention (PSA)

Gated hierarchical long short-term memory network (GHLSTM)

Experimental analysis

Evaluation indexes

The description of the C-MAPSS dataset

The preprocessing of input

The analysis and comparison of RUL prediction results

RUL prediction performance of the proposed method

Ablation experiments

Compared with state-of-arts

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Comments

Search

Quick links