Introduction

Falls are the main reason for severe wounds for aged people globally. Falls block their independent and comfortable life. Statistics exhibit that falls are the leading cause of injury-related death of people of 80 or more. Usually, numerous falls happen at residences owing to potential health risks1. The most general risks include clutter, poor lighting, slippery floors, obstructed ways, pets, and unsteady furniture. The aged people suffering from nerve diseases like dementia and epilepsy are more inclined to fall and fall-associated injuries than the normal-aged populace2. The trend of independent survival of aged people isolated from their family members is also a leading cause behind fall-related victims. Falls in messed up surroundings are central to bleeding, concussions, and other serious health risks main to death3. Owing to independent survival, in the lack of fall recognition technologies, emergency facilities do not respond to fall actions in time. Numerous reinforced surveillance methods are proposed to fill up the requirement of the occurrence of nurses at every time4. It is very complex to create an atmosphere that is entirely fall-proof. Therefore, fall recognition and rescue services can guarantee the protection of the aged populace.

Fall-detection methods aid us in distinguishing falls from non-fall actions so that a warning method to a remote observing point is mechanically produced once the patient falls or permits the utilization of a defensive airbag5. In recent times, numerous models have been projected utilizing dissimilar sensors with changing levels of performance. The most general sensor technologies employed for fall detection are classified into three sensors, namely infrared, wearable, and camera6. Camera and infrared-based sensors are generally costly and are used to record audio-visual signs. Still, their problems are linked to the confidentiality of patients and the static feature of the method. Owing to these restrictions, the wearable sensor provides a cheaper alternative to identify falls depending upon one or many wearable sensors fixed to the consumer’s body; therefore, they can carry it all over the place7. Vision-based fall recognition is a superior alternative that delivers a lower-cost solution for the fall recognition issue. Artificial intelligence (AI), specifically DL technique, is highly effective for this challenge.

Similarly, owing to the enlarged usage of IoT solutions and cameras in familiar places such as bus stands, airports, roads, railway stations, streets, and residences, vision-based models for fall recognition are an excellent selection for the prospect, too8. Also, there are numerous techniques for fall recognition, and the based model is getting advanced. When equated to other approaches, it is not essential for a custom feature extractor in DL. An automated feature extractor is probable in DL-based techniques9. Also, DL is familiar owing to its simplification. A method trained on the database is utilized for a dissimilar issue utilizing transfer learning (TL). The performance of DL-based approaches is excellent when equated to other models10. DL is applied in low-computing edge devices utilizing few-shot learning and TL.

This study introduces a new Deep Feature Fusion with Computer Vision for Fall Detection and Classification (DFFCV-FDC) technique. The primary purpose of the DFFCV-FDC methodology is to employ the CV concept for detecting fall events. Accordingly, the DFFCV-FDC technique uses the Gaussian filtering (GF) approach for noise eradication. Besides, a deep feature fusion process comprising MobileNet, DenseNet, and ResNet models is involved. To improve the performance of the DFFCV-FDC technique, improved pelican optimization algorithm (IPOA) based hyperparameter selection is performed. Finally, the detection of falls is identified using the denoising autoencoder (DAE) model. The performance analysis of the DFFCV-FDC approach has been examined on the benchmark fall database.

Literature review

Durga Bhavani and Ferni Ukrit11 project a novel Inception with a deep CNN-based fall detection and classification (INDCNN-FDC) method. The INDCNN-FDC technique transports dual phases of data preprocessing namely Guided Filtering (GIF) based image smoothing and GF-based image sharpening. Furthermore, the projected INDCNN-FDC method uses the deep TL-based Inceptionv3 method for producing a valuable set of feature vectors. Lastly, the DCNN method gets the feature vector as an input and achieves a fall recognition procedure. Şengül et al.12 proposed a mobile application which collects the hastening and gyroscope sensor facts and moves them to the cloud. In the cloud, a DL model is executed to categorize the action as per the assumed classes. The method also utilizes the Bica cubic Hermite interpolation. For the identification of activity, the study implemented the Bi-LSTM neural network. Kabir et al.13 developed a class ensemble technique that depends on CNN and LSTM methodologies for 3-class falling procedures such as non-fall, pre-fall, and fall, utilizing data from a gyroscope and accelerometer. This technique influences the CNN technique for strong feature extractors from the gyroscope and accelerometer data. Furthermore, LSTM networks the fall procedure’s time-based dynamics. In14, a complete method called TSFallDetect is proposed with a receiving device based on an embedded sensor, a moveable DL methodology organizing platform, and a server likely to collect techniques and data for prospect growth. Likewise, this study employs sequential deep-learning methodologies to forecast falling gestures using data collected from both video and inertial pressure sensors. Additionally, it introduces a new deep learning technique specifically designed for analyzing time-series data in the context of fall prediction.

Mohammad et al.15 projected a theory for a wearable monitoring structure. However, the protest of this model involved the offline study of a collective DNN structure based on an RNN and CNN. The proposed technique used CNN for a robust feature extractor from the gyroscope and accelerometer data and RNN to perfect the time-based dynamic of the falling procedure. The class-based ensemble structure has been utilized, whereas every ensemble method recognized a precise class. In16, a DL-based pre-impact FDS method is offered. To attain this, an automated feature extraction technique that can remove time-based features from all kinds of human fall facts gathered utilizing wearable sensors is proposed. Ong et al.17 presented complete research on fall detection and prediction for reconfigurable stair-accessing machines by leveraging the DL method. The developed architecture incorporates ML models and RNN, specifically LSTM and BiLSTM, for fall recognition of service robots on the set of steps. The fall information is vital for training methods to be produced in an imitation atmosphere. Alabdulkreem et al.18 projected a Chameleon Swarm Algorithm with Improved Fuzzy DL for FD (CSA-IDFLFD) model. The CSA-IDFLFD approach encompasses dual stages of processes. In the early stage, the CSA-IDFLFD approach comprises the project of the IDFL technique for the identification of fall events. Then, in the second stage, the parameters associated with the IDFL approach can be optimally nominated by the strategy of the CSA technique.

Limitations and research gap

The studies cited accentuate several improvements and threats in fall detection systems. Methods, namely Inception with deep CNNs, underline computational intensity and robustness challenges, suggesting a research gap in scalability and real-world applicability under several environmental conditions. Mobile applications collecting sensor data face problems with data transmission delays and privacy concerns, signifying gaps in optimizing data handling and security protocols. Techniques integrating CNN and LSTM models may need extensive training data, revealing a gap in data acquisition and model generalization abilities. Integrating embedded sensors with DL models may face difficulties in real-time data processing and synchronization, underscoring a gap in real-time performance and reliability. Wearable devices for fall monitoring may face user acceptance and sensor reliability issues, necessitating enhancements in usability and technological incorporation. DL-based pre-impact fall detection methods may encounter discrepancies in human motion patterns and real-time processing demands, underscoring gaps in algorithm robustness and effectualness. Lastly, integrating swarm intelligence and fuzzy logic with DL in fall detection systems may present complexities in optimization, accentuating a gap in fusion model integration and performance enhancement across several datasets and scenarios.

Proposed methodology

System architecture

In this study, a novel DFFCV-FDC methodology is proposed. The main drive of the DFFCV-FDC methodology is to employ the CV concept to detect fall events. Accordingly, the DFFCV-FDC methodology has GF using noise elimination, feature fusion process, parameter optimization, and DAE-based fall detection and classification process. Figure 1 signifies the entire flow of the DFFCV-FDC technique.

Fig. 1
figure 1

Working flow of DFFCV-FDC technique.

Noise elimination module

At the primary level, the DFFCV-FDC technique uses the GF approach for noise eradication. Image preprocessing is making the imageries for inference and training19. This is not restricted to orientating, resizing, and colour variations. Moreover, preprocessing decreases training time and speeds up model inference. The Gaussian function was employed to eliminate the noise.

GF: Levelling imageries with GF is highly effective as the procedure of graphic perception, which helps as its base. If neurons examine the visual imageries, they yield a similar filter. The half-tone image was displayed to the right afterward being flattened utilizing a GF at the finale of the imagery. The non‐uniform lower pass filter was defined in Eq. (1).

$${\mathfrak{R}}_{pre}=\frac{1}{\sqrt{2\pi {\sigma }^{2}}}{e}^{-\frac{{y}^{2}}{2{\sigma }^{2}}}$$
(1)

whereas \(\sigma \) refers to the variance and a mean of \(zero\). \({\mathfrak{R}}_{are}\) signifies the preprocessed image.

DL model architecture

Deep feature fusion process

At this level, a deep feature fusion process comprising MobileNet, DenseNet, and ResNet models is involved.

MobileNet

A CNN architecture known as MobileNet eliminates the need for excessive computational power20. Convolutional has been separated as depthwise and pointwise subtraction by MobileNet. Batch Normalization (BN) and ReLU were utilized by MobileNet structure for depthwise and pointwise convolutional, correspondingly. Usually, the significant difference between the MobileNet and CNN models lies in using a convolutional layer or layer with filter thickness that equals the thickness of input images. MobileNet deploys a separation of convolutional into two distinct functions: depthwise convolutional and pointwise convolutional.

Furthermore, the site deploys Depthwise Separable convolutional that comprises Depthwise and Pointwise layers, then BN and ReLU activation. MobileNet’s model comprises deep, separable convolutions, with the unique of the first layer that is fully convolutional. Describing the network in simple terms permits easy topology exploration to develop the network.

DenseNet

The DenseNet201 technique depends on the new DenseNet structure presented by Huang et al. DenseNet presents the model of dense connections, but all the layers are linked21. This network structure enhances gradient flow, decreases the parameter counts, and encourages feature reuse. DenseNet201 presents this idea by introducing a deeper network comprising 201 layers comprising transition layers and dense blocks.

This structure contains three dense blocks. Transition layers occur between every pair of neighbouring blocks. These transition layers’ roles are vital in changing the feature‐map sizes utilizing convolutional and pooling functions. By integrating these transition layers, this method efficiently accomplishes the data flow and adjusts the feature map sizes to enable effectual learning and data propagation through the network. This method permits DenseNet to control the benefits of dense connections and adaptively fine-tune the feature sizes, enhancing performance in several DL tasks.

The given model architecture is crafted for the classification of real and fake images, with a particular focus on detecting deepfake images. It contains a dense layer, input layer, DenseNet201layer, classification layer, global average pooling layer, and fatten layer. This model includes a total of 19,371,458 parameters, with 19,142,402 being trainable. It is gathered with the Adam optimizer utilizing a learning rate of 0.001. In summary, this design integrates DenseNet201 for feature extractor and dense layers for classification, resulting in a model identifying deepfake images with maximum accuracy.

ResNet

As the layer counts increase, the quality of the weight matrix diminishes, resulting in a reduced ability to learn features. This degradation could lead to the networks being symmetric. He et al.22 presented the ResNet model to resolve the problems of gradient explosion and vanishing effectively. It also mainly improves the training performance and efficiency of DNN, which continues to promote the progress of DL technology. ResNet has been presented with various layers of network models like ResNet_18, ResNet_50, and ResNet_101. During this research, the ResNet34 network can be employed. Primarily, the size of image \(224\text{x}224\) is input into a convolutional layer with a size of convolutional kernel of \(7\text{x7,64}\) convolutional kernels, 2 step size, and three padding, and the output is \(64\text{x}112\text{x}112\). Also, it should be pointed out that there exist two types of linking lines: solid and dashed lines. Solid lines represent the input and output sizes are similar, and the computation process is to combine directly. Figure 2 depicts the framework of ResNet.

Fig. 2
figure 2

Architecture of ResNet.

Parameter optimization

To improve the performance of the DFFCV-FDC technique, the IPOA-based hyperparameter selection is performed. Pavel Trojovský et al. in 2022 proposed POA a new metaheuristic algorithm that simulates the hunting behavior of pelicans23. The approaching prey (exploration phase) and surface flight (development phase) are two phases of POA.

Initialization

It is necessary to initialize the population before hunting, where the individuals are represented as a candidate solution. This can be mathematically derived as follows:

$${X}_{i,j}={l}_{j}+rand*\left({u}_{j}-{l}_{j}\right), i=\text{1,2},\dots , N, j=\text{1,2}, ..,m$$
(2)

whereas \({X}_{i,j}\) is the location of \({the i}_{th}\) pelican at \(j\) dimension, \(N\) refers to the number of pelicans population, \(m\) denotes the dimensionality of the problem, and \(rand\) denotes the random integer within [\(\text{0,1}\)]. The upper and lower limitations of the \({J}_{th}\) dimension \(are {u}_{j}\) and \({l}_{j}\), correspondingly.

Exploration stage

Initially, the prey position is generated randomly in the search range, and the pelican determines the prey position. Once the value of the objective function is lesser than the prey, they approach the prey or move away from the prey as follows.

$${X}_{i}^{{P}_{1}}=\left\{\begin{array}{l}{X}_{i}+rand*\left(P-I*{X}_{i}\right), {F}_{p}<{F}_{i}\\ {X}_{i}+rand*\left({X}_{i}-P\right), else\end{array}\right.$$
(3)

where \({X}_{i}^{{P}_{1}}\) is the location of the \({i}_{th}\) pelican after the update of the initial stage, \(I\) is the 1 or 2 random numbers, \(P\) is the prey location. It is the random value within [0,1], \({F}_{p}\) and \({F}_{i}\) are the fitness values (FVs) of the prey and \({i}_{th}\) pelicans.

The pelican upgrades the location if the FVs of the newest location are superior to the prior place after the pelican approaches the prey.

$${X}_{i}=\left\{\begin{array}{l}{X}_{i}^{new}, {F}_{i}^{new}<{F}_{i}\\ {X}_{i}, else\end{array}\right.$$
(4)

where \({X}_{i}^{new}\) characterizes the updated location of the \({i}_{th}\) pelican, and \({F}_{i}^{new}\) signifies the FVs of the upgraded location.

Development stage

In this stage, the pelicans capture the prey upon reaching the water surface. Here, they search for the point within the neighbourhood location to accomplish the best convergence.

$${X}_{i}^{{P}_{2}}={X}_{i}+R*\left(1-\frac{T}{T}\right)*\left(2*rand-1\right)*{X}_{i}$$
(5)

where \({X}_{i}^{{P}_{2}}\) is the location of \({i}_{th}\) pelican after the update of the second phase, \(R\) represents the constant 0.2 \(and\) denotes the random value within zero and one. T and \(T\) are the existing and maximal iterations, correspondingly.

Crisscross Optimization Algorithm (CSO) is a novel search technique which exploits horizontal and vertical crossovers for updating individuals’ positions in the population. The horizontal crossover is the arithmetical crossover of each dimension.

$$M{S}_{hc}(i, d)={r}_{1}*X(i, d)+(1-{r}_{1})*X(i, d)+{C}_{1}*(X(i, d)-X(j, d))$$
(6)
$$M{S}_{hc}(j, d)={r}_{2}*X(j, d)+(1-{r}_{2})*X(i, d)+{C}_{2}*\left(X\left(j, d\right)-X\left(i, d\right)\right)$$
(7)

where \(X\left(i,d\right)\) and \(X\left(j,d\right)\) are the locations of the \(d-\) dimension of the \({i}_{th}\) and \(j\) individuals, correspondingly; \({r}_{1}\) and \({r}_{2}\) are the arbitrary values within [0,1]; and \({C}_{1}\) and \({C}_{2}\) are the arbitrary values within [\(-1\),1]. \(M{S}_{hc}(i,d)\) and \(M{S}_{hc}(i,d)\) are the offspring generated after horizontal crossover.

The vertical crossing is an arithmetical crossing that works on each individual between two dimensions.

$$M{S}_{vc}\left(i, d1\right)=r*X\left(i, {d}_{1}\right)+\left(1-r\right)*X\left(i, {d}_{2}\right)$$
(8)

whereas \(X\left(i, {d}_{1}\right)\) and \(X\left(i, {d}_{2}\right)\) are the locations of \({d}_{1}\) and \({d}_{2}\) dimensions at \({i}_{th}\) individuals correspondingly, \(r\) denotes the random value within [0,1], and \(M{S}_{vc}(i, d1)\) is the offspring generated after vertical crossover.

The POA gets trapped in the local optima since the pelican individuals move within the smaller range. The CSO is incorporated into the local search range to enhance its capability to escape from the local optima due to local solid development ability and global detection ability. In this work, the existing individuals are far from the random individual once the FVs of the random individuals are lesser than the existing individual. In the CSO, the horizontal crossover is introduced to exploit the randomly generated individual fully, guide them to approach the target position, and optimize the local development proficiency and its capability to escape from the local optimum.

$${X}_{i}^{{p}_{1}}\left(i,j\right)={r}_{1}*X\left(i,j\right)+\left(1-{r}_{1}\right)*P\left(i,j\right)+\text{sin}\left({r}_{2}\right)*\left(X\left(i,j\right)-P\left(i,j\right)\right)$$
(9)

where \(X\left(i,j\right)\), and \(P({i}_{t})\) are the existing and the random individuals, \({r}_{1}\) and \({r}_{2}\) are the randomly generated values within [0,1] and [\(0\),\(2\pi ].\)

The IPOA method is used to derive an FF for obtaining a higher classifier effectiveness. It resolves a positive integer to characterize the superior outcomes of candidate performances. Here, the decay of the classifier error rate has been considered that FF.

$$fitness\left({x}_{i}\right)=ClassifierErrorRate\left({x}_{i}\right)$$
$$=\frac{No.\, of\, misclassified\, samples}{Total\, No.\, of\, samples}\times 100$$
(10)

Classifier for fall detection

Finally, the detection of falls is identified by utilizing the DAE method. DAE is aimed at preventing the problem of overfitting in AEs and dealing with the comprehensive condition by adding noise to the input layer24. Meanwhile, DAE avoids the repetition of inputs to outputs and learns effective data representations. The DAE-NN has three layers (viz. the hidden (HL), input, and output layers) which perform two major operations, namely encoder and decoder operation.

After adding noise to the in original data, the vector \({x}_{noise}\) is considered the input layer data. Subsequently, an encoder process has been implemented on the input layer to attain \(h\). The last output, z, is performed on the decoder to the HL.

$${x}_{noise}=A\cdot {x}_{in}$$
(11)
$$h=f\left({W}_{encoder}\cdot {x}_{noise}+{b}_{encoder}\right)$$
(12)
$$z=g\left({W}_{decoder}\cdot h+{b}_{decoder}\right)$$
(13)

where \(A\) characterizes a random matrix within [0,1]. \({W}_{encoder},\) \({W}_{decoder},\) \({b}_{encoder}\), and \({b}_{decoder}\) are the network parameters for the encoder and decoder implementation. \(f(\cdot )\) and \(g(\cdot )\) are nonlinear activation functions.

The DAE aims to recover the corrupted information into the new information, viz., the last output \(Z\) must be closer to the original data \({x}_{in}\).

$$l=\frac{1}{n}{\sum }_{i=1}^{n}{\Vert {z}^{(i)}-{x}_{in}^{\left(i\right)}\Vert }^{2}$$
(14)

Here, 1 indicates the error during the training process, and \(n\) signifies the length of training samples.

Results analysis

Data used

The FD outcomes of the DFFCV-FDC technique are examined using the multiple cameras’ fall (MCF) database25 with frontal sequence and the URFD database26 with the overhead sequence as definite in Table 1. Figure 3 portrays the sample images of MCF and URFD datasets. The MCF dataset encompasses 24 scenarios recorded utilizing 8 IP video cameras. The first 22 scenarios feature falls along with confounding events, while the last 2 scenarios solely depict confounding events. Also, the URFD dataset encompasses 70 sequences, comprising of 30 falls and 40 activities of daily living (ADL). Fall events are captured using 2 Microsoft Kinect cameras along with accelerometer data, while ADL events are recorded utilizing camera 0 and accelerometer data. Sensor data is accumulated employing PS Move devices at 60Hz and x-IMU devices at 256Hz. The suggested DFFCV-FDC technique is simulated using Python 3.6.5 tool on PC i5-8600 k, 250 GB SSD, GeForce 1050Ti 4 GB, 16 GB RAM, and 1 TB HDD. The parameter settings are provided: learning rate: 0.01, activation: ReLU, epoch count: 50, dropout: 0.5, and batch size: 5.

Table 1 Details on database.
Fig. 3
figure 3

Sample Images a) Multiple cameras’ fall detection database [Source Link: E. Auvinet, C. Rougier, J. Meunier, A. S. Arnaud and J. Rousseau, “Multiple cameras fall dataset,” DIROuniversité de montréal, Montreal, QC, Canada, tech. Rep. 1350,” 2010] b) UR fall detection database [Source Link: UR Fall Detection (URFD) dataset with an overhead sequence (available at http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html)].

Result analysis on frontal sequence database

Figure 4 establishes the confusion matrices created by the DFFCV-FDC model under a frontal sequence database with numerous epochs. The results state that the DFFCV-FDC model has efficient detection of the fall and no fall instances under all classes.

Fig. 4
figure 4

Confusion matrices of DFFCV-FDC technique under frontal sequence database (a-f) Epochs 500–3000.

In Table 2 and Fig. 5, the FD outcome of the DFFCV-FDC approach is reported on the frontal sequence database. The results stated that the DFFCV-FDC approach appropriately recognizes the falls and no falls events. With 500 epochs, the DFFCV-FDC technique gains an average \(acc{u}_{y}\) of 99.68%, \(pre{c}_{n}\) of 99.33%, \(rec{a}_{l}\) of 99.79%, \({F}_{score}\) of 99.56%, and \({G}_{measure}\) of 99.56%. At the same time, with 1000 epochs, the DFFCV-FDC model gets an average \(acc{u}_{y}\) of 98.73%, \(pre{c}_{n}\) of 98.69%, \(rec{a}_{l}\) of 97.76%, \({F}_{score}\) of 98.22%, and \({G}_{measure}\) of 98.22%. Meanwhile, with 1500 epochs, the DFFCV-FDC model attains an average \(acc{u}_{y}\) of 96.82%, \(pre{c}_{n}\) of 96.90%, \(rec{a}_{l}\) of 94.18%, \({F}_{score}\) of 95.45%, and \({G}_{measure}\) of 95.50%. Besides, with 2000 epochs, the DFFCV-FDC approach achieves average \(acc{u}_{y}\) of 98.90%, \(pre{c}_{n}\) of 97.35%, \(rec{a}_{l}\) of 97.35%, \({F}_{score}\) of 97.35%, and \({G}_{measure}\) of 97.35%. Finally, with 3000 epochs, the DFFCV-FDC approach gets average \(acc{u}_{y}\) of 98.41%, \(pre{c}_{n}\) of 98.01%, \(rec{a}_{l}\) of 97.56%, \({F}_{score}\) of 97.78%, and \({G}_{measure}\) of 97.78%.

Table 2 FD outcome of DFFCV-FDC technique on frontal sequence database.
Fig. 5
figure 5

Average of DFFCV-FDC technique at frontal sequence database.

The performance of the DFFCV-FDC approach is offered in Fig. 6 in the validation accuracy (VALAC) and training accuracy (TRAAC) curves at the frontal sequence database. The figure displays a beneficial interpretation of the behaviour of the DFFCV-FDC technique over numerous epoch counts, representing its learning method and generalized skills. Mainly, the figure concludes a stable development in the TRAAC and VALAC with a development in epochs. It safeguards the adaptive sort of the DFFCV-FDC approach in the pattern recognition procedure on both the data. The arising tendency in VALAC summarizes the ability of the DFFCV-FDC technique to adjust to the TRA data. Also, it delivers precise identification of hidden data, directing out the generalised solid skills.

Fig. 6
figure 6

Accu_y curve of DFFCV-FDC technique at frontal sequence database.

Figure 7 determines a complete depiction of the validation loss (VALLS) and training loss (TRALS) curves of the DFFCV-FDC technique at the frontal sequence database. The advanced reduction in TRALS highlights the DFFCV-FDC method, enhancing the weights and diminishing the classification error on both data. The figure designates a clean understanding of the DFFCV-FDC method’s link with the TRA data, emphasizing its ability to take patterns within both databases. Remarkably, the DFFCV-FDC approach repeatedly enhances its parameters in declining the changes between the forecast and actual TRA classes.

Fig. 7
figure 7

Loss curve of DFFCV-FDC technique at frontal sequence database.

Inspecting the precision-recall (PR) curve, as presented in Fig. 8, the results guaranteed that the DFFCV-FDC approach gradually achieves higher PR rates over every class on the frontal sequence database. It confirms the superior skills of the DFFCV-FDC approach in the classification of dissimilar classes, presenting proficiency in the recognition of classes.

Fig. 8
figure 8

PR curve of DFFCV-FDC technique at frontal sequence database.

Besides, in Fig. 9, ROC curves formed by the DFFCV-FDC approach outperformed the identification of different labels on the frontal sequence database. It offers complete thought of the tradeoff among FRP and TPR over discrete detection threshold value and epoch counts. The figure emphasized the boosted classifier results of the DFFCV-FDC approach under all classes and, exactness the efficiency in addressing many identification problems.

Fig. 9
figure 9

ROC curve of DFFCV-FDC technique on frontal sequence database.

In Fig. 10, the comparative outcomes of the DFFCV-FDC approach on the frontal sequence database are described. The outcomes indicate that the 1D-CNN, 2D-CNN, and ResNet101 models have shown the most minor performance with minimal \(acc{u}_{y}\) values of 94.31%, 95.63%, and 96.33%, correspondingly. Besides, the VGG16 and VGG19 models have attained slightly boosted \(acc{u}_{y}\) of 97.66% and 98.25%, respectively. However, the EADL-FDC and IWODL-FDDP models have accomplished closer \(acc{u}_{y}\) of 99.33% and 99.04%, correspondingly. However, the DFFCV-FDC technique performs better with an increased \(acc{u}_{y}\) of 99.68%.

Fig. 10
figure 10

Accu_y outcome of DFFCV-FDC technique on frontal sequence database 25,26.

Result analysis on overhead sequence database

Figure 11 establishes the confusion matrices formed by the DFFCV-FDC approach under an overhead sequence database with numerous epochs. The results specify that the DFFCV-FDC approach effectively detects the fall and no fall samples under all classes.

Fig. 11
figure 11

Confusion matrices of DFFCV-FDC technique under overhead sequence database (a-f) Epochs 500–3000.

In Table 3 and Fig. 12, the FD outcomes of the DFFCV-FDC method are described on the overhead sequence database. The outcomes stated that the DFFCV-FDC model correctly identifies the falls and no falls events. With 500 epochs, the DFFCV-FDC method achieves an average \(acc{u}_{y}\) of 96.36%, \(pre{c}_{n}\) of 94.95%, \(rec{a}_{l}\) of 95.35%, \({F}_{score}\) of 95.14%, and \({G}_{measure}\) of 95.14%. Simultaneously, with 1000 epochs, the DFFCV-FDC approach gets an average \(acc{u}_{y}\) of 97.02%, \(pre{c}_{n}\) of 95.48%, \(rec{a}_{l}\) of 96.68%, \({F}_{score}\) of 96.06%, and \({G}_{measure}\) of 96.07%. In the meantime, with 1500 epochs, the DFFCV-FDC approach achieves an average \(acc{u}_{y}\) of 97.68%, \(pre{c}_{n}\) of 96.70%, \(rec{a}_{l}\) of 97.12%, \({F}_{score}\) of 96.91%, and \({G}_{measure}\) of 96,91%. Furthermore, with 2000 epochs, the DFFCV-FDC model reaches average \(acc{u}_{y}\) of 98.01%, \(pre{c}_{n}\) of 97.34%, \(rec{a}_{l}\) of 97.34%, \({F}_{score}\) of 97.34%, and \({G}_{measure}\) of 97.34%. Lastly, with 3000 epochs, the DFFCV-FDC methodology obtains an average \(acc{u}_{y}\) of 97.35%, \(pre{c}_{n}\) of 96.45%, \(rec{a}_{l}\) of 96.45%, \({F}_{score}\) of 96.45%, and \({G}_{measure}\) of 96.45%.

Table 3 FD outcome of DFFCV-FDC technique on overhead sequence database.
Fig. 12
figure 12

Average outcome of DFFCV-FDC technique on overhead sequence database.

The performance of the DFFCV-FDC approach is entirely offered in Fig. 13 in the method of TRAAC and VALAC curves on the overhead sequence database. The figure shows a beneficial interpretation of the behaviour of the DFFCV-FDC method over numerous epoch counts, representing its learning procedure and generalized skills. Mainly, the figure deduces a stable development in the TRAAC and VALAC with development in epochs. It safeguards the adaptive nature of the DFFCV-FDC approach in the pattern recognition procedure on both data. The rising tendency in VALAC summarizes the skill of the DFFCV-FDC technique in adjusting to the TRA data and also surpassing in offering precise identification of hidden data, indicating strong generalization skills.

Fig. 13
figure 13

Accu_y curve of DFFCV-FDC technique on overhead sequence database25,26.

Figure 14 establishes a comprehensive representation of the TRALS and VALLS outcomes of the DFFCV-FDC approach on the overhead sequence database. The progressive decline in TRALS highlights the DFFCV-FDC technique adjusting the weights and reducing the classification error on both data. The figure directs a clear understanding of the DFFCV-FDC model’s link with the TRA data, emphasizing its ability to take patterns within both databases. Remarkably, the DFFCV-FDC method repeatedly enhances its parameters in declining the changes between the forecast and actual TRA class labels.

Fig. 14
figure 14

Loss curve of DFFCV-FDC technique on overhead sequence database.

Inspecting the PR curve, as exposed in Fig. 15, the results certified that the DFFCV-FDC model gradually accomplishes enhanced PR values under every class on the overhead sequence database. It confirms the improved skills of the DFFCV-FDC approach in identifying distinct courses and exhibiting the ability to recognize classes.

Fig. 15
figure 15

PR curve of DFFCV-FDC technique on overhead sequence database.

In addition, in Fig. 16, ROC curves produced by the DFFCV-FDC model are beaten in the identification of dissimilar labels on the overhead sequence database. It offers complete acceptance of the tradeoff among FRP and TPR over discrete detection threshold value and epoch counts. The outcome emphasized the superior classifier results of the DFFCV-FDC technique under every class, outlining the efficacy in finding many identification problems.

Fig. 16
figure 16

ROC curve of DFFCV-FDC technique on overhead sequence database.

In Fig. 17, the comparative outcomes of the DFFCV-FDC approach on the overhead sequence database are described. The results specify that the 1D-CNN, 2D-CNN, and ResNet101 methodologies have shown the most minor performance with the least \(acc{u}_{y}\) values of 92.69%, 95.48%, and 96.69%, respectively. Moreover, the VGG16 and VGG19 techniques have slightly increased \(acc{u}_{y}\) of 95.16% and 96.56%, respectively. However, the EADL-FDC and IWODL-FDDP models have accomplished nearer \(acc{u}_{y}\) of 97.34% and 97.02%, correspondingly. However, the DFFCV-FDC methodology grasps a higher solution with an increased \(acc{u}_{y}\) of 98.34%.

Fig. 17
figure 17

Accu_y outcome of DFFCV-FDC technique on overhead sequence database25,26.

Hence, the DFFCV-FDC technique can be applied for enhanced fall recognition results.

Conclusion

In this study, a novel DFFCV-FDC methodology is introduced. The main drive of the DFFCV-FDC methodology is to employ the CV concept to detect fall events. Accordingly, the DFFCV-FDC methodology has GF using noise elimination, feature fusion process, parameter optimization, and DAE-based fall detection and classification process. At the primary level, the DFFCV-FDC technique uses the GF approach for noise eradication. Besides, a deep feature fusion process comprising MobileNet, DenseNet, and ResNet models is involved. The IPOA-based hyperparameter selection is performed to improve the performance of the DFFCV-FDC methodology. Finally, the detection of falls is identified by utilizing the DAE model. The performance analysis of the DFFCV-FDC approach was examined on the benchmark fall database. A widespread comparative study reported improving the DFFCV-FDC technique over existing models. The limitations of the DFFCV-FDC technique include the requirement for robustness testing across varied real-world camera settings and lighting conditions to confirm consistent performance. Future work may focus on improving the model’s sensitivity to subtle fall cues, incorporating more sophisticated anomaly detection techniques, and addressing privacy concerns associated with continuous video surveillance in private spaces. Furthermore, exploring real-time implementation threats and optimizing computational effectualness for deployment on edge devices is significant for practical application in assisted living and healthcare environments.