Introduction

Numerous epidemics and infectious viral outbreaks, like as Zika, Ebola, MERS, and pneumonia, have already erupted, wreaking havoc on the human lives and economies of the world’s main countries and regions1. The globe is now dealing with a contagious disease triggered by SARS-CoV-2, which was identified in the China and has since spread throughout all continents2. As of August 1, 2022, 228 countries and territories worldwide had reported a total of 600.74 million active coronavirus cases and 6.47 million deaths3. The infectious disease is rapidly spreading and has claimed the lives of billions of individuals worldwide. Discovering the transmission pattern and making forecasts using scientific approaches may help policymakers frame measures and limit the epidemic.

Forecasting time series is among the most important statistical analytic methodologies that play a significant part in making decisions under uncertainty. Time series is described as a set of observations and historical records that are related to each other for a phenomenon and for specific periods of time and are typically equal in length4. Fluctuation is a time series feature that may be used to evaluate the pattern and make inferences. FTS gives semantic significance for fluctuating data. FTS is more informative than regular time series. Many authors have used fuzzy time series techniques and non-linear growth models separately for COVID-19 active cases prediction. Their COVID-19 active cases prediction is not very much close to the actual COVID-19 active cases. Using FTS and logistic growth models, the current paper developed a hybrid FTS technique for forecasting upcoming coronavirus positive cases in many countries. For better COVID-19 active cases prediction we have combined fuzzy time series techniques with the logistic non-linear growth model. The current model’s forecast aims to give a foundation for decision makers to create a protective strategy during the coronavirus pandemic. Coronavirus prediction models may also aid with medical resource planning, such as the number of ventilators, beds, and emergency healthcare facilities allocated to coronavirus patients.

The objectives and contributions of the present study are outlined below:

  • 1. Development of a logistic growth technique to estimate infection rates.

  • 2. Design of hybrid technique by using fuzzy time series technique and non-liner growth model.

  • 3. Forecasting of coronavirus infected people for trained and untrained data in several countries.

Literature review

Time-series forecasting is vital throughout epidemics because it gives critical data that might lead to disease suppression. Various forecasting methods are being utilized to anticipate future coronavirus infections.

The official statistics for COVID-19 is very ambiguous5 and fuzzy logic is a term that signifies unpredictability and can appropriately represent it6. Fuzzy logic techniques that can effectively handle the ambiguity in the coronavirus data must be used in this situation. Numerous researchers have utilized fuzzy logic in their prediction models in order to do this. In order to anticipate the coronavirus time series in Mexico, Patra et al.7 suggested several fuzzy ensemble neural network models. To estimate and predict the confirmed cases of coronavirus in China, Al-Qaness et al.8 suggested using an adaptive neurofuzzy inference system (ANFIS) optimized using flower pollination and salp swarm algorithm. Fuzzy logic inference was suggested by Dhiman and Sharma9 for the treatment and detection of coronavirus. For the investigation of the coronavirus epidemic, Fong et al.10‘s proposal of hybridized deep learning and fuzzy rule generation was made. Fatima et al.11 used IOT techniques to allow smart monitoring of coronavirus disease with a related fuzzy inference technique. Van Tinh et al.12 used in conjunction with particle swarm optimization (PSO) for coronavirus prediction. Verma et al.13 combined ARIMA and FTS techniques. Castillo et al.14‘s hybrid intelligent technique, which combines fuzzy logic and fractal mathematics for predicting coronavirus active cases. Zhao et al.15 utilized SEIR technique to estimate the number of coronavirus patients in US. Chowdhury et al.16 have suggested the use of the FPASSA-ANFIS technique to predict the number of positive coronavirus cases in China. A coronavirus optimum control technique was given by Khatua et al.17 employing fuzzy dynamical framework. For estimating the trend of coronavirus in India, Bedi et al.18 suggested a modified SEIRD model. In order to mimic the coronavirus epidemic in Malaysia, Hassan et al.19 effectively implemented a fuzzy graph technique utilizing the c-FACS technique. Asl et al.20 suggested a COVID-19-specific interval type-2 fuzzy expert system for predicting ICU admittance. Iwendi et al.21 suggested the usefulness of an ANFIS in COVID-19 infection categorization based on symptoms. Yogachi et al.22 predicted the percentage rise in coronavirus confirmed patients in West Java using the fuzzy logic mamdani approach. Kumar et al.23 used PSO approach to optimize all three hyperparameters for the coronavirus pandemic FTS forecasts. Kumar and Kaur et al.24 suggested a hybrid approach for future prediction of COVID-19 cases in India based on self-organized maps and FTS. Luo et al.25 utilized XGBoost and long short-term memory algorithms to forecast coronavirus positive patients in America. Recently various authors26,27,28,29,30,31,32 used fuzzy time series techniques and non-linear growth models for prediction of COVID-19 active, death and recovery cases.

Methodology

Fuzzy time series (FTS) model representation

FTS is a popular statistical model in analytics that examines both prior and recent data to create predictions. Utilization of fuzzy logic for time series prediction has been implemented in the literature and provides a considerable improvement over standard statistical approaches since it is non-linear and can approximate any complicated nonlinear system more accurately than linear statistical approaches. According to the fuzzy logic given by Zadeh et al.33, an element may concurrently belong to a set and not belong to it at other levels if its membership value falls within the range [0, 1]. A linguistic variable to portray the numerical time series was proposed by Chen et al.34, and these regions will serve as the linguistic words of our variable. Chen et al.34 proposed the idea of trying to divide the universe of discourse into several intervals (fuzzy sets) and seeing how each region behaves (deriving rules using time series sequences).

Definition 1:

Fuzzy Set: Universe of discourse can be depicted as \(U=\{{u}_{1}, {u}_{12},\dots .{u}_{n}\}\). \({A}_{i}={f}_{{A}_{i}}\)(\({u}_{1})/ {u}_{1}\) \(+ {f}_{{A}_{i}}\)(\({u}_{2})/ {u}_{2}\) \({+\dots +f}_{{A}_{i}}\)(\({u}_{n})/ {u}_{n}\), where \({A}_{i}\) represented as fuzzy set of \(U\) and \({f}_{{A}_{i}}\) denoted as the membership function of \({A}_{i}\). The range of \({f}_{{A}_{i}}\) is in between 0 to 1. Let \({u}_{j}\) should be treated as one of the elements of fuzzy set \({A}_{i}\) and \({f}_{{A}_{i}}\)(\({u}_{j})\) is the degree to which \({u}_{j}\) belongs. fAi (uj)  [0,1] where 1 ≤ j ≤ n. The following is a basic overview of the definition of FTS:

Definition 2:

FTS: Let \(X\left(t\right)\left(t ranges from \text{0,1},2..\right),\) be a subset of R. Let universe of discourse can be represented as \(X\left(t\right)\) which is described by fuzzy set \({f}_{i}\left(t\right).\) \(F\left(t\right)\) is specified as FTS on \(X\left(t\right)\left(t ranges from \text{0,1},2..\right)\) iff \(F\left(t\right)\) is an assemblage of \({f}_{i}\left(t\right) \left(i ranges from\text{1,2}..\right)\).

Definition 3:

FLR: \(F\left(t\right)\) is assumed to be generated by \(F\left(t-1\right)\), if there is a fuzzy relation \(R\left(t-1,t\right),\) so that \(F\left(t\right)=F\left(t-1\right)*R\left(t-1,t\right),\) in which x act as an operator. The relationship among \(F\left(t\right)\) and \(F\left(t-1\right)\) is treated as fuzzy logical relationship (FLR) denoted by \({A}_{i}{\to A}_{j}\), when \(F\left(t\right)= {A}_{j}\) and \(F\left(t-1\right)= {A}_{i}\).

Definition 4:

FLRG: Utilizing the same fuzzy set on the left, FLR may be generated into fuzzy logical relationship groups (FLRG). Consider the case where \({A}_{i}{\to A}_{j1}\) , \({A}_{i}{\to A}_{j2}\) , \({A}_{i}{\to A}_{jm}\) are fuzzy logical relationships. They may have formed a FLRG \({A}_{i}{\to A}_{j1}\),\({A}_{j2},\dots {A}_{jm}\). We used Chen's model 34 to estimate the forecast values of COVID-19 data sets, however Hurang's algorithm 35 average based length is used to establish interval's length. FTS model's algorithm is described below.

In first step, we have to define or describe the universe. The range of discourse is denoted by the notation \({U [L}_{c}-{D}_{1}\), \({U}_{c}+{D}_{2}]\), where \({U}_{c}\) and \({L}_{c}\) are the upper and lower bounds of the data and \({D}_{2}\) and \({D}_{1}\) are the constants that are obtained by taking 10% of the \({L}_{c}\) and \({U}_{c}\). Therefore, as a security margin, we have expanded the upper and lower boundaries by 10%. In second step, based on Hurang et al.35 we split the universe of discourse into \({\prime}l{\prime}\) length intervals by utilizing average based length algorithm. In third step, we achieve \({u}_{1}, {u}_{12},\dots .{u}_{n}\), define fuzzy sets \({A}_{i}\); \({A}_{1}, {A}_{2},{\dots .A}_{n}\) and fuzzify the time series data. In fourth step, we identify FLRs and FLRGs using definitions 3 and 4. In fifth step, we utilized Chen’s model’s approach for defuzzification and estimate the predicted values. In sixth step, we utilized Root Mean Square Error to evaluate the performance analysis.

Logistic growth model

The COVID-19 epidemic may be followed using a logistic growth model36. It has been commonly used to simulate population growth where there are physical and resource constraints. Epidemiology frequently uses logistic models. It is normal practice to examine the risk factors linked with a particular disease and to estimate the likelihood that the disease would manifest given the risk factors. Through logistic regression analysis, we can roughly forecast how epidemiology will evolve and spread.

$${Y}_{t}=a/1+\text{exp}(b-c)*(t-{t}_{0}))$$

a is the anticipated maximum number of confirmed cases, and \({Y}_{t}\) is is the total number of cumulative active cases. The fitting coefficients are b and c. The number of days after the first instance is represented as t. The first instance of the case happened at time \({t}_{0}\).

Calculate R2 score (regression coefficient)

The following formula may be used to get the regression coefficient \({R}^{2}\), which is used to assess how well different approaches fit data.

$$R^{2} = 1 - \sum\limits_{i = 1}^{n} {\left[ {y_{i} - \hat{y})} \right]}^{2} /\sum\limits_{i = 1}^{n} {\left[ {y_{i} - \overline{y})} \right]}^{2}$$

ŷ represented as the predicted cumulative positive cases; yi and y denoted as the actual cumulative positive cases and average of the actual cumulative positive cases. The fitting coefficient is closer to 1.

Hybrid model

We combined the FTS and logistic growth models in the hybrid model. It is assumed that a time series consists of two components: a linear autocorrelation framework element and a non-linear element. The hybrid approach summarized as given below.

\({T}_{t}{={L}_{t}+N}_{t}\), Where \({T}_{t}\) denoted as actual time series, \({N}_{t}\) and \({L}_{t}\) denoted as non-linear and element. Here we have to evaluate the residual values (\({E}_{t })\) of logistic growth model by \({E}_{t}={T}_{t}-\widehat{{ L}_{t}}\) formula. Using the FTS model, we have estimated non-linear residuals and getting the non-linear element of the hybrid model by \({E}_{t}={f(E}_{t-1},{E}_{t-2},{\dots .E}_{t-n})+ {\varepsilon }_{t}\) formula. \({\varepsilon }_{t}\) and \(f\) denoted as the random error and non-linear function evaluated for the model. By utilizing the above formula, we get the non-linear element of the time series \(\widehat{N}\), so that the forecast values of the time series are evaluated by \(\widehat{{T}_{t} }= \widehat{{ L}_{t}}+\widehat{{ N}_{t}}\).

Experimental results and discussion

For this study, we utilized time series information on COVID-19 infection from John Hopkins University that was made available on GitHub37. Phase 1 data collection runs from 28 January 2020 through 5 June 2020. Phase 2 data collection runs from 10 October 2021 until 15 January 2022. All countries’ positive, recovered and death cases are included in the statistics. The models’ packages are in the “pyFTS” library and the methods are implemented using Python 3.738. The Python Matplotlib package is used to create the plots. The forecast of coronavirus positive cases has been studied for Italy, Brazil, India, Germany, Pakistan, and Myanmar till June 5, 2020 in phase-1, and January 15, 2022 in phase-2, and projections for the number of active cases for the next 26 and 14 days have been produced. COVID-19 active cases for the above countries are represented in Table 1and Table 2 as phase-1 and phase-2 respectively for better visualization. In this article, we recommended a hybrid model which is a combination of both logistic growth model and fuzzy time series techniques. The value of fitting coefficient is closer to 1 is required for accurate estimation of coronavirus active cases. In order to get a higher \({R}^{2}\) value that is closer to 1 and a lower RMSE value, we thus presented this technique. The recommended model’s major objective is to obtain the best fit analysis when contrast to various models.

Table 1 The recommended model forecasted coronavirus patients in Italy, Brazil, India, Germany, Pakistan, and Myanmar over the following 26 days beginning from June 6, 2020.
Table 2 The recommended model forecasted coronavirus patients in Italy, Brazil, India, Germany, Pakistan, and Myanmar over the following 14 days beginning from 16 January 2022.

Phase-1 Analysis

Analysis of coronavirus active cases for Italy

As demonstrated in Fig. 1, the proposed model’s fitting impact outperforms other models (FTS and Logistic growth) used in Italy. As per our forecast, the number of coronavirus positive patients on 1 July 2020 is 240,760, while the proposed model predicts 240,502. The x-axis represents the number of days. The y-axis represents the number of COVID-19 active cases.

Figure 1
figure 1

From 6 June 2020 to 1 July 2020, a forecast vs. actual plot for Italy’s positive coronavirus patients is shown. According to the proposed model’s predictions, Italy has 240,502 COVID-19 cases on 1 July 2020.

Analysis of coronavirus active cases for Brazil

As demonstrated in Fig. 2, the proposed model’s fitting impact outperforms other models used in Brazil. As per our forecast, the number of coronavirus positive patients on 1 July 2020 is 1,459,494, while the proposed model predicts 1,455,432.

Figure 2
figure 2

From 6 June 2020 to 1 July 2020, a forecast vs. actual plot for Brazil’s positive coronavirus patients is shown. According to the proposed model’s predictions, Brazil has 1,455,432 COVID-19 cases on 1 July 2020.

Analysis of coronavirus active cases for India

As demonstrated in Fig. 3, the proposed model’s fitting impact outperforms other models used in India. As per our forecast, the number of coronavirus positive patients on 1 July 2020 is 604,641, while the proposed model predicts 601,480.

Figure 3
figure 3

From 6 June 2020 to 1 July 2020, a forecast vs. actual plot for India’s positive coronavirus patients is shown. According to the proposed model’s predictions, India has 601,480 COVID-19 cases on 1 July 2020.

Analysis of coronavirus active cases for Germany

As demonstrated in Fig. 4, the proposed model’s fitting impact outperforms other models used in Germany. As per our forecast, the number of coronavirus positive patients on 1 July 2020 is 194,725, while the proposed model predicts 194,269.

Figure 4
figure 4

From 6 June 2020 to 1 July 2020, a forecast vs. actual plot for Germany’s positive coronavirus patients is shown. According to the proposed model’s predictions, Germany has 1,455,432 COVID-19 cases on 1 July 2020.

Analysis of coronavirus active cases for Pakistan

As demonstrated in Fig. 5, the proposed model’s fitting impact outperforms other models used in Pakistan. As per our forecast, the number of coronavirus positive patients on 1 July 2020 is 217,809, while the proposed model predicts 216,681.

Figure 5
figure 5

From 6 June 2020 to 1 July 2020, a forecast vs. actual plot for Pakistan’s positive coronavirus patients is shown. According to the proposed model’s predictions, Pakistan has 216,681 COVID-19 cases on 1 July 2020.

Analysis of coronavirus active cases for Myanmar

As demonstrated in Fig. 6, the proposed model’s fitting impact outperforms other models used in Myanmar. As per our forecast, the number of coronavirus positive patients on 1 July 2020 is 304, while the proposed model predicts 299.

Figure 6
figure 6

From 6 June 2020 to 1 July 2020, a forecast vs. actual plot for Myanmar’s positive coronavirus patients is shown. According to the proposed model’s predictions, Myanmar has 299 COVID-19 cases on 1 July 2020.

Phase-2 Analysis

Analysis of coronavirus active cases for Italy

As demonstrated in Fig. 7, the proposed model’s fitting impact outperforms other models used in Italy. As per our forecast, the number of coronavirus positive patients on 29 January 2022 is 10,821,375, while the proposed model predicts 10,819,855.

Figure 7
figure 7

From 16 January 2022 to 29 January 2022, a forecast vs. actual plot for Italy’s positive coronavirus patients is shown. According to the proposed model’s predictions, Italy has 10,819,855 COVID-19 cases on 29 January 2022.

Analysis of coronavirus active cases for Brazil

As demonstrated in Fig. 8, the proposed model’s fitting impact outperforms other models used in Brazil. As per our forecast, the number of coronavirus positive patients on 29 January 2022 is 25,256,198, while the proposed model predicts 25,254,306.

Figure 8
figure 8

From 16 January 2022 to 29 January 2022, a forecast vs. actual plot for Brazil’s positive coronavirus patients is shown. According to the proposed model’s predictions, Brazil has 25,254,306 COVID-19 cases on 29 January 2022.

Analysis of coronavirus active cases for India

As demonstrated in Fig. 9, the proposed model’s fitting impact outperforms other models used in India. As per our forecast, the number of coronavirus positive patients on 29 January 2022 is 41,092,522, while the proposed model predicts 41,090,386.

Figure 9
figure 9

From 16 January 2022 to 29 January 2022, a forecast vs. actual plot for India’s positive coronavirus patients is shown. According to the proposed model’s predictions, India has 41,090,386 COVID-19 cases on 29 January 2022.

Analysis of coronavirus active cases for Germany

As demonstrated in Fig. 10, the proposed model’s fitting impact outperforms other models used in Germany. As per our forecast, the number of coronavirus positive patients on 29 January 2022 is 9,618,245, while the proposed model predicts 9,617,381.

Figure 10
figure 10

From 16 January 2022 to 29 January 2022, a forecast vs. actual plot for Germany’s positive coronavirus patients is shown. According to the proposed model’s predictions, Germany has 9,617,381 COVID-19 cases on 29 January 2022.

Analysis of coronavirus active cases for Pakistan

As demonstrated in Fig. 11, the proposed model’s fitting impact outperforms other models used in Pakistan. As per our forecast, the number of coronavirus positive patients on 29 January 2022 is 1,417,991, while the proposed model predicts 1,417,481.

Figure 11
figure 11

From 16 January 2022 to 29 January 2022, a forecast vs. actual plot for Pakistan’s positive coronavirus patients is shown. According to the proposed model’s predictions, Pakistan has 1,417,481 COVID-19 cases on 29 January 2022.

Analysis of coronavirus active cases for Myanmar

As demonstrated in Fig. 12, the proposed model’s fitting impact outperforms other models used in Myanmar. As per our forecast, the number of coronavirus positive patients on 29 January 2022 is 535,080, while the proposed model predicts 534,898.

Figure 12
figure 12

From 16 January 2022 to 29 January 2022, a forecast vs. actual plot for Myanmar’s positive coronavirus patients is shown. According to the proposed model’s predictions, Myanmar has 534,898 COVID-19 cases on 29 January 2022.

Performance analysis

In comparison to previous models, the proposed model has the highest \({R}^{2}\) values, 0.9992 in phase 1 and 0.9784 in phase 2. The recommended \({R}^{2}\) value is closer to 1, which denotes that the forecast is accurate for the COVID-19 positive patients. The logistic growth model has a higher \({R}^{2}\) than the FTS model.

Discussion

Strengths and weaknesses of the proposed model

The proposed hybrid model combining the FTS technique with the nonlinear logistic growth model stands out for its capabilities of making accurate predictions of the active cases of COVID-19. However, the proposed model is highly efficient in terms of predictive capacities because of its ability to identify linear and nonlinear trends in the data. The model’s R-scores of 0.9992 in phase-1 and 0.9784 in phase-2 underscore its robustness and reliability. Additionally, the model’s adaptability to different countries with varying epidemic dynamics showcases its flexibility and generalizability. The FTS component, while adept at handling nonlinearity, can be sensitive to the choice of intervals and the fuzzification process, potentially impacting prediction accuracy. Moreover, the logistic growth model assumes a saturation point which might not be applicable in scenarios with fluctuating infection rates due to external interventions like lockdowns or mass vaccination drives.

Limitations and differences in prediction results

Further research should be focused on the reasons for variation in predictions for different countries and different time intervals. The differences in the prediction accuracy can be attributed to several factors:

Government interventions

The measures like the implementation of lock down, social distancing measures, and vaccination programs differ greatly in terms of stringency and timing across nations. These interventions can cause sudden changes in infection trends, which can be difficult for the model to follow.

Healthcare Infrastructure

Differences in the health care systems and their capacity around the world help infect the disease and affects the ability to treat it thereby affecting the ability of the model to predict the disease. Countries with well-established health care systems may therefore have different epidemic trends than those with constrained health care systems.

Cultural and behavioral factors

Literature also indicates that people’s adherence to advised health measures, cultural practices, and social norms also influence disease transmission dynamics and thus vary the prediction results.

Phase-specific factors

The two phases analyzed include January 28, 2020, to June 5, 2020 (phase-1), and October 10, 2021 to January 15, 2022 (phase-2), which are two distinct eras of the pandemic. Factors affecting the model during the early phase include high transmission rate and few treatments, while factors during the late phase include vaccination and improved management techniques.

To overcome these limitations and improve the reliability of our model in the future, more variables including population mobility, social distance, and vaccination data, humidity, and average temperature which are known to affect virus spread will be included. We also envisaged the use of more complex algorithms such as neuro-fuzzy systems and weighted sum algorithms in order to increase prediction precision.

Conclusion

Considering the exponential rise in COVID-19 cases globally, nations must arm themselves with the necessary tools to combat this pandemic. This may be accomplished by making accurate forecasts that will enable governments to make decisions appropriately and build additional infrastructure as needed. When we examine the impact of lockdowns in various nations, we discover that total lockdowns considerably lowered the number of affected incidents. Despite the fact that the lockdown has gradually been lifted, the use of the media and social media to disseminate awareness and information about preventative measures, monitoring, running tests, implementing policy proposals in various countries, as well as the power of health care services, has been more effective in reducing disease spread. The current research focuses on providing patterns and forecasts for the COVID-19 outbreak in Italy, Brazil, India, Germany, Pakistan, and Myanmar. In this paper, we used logistic growth and FTS techniques and proposed model for analyzing and predicting the COVID-19 epidemic. Proposed model predicts better outcomes than other models, with R-scores of 0.9992 in phase 1 and 0.9784 in phase 2. As of 1 July 2020, positive coronavirus patients for Italy, Brazil, India, Germany, Pakistan and Myanmar forecasted as 240,502, 1,455,432, 601,480, 194,269, 216,681 and 299 respectively by using proposed model in phase-1. As of 29 January 2022, positive coronavirus patients for Italy, Brazil, India, Germany, Pakistan and Myanmar forecasted as 10,819,855, 25,254,306, 41,090,386, 9,617,381, 1,417,481 and 534,898 respectively by using proposed model in phase-2. In future work, we will evaluate the estimated number of isolation beds and ICU needs for every region. We could try different advanced algorithms like weighted sum algorithms, neuro fuzzy, etc. The impact of humidity and temperature and on the number of coronavirus infection cases was not taken into account by any of the models presented. We might experiment with several cutting-edge algorithms, such as neuro fuzzy and weighted sum algorithms.