Research Article - (2021) Volume 9, Issue 11

Forecasting the Number of COVID-19 Active Cases in Indonesia Using the Statistical Multilayer Perceptron Feedforwards Neural Networks
Yuyun Hidayat and Dhika Surya Pangestu*
 
Faculty of Mathematics and Natural Sciences, Department of Statistics, Padjadjaran University, Jatinangor, West Java, Indonesia
 
*Correspondence: Dhika Surya Pangestu, Faculty of Mathematics and Natural Sciences, Department of Statistics, Padjadjaran University, Jatinangor, West Java, Indonesia, Email:

Received: 09-Nov-2021 Published: 29-Nov-2021

Abstract

COVID-19 was confirmed to first appear in Indonesia on March 2, 2020. Since the beginning of its emergence, the development of the number of COVID-19 cases in Indonesia has continued to increase, until 29 May 2021, there have been 1,809,926 people infected by COVID-19 with the number of active cases as many as 99,690 cases in Indonesia. The active case talks about COVID-19 patients who need medical care and is directly related to hospital capacity. Therefore the prediction of the number of active cases of COVID-19 is a strategic matter to pay attention to. In this study, active cases were predicted using the Multilayer Perceptron (MLP). The data used in this study came from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The data is the number of positive cases, recovered, and deaths of COVID-19 sufferers in Indonesia in the period 10 January 2020–29 May 2021. The results found, in testing period 19 September 2020 – 29 May 2021 or 37 weeks, forecasting active case using (7,10,2) MLP architecture with learning rates 0.01 provides the most accurate forecasting results compared to other window width and architectures. The means absolute percentage error (MAPE) is 5.27%, the root means square error (RMSE) is 8849.01, and the means absolute error (MAE) is 5703.59. This research is useful as a reference for the government in preparation for conditioning hospital bed capacity in the next two weeks based on accurate predictions of active cases of COVID-19 in Indonesia.

Keywords

Active Case; COVID-19; Multilayer Perceptron

Introduction

COVID-19 is an infectious disease caused by SARS-CoV-2, which is a type of coronavirus. Since its appearance on December 17, 2019, in the city of Wuhan, Hubei, China, until 29 May 2021, there have been more the 170 millions infected people worldwide with a death toll of more than 3.500.000 people. The disease is spread across 220 countries, making it a world pandemic. Indonesia is one of the countries currently hit by this pandemic. COVD-19 was confirmed to have first appeared in Indonesia on March 2, 2020. At that time, two people contracted COVID-19 as a result of contact with a Japanese national. This is known after a Japanese citizen was declared infected with the coronavirus after leaving Indonesia and arriving in Malaysia. Since the beginning of the development of the number of COVID-19 cases in Indonesia, it has continued to increase, until 29 May 2021, there have been 1.809.926 people infected by COVID-19 with the number of active cases 99.690 cases. Based on the Worldometer, Indonesia is ranked 18th in the World and 4th in Asia for positive cases of COVID-19. The Indonesian government has taken various steps to curb the spread of COVID-19 cases in Indonesia. On March 16, 2020, two weeks after the emergence of the first case of COVID-19 in Indonesia, the government implemented social distancing for several regions in Indonesia. Then, two weeks later on March 30, 2020, the Indonesian government implemented the Large-Scale Social Restrictions to curb the pace of COVID-19 in Indonesia. On May 21, 2020, the Government said that Indonesia had entered a new normal stage and social restrictions were no longer in effect. 27 July 2020 The government replaced the new normal diction with New Habit Adaptation because it felt that the new normality was not quite right to describe Indonesia's condition at that time. On 13 January 2021, the Government officially starts vaccination program in Indonesia. Interventions carried out by the government have changed from time to time, but this has not been able to contain the spread of COVID-19 cases in Indonesia to date. The increase in the number of active and positive COVID-19 cases in Indonesia is getting faster.

The active case talks about COVID-19 patients who need medical care and is directly related to hospital capacity. According to the Ministry of Health, in 26 January 2021, Indonesia Bed Occupancy Rate (BOR) for COVID-19 case is 63.66%. It means that 63.66% isolation bed for COVID-19 patients in Indonesia have been occupied. Bed Occupancy Rate (BOR) is a measure that reflects the hospital's ability to provide proper care to patients. The Australian Medical Association, the Irish Medical Organization [1], and the Australasian College for Emergency Medicine [2] consider that bed-occupancy rates above 85% harm safe and efficient hospital care. In the United Kingdom, the Department of Health found that bed occupancy rates exceeding 85% can cause problems in hospital services when serving emergency cases.

It can be concluded that the BOR of hospitals in Indonesia as a whole has not reached a dangerous level and still has a buffer, provided that so far not all active cases have been treated in health facilities such as hospitals. It happened due to health triage. Health triage is a classification of COVID-19 patients based on the severity of their symptoms. Triage consists of three levels, namely emergency sign, priority sign, and non-urgent. For COVID-19 patients, priority is given to emergency and priority signs for hospitalization, while for the non-urgent category, they will undergo independent isolation at their respective residences. Even so, this independent isolation is a hidden danger that the parties need to be aware of.

The trend of positive cases of COVID-19 in Indonesia is still increasing. One of the challenges faced by the Indonesian government in this regard is facing the possible increase in active cases of COVID-19. Although hospitals in Indonesia as a whole have not experienced overcapacity, this has happened in several areas such as Bekasi, Jakarta, Bengkulu, Pekanbaru, etc. This needs to be watched out for because if the number of active cases exceeds the capacity of the hospital, it is feared that there will be a jump in the death rate due to patients who do not receive proper medical treatment.

The government can condition health facilities to deal with the increase in active cases in the future. To do this, the government needs several considerations. Therefore, the prediction of the number of COVID-19 cases is a strategic matter to pay attention to. The prediction of active cases can be used as a reference for conditioning existing health facilities as well as a reference for the Indonesian government to ensure that they have sufficient resources to deal with the number of active cases that exist in the future. In this way, the policies made can reduce the death rate and increase the recovery rate from the COVID-19 pandemic. Accurate predictions will greatly assist the government in determining steps and taking policies to deal with the COVID-19 in Indonesia.

Materials and Methods

Corona Virus Disease 2019 (COVID-19)

COVID-19 is a contagious disease caused by the coronavirus, SARS-CoV-2, which is a pathogen that attacks the respiratory tract. WHO first became aware of the new virus in Wuhan, the People's Republic of China on December 31, 2019. Coronaviruses are viruses that circulate between animals, with some infecting humans. Bats are considered to be the natural hosts of these viruses, and several other animal species are also known as sources. For example, Middle East Respiratory Syndrome Coronavirus (MERSCoV) is transmitted to humans from camels, whereas Severe Acute Respiratory Syndrome Coronavirus-1 (SARS-CoV-1) is transmitted to humans from civets. People who have tested positive for COVID-19 have reported a variety of symptoms - from mild symptoms to severe illness. Symptoms can appear 2-14 days after exposure to the virus. The most common symptoms are having fever, cough, but there are other possible symptoms as well. On March 11, 2020, WHO declared that COVID-19 was a pandemic. At that time data from China showed that adults, especially those with congenital diseases, had a higher risk of developing severe cases of COVID-19 and also a higher mortality rate than younger people. Data from the EU / European Economic Area (from countries for which data is available) shows that around 20-30% of diagnosed COVID-19 cases are hospitalized, and 2% of them suffer from severe disease. However, it's important to note that people with more severe symptoms are more likely to be tested than people with less severe symptoms. Therefore, the actual proportion of people requiring hospitalization out of the total number of infected persons is lower than this figure indicates. Hospitalization rates are higher for those aged 60 years and over, and for those with underlying health conditions.

Artificial Neural Networks

Artificial Neural networks (ANN) are a set of computational units or nodes, which are based on the function of neurons in animals. The ability to process ANN is found in the relationship between neurons, or what is called weights, which is obtained by adapting to learning a set of patterns obtained from training data. ANN is commonly used for statistical analysis and data modeling. Besides that, ANN is also commonly used in classification or forecasting [3]. ANN has three types of layers, namely the input layer, the output layer, and the hidden layer. ANN are divided into two types, namely Feed Forward Neural Networks and Recurrent Neural Networks. Feed Forward Neural Networks (FFNN) are networks where the connections between neurons in the layer do not form a cycle, which means that the input only propagates forward from the input layer to the output layer. If there is no hidden network between the two layers it is called a perceptron, whereas if there is a hidden layer it is called a multi-layer perceptron. When a feed-forward neural network is extended to include a feedback connection, the network is called a Recurrent Neural Network (RNN). Because the neuron layer has its own connections, RNN is considered a network with memory [4].

Multilayer Perceptron

Feed Forwards Neural Networks (FFNN) are networks where the connections between neurons in the layer do not form a cycle, which means that the input only propagates forward from the input layer to the output layer. If there is no hidden network between the two layers it is called a perceptron, whereas if there is a hidden layer it is called a multi-layer perceptron. MLP is a universal approximate, the ability of this universal approximation comes from the nonlinearity of the computational unit (neuron) [5]. When the network starts running, each neuron in the hidden layer carries the computational result of the input and produces the result according to the layer of the existing nodes. MLP has proven its superiority in forecasting in the field of epidemiology. Many of the predictive epidemiological studies use MLP [6,7,8].

Methodology

The data used in this research is secondary data. Data were obtained from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [9]. The data converted into Covid Weekly Data (CWD), which consists of 4 variables, namely confirm, recover, death, and active cases. This data is a weekly case data from 10 January 2020 - 29 May 2021.

Supervised Learning Data Conversion Using Sliding Window Method

Time-series data from CSSE JHU are converted into supervised learning data, using the sliding window method. This method converts the existing time-series data into several hw windows classifier to predict the individual output of y. To be specific, windows (xi,t-d, xi,t-d+1,...,xi, t,...,xi,t+d−1, xi,t+di ) will be used to predict each yi,t. This method converts sequential supervised learning problems into classical supervised learning problems so that making algorithms for classical supervised learning such as back propagation can be done. In this research, 3,4,5,6,7 and 8 lags will be used to form supervised learning data from the time series data they have.

Multilayer Perceptron Feed forwards Neural Networks

The forecasting method used in this study is the Multilayer Perceptron Feed forwards Neural Networks. The data used for training is time-series data for Indonesia COVID-19 active cases that have been converted into supervised learning data. In the development process of neural networks, the procedure used is the determination of hyper parameters, the model training process, and the evaluation of the model. In this research, we use Spyder software in Python language for modelling.

Parameter Setting: Before training neural networks, the parameters are set first. This is done for optimum results which can be defined randomly or using an algorithm. Determination of the parameters is needed to get the optimum model, namely.

Input Neurons: The neurons in the input layer are called input neurons. Input neurons receive input patterns from the outside that describe a problem. The number of nodes or neurons in the input layer depends on the number of inputs in the model and each input determines one neuron.

Learning rate: The learning rate is set during the training process to update the weights on the neurons until they reach the smallest local error value. The learning rate determines how fast the network learns. The learning rate value interval is between 0 to 1. If the learning rate value is close to 0 it will take a long time during training to reach the smallest error but if the value is close to 1 it will result in being stuck at a point that is not the smallest error. Learning rate is a parameter of the optimizer. The optimizer used in this research is Adaptive Moment Estimation (Adam). Adam is used because this optimizer can efficiently solve regression problems and deep learning problems. There is no definite analytical method that can determine what the best learning rate is. To get a good learning rate, trial and error are usually used. In this study, the learning rates used for trial and error are 0.1, 0.01, and 10−3. The use of learning rates with a log scale is an initial recommendation in trial and error with the grid search technique [10].

Hidden Layer: Determining the number of hidden layers can be done by trial and error to get the optimum network. The more layers added do not necessarily produce the best model. Because it can also cause over fitting on the model. In general, one hidden layer is sufficient to solve the problem. Using two hidden layers increases the risk for convergence at a point that is not a local minimum. Hidden Neurons: Determining the number of neurons in the hidden layer is done by trial and error. By increasing the number of neurons, it can increase the capacity of the model to represent an event. However, it can also increase the time and memory used in modeling. Too many neurons can also cause overfitting, which is a condition where the model is only good for training data. Meanwhile, reducing the number of neurons can reduce the ability of a network to carry out the training and testing process.

Maximum Epoch: Epoch is a condition where all data has gone through the training process on the neural network until it returns to the beginning in one round. Each epoch can be partitioned into multiple batches. It is also an efficiency optimization. Table below shows the parameters tested in the study to predict the number of active COVID-19 cases in Indonesia in the next two weeks (Table 1).

MLP Parameter Setting

Neural Networks Evaluation: After obtaining various neural networks for prediction from the training process, then the networks will be evaluated using testing data, to measure the accuracy of these networks. The model obtained will be evaluated using the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). When comparing forecasting methods in one or several time-series with the same unit, RMSE is widely used. Meanwhile, MAE and MAPE are used as comparisons. We do this to find out what is the ideal lag and also the most appropriate architecture to use in forecasting active cases of COVID-19 in Indonesia using MLP.

image

The process of testing each forecasting result will be carried out using the feedforwards validation technique. The picture below shows the testing scheme used in this study (Figure 1).

international-journal-accounting-research-Feedforwards

Figure 1: Feedforwards Validation.

Results and Discussion

In forecasting COVID-19 cases in Indonesia using the MLP method, the following process is carried out. The data used in this study is weekly active case data for the period 10 January 2020 – 29 May 2021. This data then used for forecasting active COVID-19 cases in Indonesia using MLP. The country data made into supervised learning data using sliding windows following the amount of lag determined for the benefit of MLP training. The performance of the MLP architecture and window width also evaluated using RMSE, MAPE, and MAE. Table 1 shows MAPE of the MLP architecture and Window Width used in this study (Table 2).

Based on the results of testing that was carried out in the period 19 September 2020 – 29 May 2021, we found that the largest MAPE value was in the use of 5 lags and a learning rate of 0.01 for forecasts which was 8.43%, while the smallest MAPE value was found in the use of 7 lags and 0.01 learning rate. From these results, it is found that the largest and smallest values of forecasting are found in forecasts using learning rates of 0.01. For comparison, Table 3 shows the RMSE values of all settings used to forecast the number of active COVID-19 cases in Indonesia (Table 3).

Hyperparameter Setting
Learning Rate 0.1, 0.01, 0.001
Input Neurons 3,4,5,6,7,8
Hidden Layer 1
Hidden Neurons 10
Maximum Epochs 1000

Table 1: MLP Parameter Setting.

MAPE
Learning Rates 3 Lags 4 Lags 5 Lags 6 Lags 7 Lags 8 Lags
0.1 5.89% 6.21% 6.04% 5.43% 5.64% 6.90%
0.01 6.52% 7.12% 8.43% 5.98% 5.27% 7.98%
0.001 5.52% 5.83% 5.91% 5.43% 5.83% 6.62%

Table 2: MAPE value.

Based on table 2, it is known that the smallest RMSE value is found in the use of 7 lags and learning rates of 0.01, while the largest RMSE value is found in the use of 8 lags and learning rates of 0.001. From these results, we found that the use of 7 lags and 0.01 learning rates gave the smallest error value, the same as the results of testing using the previous MAPE. Table 3 shows the MAE values of all settings used for forecasting the number of active COVID-19 cases in Indonesia (Table 4).

Table 4 shows similarities with Table 3. The smallest MAE value is found in the use of 7 lags with learning rates of 0.01, which is 5703.593, and the largest value is found in the use of 8 lags with learning rates of 0.001 which is 17544.1. Based on the three evaluation measures used, namely MAPE, RMSE, and MAE, we found that the use of 7 lags and 0.01 learning rates gave the smallest error value. In addition to using MAPE, RMSE, and MAE to evaluate the overall performance of the MLP, the performance of the MLP is also observed for each forecasting period. Graph 1 shows the plot between the actual and predicted values of active COVID-19 cases using an architecture (7,10,2) with an alpha of 0.01 and epochs of 1000 for the first-week forecast (Graph 1).

MAE
Learning Rates 3 Lags 4 Lags 5 Lags 6 Lags 7 Lags 8 Lags
0.1 6178.395 6416.58 6664.675 6032.253 6382.053 7257.202
0.01 7531.044 8236.684 8983.004 6897.677 5703.593 9161.941
0.001 7894.519 15274.03 10346.03 12465.45 11811.86 17544.1

Table 4: MAE value.

Graph 1 shows that the predictions of active COVID-19 cases in the first week of the 19 September 20201 – 29 May 2021 period do not show a significant difference with the actual cases encountered. This shows that in the first week of forecasting the number of active COVID-19 cases in Indonesia, the use of MLP with an architecture (7,10,2) with a learning rate of 0.01 and an epoch of 1000 has provided accurate results. In addition to the first week, MLP performance also needs to be considered in forecasting active COVID-19 cases for the second week. Graph 2 shows the plot between the actual value and the forecast results for active COVID-19 cases using an architecture (7,10,2) with an alpha of 0.01 and an epoch of 1000 for the second week's forecast (Graph 2).

international-journal-accounting-research-Actual

Graph 1: Actual cases of COVID-19 and first-week prediction results.

international-journal-accounting-research-cases

Graph 2: Actual cases of COVID-19 and second-week prediction results.

In the second week of forecasting, it was found that the predicted value of the MLP tends to be close to the actual value, except for the period January 16–February 20, 2021. In that period, a large difference was found between the predicted value and the actual value of active COVID-19 cases in Indonesia. This can be seen through Graph 2. During that period, the number of positive cases of COVID-19 in Indonesia increased significantly compared to the previous period, so this made the prediction results for that period differ greatly from the actual value. When compared to the first week, the second week's predictions have less accuracy. This is natural, because the longer the forecasting period, the greater the uncertainty faced. After observing the actual value and the predicted results of active COVID-19 cases in Indonesia, we calculate the error that occurred in each forecasting period. In measuring the error in each forecast, Absolute Percentage Error (APE) is used. Graph 3 shows the APE value for the first and second-week forecasts for the period 19 September 2020–29 May 2021 (Graph 3).

Based on Graph 3, the APE value in the second period was relatively higher than the first period, this is in line with what was shown in the previous Graphs 1 and 2. The largest APE value was found in the 46th-week forecast, which was on January 16, 2021. On that date, Indonesia experienced significant increase in the number of active COVID-19 cases of 78256 cases, 18343 cases greater than the increase in the number of positive cases previously on January 9, 2021, which amounted to 59913 cases.

international-journal-accounting-research-September

Graph 3: APE Value for 19 September 2020 – 29 May 2021 COVID-19 Active Case Forecast.

RMSE
Learning Rates 3 Lags 4 Lags 5 Lags 6 Lags 7 Lags 8 Lags
0.1 9367.843 9382.373 10272.6 9602.118 10002.81 10411.41
0.01 11611.7 12518.77 12720.13 10540.13 8849.006 13736.51
0.001 11831.01 20821.36 14050.53 16419.98 17486.52 26875.48

Table 3: RMSE value.

Conclusion

Based on the research results, we found that overall forecasting using MLP for the prediction of weekly active cases in the next two weeks provides relatively good performance with APE value not exceeding 20% in the first week. In second period forecast, we found that the APE are relatively larger than in the first week forecast, with the largest APE found in week 46 is 21.92%. Among the various kinds of window width and also the architecture used, we found that the use of architecture (7,10,2) with learning rates 0.01 and epoch 1000 provides the best performance for forecasting active cases of COVID-19 in the last 37 weeks in Indonesia. Hopefully, these results can be used as a reference by the Indonesian government to forecast active cases accurately for the next two weeks. So the government can condition the number of hospital beds following the results of existing predictions, and that all patients who are positive for COVID-19 can be given good care, and no one is rejected by the hospital because the hospital is full.

Acknowledgments

The author gives thanks to the supervisor who has supervised and guided the conduct of this research, and also the Statistics Department of Padjadjaran University as a place to study so that this research can be carried out.

REFERENCES

Citation: Hidayat Y, Pangestu DS (2021) Forecasting the Number of COVID-19 Active Cases in Indonesia Using the Statistical Multilayer Perceptron Feed-forwards Neural Networks. Int J Account Res 9:235. doi: 10.35248/2472-114X.21.9.235

Copyright: © 2021 Hidayat Y, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.