Prediction of COVID-19 New Cases Using Multiple Linear Regression Model Based on May to June 2020 Data in Ethiopia

The aims of this study was to predict COVID-19 new cases using multiple linear regression model based on May to June 2020 data in Ethiopia. The COVID-19 cases data was collected from the Ethiopia Ministry of Health Organization Facebook page. Pearson’s correlation analysis and linear regression model were used in the study. And, the COVID-19 new cases was positively correlated with the number of days, daily laboratory tests, new cases of males, new cases of females, new cases from Addis Ababa city, and new cases from foreign natives. In the multiple linear regression model, COVID-19 new cases was significantly predicted by the number of days at 5%, the number of daily laboratory tests at 10%, and the number of new cases from Addis Ababa city at 1% levels of significance. Then, the researchers recommended that Ethiopian Government, Ministry of Health, and Addis Ababa city administrative should give more awareness and protections for societies, and they should open again more COVID-19 laboratory testing centers. And, this study will help the government and doctors in preparing their plans for the next times. Original Research Article Argawu et al.; JPRI, 33(51A): 54-63, 2021; Article no.JPRI.76078 55


INTRODUCTION
Coronavirus disease (COVID-19) is an infectious disease that is caused by severe acute respiratory syndrome known as coronavirus. The COVID-19 was first identified on 31 December 2019 in the city of Wuhan, which is the capital of Hubei Province in China. Some of the common signs of COVID-19 include fever, shortness of breath, and dry coughs. Other uncommon symptoms include muscle pain, mild diarrhea, abdominal pain, sputum production, loss of smell, and sore throat [1][2][3]. On 11 March 2020, the WHO announced that it was a global pandemic [4].
As the Worldometer coronavirus updates information reported on 15 th of June 2020, we have 8,028,253 total COVID-19 cases, and 4,148,128 totals recovered with 51.7% recovered rate as globally. It was distributed from highest to lowest ranks of the new cases by the World Regions as follows: North America has led by 2,480,701 total new cases (1 st ), Europe has 2,398,779 total new cases (2 nd ), Asia has 1,616,962 total new cases (3 rd ), South America has 1,425,696 total new cases (4 th ), Africa has 244,578 total new cases (5 th ), and the last was Oceania by 8,931 cases. In the report, the male and female cases were 71% and 29%, respectively [5]. And, Ethiopia was ranked as the 2 nd , 15 th , 16 th, and 23 rd on the table by 176 new cases, by 3,521 total COVID-19 cases, and by 620 total recovered (17.6%) as compared from African countries on this date as the Worldometer coronavirus updates information showed [5].
This report also showed that Ethiopia was listed in 27 th place by the capacity of COVID-19 laboratory tests. It was 1,629 tests per 1,000,000 populations. This is bad news for Ethiopia. Currently, the Ethiopian population is near 115 million. This is the fact that Ethiopia has a very low proportion of COVID-19 laboratory tests compared with other countries' tests. This report indicated that Ethiopia needs increasing efforts and strategies to increase daily laboratory tests. Otherwise, Ethiopia will be the next ''African USA'' by beating COVID-19.
And, as the researchers observed COVID-19 new cases were alarmingly increased in the study period (12 th of May to 10 th June, 2020) as the Ethiopia Ministry of Health Organization Facebook page report shown. Then, this study aimed to predict COVID-19 new cases using multiple linear regression model based on May to June 2020 data in Ethiopia.

Source of Data and Study Period
The COVID-19 new case report data were collected from the Ethiopia ministry of health organization Facebook page. The study period of data was from 12 th of May to 10 th of June 2020 (for the last 30 days) since complete information was available on this study period only but not on the previous dates on the Facebook page. Total number of COVID-19 new cases, date of record, number of new recoveries, number of new cases from Addis Ababa city and some regions, number of males and females, maximum and minimum ages of the patients were collected and included in the study.

Pearson's Correlation Coefficient
Correlation is a statistical method used to assess a possible linear association between two continuous variables. It is simple both to calculate and to interpret. Pearson's correlation coefficient is denoted as r for a sample statistic. For a correlation between variables x and y, the formula for calculating the sample Pearson's correlation coefficient is given below [6].
Where: x i and y i are values for variables x and y for the i th individual.

Linear Regression Model
The regression model has many variants such as linear regression, polynomial regression, and others [7]. In this study, the fitted simple and multiple linear regression models were used to determine the most predictor variables for COVID-19 new cases from 12 th May to 10 th June 2020 in Ethiopia.
The fitted simple linear regression model equation is written as follow: Where: an estimated COVID-19 new cases. And X 1 is an independent variable with its corresponding estimated coefficients (b 1 ). And, b 0 is the intercept coefficient in the model.
The fitted multiple linear regression model equation is given as follow: Where, an estimated COVID-19 new cases. And X 1 , X 2 , and X 3 are independent variables with their corresponding estimated coefficients (b 1 , b 2 , and b 3 ). And, b 0 is the intercept coefficient in the model.

Polynomial Regression Model
The fitted quadratic and cubic regression models were used to estimate the parameters of independent variables as X, X 2 , and X 3 . All the estimated parameters (b 1 , b 2 , and b 3 ) shown the change of Y when the independent variable changed from x to x+1 [7]. The fitted quadratic regression model equation is given as follow: The fitted cubic regression model equation is given as follow: Where: all terms are stated above.

COVID-19 New Cases by Regions and Genders
From the total number of 2,257 COVID-19 new cases, the majorities (64%) were males and 36% of them were females. This indicated that the male group was infected less in Ethiopia as compared with world male cases was 71%. Addis Ababa city has covered the majority (74%) of the Pandemic.

Descriptive Statistics of COVID-19 Cases
The average value of COVID-19 conducted laboratory tests was 4,065 per day with its min (1,775) and max (6,187) in the given duration. The average value of COVID-19 new cases was 75 per day, with minimum and maximum values of 2 and 190, respectively. Addis Ababa (ADDIS ABABA) city had recorded the highest COVID-19 new cases (56) per day in the given duration. In this duration, the maximum and minimum new cases in Addis Ababa city were 153 and 0 with mean value of 56 new cases, respectively. The city covered more than 70% of total cases in the country. The average values of the minimum and maximum ages of COVID-19 new cases were 9.4 years and 71 years with their smallest and largest ages of 1 month and 115 years, respectively (Table 3).

Correlation Analysis for COVID-19 New Cases
The correlation analysis showed that there were significant positive correlations between COVID-19 new cases and the number of days, daily laboratory tests, new cases of males, new cases of females, new cases from ADDIS ABABA city, and new cases from foreign natives (Table 4).

Regression Model for COVID-19 New Cases
The linear regression model had the highest Fvalue (120.7) and the smallest MSE value (637.4) as compared with quadratic and cubic models. And, the number of days was a significant predictor for new cases in the linear regression model (p-value of 0.000). But, this variable and its two expressions were not predictors in the quadratic and cubic regression model. And, this fitted linear regression model was much better than the quadratic and cubic models. However, the three models have similar R square values like 81% and 82% variations of COVID-19 new cases was explained by the models (Table 5 and Fig. 1).
The estimated linear regression equation was given as:-This implied that the new cases will be increased to 585 after 100 days.
Daily laboratory test was also significant predictor for new cases in the linear regression model (p-value of 0.000). The fitted linear regression model has the highest F-value (19.5) and but not the smallest MSE value (1993.3) as compared with quadratic and cubic models. Thus, the fitted linear regression model was much better than the quadratic and cubic models. However, the cubic regression model has a better R square value as 46% variations of COVID-19 new cases was explained by the model. And, the linear regression model explained 41% of the variations (Table 5 and Fig.  2).
The fitted linear regression equation is given below.
This indicated that the new cases will be raised to 3,400 if 100,000 laboratory tests were conducted daily.
Similarly, new case from Addis Ababa city was predicted significantly in the new cases in the linear regression model (p-value of 0.000) with R 2 =93% of the new cases variations were explained by cases from Addis Ababa city (Table  5 and Fig. 3).
Its estimated linear regression equation is defined as follow.
This suggested that the country's new cases will be increased to 12,000 if 10,000 new cases were found in ADDIS ABABA city.

Multiple Linear Regression Model for COVID-19 New Cases
In this model, COVID-19 new cases were predicted significantly by the number of days, daily laboratory tests, and new cases from Addis Ababa city at 5%, 10%, and 1% levels of significance, respectively (Fig. 4).    (Fig. 4, R-software output).

Multiple Linear Regression Assumptions
The multiple linear regression assumptions were tested correctly as shown on Fig. 5.

DISCUSSION
In the correlation analysis for COVID-19, new cases had significant and positive correlations with the number of days (r = 0.901), daily laboratory tests (r = 0.641), new recoveries (r =0.389), new cases from males (r =0.985), new cases from females (r = 0.964), new cases from ADDIS ABABA city (r = 0.965), and new cases from foreign natives (r = 0.416).
The simple linear regression model was a better fit for the data of COVID-19 new cases than quadratic and cubic regression models. In this fitted model, COVID-19 new cases were significantly predicted by the number of days (B= 5.85), daily laboratory tests (B= 0.034), and new cases from Addis Ababa city (B=1.2) at a 5% level of significance. A study from Indian found that the linear regression growth model was more specific to predict the number of affected cases of COVID-19 than the exponential growth model. These models are used for forecasting in long term intervals. And, another study from Indian showed that the linear model was the best fitting model for Region III from May 3 rd to May 15 th [8][9][10].

Fig. 5. R-Output of multiple linear regression assumptions
In the multiple linear regression model, COVID-19 new cases were predicted significantly by the number of days, daily laboratory tests, and new cases from Addis Ababa city at the 5%, 10%, and 1% levels of significance, respectively. Thus, COVID-19 new cases were predicted to increase 135, 503, and 881 when the number of days increased by 100 days, the daily laboratory tests increased by 100,000 tests, and the new cases from Addis Ababa city increased by 10,000 tests while holding other variables constant.
Odhiambo et al. [9] from Kenya showed that there was a correlation between COVID-19 new cases and contact persons made by the confirmed status as well as the number of flights from foreign countries to Kenya. The study used univariate analysis of the generalized linear model showed that contact persons in Kenya had 0.265 effects on COVID-19 cases in Kenya. In the multivariate analysis, the contact persons and flights to Kenya had 0.278 and 3,309 effects on COVID-19 cases in Kenya at 5% and 10% levels of significance, respectively. the researchers also used the compound Poisson regression model, which showed that as the COVID-19 day increased by 235, the COVID-19 new cases were projected to 83,418 new cases.
Ghosal et al. [11] from India used a linear regression analysis to predict the average week 5 and 6 death counts. Thus, our study agreed with this study on the correlation analysis but not on the linear regression analysis.
Mahnnty et al. [12] analyzed COVID-19 cases from India, Pakistan, Myanmar (Burma), Brazil, Italy and Germany till June 4, 2020 and predictions have been made for the number of positive cases for the next 28 days. In the study, Verhulst model fitting effect is better than Gompertz and SIR model with R-score 0.9973. The proposed model perform better as compare to other three existing models with R-score 0.9981.These above models can be adapted to forecast in long term intervals, based on the predictions for a short interval as of June Pandey et al. [13] from India analyzed COVID-19 cases from 30 th January to 30 th March 2020 and predictions were made for the number of cases for the next 2 weeks. SEIR model and Regression model were used for predictions based on the data collected from John Hopkins University repository. The performance of the models was evaluated using RMSLE and achieved 1.52 for SEIR model and 1.75 for the regression model. The RMSLE error rate between SEIR model and Regression model was found to be 2.01. Also, the value of R0 which is the spread of the disease was calculated to be 2.02. Expected cases may rise between 5000-6000 in the next two weeks of time.
Ayyoubzadeh et al. [14] from Iran used linear regression model predicted the incidence with root mean square error (RMSE) of 7.562 (SD 6.492). The most effective factors besides previous day incidence included the search frequency of hand washing, hand sanitizer, and antiseptic topics. The RMSE of the long shortterm memory model was 27.187 (SD 20.705).
Rath et al. [15] used a comparison of linear regression and multiple linear regression model was performed where the score of the model tends to be 0.99 and 1.0 which indicates a strong prediction model to forecast the next coming days active cases. Using the multiple linear regression model as on July month, the forecast value of 52,290 active cases are predicted towards the next month of 15 th August in India and 9,358 active cases in Odisha if situation continues like that way.
Teresa et al. [16] from Canada found a positive association but not statistically significant between cumulative incidence and ambient temperature (14.2 per 100,000 people; 95%CI: −0.60-29.0) using multiple linear regression models. The study showed that there was no a statistically significant association between total cases or effective reproductive number of COVID-19 and ambient temperature.
Researchers from South Korea Lee et al. [17] found newly confirmed COVID-19 patients have been decreasing since March 2020 while the traffic has been increasing. The study also showed that traffic was increasing indicates greater contact between people, which in turn increases the risk of further COVID-19 spread using non-linear regression and single linear regression models.
Another researcher Sansa N.A [18] from China found that there was a significant positive correlation between the COVID-19 confirmed cases and Recovered cases using a simple regression linear model from the period dated 20 January 2020 to 23 February 2020.

Conclusion
The total number of COVID-19 cases from May 12 th to June 10 th 2020 was increased by 9.1 times as compared with 14 th of March to 11 th May 2020 in Ethiopia. In the correlation analysis, the COVID-19 new cases were significantly correlated with the number of days, daily laboratory tests, new recoveries, new cases of males, new cases of females, and new cases from Addis Ababa city. In the simple linear regression, variables such as the number of days, daily laboratory tests, new recoveries, new cases of males, new cases of females, new cases from Addis Ababa city, and new cases from foreign natives were significantly predicted COVID-19 new cases. But only the number of days, daily laboratory tests, and new cases from Addis Ababa city were significantly predicted COVID-19 new cases at 5%, 10%, and 1% levels of significance using the multiple linear regression model. And based the model prediction, COVID-19 new cases will be increased by 135, 503, and 881 new cases when the number of days, daily laboratory tests, and new cases from Addis Ababa city are increased by 100 days, 100,000 tests, and 10,000 cases, respectively. Then, if strong preventions and actions will not been taken in the country, the predicted values of COVID-19 new cases will be 590 after the 9 th of August 2020.

Recommendation
The researchers recommended that Ethiopia government, Ministry of Health and Regional Governments (especially the Addis Ababa city administrative) should give more awareness and protections collaboratively for societies, and they should also open more COVID-19 laboratory testing health centers in different areas of the country to ensure that those health centers can test more persons as the number of days increases, and the number of new cases will be highly increased as predicted in this study. With these preventive and curative measures, the severity of COVID-19 will be limited when compared to other countries, such as the USA, South Africa, and Egypt, which are now leading in the number of new cases in the world and Africa. This research work will be extended after looking for the spread of the disease instantaneously by using a comparison of linear regression and time serious models. And, this study will help the government and doctors in preparing their plans for the next times. Based on the predictions for short-term interval, these models can be tuned for forecasting in long-term intervals.

CONSENT AND ETHICAL APPROVAL
It is not applicable.