Implications of Delayed Reopening in Controlling the COVID-19 Surge in Southern and West-Central USA

In the wake of the rapid surge in the COVID-19-infected cases seen in Southern and West-Central USA in the period of June-July 2020, there is an urgent need to develop robust, data-driven models to quantify the effect which early reopening had on the infected case count increase. In particular, it is imperative to address the question: How many infected cases could have been prevented, had the worst affected states not reopened early? To address this question, we have developed a novel COVID-19 model by augmenting the classical SIR epidemiological model with a neural network module. The model decomposes the contribution of quarantine strength to the infection time series, allowing us to quantify the role of quarantine control and the associated reopening policies in the US states which showed a major surge in infections. We show that the upsurge in the infected cases seen in these states is strongly corelated with a drop in the quarantine/lockdown strength diagnosed by our model. Further, our results demonstrate that in the event of a stricter lockdown without early reopening, the number of active infected cases recorded on 14 July could have been reduced by more than 40% in all states considered, with the actual number of infections reduced being more than 100,000 for the states of Florida and Texas. As we continue our fight against COVID-19, our proposed model can be used as a valuable asset to simulate the effect of several reopening strategies on the infected count evolution, for any region under consideration.

shows the application of the model to the north-eastern states of New York, New Jersey and Illinois along with the diagnosed quarantine strength function Q(t) for these states. These states do not show a decline in Q(t). This corresponds well to the delayed reopening and generally stronger quarantine measures employed in the North-Eastern US states. Since Q(t) does not decrease, these states did not show a surge in infections starting June 2020, unlike their Southern and West-Central counterparts. The difference in these results between the North-Eastern and Southern, West-Central states indicates two things: (a) it strengthens the validity of our proposed model in capturing the real-time reopening scenario in different states through the evolution of the diagnosed Q(t), and, more importantly, (b) it further validates the role played by early reopening in reducing Q(t) and subsequently leading to a surge of new infected cases in the Southern and West-Central US states.
Impact of early reopening on the states of Louisiana, Florida, Oklahoma, Texas and Utah Figure S2, S3 implements a similar analysis to study the effect of early reopening for the states of Louisiana, Nevada, Oklahoma, Texas and Utah, as done for the states of Arizona, Nevada, South Carolina and Tennessee. Similar to the states considered in the main text, we see that all of these states show a decline in Q(t) starting around the time when these states were reopened. If these states were not reopened early, a large number of infections would have been reduced as demonstrated in Table 3 of the main text.

Equivalence between the ODE model and the Chemical Langevin SDE model
This analysis heavily borrows from the pioneering work done by Gillespie. 1 In this section, we will establish that the deterministic ODE model and the stochastic Chemical Langevin equation originate from a common expression: the chemical master equation, 2 and are closely linked to one another. Following is the notation we will use, in accordance with 1 We consider N compartments: S 1 , S 2 . . . S N and R reaction channels: R 1 , R 2 . . . R M in a fixed volume Ω. In our case, we have N = 4 (S, I, R, T ) compartments and R = 4 reaction channels. We denote the dynamical state of the system at any time t as X(t) = (X 1 (t), X 2 (t) . . . X N (t)) where • X i (t) : total number of S i molecules (in our case: individuals) in the system.
• Propensity function a j (x)dt : probability that a reaction R j will occur somewhere in Ω in the next time interval [t, t+dt] for j = 1, 2 . . . M .
• State change vector ν j whose ith component is defined by ν j,i : change in the number of S i molecules produced by one R j reaction for i = 1, 2 . . . N , j = 1, 2 . . . M . In our case ν j,i = ±1.
From the definition of a j (x)dt, we can write the probability of the system being in state x at time t + dt (we take the sum of all mutually exclusive ways either through one reaction or no reaction in [t, t+dt]): Taking the limit of (1) as dt -¿ 0 leads to the chemical master equation .
Macroscopic picture: Deterministic model relation to the chemical master equation: Multiplying the chemical master equation (2) by x i and summing over all x, we obtain for the mean of Thus, whenever fluctuations are not important, the species populations evolve deterministically according to the following set of ordinary differential equations (4) is the basis for the classical SIR epidemiological equations, and we see how they evolve from the chemical master equation (2).
Microscopic picture: Stochastic Simulation Algorithm and its relation to the master equation: Another consequence of the master equation (2) is the existence and form of the next-reaction density function p(τ, j x, t), which is defined as • p(τ, j x, t)dτ = probability that given X(t) = x, the next reaction in Ω will occur in [t + τ, t + τ + dτ ], and will be an R j reaction Since ∑ j a j (x)dt is the probability that some reaction occurs in the time interval dt, the probability that a time interval τ is spent without any reaction occuring is given by the exponential distribution: Exp(∑ j a j (x)τ ). Thus, we obtain for p(τ, j x, t) (9) is the basis for the stochastic simulation algorithm in which Monte-Carlo techniques are used to construct unbiased realizations of the process X(t). A typical algorithm for stochastic simulation of this kind, is the Gillespie Algorithm 3 which can be viewed as a discrete space continuous time Markov jump process, with exponentially distributed jump times.

Chemical Langevin Equation:
Bridging the gap between macroscopic and microscopic models: Let the state of the system X(t) at the current time t be x t . Let K j (x t , τ ) be the number of R j reactions that occur in the time interval [t, t+dt]. Thus, the number of S i molecules in the system at time t + τ will be Due to this condition, K j (x t , τ ) will be a statistically independent Poisson random variable P j (a j (x t ), τ ). Thus (10) simplifies to • Condition 2: Large number of reaction occurrences: This condition requires τ to be large enough so that the expected number of occurrences of each reaction channel R j in [t, t + τ ] is much larger than 1. Thus This condition enables us to approximate each Poisson variable P j (a j (x t ), τ ) by a normal random variable with the same mean and variance.
Thus, (12) further simplifies to where N (m, σ 2 ) denotes the normal random variable with mean m and variance σ 2 . Using N (m, σ 2 ) = m + σN (0, 1), denoting the time interval τ by dt and the unit normal random variable N j (0, 1) as N j (t), we obtain (15) can be written as a stochastic differential equation as where Γ j (t) are temporally uncorrelated, statistically independent Gaussian white noise processes.
(16) is the Langevin equation, and it derives from the master equation provided that Condition 1 and Condition 2 are satisfied.
The Langevin equation (16) form of the ODE system (5-8) leads to the stochastic differential equation used in the current study In (17), W i (t) ∼ N (0, t) is a normally distributed random variable with mean zero and variance t or dW i (t) ∼ N (0, dt). It should also be noted that each W i (t) represents an independent Brownian motion.
Comparison of the macroscopic, microscopic and Langevin SDE model for our study Figure S4a shows that the microscopic Stochastic Simulation Gillespie Algorithm and the ODE model presented in Equation (6-9) in the main text show a good agreement with each other. Figure S4b shows the comparison of the Chemical Langevin SDE model shown in (17) ran for 1000 trajectories and the ODE model; which also show a good agreement. Thus, we have shown the equivalence between the microscopic, macroscopic and the Chemical Langevin model for our study. This equivalence allows us to add fluctuating components to the standard deterministic SIR model as shown in (17) and quantify the uncertainty resulting from these fluctuations. Table S1 shows the Model Mean Absolute Percentage Error (MAPE), epochs needed for convergence and number of parameters optimized for the different states considered.

Parameter Inference: Gaussian Process Residue Model
In order to validate the robustness of the model and the uniqueness of the parameters recovered by the model, we consider a Gaussian Process residue model for uncertainty quantification. Gaussian Processes have emerged as a useful tool for regression, classification, clustering and uncertainty quantification.      In the present study, we fit a Gaussian Process regression model between the error resulting from the best fit model and the infected data. For the prior over the function space, we use a mean of zero and variance described by a Squared Exponential Kernel with a lengthscale of 1 and a significantly high signal standard deviation of O(10 4 ) which allows for noisy estimates of the posterior. Such a fitted model for the infected count for a region under consideration (Arizona), is shown below in figure S5. Subsequently, we sample 500 error residues from this model and superimpose them on the best fit predictions to simulate 500 samples of the infected case count data. Finally, we apply our model described on these 500 samples of data, and recover the parameters Q(t), β, γ, δ from each of them. Figures S6, S7 shows the inferred parameters for 500 realizations of the Gaussian process residue model superimposed on the best fit model prediction applied to all states considered, and shown for (a) the quarantine strength function Q(t), (b) the contact rate β and (c) the recovery rate γ +δ. It can be seen that for all realizations, Q(t) is seen to follow a similar behaviour, which lies close to the best fit model prediction. In addition, the inferred histograms for the contact rate β and the recovery rate γ + δ show a peak which is close to the best fit model prediction. This further validates the robustness of the model for other regions considered and strengthens the uniqueness of the parameters recovered by the model. A total of 12 million iterations (60000 iterations for each realization of the Gaussian process residue model × 500 realizations) were performed on the MIT Supercloud cluster to generate parameter histograms for each state considered.

Model validation: Calculation of the effective reproduction number
Following a previous study, 6 we define the Covid spread parameter as follows: where S(t) is the susceptible population and N is the total population. This definition of the Covid spread parameter C p (t) is equivalent to the effective reproduction number R ef f (t) in the context of the QSIR model. We included both γ, δ in the definition of C p (t) since both these parameters eventually contribute to the recovered population and we wanted to include effects of both. Another viable option to define C p (t) could be to just use γ in the denominator of C p (t). Figure S8 shows the comparison of the Covid spread parameter, as defined in Equation 21 with and without reopening for all US states considered in the present study. For all the states, we can see that without reopening, a diminished effective reproduction number is seen, indicating moving in the right direction of halting the infection spread.   Figure S7: [Parameter Inference for US states] Inferred parameters for 500 realizations of the Gaussian process residue model superimposed on the best fit model prediction and shown for the quarantine strength function Q(t) (left column), the contact rate β (middle column) and the recovery rate γ + δ (right column) for the US states considered in the present study. A total of 12 million iterations were performed on the MIT Supercloud cluster to generate parameter histograms for each region. To further validate the Covid spread parameter variation and its relation to the effective reproduction number, we compare the variation in C p to the R ef f obtained through a prominent Covid-19 forecasting model used by the CDC, USA. 7,8 For all of the 9 states which we considered, the time at which an upsurge is seen in C p due to early reopening corresponds very well to the exact time at which an upsurge is seen in R ef f . 7,8 In addition, we show the comparison between C p values estimated from our study and R ef f values obtained from 7 Table S2: Cp and R ef f value ranges from reopening till one month post that, for 6 states considered in our study; lie close to each other.