Structural Comparison and Drug Screening of Spike Proteins of Ten SARS-CoV-2 Variants

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) has evolved many variants with stronger infectivity and immune evasion than the original strain, including Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Iota, Lambda, and 21H strains. Amino acid mutations are enriched in the spike protein of SARS-CoV-2, which plays a crucial role in cell infection. However, the impact of these mutations on protein structure and function is unclear. Understanding the pathophysiology and pandemic features of these SARS-CoV-2 variants requires knowledge of the spike protein structures. Here, we obtained the spike protein structures of 10 main globally endemic SARS-CoV-2 strains using AlphaFold2. The clustering analysis based on structural similarity revealed the unique features of the mainly pandemic SARS-CoV-2 Delta variants, indicating that structural clusters can reflect the current characteristics of the epidemic more accurately than those based on the protein sequence. The analysis of the binding affinities of ACE2-RBD, antibody-NTD, and antibody-RBD complexes in the different variants revealed that the recognition of antibodies against S1 NTD and RBD was decreased in the variants, especially the Delta variant compared with the original strain, which may induce the immune evasion of SARS-CoV-2 variants. Furthermore, by virtual screening the ZINC database against a high-accuracy predicted structure of Delta spike protein and experimental validation, we identified multiple compounds that target S1 NTD and RBD, which might contribute towards the development of clinical anti-SARS-CoV-2 medicines. Our findings provided a basic foundation for future in vitro and in vivo investigations that might speed up the development of potential therapies for the SARS-CoV-2 variants.


Introduction
Coronavirus disease 2019 (COVID-19) outbreak began in December 2019 and has caused more than 4.8 million deaths, according to the statistics of the World Health Organization (WHO), as of October 15, 2021 (https://www.who .int/). COVID-19 is caused by severe acute respiratory syn-drome coronavirus 2 (SARS-CoV-2), a positive-sense RNA betacoronavirus belonging to the family Coronaviridae [1,2]. SARS-CoV-2 possesses a large genome of approximately 30 kb [3], which encodes for four structural proteins, spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins, and sixteen nonstructural proteins (Nsp 1-16) [4][5][6]. Among these proteins, the S protein plays an important role in binding the angiotensin-converting enzyme 2 (ACE2) of the host cell, which helps the virus to enter the host cell [7]. The S protein can be recognized by and bond with the cell surface toll-like receptor 4 (TLR4), as well as antibodies, so it is a target for immunological recognition [8,9].
All viruses, including SARS-CoV-2, change over time. Although the evolutionary rate of SARS-CoV-2 is low, which displays a change of 1 or 2 nucleotides per month per lineage in the 30 kb pairs [10], a long-time and extensive spread of SARS-CoV-2 have induced some unexpected mutations that can increase virus transmission and disease severity [11][12][13]. So far, the worldwide spreading variants of SARS-CoV-2 are Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Iota, Lambda, and Mu (21H) named by the WHO. The WHO classifies the variants of Alpha, Beta, Gamma, and Delta to variants of concern (VOC) [14][15][16][17][18]. Previous studies demonstrated that the Delta variant decreased the effectiveness of vaccines and increased the breakthrough infection rates [19,20]. Many researchers have focused on developing anti-SARS-CoV-2 drugs and found some potential drugs, such as Azvudine [21], Molnupiravir [22], Paxlovid, and antibodies [23,24].
The mutations on the S proteins have been reported to affect both the binding affinity with ACE2 and the efficacy of antibodies [12,[25][26][27]. Moreover, the S protein and its parts are important for designing most approved vaccines, and thus, the mutations on the S protein raised much more concern about the vaccine effectiveness of SARS-CoV-2 [12,[28][29][30][31]. Many researchers have focused on exploring the S protein structures of different SARS-CoV-2 variants by experimental and modeling methods, which has contributed immensely to our understanding of how the mutations alter the structure and function of the S protein [32][33][34][35]. However, due to rapidly increasing variants, it is still a challenge to reveal the S protein structures of all SARS-CoV-2 variants. AlphaFold2 (AlphaFold) is a computational approach capable of predicting protein structures with high accuracy, which provides us with a new method to quickly predict protein structures according to their genetic sequences. Utilization of this powerful approach can help us to solve the challenge of revealing the S protein structures of different SARS-CoV-2 variants.
In this study, we used the AlphaFold to model the spike protein structures of ten SARS-CoV-2 variants. The highaccuracy structures were verified by the comparison with experimental structures and the pLDDT (the predicted local-distance difference test) of the AlphaFold built-in algorithm. To classify the SARS-CoV-2 strains, we performed phylogenetic analyses based on the genomic and protein sequences of all strains. Moreover, we analyzed the binding affinities of ACE2 and antibodies to S1 RBD and NTD, revealing that the mutations on the Delta variant S protein could affect the recognition of antibodies with S1 RBD and NTD, which might increase the immune evasion. Furthermore, we identified multiple compounds that target S1 NTD and RBD by virtual screening the ZINC database and experimental validation, which might contribute towards the development of clinical anti-SARS-CoV-2 medicines.
Our results provided abundant basic data for further research related to preventing and curing COVID-19.

Analyses of the Sequence Mutations on Different SARS-
CoV-2 Strains. SARS-CoV-2 has changed as it has spread across the globe and has evolved many variants, such as Alpha, Beta, Gamma, Delta, Kappa, Lambda, Gamma, Iota, and 21H variants. To comprehensively understand the characteristics of these SARS-CoV-2 variants, we analyzed nucleotide mutations on all SARS-CoV-2 variants. The results showed that the nucleotide mutations occur across the whole genome, including open reading frames (ORFs), spike (S) glycoprotein, nucleocapsid phosphoprotein (N), envelope (E) protein, and membrane (M) protein ( Figure S1). The frequency of mutations on S protein is higher than other parts in different SARS-CoV-2 variants.
The S protein of SARS-CoV-2 consists of 1273 amino acids containing subunits S1 and S2 ( Figure S2A). The subunit S1 is divided into N-terminal domain (NTD), Cterminal domain (CTD), and receptor-binding domain (RBD) that binds with angiotensin-converting enzyme 2 (ACE2). To analyze the similarities and mutations of S protein in different SARS-CoV-2 variants, we aligned the S proteins' sequences in eleven SARS-CoV-2 strains, including original, D614G, Alpha, Beta, Gamma, Delta, Kappa, Lambda, Gamma, Iota, and 21H variants. The results showed that the S proteins' sequences have a mean homology of 99.51%, and the common amino acid mutations among these SARS-CoV-2 strains are mainly enriched at S1 NTD and RBD ( Figure S2B). The D614G mutation is most common and occurs in ten SARS-CoV-2 variants ( Figure S2B), while the E484K and N501Y mutations appear 4 times.
2.2. The Spike Protein Structure Prediction of Ten SARS-CoV-2 Strains. The alterations of S protein in different SARS-CoV-2 variants have been reported to affect virulence, transmissibility, disease severity, and immune escape [12,13,18]. However, the influence of amino acid mutations on S protein structure is not yet clear. To acquire the spike structures of different SARS-CoV-2 strains, we used AlphaFold to predict their spike protein structures based on the mutated protein sequences. The spike proteins of ten major worldwide spread SARS-CoV-2 strains were successfully predicted, including original, alpha, beta, gamma, delta, epsilon, iota, kappa, lambda, and 21H strains. The fulllength spike protein monomers are presented, and the RBD and NTD of the S1 protein are marked by different colors (Figure 1). To compare the structural difference among the ten SARS-CoV-2 strains, we performed the pairwise structural alignment based on full-length S ( Figure S2), NTD ( Figure S3), and RBD ( Figure S4), respectively. The comparison of the full-length spike protein structures shows that the parts of S1 RBD and NTD are significantly changed whereas the S1 C-terminal domain and S2 domain are hardly affected ( Figure S3). The comparison of S1 NTD and RBD structures predicted based on their 2 Research sequences shows that NTD structures are diverse in different strains, and especially the N-terminal structures of NTD are significantly changed ( Figure S4). The N-terminal and receptor-binding motif (RBM) in RBD structure domains display significant changes ( Figure S5). These observations indicate that S1 RBD and NTD of spike proteins have significant structural changes between different SARS-CoV-2 strains. We further compared the electrical property on the structures of RBD and S1 NTD of ten major SARS-CoV-2 strains. The results showed that amino acid mutations could change the electrical property on the RBD and S1 NTD surfaces ( Figure 1). Particularly, compared with the original S protein, RBD and S1 NTD of SARS-CoV-2 Delta variants were notably different structures and electrostatic surfaces (Figure 1 red box and S3-S5).

Validation of the Predicted S Protein Structures of SARS-
CoV-2. To evaluate the predicted S protein structures of SARS-CoV-2 variants, we analyzed the values of the predicted Local Distance Difference Test (pLDDT) calculated by AlphaFold since this value represents the domain accuracy [36]. The pLDDT value above 70 indicates the struc-tures have been considered as confidently predicted structures [37]. Our results show that mean pLDDT values of full-length S, S1 NTD, and S1 RBD in different SARS-CoV-2 strains are all above 75 (Figures 2(a)-2(c)). We also found that percentages of amino acid residues whose pLDDT values are above 70 in all residues of different fulllength S are all higher than 76%, and the percentages of different S1 NTD, and S1 RBD are above 82%, and 89%, respectively ( Figure S6). These results of pLDDT indicate the predicted structures are highly accurate. Furthermore, we compared the AlphaFold-predicted S protein structure of the original strain with five experimental S proteins, including PDB ID: 7DDD, 7DDN, 7BNM, 6VSB, and 7BNN to evaluate the validity of the predicted structures. We calculated Template Modelling (TM) score [38], maximal subset (MaxSub) score [39], and Global Distance Test (GDT)-TS score [40] between the AlphaFoldpredicted and experimental S proteins (Figure 2(a)). The resulting TM scores were >0.88, the MaxSub scores > 0:5, and the GDT − TS score > 0:6 ( Figure 2(d)), which indicated the predicated S proteins were of high confidence. We further analyzed the 3D structure similarities between the   Figure 2(e)). We find that the S1 NTD and RBD structures are also similar to the experimental structure of 7DDN. Overall, these results validate the high accuracy and reliability of AlphaFold at predicting the S protein of SARS-CoV-2.

Classification of SARS-CoV-2 Strains
Based on the Genomic Sequences and Protein Structures. Successful classification of SARS-CoV-2 strains is important to explore the development of the virus and predict its evolution. To classify the SARS-CoV-2 strains, we performed phylogenetic analyses based on the genomic and protein sequences of all strains. The results showed that SARS-CoV-2 Delta and kappa variants are highly homologous in four sequencebased clusters (Figure 3(a)). Clusters based on the spike protein, S1 RBD, and S1 NTD showed SARS-CoV-2 gamma and beta variants are highly homologous (Figure 3(a)). The cluster trees of spike protein RBD and full-length spike pro-teins are highly similar, which indicates the spike variances could be mainly from mutations on the spike RBD domains. Next, we further performed clustering analyses based on the similarities of the full-length spike protein, S1 NTD, and S1 RBD structures. Notably, clusters based on S1 NTD and RBD revealed that the SARS-CoV-2 Delta variant is significantly different from other variants (Figure 3(b) and Table S1). The root means square deviation (RMSD) values of S1 RBD and NTD on the SARS-CoV-2 Delta variant are higher than that of other variants, indicating that the S1 and RBD structure of the Delta variant has changed significantly more when compared with the other strains. Overall, the clusters based on structures and sequences of full-length spike protein are different, and the sequence could not well reflect the structures of these SARS-CoV-2. These results suggest that the structure of Delta S1 RBD and NTD are unique compared to other variants. Currently, the SARS-CoV-2 Delta variant is spreading quickly across the world and possesses high virulence and resistance to available vaccines. Thus, the unique structures 4 Research of SARS-CoV-2 Delta S1 RBD and NTD could play important roles in increasing the detrimental change in COVID-19 epidemiology.
2.5. The Comparison of S1 NTD Structure between SARS-CoV-2 Delta and Original Strains. To explore the structural difference of S1 NTD between Delta and original (wild-type) strains, we analyzed the effects of amino acid (AA) mutations on the S1 NTD structure of the Delta strain. The result shows that six amino acids are mutated on the S1 NTD structure of Delta strain compared with the original strain, including T19R, T95I, G142D, E156G, and deletions of F157 and R158 (Figure 4(a)). These mutations have changed the molecular orientation and electrical properties of the protein, such as uncharged T19 is mutated to positively charged R19, aliphatic G142 is mutated to negatively charged D142, and negatively charged E156 is mutated to aliphatic G156 (Figures 1 and 4(a)). Meanwhile, we observed the effects of these AA mutations on S1 NTD structures. The results showed that the domains enriched AA mutations are significantly changed compared with the original strain ( Figure 4(b) black arrows), while the no mutated domains of Delta S1 NTD have high similarities with the original strain ( Figure 4(b) black box). The S1 NTD has been confirmed as an epitope that could be bound with several antibodies [34,41]. To further investigate the effects of structural changes on binding antibodies, we calculated the RMSD values of five loops of S1 NTD between Delta and original strains, including N1 loop (residues 14-26), N2 loop (residues 67-79), N3 loop (residues 141-156), N4 loop (residues 177-186), and N5 loop (residues 246-260). The results show that RMSD values of N3 and N5 are greater than 4.8, and RMSD values of N1, N2, and N4 are 0.406, 1.376, and 0.273, respectively ( Figure 4(c)). The comparison of structures also shows N3 and N5 are significantly different (Figure 4(d)). Given that the N3 and N5 loops mediate the interaction with antibody [41], we observed the interaction between S1 NTD and antibody 4A8 (PDB: 7C2L). Compared with the original strain, N3 and N5 of Delta S1 NTD are significantly different, and the pivotal hydrophilic interaction domain constructed with Delta NTD loops of N3 and N5 and three complementaritydetermining regions (CDRs) are much more open than that of the original strain ( Figure 4(e)).     I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I95 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 157 57 57 57 7  57 57 7 7  57 7 7 7 7  57 57 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 T T T T T T T T T T T T T T T T T T95 9 9 95 5 5 5 95 9 9 9 95 9 9 9 95 9 9 9 G G G G G G G G G G G142 142 14 14 142 14 14 14 4 4 4 4 4 4 4 4   . We evaluated the interactions between NTD and antibody 4A8 by HAD-DOCK and HDOCK prediction. The HADDOCK score, van der Waals energy, desolvation energy, and RMSD from the overall lowest-energy structure of the original-NTD-4A8 complexes are much lower than that of the Delta-NTD-4A8 complexes, while the electrostatic energy, Z -score, and HDOCK docking score of the complexes are hardly influenced (Table 1 and S2). These results indicate that AA mutations on Delta S1 NTD significantly affect the structure and alter the interactions between NTD and antibodies.
2.6. The Comparison of S1 RBD Structure between SARS-CoV-2 Delta and Original Strains. In addition to AA mutations on S1 NTD, two AA mutations-L452R and T478K occur on the Delta S1 RBD domain, which could change AA's electrical properties; the aliphatic L452 and polar uncharged T478 are mutated to positively charged R452 and K478, respectively ( Figure 1). To explore the structural difference of S1 RBD between Delta and original strains, we compared the S1 RBD domains. The S1 RBD could be divided into three parts, including N1 (residues 333 to 438), receptor-binding motif (RBM, residues 438 to 506), and N3 (residues 507 to 539), based on the structure. The RMSD values of N1, RBM, and N3 between Delta and original strains are 0.69, 8.28, and 0.61, respectively ( Figure 5(a)). Consistently, the structural comparisons show that RBM domains are significantly different between Delta and original strains, while domains of N1 and N3 are highly similar ( Figure 5(b)). Given that the RBM domain plays an important role in binding with ACE2, we explored the interactions between ACE2 and Delta S1 RBD domain by aligning structures based on their homologous sequence. Compared with the original strain, Delta S1 RBD displays a close interaction with ACE2 ( Figure 5(c)).
Moreover, we performed docking simulations to quantify the interactions between ACE2 and RBD structures of all mutants. The results show that AA mutations could affect the docking scores between ACE2 and RBD in all SARS-CoV-2 mutants, indicating that interactions between ACE2 and RBD are changed (Table 2 and S3). Comparison between original and Delta strains shows that electrostatic energy, desolvation energy, Z-score, and RMSD from the overall lowest-energy structure are low while van der Waals energy and HADDOCK score are high in Delta-RBD-ACE2 complexes ( Table 2). Most notably, significant differences in electrostatic energy between the original and Delta complexes are observed. The electrostatic energy of the original strain is −221:5 ± 11:0, whereas that of the Delta variant is −264:5 ± 23:9 (Table 2), which suggests that the mutated positively charged R452 and K478 on Delta RBD decrease electrostatic energy. Overall, these results indicate that AA mutations on the Delta RBD domain could significantly alter the RBD structure and its interactions with ACE2.
2.7. The Effects of AA Mutations on Interactions between S1 RBD and Antibodies. The structural comparison of S1 RBD between Delta and original strains shows that mutations on T478 and L452 induce significant skewing of AAs and alteration of the structures (Figure 6(a)). The structural difference occurs on RBM where AA mutations are enriched (Figure 6(a)), which is consistent with the results of RMSD values ( Figure 5(a)). Distances between T478 and L452 of the original and the corresponding K476 and R450 of the Delta are 29.8 and 5.3 angstrom, respectively (Figure 6(a)). Given that three epitopes of S1 RBD could be recognized and bound by antibodies, we analyzed interactions between S1 RBD and antibodies. The results show that the structures of two of three epitopes are slightly changed, and the bindings with S304 and S309 antibodies are almost not influenced ( Figure 6B2 and B3), whereas the epitope bound with S2H14 antibody, also involved in binding with ACE2, is significantly changed indicating that the capacity of this epitope bound with antibodies is decreased. To evaluate Notes: additional parameters, including cluster size, RMSD from the overall lowest-energy structure, van der Waals energy, electrostatic energy, desolvation energy, restraints violation energy, buried surface area, and Z-score, are presented. The scores for other mutant complexes are shown in Table S2. 7 Research the interactions between the antibody S2H14 and S1 RBD of original and Delta strains, we performed docking simulations based on the interactions between the heavy chain of the S2H14 antibody and S1 RBD. The results showed that the original-RBD-S2H14 complex has a higher binding affinity than the Delta-RBD-S2H14 complex (Table 3 and S4). The values of the HADDOCK score, electrostatic energy, desolvation energy, and RMSD from the overall lowest-energy structure in the original-RBD-S2H14 complex are much lower than those in the Delta-RBD-S2H14 complex, while Van der Waals energy in the original-RBD-S2H14 complex is higher than that in the Delta-RBD-S2H14 complex (Table 3). These results indicated that the structural changes of Delta S1 RBD induced by amino acid mutations altered the epitope and reduced antibody recognition. credible models for screening potential compounds targeting these proteins. Given that the SARS-CoV-2 Delta variant has been a major threat worldwide, we performed the virtual Notes: additional parameters, including cluster size, RMSD from the overall lowest-energy structure, van der Waals energy, electrostatic energy, desolvation energy, restraints violation energy, buried surface area, and Z-score, are presented. The scores for other mutant complexes are shown in Table S3.   9 Research screening technique to identify potential drugs. To identify chemicals that can be applied in the clinic, we selected a database consisting of 5903 approved drugs worldwide (Table S5). We utilized the screening approach to calculate the binding affinity of the compounds targeting Delta S1 NTD distinct from other variants. The results revealed 40 kinds of drugs targeting S1 NTD that display high binding affinity, whose binding energies are less than -9 kcal/mol (Table S5). The top 10 best docking drugs targeting S1 NTD are cepharanthine, midostaurin, targretin, zinc000014880001, dihydroergotoxine, trypan blue, vorapaxar, ergotamine, lomitapide, and lestaurtinib (Figure 7(a) and Table 4). These 10 drugs are enriched at two pharmacophores on the Delta NTD (Figure 7(b)). Pharmacophore 1 is a cavity displaying a positive charge, while pharmacophore 2 shows a negative charge (Figure 7(b)). Trypan blue among the ten drugs is targeted pharmacophore 2, and the other nine drugs are targeting pharmacophore 1 (Figure 7). The main interactions between trypan blue and pharmacophore 2 are hydrophobic, hydrogen bonds, and salt bridges through binding with PRO39, ASP40, LYS41, VAL42, PHE43, ARG44, LYS195, ASN196, ILE197, GLY199, TYR200, PHE201, LYS202, ASP228, and LEU229 ( Figure 8). The nine drugs interact with ASN188, ARG190, HIS207, THR208, and PRO209 on the pharmacophore 1 through hydrophobic interactions, hydrogen bonds, pi-cation, and salt bridge (Figure 8).

Virtual
Screening of Potential Drugs Targeting Delta S1 RBD and RBM. Given that the Delta S1 RBD and RBM play important roles in entering the host cells through binding with ACE2, Delta S1 RBD, and RBM have great potential as new drug targets. To further search potential drugs targeting the SARS-CoV-2 Delta strain, we performed the virtual screening methods on the Delta S1 RBD and RBM, with RBM used for cross-validation. The results showed that 14 and 16 kinds of drugs showed a high binding affinity with the Delta S1 RBD and RBM, respectively (Table S5). Nine of the top 10 best docking drugs are significantly overlapped in targeting S1 RBD and RBM while the binding energies are different, which include trypan blue, sn38 glucuronide, dihydroergotoxine, zinc000014880001, irinotecan, avodart, tubocurarin, dihydroergotamine, and tasosartan (Figures 9(a) and 10 and Table 4). ZINC95618827 uniquely interacts with ARG37, TYR78, PRO108, and GLU198 on Delta S1 RBM by hydrophobic interactions, hydrogen bonds, and salt bridge ( Figure 10A11). Naldemedine interacts with VAL23, ALA26, ALA30, ASN36, LYS38, ASN132, and THR152 on the Delta S1 RBM by hydrophobic interactions and hydrogen bonds ( Figure 10A10). The identified drugs are enriched at three pharmacophores on Delta RBD (Figure 9(b)). Sn38 glucuronide, naldemedine, and zinc95618827 are enriched at pharmacophore 1 (Figure 9 B1), while trypan blue, tubocurarin, and dihydroergotamine are enriched at pharmacophore 2 forming hydrophobic interactions, hydrogen bonds, and salt bridge ( Figures 9B2 and 10). Dihydroergotoxine, zinc000014880001, irinotecan, tasosartan, and avodart are enriched at pharmacophore 3 forming hydrophobic interactions, hydrogen bonds, salt bridge, and pi-stacking ( Figures 9B3 and 10).
To validate the binding affinity of the screened compounds to Delta S1 RBD, we used surface plasmon resonance technology (BIAcore 8K, GE Healthcare) to analyze the binding energy which is a frequently used instrument to detect the binding energy between proteins and small molecule compounds [42,43]. We detected the binding energy between S1 RBD of Delta variant and the 8 screened compounds, including dihydroergotoxine, trypan blue, irinotecan, biosone, cepharanthine, avodart, tasosartan, and conivaptan. Five of the compounds exhibited high binding affinity to S1 RBD of the Delta variant, and the estimated binding affinity values (KD) of dihydroergotoxine, trypan blue, irinotecan, biosone, and cepharanthine were 49.9, 10.8, 243.5, 77.3, and 271.4 μM, respectively ( Figure 11). The results indicated that these compounds could have antiviral activity and that the drugs predicted by bioinformatics methods were highly confident.

Discussion
AlphaFold v1 has been used to predict the structures of many SARS-CoV-2 proteins, such as SARS-CoV-2 ORF 6, 8, 10 and NSP 2, 3, 4, 6 [44,45], which with high accuracy. In this study, we focused on predicting the S protein  Table S4. The protein-protein docking is based on the interactions between the heavy chain of the S2H14 antibody and S1 RBD.
10 Research structure of ten SARS-CoV-2 strains, including original, Alpha, Beta, Gamma, Delta, Kappa, Lambda, Gamma, Iota, and 21H variants with high accuracy by applying the machine learning method, AlphaFold [37]. The reliability of the predicted S protein structures was validated in two ways, firstly by use of the pLDDT value calculated by Alpha-Fold and the comparison between AlphaFold-predicted and experimental S protein structures (Figure 2). To the best of our knowledge, this is the first study to present the S protein structures of so many SARS-CoV-2 strains, which helps lay a foundation for further SARS-COV-2 research and supple-ment experimental structural biology. Moreover, this effective utilization of AlphaFold may help the developers to update and optimize the algorithm. How to correctly classify the different SARS-CoV-2 strains is an important question; the main method used to classify the SARS-CoV-2 strains so far has been phylogenetic analysis, which is based on the genomic sequence [12,15,27,46,47]. While phylogenetic analyses do a good job of reflecting the evolutionary trajectory of the virus to some extent, it cannot account for differences in virulence of the strains and is not able to substantially differentiate between the Delta 11 Research and other strains. Trying to solve this problem, we attempted to classify the SARS-CoV-2 strains by using structural similarity. Interestingly, our classifications showed the unique cluster of the Delta strain based on the structural similarity of both S1 NTD and RBD, which was different from that based on the sequences (Figure 3). This result can be interpreted as that the structural difference can reflect the functional variation more directly. This study suggests that structural similarity can be a new way to classify SARS-CoV-2 strains under the novel conditions that Alpha-Fold can provide high accuracy protein structures.
SARS-CoV-2 Delta (B.1.617.2) strain was first identified in India in December 2020, which has become a major epidemic strain worldwide accounting for more than 80% of new cases (nextstrain/ncov) [47]. The SARS-CoV-2 Delta strain has been confirmed to be less sensitive to serum neutralizing antibodies and vaccine-elicited antibodies, compared with wild-type strain [48,49]. Here, we predicted the high-accuracy spike protein structure of SARS-CoV-2 Delta strain with powerful tool-AlphaFold and demonstrated that the structures of S1 NTD and RBD of SARS-CoV-2 Delta strain were significantly changed. They displayed lower binding affinities with the corresponding antibodies, 4A8 and S2H14 antibodies, compared with the original stain (Figures 4 and 6). The lower binding affinities with antibodies can reduce the neutralization ability to the spike protein of SARS-CoV-2 Delta strain and then induce immune evasion. Our results provided a bioinformatic and computational insight into explaining how the SARS-CoV-2 Delta variant causes immune evasion from spike protein structure, and it could provide a reference for the follow-up research. These methods presented in our research can also be used to continuously monitor the mutation and structural variation of SARS-CoV-2 strains.
Effective drugs against SARS-CoV-2, especially the Delta strain, have not yet been developed. Virtual screening combined with structural biology has been a useful assistant method for drug discovery [50][51][52]. The high-accuracy 12 Research predicted structures of spike proteins provide us a basis for the virtual screening. The S1 NTD and RBD are crucial parts for binding host cells [53,54], and thus, the binding of the drug with these parts could play important roles in inhibiting the infection of SARS-CoV-2. Given the unique structure and pandemic of the Delta strain, we selected 5903 worldwide approved drugs that can be used in the clinic rapidly from the ZINC database (https://zinc15.docking.org/) for virtual screening of targeting S1 NTD and RBD of Delta strain. After a strict filter at binding energy less than -9 kcal/mol, 40 and 14 kinds of drugs targeting S1 NTD and RBD were identified. These drugs can form hydropho-bic interactions, hydrogen bonds, salt bridges, Pi-cation, and Pi-stacking with key amino acids on S1 NTD and RBD. Some of these drugs, including Dihydroergotoxine, Avodart, Tubocurarine, and Irinotecan, have been evaluated to potentially affect SARS-CoV-2 towards proteases, and NSP9 [55][56][57][58]. Our results indicated that these drugs individually or in combinations may be used as potential inhibitors of S1 NTD and RBD of SARS-CoV-2 after verifying their in vivo and in vitro antiviral ability. A limitation of this study is that we only focused on the S protein structures of SARS-CoV-2. Besides the S protein, other parts, including nucleocapsid protein, envelope 14 Research protein, and RNA-dependent RNA polymerase of SARS-CoV-2, are also not well known, while they can play important roles in SARS-CoV-2 replication and infection. These proteins should be explored in future works following our methods. Moreover, the availability of screened drugs targeting the Delta strain S1 NTD and RBD should be verified by the strict experiments before applying to the clinic. We would like to highlight that the current study is mainly based on computation and simulation providing essential data; however, some structures and results should be evaluated by further experiments.
In conclusion, our study acquired the high-accuracy spike protein structures of ten SARS-CoV-2 variants by using the AlphaFold model. The complementary methods, containing the comparison with experimental structures and the validation of pLDDT of the AlphaFold built-in algorithm, verified the structures were highly accurate. We found that the clustering analysis based on structural similarity  ARG37,TYR78,PRO108,ASP110,GLU198   SER31,ASN130,ARG134,  THR152,GLU166,GLN175,PRO181   TRP35,ARG37,TYR105,LYS106,  PRO108,GLU198,LEU199   TYR78,LYS106,THR112,LEU143,  PRO145,ARG148, Figure 10: Chemical structures of drugs targeting Delta S1 RBD and RBM. Chemicals A1 to A9 are common drugs in S1 RBD and RBM. Chemicals A10 and A11 are unique for RBD and RBM, respectively. The drug interacts with the red amino acids on the Delta S1 NTD domain based on the blue interactions, including hydrophobic interaction, hydrogen bond, pi-stacking, pi-cation, and salt bridge related to Figure 9 and Table S5. 15 Research could reflect the current characteristics of the epidemic more accurately than those based on the protein sequence, such as the mainly pandemic SARS-CoV-2 Delta variants displayed the unique features in the cluster based on the structural similarity. Moreover, the analysis of the binding affinities of ACE2-RBD, antibody-NTD, and antibody-RBD complexes in the different variants revealed that the recognition of antibodies against S1 NTD and RBD was decreased in the variants, especially the Delta variant compared with the original strain, which may induce the immune evasion of SARS-CoV-2 variants. Furthermore, we identified multiple com-pounds that target S1 NTD and RBD by virtual screening the ZINC database against a high-accuracy predicted structure of Delta spike protein, which might contribute towards the development of clinical anti-SARS-CoV-2 medicines. Our study provided abundant basic data for further research related to curing COVID-19.     [39], and GDT-TS score [40] were calculated based on the online service of TM-align (https://zhanggroup.org/ ) [64,65]. The RMSD and pLDDT between experimental and predicted structures were calculated by the software PyMol and AlphaFold, respectively [36,66].

Cluster
Analyses of Different SARS-CoV-2 Strains. The cluster analyses were performed based on both protein sequences and structures. The software Molecular Evolutionary Genetics Analysis-X (MEGA-X) was used for phylogenetic analyses based on protein sequences. The ClustalW alignment algorithms were used to align the multiple sequences, and evolutionary tree was generated using the maximum likelihood method [67]. The structural cluster analyses were based on the matrix of RMSD values between different SARS-CoV-2 strains, and the method used to cluster these strains was based on the built-in algorithm of R packages "pheatmap." The data were visualized by the R packages of "pheatmap" and "ggplot2." 4.4. Consensus Docking of S1 NTD and Antibody 4A8. The relaxed-predicted S1 NTD structures of different SARS-CoV-2 strains were selected for the modeling of biomolecular complexes between S1 NTD and antibody 4A8 [41] based on high ambiguity-driven protein-protein docking (HAD-DOCK) [68,69]. The interface residues at 25 to 32, 51 to 58, 100 to 116 for antibody 4A8 M chain, and at 145 to 150 for S1 NTD were determined for restraint docking in an online HADDOCK server. The other docking parameters were set as default. The algorithm of HDOCK combined both template-based and free approaches were deployed for cross-validation [70,71].

Consensus
Docking of S1 RBD with ACE2 and Antibodies. Similarly, the docking analyses of ACE2 and antibodies with S1 RBD have followed the same method as described in Section 2.4. Briefly, for antibody S2H14 [9] docking against the S1 RBD, the interface residues at 52, 56, 107, and 108 on the heavy chain of the antibody S2H14, and 449, 498, 500, and 505 (131,180,182, and 187) on S1 RBD were determined for restraint docking. For ACE2 docking against S1 RBD, the interface residues at 21, 24, 27, 28, 30, 35, 38,  4.6. Virtual Screening of Potential Drugs Targeting S1 NTD and RBD. For fast use of the screened drugs in the clinic, the 5903 chemicals used for virtual screening were downloaded from the ZINC database under the filter of the world-approved drugs. The software of Autodock vina [72] was applied to identify potential drugs binding with S1 NTD and RBD. The S1 RBD and NTD were included in the searching grid boxes, respectively. The binding sites and pharmacophores were searched automatically by the software. Moreover, for cross-validation, the S1 RBM was set as a target in the searching grid box. Finally, based on the docking scores, the screened compounds with the threshold values of binding energy fixed less than -9.00 kcal/mol were identified as potential drugs for S1 NTD and RBD. The binding affinity was measured by surface plasmon resonance technology using a BIAcore 8K instrument (GE Healthcare, United States) with running buffer (PBST containing 5% DMSO) at 25°C. The S1 RBD of Delta (Beyotime Biotechnology Co., LTD, China, #P2341) was purchased from immobilized onto sensor CM5 chips by a standard amine-coupling procedure in 10 mM sodium acetate (pH 5.5). Compounds were serially diluted and injected into the chip at a flow rate of 30 μl/min for 120 s (contact phase), followed by 120 s of buffer flow (dissociation phase). The binding affinity of KD value was calculated by the software of Biacore Insight Evaluation V3.0 (GE Healthcare, United States).

Interaction Analysis between Potential Drugs and Protein
Structures. The top ten screened compounds ranked by the lowest binding energy were selected for interaction analysis with S1 NTD and RBD. The interaction analysis was performed using the online service of protein-ligand interaction profiler (PLIP) [73]. The algorithm of PLIP identified the amino acids on protein structures responsible for forming specific interactions with the chemicals. The 3D structures of compounds, S1 NTD, and RBD were visualized by the software PyMol, and the combination compounds with S1 NTD and RBD were generated by the software Autodocktools.  Figure S1: diversities of mutated nucleotides on SARS-CoV-2 genome. Figure Figure S3: the comparisons of the full-length spike protein of the SARS-CoV-2 strains. Figure S4: the comparisons of S1 NTD of ten SARS-CoV-2 strains. Figure S5: the comparisons of S1 RBD of ten SARS-CoV-2 strains. Figure S6: percentage of the amino acid residues whose pLDDT is more than 70 in groups of S1 NTD, S1 RBD, and spike protein. Table S1: the cluster by spike protein structures. Table S2: docking analyses of antibody 4A8 with S1 NTD of different SARS-CoV-2 variants. Table  S3: docking analyses of ACE2 with S1 RBD of different SARS-CoV-2 variants. Table S4: docking analyses of antibody S2H14 with S1 RBD of different SARS-CoV-2 variants. Table S5: virtual screening of potential drugs. Protein structural files of all SARS-CoV-2 variants were submitted as supplemental materials. (Supplementary Materials)