In Vitro Nanobody Library Construction by Using Gene Designated-Region Pan-Editing Technology

Camelid single-domain antibody fragments (nanobodies) are an emerging force in therapeutic biopharmaceuticals and clinical diagnostic reagents in recent years. Nearly all nanobodies available to date have been obtained by animal immunization, a bottleneck restricting the large-scale application of nanobodies. In this study, we developed three kinds of gene designated-region pan-editing (GDP) technologies to introduce multiple mutations in complementarity-determining regions (CDRs) of nanobodies in vitro. Including the integration of G-quadruplex fragments in CDRs, which induces the spontaneous multiple mutations in CDRs; however, these mutant sequences are highly similar, resulting in a lack of sequences diversity in the CDRs. We also used CDR-targeting traditional gRNA-guided base-editors, which effectively diversify the CDRs. And most importantly, we developed the self-assembling gRNAs, which are generated by reprogrammed tracrRNA hijacking of endogenous mRNAs as crRNAs. Using base-editors guided by self-assembling gRNAs, we can realize the iteratively diversify the CDRs. And we believe the last GDP technology is highly promising in immunization-free nanobody library construction, and the full development of this novel nanobody discovery platform can realize the synthetic evolution of nanobodies in vitro.


Introduction
Nanobodies provide the remarkable specificity of antibodies within the 15 kDa single-variable domains (VHH) of the heavy-chain-only antibodies found in llamas and other camelids [1]. The advantages of nanobodies include their ability to bind with high affinity to epitopes that are inaccessible to traditional antibodies (≥150 kDa), better stability and the possibility of lower immunogenicity, and Lego-like modularity, and they even increase the efficacy of chimeric antigen receptor (CAR) T cells [2,3]. There are dozens of active nanobody programs of all stripes in clinical development, covering lots of targets and indications [4][5][6], even in the diagnosis and treatment of COVID-19 [7,8]. However, nanobodies discovered to date have been derived from the immunization of camelids [9], which is invariably time-consuming, expensive, and unreliable in quality. Therefore, in this study, we are dedicated to developing a novel platform technology that can be used to expedite the development of nanobodies for therapeutic and diagnostic applications by avoiding the need for animal immunization.
There are mainly two kinds of previously described strategies for in vitro library creation or introducing variation in antibody complementarity-determining region (CDR) loops: the static antibody libraries and the dynamic antibody libraries. The static antibody libraries' sequence diversity remains constant after it is generated. This library construction scheme mainly relies on the artificial synthesis of antibody sequences and the artificial introduction of sequence diversity in the CDRs of antibodies. For example, libraries that are generated by amplifying CDR sequences from natural antibodies before grafting them into recombinant display systems for selection in vitro [10], using trinucleotide assembly of mutagenic oligonucleotides to create naive libraries on a single antibody framework (i.e., antibody constant region) or a collection of antibody frameworks [11][12][13], using synthetic libraries in which the antigen-binding sites were generated by diversifying only four kinds of amino acids (Tyr, Ser, Ala, and Asp) in CDRs [14], and generating synthetic antibody libraries, whose CDRs contain only two kinds of amino acids, including tyrosine and serine [15,16]. By using these methods, we can accurately limit mutations to specified regions, like CDRs, or design positional amino acid frequencies in CDRs. However, this library building scheme strongly relies on synthetic antibody sequences and artificially introduces sequence diversity in CDRs. Also, binders acquired from these libraries are inferior in quantity and affinity to those obtained from the natural immune system. The dynamic antibody libraries' antibody diversity is continuously generated. For example, use the most mutagenic RNA viral class, Alphavirus sindbis viruses as vectors for heredity and diversity, which achieved 24-hour selection cycles surpassing 10 -3 mutations/base [17]. Libraries can also be constructed by coexpressing IgG, for example, the heavy chain (HC), light chain (LC), and activation-induced cytidine deaminase (AID) in mammalian cells. The expression of AID alone is sufficient to reproduce the salient features of somatic hypermutation (SHM) in both B cells and other mammalian cells [18,19]. These currently available dynamic library construction methods allow an iterative fashion to diversify, display, and screen antibodies and more importantly, antibody affinity maturation via iterative cycles of antigen-coated magnetic beads sorting or fluorescence-activated cell sorting (FACS) under increasingly stringent sort conditions. This dynamic library allows for iteratively diversifying the antibody sequences; however, with a relatively low mutation efficiency, the mutations generated by RNA viruses and AID can hardly restrict to a small window of nucleotides within CDRs, and constant region mutations may cause antibody disfunction.

Nanobody Library Construction by Using G-Quadruplex
Fragments. Recently, the Liu group developed a series of base-editors, which could benefit point mutation correction by efficiently converting one base pair to a different base pair, such as adenine base-editors (ABEs) [20] and cytosine base-editors (CBEs) [21] or benefit disruption of specific regions and multiplex base-editing applications, such as the ABE8e base-editor [22]. These base-editors will not induce double-stranded DNA breaks or extensive insertions and deletions (indels) to target sites, cause frameshift mutations, and yield a nonsense protein. These advantages make the base-editor an excellent choice for creating mutations in antibody CDRs. However, base-editors only operate on single-stranded DNA but reject double-stranded DNA. Still, this feature is critical to restrict deaminase activity to a small window of nucleotides within the single-stranded DNA structures. G-quadruplex is a specialized DNA secondary structure consisting of a sequence of guanine-rich nucleic acids. The sequence forming a G-quadruplex requires at least 4 consecutive groups of G bases, joined by 1-7 arbitrary bases and not less than 2 G bases per group (G (≥2) N 1-7 G (≥2) N 1-7 G (≥2) N 1-7 G (≥2) ) [23,24]. The formation of the G-quadruplex allows its complementary strand to exist in a single-stranded state, so gene editing can be performed within the window by finding a protein that directs deaminase to the complementary single strand of the G-quadruplex.
Based on the above facts, the first nanobody library construction strategy we tried was starting from a consensus framework derived from an anti-GFP nanobody LaG-2 [25]. CDR1, CDR2, and CDR3 of nanobody LaG-2 were replaced by the natural G-quadruplex fragments from VEGF, c-myc, and BCL-2 genes, respectively [26], modified LaG-2 was named LaG-2/G4, and the sequence was shown in Figure S1A. And base-editors were directed to the single-strand window by nucleolin, which promotes the formation of G-quadruplexes and stabilizes the structure of G-quadruplexes when it is bound to them. Therefore, it stands to reason that base-editors tethered to nucleolin could be directed to G-quadruplex fragments containing CDRs of LaG-2, as shown in schematic Figure S1B. However, we found that G-quadruplex sequences containing nanobodies could spontaneously generate mutations without the use of nucleolin-tethered base-editors when nanobody sequences were integrated into 293T cells genomes by lentiviral infection. The results showed that the mutations were mainly concentrated in three CDRs, with only a few mutations outside, the overall mutation rate was 161/1000 bp -1 (per bp) (Figure 1), and data of the mutation sites and types, together with the ratio of four nucleotides before and after mutation, were shown in Figure S2. Also, we noticed 3 or 6 amino acids deletions in CDR2; however, these deletions did not cause frameshift mutations. Unfortunately, these mutations lack randomness, and both the mutation sites and the mutated amino acids are preferential. These characteristics are detrimental to the library construction. Moreover, we proved that LaG-2/G4 is stable in the Stbl3 E. coli strain ( Figure S3), and high-fidelity PCR is capable of introducing a small number of mutations but far fewer than those introduced by amplification of LaG-2/G4 in 293T cells ( Figure S4). To further clarify the pattern of mutations induced by G-quadruplex fragments, we tried to switch CDR2 and CDR3 regions of LaG-2/G4 ( Figure S5A) or replaced CDR2 containing G-quadruplex fragments with another G-quadruplex fragment from HIF-1α ( Figure S5B) or RET gene ( Figure S5C). Then these LaG-2/G4 variants were amplified by Hieff Canace® High-Fidelity DNA Polymerase, and mutations generated by PCR were shown. Moreover, CDR2 containing G-quadruplex fragments from RET genes were amplified by Pfu DNA Polymerase ( Figure S5D). However, all these tries failed to improve the diversity of G-quadruplex-induced mutations. More importantly, G-quadruplexes are difficult to amplify in PCR reactions, so it is possible that there is mutant-preferred amplification, and the mutations generated in PCR reactions or HER293T cells are much less than they appeared. Moreover, error-prone PCR has the potential to increase Gquadruplex-induced mutations; however, this does not meet our initial purpose, which involved the G-quadruplex and base-editor. Also, error-prone PCR would accumulate mutations in the constant regions of the nanobody, which could potentially lead to the nanobody' incorrect folding and loss of function. So we quit our plan to use Gquadruplex to induce diversity in the CDRs of nanobodies in this study.
3.2. Nanobody Library Construction by Using Traditional gRNA-Guided and nCas9-Tethered Base-Editors. Besides G-quadruplex, Cas9/gRNA complexes are also perfect choices to create single-stranded bubbles on nanobody CDRs. So in 3 BioDesign Research this section, we try to use gRNAs to direct Cas9-tethered baseeditors to nanobody CDRs, as shown in Figure 2(a). The gRNA/Cas9 base-editor complex is recruited to the target sequence by the base-pairing between the gRNA sequence and the complement to the target sequence in the genomic DNA, and then multiple mutations were generated. To create more mutations, we decide to diversify all three CDRs on nanobodies, and this requires at least three gRNAs. Here, we use a polycistronic-tRNA-gRNA (PTG) strategy [27] to quickly assemble three gRNAs in one construct, in which, three CDR-targeting gRNAs are flanked by glycine tRNAs to create polycistronic glycine tRNA-gRNA constructs. Taking advantage of the endogenous tRNA processing system in mammalian cells, we can efficiently transcript three CDRtargeting gRNAs driven by only one Pol III promoter (Figure 2(b)). Three different nCas9-tethered base-editors were evaluated for nanobody library construction (Figure 2(c)). The BE3 base-editor (APOBEC-XTEN-nCas9-UGI) was engineered in the lab of David R. Liu [21]: AIDmut1 base-editor (AID * Δ-nCas9), in which, AID * Δ is a mutated wild type AID (K10E + T82I + E156G) with partial depletion (196-198AA) of its C terminus nuclear exporting sequence (NES) which ablates its nuclear export signal while increasing somatic hypermutation (SHM). AID * Δ was developed by Hess et al. [28], and we tethered this AID variant to nCas9, thus forming the AIDmut1 base-editor; the last baseeditor is AIDmut2, which is generated by removing the full length of NES (183-198AA) from the AIDmut1 base-editor, and deletion of the full-length NES was proved to create more diversification on target site than partial deletion, according to Ma et al. [29].
To test the resulting base-editors, we transfected HEK293T cells which preintegrated the LaG-2 nanobody sequence in the genome, with two-plasmid mixtures in which one plasmid expresses three CDRs-targeting gRNAs and another expresses the base-editor. Transfection was conducted three times, to facilitate the accumulation of mutations. The results demonstrated that the overall mutation rates of BE3, AIDmut1, and AIDmut2 were 2.92/ 1000 bp -1 , 10.2/1000 bp -1 , and 6.23/1000 bp -1 , respectively. AIDmut1 and AIDmut2 substantially increased the baseediting efficiency compared with BE3. SHM introduces nucleotide alterations at the V regions of heavy and light chain genes at a rate of 1/1000 bp −1 [30], which is enough to enable the selection of B cells producing high-affinity antibodies. So, the mutation rates of tested base-editors, especially AIDmut1 and AIDmut2, were far beyond SHM. The base-editing active window spans from −10 to −50 (counting the PAM as positions 1-3) at three CDRs loci (Figures 2(d)-2(f)). Moreover, BE3 and AIDmut1 caused mainly C-to-T conversion (60% and 55.81%, respectively), while AIDmut2 was mainly G-to-A mutation (55.26%). Meanwhile, low levels of unexpected A/T conversion or deletion were observed for BE3 and AIDmut2. Notably, rare long-range fragment deletion was observed for AIDmut1 and AIDmut2 ( Figure S6); this is not likely the features of base-editors, so we excluded these minor sequences from analysis in Figures 2(e) and 2(f). And there are no significant differences between the ratio of four nucleotides before and after mutation for all of the base-editors (Figures 2(d)-2(f)). In the collection, we propose that expressing AIDmut1 and AIDmut2 base-editors were feasible strategies for nanobody library construction in vitro.
3.3. Nanobody Library Construction by Using Self-Assembling gRNA-Guided and nCas9-Tethered Base-Editors. Based on the above findings, we know that the expression of three gRNAs can effectively guide the baseeditors to the three CDRs and cause mutations in the gRNA recognition region and adjacent sequences ( Figure S7). However, gRNA-target recognition relies mainly on the 20 bp seed region at the 5 ′ end, and a mutation of only 2 bp in the seed region results in a 96% loss of gRNA guidance function [31]. Also, our data demonstrated that mutations generated by AID mutants pile up at a very slow rate. On average, there are only 4 mutation sites on each nanobody sequence, after three times of transfection and mutation ( Figure S7). Therefore, when base-editors are guided by the conventional gRNAs, mutations that emerged from the first round of library construction will prevent the generation of more mutations in the next round.
To iteratively diversify the CDRs, firstly, we reviewed the process of generation of conventional gRNA [32], which is a combination of the endogenous bacterial CRISPR RNA Figure 1: Sequences of LaG-2/G4 and spontaneously mutated LaG-2/G4. The CDRs were marked with red lines. Bases marked with colored backgrounds are the spontaneously mutated bases, and "-" marks the position where the base deletion mutation occurred. 4 BioDesign Research (crRNA) and transactivating crRNA (tracrRNA) into a single chimeric guide RNA (gRNA) transcript. The gRNA combines the targeting specificity of the crRNA with the scaffolding properties of the tracrRNA into a single transcript (Figure 3(a), upper panel). We speculate that the formation of the functional CRISPR/Cas9 complex has no mandatory requirement for the crRNA-tracrRNA-hybridized sequence. This part of the sequence only needs to satisfy the stem-loop structure formed between crRNA and tracrRNA in traditional gRNAs, and we can take advantage of this feature to construct the self-assembling gRNAs. We designed truncated gRNA (gRNAΔ) which does not contain the crRNA part of conventional gRNA. Also, the 14 bp 5′ end tracrRNA sequence that pairs with crRNA was modified, which now pairs with the 3′ constant region adjacent to the CDRs of nanobody mRNA (Figure 3(a), lower panel, and Figure 3(b)). After the pairing of gRNAΔ and the nanobody mRNA, endogenous mammalian RNases will cleave mRNA (generating truncated-mRNA, mRNAΔ) and assist maturation of the gRNAΔ : mRNAΔ duplex (self-assembling gRNA), just as  (d-f) HEK293T cells containing LaG-2 were transfected with indicated combinations of (d) BE3, (e) AIDmut1, or (f) AIDmut2 and 3×gRNAs for 3 times; then the LaG-2 loci were sequenced. Graphs of the enrichment of mutation at each base are shown here; we also indicate the type and number of mutations in the lower-left corner, and "-" means deletion mutation. The pie chart represents the change in the ratio of the four bases before and after the mutation. In (d-f), mutations from at least 10 LaG-2 sequences were analyzed.

BioDesign Research
RNase III cleaves pre-crRNA base-paired with transactivating crRNA (tracrRNA) in the presence of Cas9 [33,34]. The mature self-assembling gRNAs then guide nCas9tethered base-editors to create mutations on target DNA sites. Most importantly, the seed regions of our selfassembling gRNAs always come from the latest mutated mRNA, while maintaining self-targeting ability, enabling continuously diversifying the CDRs.
The base-editing efficiency of self-assembling gRNAguided base-editors was tested under the same condition when using conventional gRNAs. The results demonstrated that the overall mutation rates of AIDmut1 and AIDmut2 were 6.82/1000 bp −1 (Figure 3(c)) and 2.36/1000 bp −1 (Figure 3D), respectively. Both substantially decreased the base-editing efficiency compared with when they were guided by the conventional gRNAs. And more importantly, most of the mutations generated by AIDmut1 and AIDmut2 are outside the CDRs, which is unfavorable for an antibody library building. We speculate that the self-assembling gRNAs may not be processed smoothly and produced three   , and (f) AIDmut2 and 3×gRNAΔ21 for 3 times; then the LaG-2 loci were sequenced. Graphs of the enrichment of mutation at each base are shown here; we also indicate the type and number of mutations in the lower-left corner, and "-" means deletion mutation. The pie chart represents the change in the ratio of the four bases before and after the mutation. In (c-f), mutations from at least 10 LaG-2 sequences were analyzed. 6 BioDesign Research independently functioning self-assembling gRNAs. Also, excess mRNA sequences have the potential to cause singlestrand range expansion, which in turn causes an expanded editing window. As we mentioned above, RNase III cleaves pre-crRNA base-paired with tracrRNA in the presence of Cas9. Although we do not know which endogenous mammalian RNases cleave gRNAΔ : mRNA duplex, we speculate that mammalian RNases working as RNase III cleave the double-stranded RNA after the formation of gRNAΔ : mR-NA : Cas9 trimer. It stands to reason that prolonging the 5′ end of gRNAΔ could facilitate RNases binding and cleavage of mRNA ( Figure S8A). So we tested the base-editing ability of AIDmut1 and AIDmut2 guided by 5 ′ end prolonged 21 bp gRNAΔ (gRNAΔ21). The results demonstrated that the overall mutation rates of AIDmut1 and AIDmut2 were 9.97/ 1000 bp −1 (Figure 3(e)) and 10.7/1000 bp −1 (Figure 3(f)), respectively, both substantially increased the base-editing efficiency compared with when they were guided by the shorter gRNAΔ. Moreover, mutations are more concentrated in CDRs for AIDmut2 guided by gRNAΔ21, 62.22% of mutations are in CDRs (Figure 3(f)). Since nanobody mRNAs were engaged in generating self-assembling gRNAs, to test whether the loss of mRNAs would affect the subsequent expression and screening of nanobodies in the future, we display nanobodies on the surface of 293T cells by the addition of an N-terminal signal peptide and a Cterminal transmembrane domain. 293T cells containing membrane-expressed LaG-2 (mLaG-2) were transiently electroporated with AIDmut1 or AIDmut2 and selfassembling gRNAs. Then mLaG-2 expression was evaluated by flow cytometry, and we did not observe any loss in nanobody expression ( Figures S8B and S8C). We speculate that this is because the U6 promoter is a weak promoter and only a small amount of gRNAΔ21 will be expressed, then hijack only a few endogenous mRNAs, and thus do not have a significant impact on the expression of nanobodies. We also observed the production of a small number of stop codons ( Figures S8D and S8E), which again have not impaired the expression of the nanobodies as shown in Figures S8B and S8C.
To further confirm the iterative evolutionary capacity of our self-assembling gRNAs, we mutated 2 bases on the 20 bp seed region recognition site. Then, we tested the base-editing ability of AIDmut1 and AIDmut2 guided by 5 ′ end prolonged 15 bp gRNAΔ (gRNAΔ15). As expected, mutation of the seed region recognition site will not affect the binding of truncated gRNAs. Base-editors directed by selfassembling gRNAs substantially increased the base-editing efficiency compared with base-editors directed by conventional gRNAs; moreover, mutations are more concentrated in CDRs for base-editors guided by truncated gRNAs  Figure 4: Comparison of mutation tolerance for base-editors guided by conventional gRNAs and self-assembling gRNAs. (a-d) HEK293T cells containing LaG-2 which contains 2 mutated bases on the 20 bp seed region recognition site (mutation sites marked with two black triangles) were transfected with indicated combinations of (a) AIDmut1 and 3×gRNA, (b) AIDmut2 and 3×gRNA, (c) AIDmut1 and 3×gRNAΔ15, and (d) AIDmut2 and 3×gRNAΔ15 for 3 times; then the LaG-2 loci were sequenced. Graphs of the enrichment of mutation at each base are shown here; we also indicate the type and number of mutations in the lower-left corner, and "-" means deletion mutation. The pie chart represents the change in the ratio of the four bases before and after the mutation. In (a-d), mutations from at least 10 LaG-2 sequences were analyzed. Figure S1: G-quadruplex and nucleolin-tethered base-editormediated GDP technology. Figure S2: enrichment of the LaG-2/G4 spontaneous mutation at each base in HEK293T cells. Figure S3: enrichment of the LaG-2/G4 spontaneous mutation at each base in Stbl3. Figure S4: mutations on LaG-2/G4 generated by Hieff Canace® High-Fidelity DNA Polymerase. Figure S5: mutations on LaG-2/G4 variants generated by different High-Fidelity DNA Polymerases. Figure S6: amino acid mutations on LaG-2 that were generated by conventional gRNA-guided AIDmut1 and AIDmut2. Figure S7: mutations on LaG-2 DNA generated by conventional gRNA-guided AIDmut1. Figure S8: the characteristics of 3×gRNAΔ21-guided base-editors.