Targets of Somatic Hypermutation Within Immunoglobulin Light Chain Genes in Zebrafish

IMMUNOLOGY
ORIGINAL ARTICLE
Targets of somatic hypermutation within immunoglobulin light chain genes in zebrash
Alexis E. Marianes* and Anastasia M. Zimmerman

Department of Biology, College of Charleston, Charleston, SC, USA
Summary In mammals, somatic hypermutation (SHM) of immunoglobulin (Ig) genes is critical for the generation of high-afnity antibodies and effective immune responses. Knowledge of sequence-specic biases in the targeting of somatic mutations can be useful for studies aimed at understanding antibody repertoires produced in response to infections, B-cell neoplasms, or autoimmune disease. To evaluate potential nucleotide targets of somatic mutation in zebrash (Danio rerio), an enriched IgL cDNA library was constructed and > 250 randomly selected clones were sequenced and analysed. In total, 55 unique VJ-C sequences were identied encoding a total of 125 mutations. Mutations were most prevalent in VL with a bias towards single base transitions and increased mutation in the complementarity-determining regions (CDRs). Overall, mutations were overrepresented at WRCH/DGYW motifs suggestive of activationinduced cytidine deaminase (AID) targeting which is common in mice and humans. In contrast to mammalian models, N and P addition was not observed and mutations at AID hotspots were largely restricted to palindromic WRCH/DGYW motifs. Mutability indexes for di- and trinucleotide combinations conrmed C/G targets within WRCH/DGYW motifs to be statistically signicant mutational hotspots and showed trinucleotides ATC and ATG to be mutation coldspots. Additive mutations in VJ-C sequences revealed patterns of clonal expansion consistent with afnity maturation responses seen in higher vertebrates. Taken together, the data reveal specic nucleotide targets of SHM in zebrash and suggest that AID and afnity maturation contribute to antibody diversication in this emerging immunological model. Keywords: immunoglobulin; somatic hypermutation; zebrash
doi:10.1111/j.1365-2567.2010.03358.x Received 18 May 2010; revised 18 August 2010; accepted 20 August 2010. *Present address: Department of Molecular, Cell, and Developmental Biology, Johns Hopkins University, Baltimore, MD, USA. Correspondence: A. M. Zimmerman, Department of Biology, College of Charleston, 66 George Street, Charleston, SC 29424, USA. Email: zimmermana@cofc.edu Senior author: Anastasia M. Zimmerman
Introduction
A hallmark of the adaptive immune system is the capacity to mount a heightened memory response to pathogens encountered upon sustained or recurrent infection. Within mammalian models, an integral part of this protection stems from the ability of the immune system to ne-tune its diverse repertoire of antigen receptors over time through mutation and selection.1 In the case of B cells and antibody responses, initial receptor diversity is created by V(D)J rearrangement of immunoglobulin gene segments to form functional immunoglobulin genes. The modular nature of the immunoglobulin segments, imprecise joining, and addition of nucleotides at V(D)J junc240
tions generate an initial repertoire of nave B cells with membrane-bound antigen receptors (B-cell receptors). When a pathogen triggers an immune response, B cells with specicity to antigen rapidly relocate to the T/B-cell interface of lymphoid tissues where they are stimulated to proliferate through interactions with T helper (Th) cells and appropriate cytokines. The cellular outcome of the resultant daughter cells is thought to bifurcate down one of two pathways. Cells either terminally differentiate into post-mitotic short-lived antibody-producing plasma cells or inltrate germinal centres of lymphoid follicles or other comparable structures and undergo massive and rapid clonal expansion. During this expansion, B cells can be subject to somatic hypermutation (SHM) in
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
SHM in zebrash
which point mutations are introduced into variable (V) regions of rearranged immunoglobulin DNA. This targeted mutation is thought to be largely responsible for generating subsets of B cells with slightly different afnities to the antigen. In mice and humans, the mutation rate in V regions is estimated to be quite high, at 10)3 per base per generation, that is, 106-fold higher than the rate of background mutations in DNA.2,3 In mammals, mutations in rearranged immunoglobulin gene segments have been found to cluster in CDRs of VH and VL, which together encode the regions of the antibody most likely to inuence afnity for antigen. Following hypermutation, B cells encoding receptors with greater specicity for antigen can undergo positive selection4 through a process referred to as afnity maturation. Selected B cells may further differentiate into either longlived plasma cells or memory B cells which contribute to enhanced antibody responses upon reinfection by pathogen. At present, bony shes represent one of the most ancestral groups of animals known to produce antibodies for which high-quality genome sequences have become available. The availability of genomic sequences has recently facilitated genome-wide annotations of the germline segments available for V(D)J rearrangements. Prior to such annotations, the repertoire of immunoglobulin segments available for rearrangements was often unknown, making alignment of expressed transcripts with concordant genomic segments difcult to ascertain. Recently, using a full annotation of the zebrash (Danio rerio) IgH locus,5 Weinstein et al.6 analysed an expressed repertoire of immunoglobulin H (IgH) gene segments in zebrash to determine VH family usage. Previous work in our laboratory7 coupled genomic annotation with expression data to reveal VJ-C expression of IgL loci from ve different chromosomes in zebrash. In the present study, we constructed an enriched cDNA library of zebrash IgL in order to generate a sufcient quantity of VJ-C transcripts with which to ascertain potential nucleotide targets of somatic hypermutation. Collectively, the data presented
BX511206 BX649562 BX571825 (BAC zK158E13) BX914202 CT009671 CR384077 BX640456
V1 J1C1 V2 V3P J2C2P V4 V5 J3C3 V6P J4C4 V7 J5C5 V8P
herein reveal several patterns of mutation in the IgL of zebrash and extend understanding of the processes underlying the generation of antibody diversity in this emerging immunological model.
Materials and methods

Targeted IgL loci
Rearrangements from concordant immunoglobulin segments from BACzK158E13 depicted in Fig. 1 were targeted for this study. Our decision to target exons that could be anchored to this BAC was based on the lack of gaps and the fact that these IgL loci were isolated in a single region on chromosome 19 far from other potential IgLs. Our previous per cent matrix and tree analyses7 showed that zebrash IgLs were conned to a single chromosomal region group together, indicating that loci of a single chromosomal region are more likely to be similar to one another than those found in other parts of the genome. By targeting a single VL, it was predicted that sufcient numbers of unique transcripts could be obtained to facilitate both identication of potential nucleotides targeted for mutation and groups of nucleotides comprising mutational hotspot motifs. In addition, by targeting a single VL, each VJ-C transcript could be aligned to both targeted and neighbouring VL to ensure that concordant assignments were made and that deviations ascribed to mutations were not alterations of closely related gene segments.
The zebrash V7 locus

Initial polymerase chain reaction (PCR) studies (data not shown) revealed that primers in V1 and V7 leader sequences in combination with primers to C1, C2, C3, C4 or C5 C regions more readily amplied VJ-C sequences from a panel of cDNA derived from zebrash of various ages. Because the V1 heptamer (CACAGTA) deviates at the seventh base while the V7 heptamer and nonomer are
BX927234 500 kb 923 kb
100 kb
Figure 1. Targeted zebrash immunoglobulin L (IgL) germline reference sequence. The existing genomic annotation of zebrash IgL was extended by assembling overlapping BAC inserts. BACs with overlapping end reads were prioritized for sequencing at the Sanger Center (http:// www.sanger.ac.uk/Projects/D_rerio). The 923-kb contiguous chromosomal tiling path was manually assembled using the Artemis Annotation Software package. The IgL gene segments targeted are clustered in a single region on zebrash chromosome 19. Immunoglobulin segments at this locus are divergent, enabling alignment of cDNA sequences with germline sequences to determine patterns of somatic hypermutation (SHM). Each BAC is designated by its corresponding NCBI accession number and drawn approximately to scale, while the IgL locus is expanded with exon sizes exaggerated.
241
A. E. Marianes and A. M. Zimmerman

canonical, we selected V7 as the target locus in which to analyse potential mutational events in adult zebrash. Our decision to target an IgL locus with a canonical recombination signal sequences (RSS) was made on the assumption that such a locus would be likely to yield the highest number of transcripts. This assumption is supported by previous work by Ramsden and Wu,8 who found that synthetic substrates of mouse origin showed favoured expression of V segments anked by RSS that more closely resemble canonical heptamer (CACAGTG) and nonamer (ACAAACC) sequences. By focusing on a single rearranged IgL gene (V7) and constructing an enriched cDNA library from a single individual, direct comparison of sequences could be utilized to discern mutational hotspots and potential lineages of clonal expansions could be examined. from total RNA using the Micro-Fast Track mRNA Isolation Kit (Invitrogen, Carlsbad, CA). Poly(A) mRNA was reverse-transcribed into cDNA according to the manufacturers protocol using an RLM-RACE kit (Ambion, Carlsbad, CA). Briey, rst-strand cDNA synthesis was initiated with an oligo-dT adapter primer. cDNA was then subjected to PCR using forward primers that span the V7 leader/exon junction of BAC ZK158E13 and reverse primers anchored to the adapter sequence added using the RACE kit. One-sided nested PCR was then carried out using V7 primers and reverse primers designed to amplify cDNA corresponding to conserved regions of C1, C2, C3, C4 or C5 identied on BAC clone ZK158E13 (primers listed in Table 1). Products were run on agarose gels, bands of appropriate sizes were excised using a QIAquick Gel Purication kit (Qiagen, Valencia, CA), and fragments were cloned into TOPO T/ A pCR21 cloning vectors (Invitrogen) and transformed into TOP10 cells (Invitrogen). Colonies were picked by blue/white screening, expanded, and maintained in agar stabs. In total, plasmids from > 250 colonies were puried (Qiagen miniprep) and EcoRI (New England Biolabs, Beverly, MA) restriction digests were performed to identify clones with inserts.
Animals and RNA extraction

Zebrash embryos (Tubingen line) were obtained from the Zebrash International Resource Center (Eugene, OR) and raised under standard conditions9 to establish labora tory equilibrium of normal immune function. The Tubingen line of sh was chosen as reference genomic segments from the BAC zK158E13 clone were derived from sh of the Tubingen strain. A single adult zebrash (2 years of age) was anaesthetized with MS222 (Sigma Chemicals, ST. Louis, MO) from which organs were harvested. Upon removal, the haematopoietic tissues (pronephros, mesonephros and spleen) were pooled, snap-frozen in liquid N2, and held at )80 prior to RNA extraction.
DNA sequencing
Clones with inserts were sequenced bi-directionally using universal M13 forward and M13 reverse primers at the Clemson University Genomics Institute (Clemson, SC). Plasmid vectors and PCR primers were trimmed from sequences, and overall sequence quality and automated sequencing calls were veried by inspection of each sequence chromatogram. In total, 220 VJ-C clones were identied to have high-quality forward and reverse complement sequencing reads.
cDNA synthesis, library construction and cloning of VJ-C rearrangements

Total RNA was extracted with Trizol (Life Technologies, Carlsbad, CA) and poly(A)-enriched mRNA was isolated
Table 1. Primers and polymerase chain reaction (PCR) or reverse transcription conditions Targeted transcript V7 C1/C2/C3/C4/C5 C5 EF1a Primer 50 -TGACTGTAGTGACTCAGAGTCC-30 50 -GCTCAGGCTGCTGCTCCAGC-30 50 -TGTACAGTCCATCCTC-30 FWD: 50 -CCTGGTGACAACGTTGGCTT-30 RVS: 50 -GAACGGTGTGATTGAGGGAA-30 Invitrogen (product no. 48190-011) Invitrogen (product no. 12577-011) 50 -GCGAGCACAGAATTAATACGACT-30 50 -GCGAGCACAGAATAATACGACTCACTATAGG(dT)-30 Conditions 5 min at 94; 30 cycles (30 seconds at 95, 30 seconds at 5055 and 60 seconds at 72); 10 min at 72 4 min at 94; 30 cycles (30 seconds at 95, 30 seconds at 56, 60 seconds at 72 and 10 min at 72); 10 min at 72 10 min at 25 and 30 min at 42 30 min at 42 1 hr at 42 3 min at 94; 35 cycles (30 seconds at 94, 30 seconds at 60 and 30 seconds at 72); 7 min at 72
Random hexamers Oligo-dT(20-VN)1 Oligo-dT adapter 30 RACE adapter
Oligo-dT(20-VN) is mixture of 12 primers, each a string of 20-dT residues followed by two additional variable nucleotides (VNs). The VN anchor targets primer annealing at the 50 end of the poly(A) tail. EF1a, elongation factor 1a; RACE, rapid amplication cDNA ends.
242
SHM in zebrash
Alignment of expressed VJ-C sequences to genomic regions
The 220 VJ-C sequences were compared against the nonredundant nucleotide database at national center biological information (NCBI) using the megaBLAST algorithm. Sequences were subsequently compared against VL, JL, and CL identied7 in zebrash using the Matrix Global Alignment Tool.10 All of the 220 VJ-C clones had highest identity to the targeted V7 gene segment with at least 95% identity. The stringent 95% requirement of IMGT/ V-QUEST11 was employed, as the existence of additional VL cannot be ruled out from the genome. Given that the per cent variability in nucleotide sequences of identied zebrash VL ranges from 43 to 93% overall, with VL on Chr 19 (V1V8) ranging from 495 to 836%, a 95% criterion is suitably rigorous. The resultant 220 VJ-C sequences were aligned to germline IgL segments using CLUSTALW,12 and CDRs and frameworks (FRs) were dened using the rules of Kabat.13 In total, from the 220 VJ-C clones, 55 unique sequences containing a total of 125 mutations from concordant germline immunoglobulin segments were identied. Because any somatic mutation could in theory be carried during the clonal expansion of a single B cell or be amplication of the same transcript by reverse transcriptase (RT)-PCR, identical VJ-C sequences were deemed to represent a single B-cell population and therefore counted only once in the mutational analyses. In addition, as mutations can be additive if resultant B cells are derived from a common founder, the 55 unique VJ-C sequences were scored for mutations exclusive for that clone and with all mutations included. Unique VJ-C sequences were submitted to NCBI (accession numbers in Table 2). in the cDNA library construction. Products were run out on agarose gels, bands were puried, amplicons were cloned into pCR21 cloning vectors, and plasmids were transformed into TOP10 cells (Invitrogen). Twelve subclones were randomly selected for bi-directional sequencing and no mismatches were identied in 11 280 resultant bases. Thus, the PCR amplication error rate can be considered negligible and few, if any, of the base pair changes found in the VJ-C cDNA clones warrant being ascribed to PCR or sequencing errors.
Calculation of mutability indexes

Mutability indexes were calculated as described by Shapiro et al.14 Briey, a mutability index is a measure of observed/expected mutations. The observed number of mutations within a mono-, di- or trinucleotide (the target) is divided by the number of times the specic base or group of adjacent bases would be expected to be mutated for a mechanism without target bias. A mutability index score of 10 is assumed to represent random (unbiased) mutation whereas higher scores imply that the specic bases are targeted by mutation or selected over other mutations during B-cell development. Mutability indexes were calculated for each of the mononucleotides (A,T,C and G), the 16 dinucleotide combinations, and 64 possible trinucleotides of the germline VL region sequenced. The SMS DNA Software Suite15 and Microsoft Excel were used for calculations and database management. Initially, the number of times each target occurred in the germline VL region was determined. For dinucleotide analysis, one extra nucleotide from adjacent germline sequence at the end of each region was included and two extra germline nucleotides were included for trinucleotides. The total number of each specic target present in the germline VL was divided by the number of all targets to generate relative target frequencies. The use of relative target frequencies was necessary to normalize the data as each base or group of adjacent bases does not occur in equal proportions over the regions analysed. Finally, relative frequencies were multiplied by the total number of mutations to yield an expected mutation number. Observed numbers of mutations were divided by expected numbers for each target to obtain mutability index scores.
Tests for genomic contamination

cDNA utilized to construct cDNA libraries was subjected to PCR-based amplication using primers corresponding to the zebrash elongation factor 1a (EF1a) housekeeping gene (Table 1). These EF1a primers amplify segments separated by an intronic region, thus facilitating detection of contaminating genomic DNA in cDNA preparations through the presence of larger EF1a bands on agarose gels. In each of the cDNA preparations used to generate VJ-C cDNA libraries, DNA contamination was not detected.
Statistical analyses
Chi-squared analyses of mono, di- and trinucleotide mutability indexes were carried out by contrasting observed mutational frequencies to their expected (nonbiased) mutational frequencies. For mutability indexes, P values < 001 were considered statistically signicant. In cases where mutations could be assigned to different di- or tri-nucleotide targets, Bonferroni corrections were applied. Chi-squared analyses were also performed to
243
Taq polymerase delity assay

To calculate potential error rates arising from amplication or sequencing, a set of experiments was undertaken to generate subclones from one of our resultant IgL rearranged VJ-C clones (accession no. EU795303). This isolated plasmid was nicked and used as the template in PCR using gene-specic primers identical to those used

Table 2. Distribution of immunoglobulin L (IgL) mutations in zebrash VJ-C cDNA clones Accession number EU795304 Table 2. (Continued) Accession number Region1 VL-CDR1 (50) VL-CDR2 (118) VL-CDR2 (158) VL-CDR3 (259) VL-CDR2 (137) VL-FR3 (186) VL-CDR3 (259) VL-CDR3 (259) CL (126) VL-CDR3 (259) CL (112) CL (154) VL-CDR1 (65) VL-CDR3 (259) VL-FR3 (196) VL-CDR3 (259) CL (138) VL-CDR3 (259) CL (17) VL-FR2 (87,88) VL-FR2 (92) JL (18) CL (23) CL (34) VL-CDR3 (259) CL (22) VL-FR2 (80) VL-CDR3 (259) VL-CDR2 (166) VL-CDR3 (259) VL-FR2 (82) VL-FR2 (93) VL-CDR3 (259) CL (54) VL-CDR3 (259) CL (17) CL (22) JL (14) VL-CDR3 (259) JL (24) VL-FR1 (14) VL-FR1 (15) VL-FR1 (19) VL-FR1 (21) VL-FR1 (27) VL-FR1 (30,31) VL-FR1 (33) VL-CDR3 (259) VL-CDR3 (259) CL (154) CL (63) VL-CDR3 (259) CL (95) Mutation2,3 AGTAGC GGAGAA AGTAGG TGCTAC TCTTCC CTGTTG TGCTAC TGCTAC GGCAGC TGCTAC C, insertion C, insertion GACGAT TGCTAC CGT CAT TGCTAC C, insertion TGCTAC GGCGC CCTCTG GGAGAA T, insertion GCGGCA CCCCTC TGCTAC C, insertion TTGTTA TGCTAC GGAGAA TGCTAC CAGCGG AAAGAA TGCTAC AAGGAG TGCTAC GGCGC C, insertion GGCTGC TGCTAC C, insertion GCAGCC GGGAGG GATGTT TCTCCT TCTCCT ATCTCC AGCGGC TGCTAC TGCTAC C, insertion GCTACT TGCTAC TTTTAC Type4 T T V T T T T T T T NA NA T T T T NA T NA T,V T NA T T T NA T T T T T T T T T NA NA V T NA V* T V T T V,T T T T NA T T T Codon5 Silent (Ser) GlyGlu SerArg CysTyr Silent (Ser) Silent (Leu) CysTyr CysTyr GlySer CysTyr Frameshift Frameshift Silent (Asp) CysTyr ArgHis CysTyr Frameshift CysTyr Frameshift ProLeu GlyGlu Frameshift Silent (Ala) ProLeu CysTyr Frameshift Silent (Leu) CysTyr GlyGlu CysTyr GlnArg LysGlu CysTyr AspGly CysTyr ORF intact GlyCys CysTyr Frameshift Silent (Ala) GlyArg AspVal SerPro SerPro IleSer SerGly CysTyr CysTyr Frameshift AlaThr CysTyr Silent (Phe) EU795329 EU795330 EU797178 EU797179 VL-FR3 (217) VL-FR3 (181) VL-FR2 (98) VL-CDR1 (41) VL-CDR3 (259) JL (30) VL-CDR1 (58) VL-FR2 (103) VL-CDR3 (261) VL-FR2 (82) JL (11,12) VL-CDR2 (137) VL-FR3 (208) VL-FR3 (212) VL-FR3 (214) CL (122) VL-CDR2 (124) VL-CDR3 (259) VL-CDR3 (259) JL (30) VL-FR3 (218) VL-CDR3 (259) VL-FR2 (97) VL-CDR3 (259) VL-CDR1 (71) CL (78) VL-CDR1 (64) VL-CDR2 (176) VL-CDR2 (172) VL-CDR3 (259) JL (30) VL-CDR1 (65) VL-CDR3 (259) CL (43) VL-FR1 (34) VL-CDR1 (56) VL-CDR1 (71) VL-CDR1 (44) VL-CDR3 (259) VL-FR3 (180) VL-FR1 (34) VL-CDR1 (71) VL-CDR3 (259) CL (78) VL-CDR3 (235) VL-CDR3 (259) JL (30) VL-FR1 (23) VL-FR2 (97) VL-CDR2 (129) VL-CDR3 (259) VL-FR1 (27) VL-CDR3 (259) CL (28) GCAGAA TTTTAT GCTGCC ACTACC TGCTAC GTTGAT GGTGCT CCTCCC AGTGGT CAGCGG GGA AAA TCTTCC CCTCAT GAAGA GATGCT AAGAAA AGTAAT TGCTAC TGCTAC GTTGAT GCAGCG TGCTAC GCTGTT TGCTAC AGCAGT GTGCTG GACGGC CCTCCC GAAGGA TGCTAC GTTGAT GACGAT TGCTAC GATGGT AGCACC GTTGTC AGCAGT GGGGGA TGCTAC TTTGTT AGCACC AGCAGT TGCTAC GTGATG ATGACG TGCTAC GTTGAT TCTTCC GCTGTT CTTTTT TGCTAC TCTCCT TGCTAC CTTCGT V V T* T* T V V T* T T T,T T* V NA V T T T T V T* T T T T V T T* T T V T T T V T* T T* T V V T T T T T V T* T T T T T V AlaGlu PheTyr Silent (Ala) Silent (Thr) CysTyr ValAsp GlyAla Silent (Pro) SerGly GlnArg GlyLys Silent (Ser) ProHis Frameshift AspAla Silent (Lys) SerAsn CysTyr CysTyr ValAsp Silent (Ala) CysTyr AlaVal CysTyr Silent (Ser) ValLeu AspGly Silent (Pro) GlnGly CysTyr ValAsp Silent (Asp) CysTyr AspGly SerThr Silent (Val) Silent (Ser) Silent (Gly) CysTyr PheVal SerThr Silent (Ser) CysTyr ValSer MetThr CysTyr ValAsp Silent (Ser) AlaVal LeuPhe CysTyr SerPro CysTyr LeuArg
Region1
Mutation2,3
Type4
Codon5
EU795305 EU795306 EU795308 EU795309
EU797180 EU797181 EU797183 EU797184 EU797185
EU795310 EU795311
EU797186 EU821496 EU821497 EU821498 EU821500 EU821501 EU821502 EU821503 EU821504
EU795313 EU795314
EU795315 EU795316 EU795318 EU795319
EU821505
EU821507
EU795320
EU821508 EU821510 EU821511 EU821512 EU821513
EU795321 EU795322 EU795323
EU821518
EU795324 EU795326 EU795328
EU821519
244
SHM in zebrash
Table 2. (Continued) Accession number EU821520
Region1 VL-FR1 (32) VL-FR3 (218) VL-FR3 (222) VL-CDR3 (259) VL-CDR1 (71) VL-FR3 (181) VL-CDR3 (259) JL (29) VL-CDR2 (124) CL (95) CL (152) VL-CDR2 (125) VL-FR3 (196,197) VL-CDR3 (259)
Mutation2,3 ATCATT GCAGCG GTTTTT TGCTAC AGCAGT TTTTCT TGCTAC GTTATT AGTAAT TTTTT T, insertion AGTAGC CGTCAC TGCTAC
Type4 T T* V T T T T T T NA NA T T, T T
Codon5 Silent (Ile) Silent (Ala) ValPhe CysTyr Silent (Ser) PheSer CysTyr ValIle SerAsn Frameshift Frameshift Silent (Ser) ArgHis CysTyr
as the number of mutations per base sequenced in each gene segment region, the percentages became 065, 051 and 022. These results show that mutations are concentrated in the VL regions in terms of both higher overall numbers and higher density when compared with the JL and CL regions.
EU821521 EU821522 EU821523 EU825202 EU825203 EU825204 EU821516***
Distribution of VL, JL and CL mutations

The distribution of somatic mutations in VL, JL and CL regions is depicted in Fig. 2. In this gure, mutation frequencies are plotted against position number, with position 1 being the initial base of the rst framework region of the V7 gene segment. Mutation frequencies are illustrated for 20 base pair intervals along the regions sequenced. The relative locations of CDR regions are depicted by lines at the top of the diagram and the approximate locations of VL, JL and CL are shown as boxes under the position number. Mutations were found in all regions, with the highest incidence being in the VL CDR3 region. The JL region, although small, exhibited a slightly higher overall mutation frequency rate than the collective region spanned by CL. Within the CL region, the mutation frequency decreased slightly with distance from VL, with no mutations found in the last two segmental groupings of CL.
VL, variable region; JL, joining region; FR, frame work region; CDR, complementarity-determining region. 2 Mutated nucleotides are underlined, and depicted within triplet bases of codons. 3 Bases shaded correspond to targeting of the G and C nucleotides of DGYW/WRCH** AID hotspot motifs. 4 Transition (T) or transversion (V) base mutations. 5 Amino acids resulting in a change in side-chain polarity are shown in italics. *Designates mutation in the wobble position of a degenerate codon. For example, in the third position of glycine codons (GGA, GGC, GGG and GGT) all nucleotide substitutions are synonymous (do not change the amino acid). **DGYW (AGT/G/CT/AT); WRCH (AT/GA/C/TAC). ***Clone with no mutations from germline gene segments.
Absence of N and P addition

Interestingly, we did not nd evidence of either N or P addition at the CDR3/JL junctions in any of the cDNA sequences analysed. The assignment of nucleotides to the CDR3 and JL regions at the CDR3/JL junction was straightforward as, for each of the 55 sequences, bases at this junction could be assigned to the 30 and 50 ends of concordant germline sequences. We did, however, nd a single clone (accession no. EU797180) for which three bases from the 30 end of the germline VL and three bases from the 50 end of the germline JL were not present. This six-base difference can probably be attributed to imprecision in rag-mediated double-strand breakage or exonuclease activity prior to joining. These six bases were therefore not counted in the somatic mutation frequency analysis as they would have been similarly absent in the original B-cell rearrangement and thus not subject to SHM. The lack of N and P addition and the almost uniform size of the coding junctions were surprising and suggest that recombination site diversity might be somewhat limited in zebrash IgL.
evaluate mutational frequencies and distributions of each WRCH/DGYW motif as reported within the results. Statistical analyses of antigen selection pressure on FR and CDR regions were carried out using the multinomial distribution model of Lossos et al.16 which is presently available as a JAVA applet at http://www-stat.stanford.edu/ immunoglobulin. For multinomial distributions, an excess of CDR replacements or scarcity of FR replacements was judged signicant at P < 005.
Results
Somatic mutation occurs within VL, JL and CL encoded regions of zebrash IgL
Alignments of 207 682 VJ-C encoded nucleotides with concordant germline gene segments revealed 125 mutations over 55 unique VJ-C cDNA sequences (Table 2). The majority of mutations were found in VL; however, the JL and CL regions were also found to have mutations. The percentage of total mutations in the VL, JL and CL regions was 75, 8 and 17%, respectively. When weighted
Mutational bias towards single base transitions in zebrash VL

In total, 94 VL mutations were identied among the 55 unique VJ-C sequences (Table 2). Of these, 93 were base
245

35
Mutation frequency (%)
30 25 20 15 10 05 00
CDR1
CDR2
CDR3
Mutation frequency WRCH/DGYW Palindromic AID hotspot
8 7 6 5 4 3 2 1 0
Figure 2. Distribution of mutation frequency and activation-induced cytidine deaminase (AID) WRCH/DGYW hotspot motifs in targeted zebrash immunoglobulin L (IgL). The combined frequency of mutations (primary vertical axis) per base sequenced for 55 unique VJ-C cDNA sequences plotted against position number (horizontal axis) reveals variability in VL, whereas mutations in CL were infrequent. The additive number of WRCH/DGYW hotspot motifs (secondary vertical axis) in the targeted germline IgL when plotted against position number shows an apparent trend of increasing mutation at AID hotspots. Within mammals, WRCH/DGYW motifs have been found to be the principal hotspot for AID-induced G:U lesions in rearranged immunoglobulin genes during somatic hypermutation. The bi-directional arrows show locations of palindromic WRCH/DGYW hotspot motifs which coincide with the highest mutation frequencies of the zebrash IgL. The lower diagram depicts locations of VL, JL and CL regions with respect to overall distribution of the VJ-C sequence and the complementarity-determining regions (CDRs) are depicted in the graph plot area.
substitutions and one was a single base deletion (EU797185). As insertions or deletions in non-triplet increments result in frameshifts, it is not surprising that their occurrence was rare in the VJ-C cDNA clones. B cells with non-productive VL caused by frameshifts would inherently lack functional B cell receptor (BCR) and probably be selected against. Of the 93 VL single base substitutions, base transitions (79; 85%) were more abundant than transversions (14; 15%). The transition to transversion ratio of VL mutations was 564, which is considerably higher than the theoretical 1 : 2 ratio (P < 001) if random bases were incorporated during substitutions. These results are consistent with analyses in mammalian models demonstrating a strong tendency for transitions by SHM.17 Notably, 24 of the 94 VL mutations occurred in codon wobble positions (Table 2) and, of these, 96% (23/24) resulted in silent mutations. This ratio (23 : 1) is also higher than expected assuming random base substitution. These results indicate that transitions and silent mutation accumulation are prevalent in the zebrash IgL analysed in this study.
% R and S mutations
2 21 0 4 41 0 6 61 0 8 81 0 10 100 1 12 12 1 0 14 140 1 16 16 1 0 18 180 1 20 20 1 0 22 220 1 24 240 1 26 26 1 0 28 280 1 30 300 1 32 32 1 0 34 340 1 36 360 1 38 38 1 0 40 40 1 0 42 420 1 44 44 1 0 46 460 1 48 0
VL
JL Position number
CL
incorporation of an A in the daughter strand at the complementary position during replication. In a subsequent round of DNA replication, if left unrepaired, the A in the daughter strand results in a xed C-to-T mutation in the coding strand. Similarly, G-to-A exchanges can result from C deamination on the non-coding strand and two rounds of replication. As illustrated in Fig. 3, the majority of the cytidine mutations observed in VL (8/14) were found at the widely accepted WRCY (rened WRCH18)
45 40 35 30 25 20 15 10 5 0
AID hotspot Outside AID hotspot Outside Outside Outside (WRCH) AID motif (DGYW) AID motif AID motif AID motif Cytidine mutations Guanosine mutations Adenosine Thymidine mutations mutations
Mutations are concentrated at WRCH/DGYW hotspot motifs

Overall, G-to-A exchanges were the most common mutation observed in the zebrash VL (41; 44%). In addition, C-to-T (11; 12%) exchanges were also present, albeit in lower numbers (Table 2). Either of these types of mutations could occur through xation by AID-targeted deamination of deoxycytidine to deoxyuracil. For example, a C deaminated to a U on the coding strand could prompt
246
Figure 3. Base exchanges are biased to replacement (R) mutations and nucleotide targeting at activation-induced cytidine deaminase (AID) hotspot motifs (WRCH/DGYW). The majority of the base exchanges observed resulted in amino acid changes [replacements (R); black areas of bars]. Silent mutations (S; white areas), while found in all four bases, were proportionally higher outside of WRCH/DGYW hotspot motifs. These data suggest that neutral mutations may be more prone to accumulate in bases outside of hotspots whereas replacement mutations are favoured at AID hotspot motifs.
Number of AID hotspot motifs

Silent Replacement
SHM in zebrash
AID mutation hotspot motif. Even more abundant was cytidine mutation (37/45) on the non-coding strand (reverse complement of the hotspot motif RGYW resulting in G mutations on the coding strand). These ndings suggest that AID hotspot motifs are targets for mutations in zebrash and resultant mutations appear biased to xation in daughter strands by U to A base pairing during DNA replication. analysis showed that 89% (41/46) of the germline C/G targets in WRCH/DGYW motifs were in the rst or second codon position. Given that this percentage is higher than the 666% expected if WRCH/DGYW targets were distributed equally among the three codon positions, it appears that there is a propensity for replacement mutations at these motifs. There was, however, no statistically signicant difference between FR and CDR regions with respect to AID targets at wobble positions (P < 062), meaning that, unless they are subject to other selective pressures, the mutations induced at AID hotspots in both FR and CDR regions are more prone to be replacements. Of the 46 mutations in WRCH/DGYW hotspots, 89% (41/46) occurred at non-wobble positions and, of these, 100% (41/41) resulted in replacements. These numbers are considerably higher than the 57% (27/47) replacement rate outside AID hotspot motifs (Fig. 3). Thus, the data suggest that non-random mutations are favoured, with a strong bias towards replacements in WRCH/DGYW motifs.
Uracil glycosylase DNA repair appears to be limited at AID hotspots

In total, 49% (46/93) of all VL base exchanges and 76% (45/59) of all C/G mutations were found at WRCH/ DGYW hotspot motifs. In addition, 89% (40/45) of the C/G mutations at these hotspots were G-to-A and C-to-T base exchanges. Both this hotspot targeting and the high frequency of these types of base exchanges suggest that the mutations observed at the WRCH/DGYW hotspot motifs positions may be attributable to AID activity. Moreover, given that AID induces C deamination to uracil in DNA, which, if left unrepaired, becomes xed as G-to-A or C-to-T mutations following replication, it appears that uracil DNA glycosylase repair may be limited at AID hotspots in zebrash. The remaining base exchanges (5/45) identied at these hotspots, however, suggest that base excision repair may not be entirely absent. For example, in these ve cases (one G-to-T, three G-to-C, and one C-to-A), these mutations could have arisen if, after C deamination, uracil DNA glycosylase initially removed the corresponding deoxyuracil, thus creating an abasic site. Subsequent endonuclease activity at the lesion followed by error-prone DNA polymerase repair could in theory have been responsible for these G-to-T, G-to-C, and C-to-A mutations. Although deoxyuracil base excision repair could in theory generate a wider spectrum of base changes than C-to-T and G-to-A enabled by AID alone, it appears that diversication by this system is somewhat limited at WRCH/DGYW hotspot motifs in the zebrash model.
WRCH/DGYW mutations are highly prevalent in CDR3 regions

Using a pattern search algorithm (SMS), a total of 37 WRCH/RGYW motifs were identied in the V7 germline DNA (Fig. 4a). This number of AID hotspots is an overrepresentation (P < 001) of this motif from a distribution of the motif expected in a random DNA sequence of this length. As shown in Fig. 4b, the proportion of bases present within WRCH/DGYW motifs in the different FRs and CDRs of VL ranged from 39% (CDR3) to 48% (FR3). Despite this relatively narrow and consistent distribution range, the majority of the mutations observed (38/ 46) at WRCH/DGYW motifs were found in CDR regions (Fig. 4c), most notably CDR3 (31/46). These ndings suggest the possibility that either WRCH/DGYW hotspot motifs in CDR1 and CDR3 regions are preferentially targeted or that mutations in these CDRs are disproportionately selected during B-cell maturation in zebrash.
Replacement mutations are favoured within and outside AID hotspot motifs
Overall, 74% (68/93) of the VL base exchanges resulted in replacement mutations. Replacement (R) over silent (S) mutations were favoured in both the FR (26R; 11S) and CDR (42R; 14S) regions. When viewed in the context of WRCH/DGYW motifs, 89% (41/46) of the mutations at AID hotspots were replacements. To discern whether C/G targets of the AID motifs were more likely to be situated in either the rst or the second codon position and hence by location be predisposed towards replacements, WRCH/DGYW positioning within the open reading frame of the targeted germline VL was determined. This
Palindromic WRCH/DGYW motifs appear to be disproportionately targeted for mutation

To discern if specic WRCH/DGYW motifs were preferentially targeted, the distribution of mutations at each WRCH/DGYW motif was determined (Fig. 5). In total, 37 WRCH/DGYW sequence motifs were present in the targeted VL and, notably, at least one representative of each possible sequence combination was present (the number of each is shown in parentheses in Fig. 5). In addition, 12 of the motifs in this VL were AGCT or TGCA palindromes. Both AGCT and TGCA conform to WRCY and DGYW in both directions. Although WRCH/ WRCH palindromes represented only 32% (12/37) of the
247

(a) 5 6 8 7 6 5 481 349 467 (b) 393 421 424 31 (c) 2 5 3 3 2 FR1 CDR1 FR2 CDR2 FR3 Distribution of WRCH/DGYW AID hotspot motifs Percentage of bases within each region contained in motif Distribution of observed WRCH/DGYW mutations CDR3
Figure 4. Mutations at activation-induced cytidine deaminase (AID) hotspots are disproportionately concentrated at complementarity-determining region 3 (CDR3). (a) AID hotspot motifs (n = 37) are distributed across framework regions (FRs) and complementarity-determining regions (CDRs) of the targeted germline VL. (b) The proportion of bases contained in WRCH/DGYW motifs varies slightly between FRs and CDRs. (c) Despite different densities of WRCH/DGYW motifs (a, b), mutations within WRCH/DGYW motifs are highly biased to occurrence in CDR3.
AID
AID
WRCH/DGYW mutational targeting occurs on both DNA strands

Analyses of strand bias at AID hotspot motifs revealed that the majority (38/46) of the mutations at WRCH/ DGYW motifs could be explained by C targeting in the template strand. The remaining (8/46) C mutations at WRCH/DGYW were within the coding strand. Although it cannot be determined if each mutation was an independent mutational event or a product of selection and clonal expansions of B-cell clones harbouring the mutation, it can be concluded that within the AID hotspot motifs both strands are targeted for mutation. Chi-squared analyses of the WRCH/DGYW mutations in both strands revealed that only TGCA and AGCT motifs were signicant for G mutations in DGYW, and only TGCA and AGCT were signicant for WRCH motifs (P < 001). These results suggest that, when compared with all AID hotspots, either palindromic AID hotspot motifs are disproportionately targeted for mutation or the initial mutations at these motifs undergo positive selection.
WRCH W YGD
5-WRCH-3 /5-DGYW-3 AACA/TGTT (VL=2);0 AACC/GGTT (VL=3);1 AACT/AGTT (VL=5);3 AGCA/TGCT (VL=2);0 AGCC/GGCT (VL=1);0 TACA/TGTA (VL=3);0 TACC/GGTA (VL=3);1 TACT/AGTA (VL=3);0 TGCC/GGCA (VL=1);0 TGCT/AGCA (VL=2);0
W RCH H CRW
AID
5-WRCH-3 /5-WRCH-3 AGCT/AGCT (VL=6); 8 TGCA/TGCA (VL=6); 33
Palindromic AGCT and TGCA motifs conform to WRCY and RGYW in both directions. VL = 37 motifs 41/46 mutations in palindromes.
Figure 5. Somatic mutations are highly concentrated in palindromic activation-induced cytidine deaminase (AID) hotspot motifs. Within mammals, WRCH/DGYW motifs are considered the principal hotspot for AID-induced cytidine deamination in rearranged immunoglobulin during somatic hypermutation (SHM) whereas palindromic WRCY/WRCY are often targeted during class switch recombination (CSR). In the zebrash VL targeted, every WRCH/DGYW sequence was present (the number of occurrences is listed in parentheses and the mutation number within each is depicted in bold). Overall, VL mutations were disproportionately concentrated within palindromic AGCT/AGCT and TGCA/TGCA AID hotspot motifs.
Mutability indexes (MIs) reveal additional nonrandom nucleotide targeting
Mononucleotides
To determine whether additional nucleotides or combinations of adjacent nucleotides in the VL coding region were preferentially targeted for mutation, MIs were determined. Each index is a normalized measure that takes into account the fact that each base or group of bases does not occur at the same frequency over the VL region analysed. The MIs reported for mononucleotides (Table 3) show by index score that the nucleotides are preferentially mutated in the order G>C>T>A. Transversion to transition ratios at G, C, T and A were 010, 016, 023 and 044, indicating a strong preference for base exchanges resulting in transition mutations. Each of the four ratios
AID hotspot motifs in the targeted VL, they accounted for 89% (41/46) of the mutations observed across all WRCH/DGYW motifs. AGCT/AGCT palindromes were found in FR1, CDR1 and FR2 (each region had one palindromic motif) with the number of mutations at each motif being 2, 4 and 2, respectively. The TGCA/TGCA palindromes were present in FR2, FR3 and CDR3 (also each having one palindromic motif) with respective mutation numbers at these motifs being 1, 1 and 31. These results suggest either preferential targeting of somatic mutation to palindromic AID hotspot motifs or an increased selection for resultant mutations at CDRs.
248
SHM in zebrash
Table 3. Substitutions and mononucleotide mutability indexes in VL regions of zebrash immunoglobulin L (IgL) From A C G T Total
1
Substitution A C G T Mutability index1 Observed Expected (observed/expected)
2 (154)2 9 (692) 2 (143) 0 (0) 41 (911) 3 (67) 1 (48) 17 (810) 3 (142) 45 22 12
2 (154) 13 (100) 12 (857) 14 (100) 1 (22) 45 (100) 21 (100) 15 93
24 16 26 27 93
054 087 173** 076*
The mutability index is the observed number of mutations in a specic nucleotide divided by the expected number of mutations given a mechanism without bias. Expected numbers were derived by multiplying the frequency of the nucleotide over the region sequenced by the total number of mutations observed. A mutability index score of 10 would be assumed to represent non-biased sequence insensitive mutations. The number of VL nucleotides over the 55 VJ-C sequences was 14 355 (A = 3685; C = 2473; G = 4015; T = 4180) with 93 VL mutations. 2 Numbers in parentheses are substitution percentages for the indicated nucleotides. Chi-squared contrasts of observed and expected mutations with signicant differences. *P = 0005; **P = 001.
is signicantly different (P < 001) from the theoretical 2 : 1 transversion to transition ratio that would be expected if base exchanges were random and nondiscriminatory. Thus, mononucleotide analyses suggest that neither the nucleotides targeted nor the resultant patterns of substitution can be explained by assuming that somatic mutation events occur randomly in the zebrash VL.
Table 4. Dinucleotide mutability indexes in zebrash VL VL No. of Mutability index2 1 Dinucleotide germline mutations Expected (observed/expected) AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT 17 14 19 17 12 9 2 23 22 10 22 19 16 13 30 17 6 6 10 7 11 2 2 17 7 41 6 9 1 8 42 9 12 10 13 12 8 6 1 16 15 7 15 13 11 9 21 12 050 061 075 059 131 032 143 105 045 584** H 039 067 009** C 088 199** H 076
Dinucleotides
Dinucleotide mutability indexes (Table 4) were calculated to determine if mutations were concentrated to specic pairs of adjacent bases. Overall, dinucleotide MIs varied considerably from the observed/expected score of 10 which is assumed to represent random sequence insensitive mutation. Index scores greater than 10 are assumed to represent preferential targeting whereas scores less than 10 imply either an avoided target for mutation (a cold spot) or that mutations at these positions are selected against. Chi-squared analyses of mutations at dinucleotides showed statistically signicant targeting at GC and TG combinations. GC dinucleotides have also been found to be signicant targets of mutation in human19 and catsh20 VH regions. In the present study, 90% (9/10) of the GC dinucleotides in the zebrash germline VL were encoded within WRCH/DGYW motifs. Moreover, 98% (40/41) of the mutations at GC positions were in the target C positions of AID hotspot motifs. In contrast to the high mutability score for GC (MI = 584), the MI for TA was 009 (Table 4), which would suggest that TA dinucleotides are avoided targets for mutation (cold spots) or mutations at TA are selected against. The presence of both statistically signicant mutational hotspots and cold spots is also indicative of non-random nucleotide targeting in the zebrash model.
H, favoured somatic hypermutation (SHM) target or hotspot; C, avoided target or cold spot. 1 Number of times the dinucleotide is present in the targeted germline VL. 2 Mutability indexes were calculated as described in the legend of Table 3. **Statistically signicant by chi-squared test at P = 001.
Trinucleotides
When expanded to trinucleotides, MIs (Table 5) for GC targeting were highest and statistically signicant in GCA and TGC combinations whereas reduced targeting at TA remained consistent in TAC and TAA trinucleotides. In addition, GTG was also found to be a statistically signicant base combination for mutation and ATG was
249

Table 5. Trinucleotide mutability indexes in zebrash VL regions No. of mutations 2 1 6 2 4 2 2 2 3 10 2 5 0 0 0 2 3 2 8 3 0 0 0 8 0 0 0 3 1 5 12 1 3 4 1 2 39 1 0 9 5 0 2 1 0 0 35 6 0 1 0 3 3 1 Mutability index1 (observed/expected) 032 024 114 095 095 047 190 038 095 190* H 063 059 0 0* C 0** C 047 285* H 095 095 142 0 NA NA 108 0 NA NA 285* H 024 119 088 047 142 190 016 032 925** H 047 NA 213* H 043 NA 038 016 000 000 553** H 095 000 016 NA 041 095 032 Table 5. (Continued) Accession number TCG TCT TGA TGC TGG TGT TTA TTC TTG TTT
Trinucleotide AAA AAC AAG AAT ACA ACC ACG ACT AGA AGC AGG AGT ATA ATC ATG ATT CAA CAC CAG CAT CCA CCC CCG CCT CGA CGC CGG CGT CTA CTC CTG CTT GAA GAC GAG GAT GCA GCC GCG GCT GGA GGC GGG GGT GTA GTC GTG GTT TAA TAC TAG TAT TCA TCC
VL germline 6 4 5 2 4 4 1 5 3 5 3 8 1 5 7 4 1 2 8 2 2 0 0 7 1 0 0 1 4 4 13 2 2 2 6 6 4 2 0 4 11 0 5 6 5 2 6 6 3 6 0 7 3 3
Region1 1 7 7 5 14 4 6 3 3 5
Mutation2,3 0 6 3 34 11 0 3 1 2 6
Type4 000 081 041 645** H 075 000 047 032 063 114
Codon5
H, favoured somatic hypermutation (SHM) target or hotspot; C, avoided target or cold spot. 1 Mutability indexes were calculated as described in the legend of Table 3. Statistically signicant by chi-squared test at *P = 005; **P = 001. NA, not applicable.
identied as a coldspot. The GCA and TGC trinucleotides are both contained within DGYW/WRCH motifs and one of the highly mutated DGYW hotspots was immediately preceded by a G which accounts for GTG being statistically signicant. Thus, GCA, TGC, and GTG appear to be targets for somatic mutation because of their inclusion within the larger WRCH/DGYW hotspot motifs, whereas ATC and ATG appear to be mutational cold spots or trinucleotides in which mutations are selected against.
The impact of selection on somatic mutations

Regional accumulation of V mutations is a dening feature of antibody genes. In mammals, this tendency is believed to be attributable in large part to antigenic selection and clonal expansion. Mutations in FRs typically appear to be less tolerated, whereas mutations in CDRs provide the basis for the amino acid changes favoured during antigenic selection and afnity maturation. To address the possibility of antigenic selection, the distribution of R and S mutations within productive VJ-C cDNA sequences were compared. If selection were widespread, it is predicted that S mutations would be more abundant in FRs and R mutations would be favoured in CDRs. Of the 80 VL mutations in productive VJ-C sequences, 31 were in FRs (21R; 10S) and 49 were in CDRs (35R; 14S). When weighted as the number of R and S mutations per base sequenced, the percentages become 041% for FRs and 072% for CDRs. These percentages would appear to suggest that either FRs are less targeted or mutations within FRs are selected against. When ratios of R to S mutations in FRs and CDRs were compared using chisquared analyses, however, no signicant differences were
250
SHM in zebrash
found (P = 045). Multinomial distribution analyses of R and S mutations in productive VJ-C clones also did not reveal signicant P values for a scarcity of FR replacements or an excess of CDR replacements. These results could be interpreted to mean that selection may be somewhat restricted in the zebrash used in this study. Recent work by others20 has shown that, in addition to collective assessments of R and S mutations, analyses of productive rearrangements in the context of clonal lineages and subsequent radiations may be more relevant to understanding the potential impact of selection on somatic mutation. in some FRs. For several VJ-C clones, R mutations in FRs appear to be early in the mutational lineage. The presence of founder mutations carried in the clonal descendants implies that, although the mutation may change the afnity for antigen, it does not ablate the structural integrity of the B-cell receptor. Identication of lineages in the zebrash VJ-C clones is suggestive of both sustained and incremental mutational events and selection, both characteristics of afnity maturation responses seen in higher vertebrates.
Discussion
Zebrash are rapidly emerging as an important immunological model for biomedical research. In contrast to mice and human models, far less is known concerning potential nucleotide targets and mutational hotspots that may underpin immunoglobulin afnity maturation in this species. Previously, by aligning cDNA and expressed sequence tag (EST) sequences with genomic reference sequences, our laboratory had shown that somatic mutation can occur in the immunological light chains in zebrash.7 In the present study, by focusing on a single VL locus, we were able to obtain a sufcient quantity of VJ-C cDNA transcripts to identify specic nucleotide targets and patterns of SHM in this animal model. In mammals, SHM is typically characterized by the presence of single base substitutions, with few insertions or deletions. Single base substitutions and limited insertion or deletion were also dening characteristics in the zebrash VJ-C cDNA clones (Table 2). Given that insertions or deletions in non-triplet increments result in frameshifts, it is expected that B cells harbouring such mutations would be selected against, for without functional light chains the integrity of the BCR would be lost. Also similar to ndings in mice and humans,21 transition (T) mutations were more prevalent than transversions (V) (T:V ratio in zebrash VL = 564). The transition preference in zebrash is even more pronounced if only mutations in wobble positions of degenerate codons are considered (T:V ratio = 10; data in Table 2). Mutations at degenerate wobble positions may escape selection at the protein level as, regardless of the base exchanged, an identical amino acid would be encoded. Thus, mutations at wobble positions might offer a window into targeting and base change preferences of a mutational mechanism active in zebrash. However, selection based on nucleotides adjacent to wobble positions cannot be excluded. Of the 11 mutations in wobble positions of degenerate codons, base exchanges were from T (seven), A (three), or G (one) and none of these 11 exchanges was in WRCH/DGYW motifs. Similarly, if all non-WRCH/ DGYW VL mutations are included, A:T mutations represent 73% (34/47; Table 2) of mutations outside of AID
251
Mutations at consensus positions: potential founders in lineage radiation

Alignment of productive VJ-C sequences with both each other and the concordant targeted germline VL revealed that the 80 VL mutations localized to 44 positions of the germline sequence. This nding can be attributed to the fact that certain mutations, while present in a unique overall sequence, were common to more than one VJ-C clone. For example, the VL(259) GA transition within the DGYW motif in CDR3 appeared in several different unique VJ-C clones (Table 2). In total, 10 germline positions were found to be consensus positions for 60% (48/ 80) of the mutations among productive VJ-C clones. Moreover, 83% (40/48) of these mutations were at AID hotspot motifs with 92% (37/40) in CDRs. Collectively, these data suggest either repeated nucleotide targeting or the occurrence of clonal expansions in zebrash B cells, with the majority of the progenitor clones initially acquiring founder mutations at AID hotspot motifs in CDRs.
Clonal lineages and afnity maturation

Shared mutations in productive VJ-C rearrangements were used as consensus positions to construct potential lineages of clonal radiation. As illustrated in Fig. 5, several potential lineages could be discerned for the VJ-C cDNA clones. Some clonal sets were found to harbour two or more sequentially additive mutations (Fig. 6a), while others appeared to represent radiations from a common founder mutation in CDR3 (Fig. 6b). When mutations within each of the lineages were counted, R mutations (45/65) were more prevalent than S mutations (20/65). In addition, it was evident that, in at least two cases, one or more depicted paths could have given rise to a particular clone. For example, the single mutations harboured in clones EU821523 and EU797183 (Fig. 6a) are intermediates to successive additive mutations in the EU821496 and EU795319 clones, respectively. The EU821496 and EU795319 clones could have also descended from EU795306. In several clones, mutations were not restricted to CDRs, suggesting a tolerance of R mutations

(a)
EU821501 CDR1 (71)S EU821516
Progenitor VJ-C
EU797184 CDR2 (137) S
EU821521 CDR1 (71) S FR3 (181) R EU821523 CDR2 (124)R
EU797183 FR2 (82) R
EU795305 CDR2 (137) S FR3 (186) S
EU821511 FR1 (34) R CDR1 (71) S
EU795306 CDR3 (259) R
EU795319 FR2 (82) R FR2 (93) R CDR3 (259) R EU821500 FR2 (97) R CDR3 (259) R
(b)
EU821516
EU821496 CDR2 (124) R CDR3 (259) R
Progenitor VJ-C
EU821513 CDR3 (235) R CDR3 (259) R
EU821507 FR1 (34) R CDR1 (56) S CDR1 (71) S EU821498 FR3 (218) S CDR3 (259) R EU795304 CDR1 (50) S CDR2 (118) R CDR2 (158) R CDR3 (259) R EU795311 FR3 (196) R CDR3 (259) R
EU821518 FR1 (23) S FR2 (97) R CDR2 (129) R CDR3 (259) R
EU821505 CDR1 (65) S CDR3 (259) R
EU795306 CDR3 (259) R
EU821504 CDR2 (172) R CDR3 (259) R
EU821519 FR1 (27) R CDR3 (259) R
EU795310 CDR1 (64) S CDR3 (259) R
EU821508 CDR1 (44) S CDR3 (259) R
EU821520 FR1 (32) S FR3 (218) S FR3 (222) R CDR3 (259) R EU795323 FR1 (27) R CDR3 (259) R +7additional mutations
EU825204 FR3 (196) R FR3 (197) S CDR3 (259) R
EU795318 CDR2 (166) R CDR3 (259) R EU795316 FR2 (80) S CDR3 (259) R
EU797179 CDR1 (41) S CDR3 (259) R
Figure 6. Lineage relationships of VJ-C cDNA clones are consistent with the possibility of clonal expansion and afnity maturation in zebrash B cells. Individual VJ-C cDNA clones are depicted as circles by the accession number. A potential progenitor clone is depicted at the apex of each diagram. (a) Clones harbouring two or more additive mutations. (b) Clones radiating from a founder mutation in complementarity-determining region 3 (CDR3). In both diagrams, the progenitor VJ-C clone (accession no. EU821516) is identical in sequence to unmutated germline immunoglobulin L (IgL). Mutations listed for each VJ-C clone designate the location in framework regions (FRs) or CDRs and numbers in parentheses indicate the concordant germline position. Replacement mutations are indicated with an R while silent mutations are indicated with an S. Listings in italics are mutations at the C/G positions of WRCH/DGYW hotspot motifs. The directionality of arrow segments depicts sequential mutation accumulation in the radiation of clonal descendants.
hotspot motifs. Collectively, these ndings indicate that in zebrash a substantial number of mutations outside of WRCH/DGYW motifs in VL may be attributable to mutational mechanisms that target A:T base pairs. AID-dependent deamination of cytidine to uracil, in addition to producing mutations at C/G nucleotides, has also been shown to activate mismatch repair at U:G mismatches in mouse models. The mismatch repair proteins MSH2MSH6 have been found to bind U:G mismatches and in doing so can recruit a low-delity DNA polymerase called polymerase eta (g).22 Upon binding the MSH2MSH6 heterodimer, the catalytic activity of g is stimulated, allowing the polymerase to move more rapidly along the template DNA. Being a low-delity polymerase, g is prone to incorporate base substitutions preferentially at A:T positions downstream of the original U:G
252
lesion.23,24 Thus, in theory, the G:C and A:T mutations observed in zebrash VL could be largely dependent on the combined outcome of uracil-DNA glycosylase (UNG) and mismatch (MSH) repair pathways. Somatic mutations in mammalian immunoglobulin18 and the nurse shark antigen receptor, (NAR)25 are also proportionally distributed among G:C and A:T base pairs. However, in other vertebrates, including frogs,26 and the VH of shark IgM,27 G:C mutations are favoured. These ndings indicate mutational targeting of G:C and A:T pairs and subsequent repair strategies therein may occur at different capacities in different organisms. When individual zebrash VJ-C sequences are considered, 18 of the 55 clones contained mutations within a WRCH/DGYW and one or more mutations either upstream or downstream of the targeted AID hotspot. In
SHM in zebrash
total, the 18 clones harboured 29 VL mutations outside of a targeted WRCH/DGYW (data in Table 2). Of these, 72% (21/29) were at A:T positions and 28% (8/29) were at G:C base pairs. Most intriguing, however, is that 93% (27/29) of these mutations were downstream of the targeted WRCH. Based on these ndings, it is tempting to speculate that, similar to mammalian models, AID might target C:G at WRCH/DGYW in zebrash which in turn may activate orthologous MSH2MSH6 proteins to resultant U:G mismatches. If recruitment of a low-delity polymerase similar to mouse polymerase g were also to ensue, this could in theory account for the increased propensity for A:T mutations downstream of targeted AID hotspots. Most experimental evidence in mice and humans suggests that AID initiates deamination of cytidines in actively transcribed immunoglobulin genes.28,29 Recently, it was shown that in vitro nucleosomes prevent AID access, unless the immunoglobulin segment is being transcribed.30 Transcription is required for SHM in vivo, presumably in part to loosen the contact of nucleosomes with the DNA.31 During transcription, the single-stranded DNA (ssDNA) is prone to AID-mediated C to U conversion, producing U:G mismatches in the DNA. U:G mismatches cause modest distortions in the DNA which may in turn activate a suite of DNA repair mechanisms involving DNA glycosylases, general mismatch repair factors, and a variety of error-prone polymerases.32 Alternatively, if left unrepaired, the U:G mismatches become xed as CT mutations in replicated DNA as a result of the lack of discrimination by DNA polymerases between U and T in the template strand during DNA replication. In the zebrash VL, the majority (41/46) of the cytidine mutations found at WRCH/DGYW hotspot motifs were CT transitions. This nding suggests that uracil glycosylase-mediated DNA repair may be somewhat limited in the zebrash VL regions. For example, if uracil residues in U:G mismatches were substrates for base excision repair, it would seem likely that hydrolysis of the glycosidic bond between U and deoxyribose and subsequent endonucleolytic cleavage of the sugar would result in an abasic site. Polymerases involved in base excision DNA repair are generally more error-prone and their activity over an abasic lesion brought about by U removal from U:G mismatches would be predicted to result in CA/G/T mutations. The precise preference for each type of substitution would in large part be reliant on the polymerase involved. Given that two of the three possible substitutions are transversions, it seems that a predominance of transversions would be apparent if base excision repair was extensive at U:G mismatches created at AID motifs.33 The statistically signicant preference (P < 001) for transition over transversion mutations both within and outside AID motifs in the zebrash VL implies a tendency either for mutations at AID hotspots to escape DNA repair or for selective pressures to maintain transition mutations once they become xed in the B-cell genome. In mammals, it has been suggested that AID-induced mutagenesis saturates the overall repair capacity of B cells.34 If AID mutagenesis were to saturate uracil glycosylase capacities in zebrash B cells, this might in part explain why the majority of mutations at AID hotspots in zebrash VL were CT. Conversely, if mismatch repair pathways were not as saturated, this could also explain the increased capacity for A:T mutations downstream of targeted at AID hotspots. Although the balance between saturation of repair mechanisms and toleration of mutation remains largely unknown, it appears that exibility in this balance would result in an increased capacity to generate mutational diversity within immunoglobulin gene segments. Uracil glycosylase base excision and mismatch repair systems are evolutionarily ancient mechanisms for DNA repair thought to exist in all prokaryotes and eukaryotes. The utilization of these repair mechanisms in vertebrates to generate additional diversity within immunoglobulin gene segments is an area of research that has only recently begun to be explored. The discovery just over a decade ago35 that AID is responsible for both SHM and class switch recombination (CSR) dramatically enhanced the possibility for obtaining an in-depth understanding of the mechanistic processes underlying adaptive immunity in vertebrates. It had long been thought that the uracil in DNA was an adverse condition arising from inappropriate incorporation of dUTP during replication or spontaneous deamination of cytosine.36 It is becoming increasingly apparent, however, that nature incorporates uracil into DNA as a central mediator of adaptive immunity and as a strategy against certain viruses during innate responses.37,38 Thus, uracil incorporation, once thought to be solely a mutagenic burden, has been revealed as a mechanism to modify immunoglobulin DNA in B cells for diversity or even non-self DNA for degradation. An orthologue of AID has been identied in zebrash and its expression in mammalian cells in vitro has been shown to induce both CSR and SHM.39,40 In the present study, the patterns revealed for in vivo mutations in zebrash VL strongly suggest that AID and uracil incorporation are utilized as a means to diversify immunoglobulin diversity in the zebrash model. Despite drastically different outcomes for SHM and CSR (point mutations versus large-scale deletions) and functionally distinct target sequences (VH/L exons versus switch regions), SHM and CSR are both contingent upon the B-cell specic AID enzyme and single-strand templates brought about by transcription. Point mutations similar to those at VH WRCH sequences have also been found at the WRCH within switch regions in mice, suggesting a common AID targeting method for both SHM and CSR.41,42 Given that SHM has been found in all vertebrates including sh, whereas CSR appears limited to
253

amphibians, birds and mammals4347, it appears that SHM evolved earlier than CSR. In mammals, immunoglobulin switch regions appear to have further diverged to incorporate additional features, such as R-loop forming ability and cis-acting regulatory regions, which may serve to increase the overall efciency of CSR.48 It is also plausible that gene conversion (GCV) could be implicated in the non-WRCH/DGYW mutations found in the zebrash IgL. Given that diversication of immunoglobulin by gene conversion has only been shown to exist in chickens49,50 and rabbits,51 a comparative and evolutionary approach may prove necessary to elucidate additional currently unresolved aspects of SHM, CSR and GCV in vertebrates. The present study demonstrates a strong dependence of SHM on cytidine targeting in zebrash and reveals several important parameters, including strand biases and the direction in which mutations spread from a C to downstream A:T base pairs. In addition, the results reveal asymmetry in the AID hotspot motifs targeted with preferential targeting at palindromic WRCH sequences. These results combined with ndings of distinct mutational hotspots, cold spots, and additive mutations are indications that SHM is not a random process in zebrash. The presence of clonal lineages is also a strong indicator that AID deamination, mutation repair and afnity maturation may be crucial in shaping the somatic diversication of IgL in zebrash. Based on these conserved and unique features it is probable that the zebrash will prove to be an especially useful emerging new vertebrate model for understanding the role of immunoglobulin diversity in immune system development, function and disease.
6 Weinstein JA, Jiang N, White RA, Fisher DS, Quake SR. High-throughput sequencing of the zebrash antibody repertoire. Science 2009; 324:80710. 7 Zimmerman AM, Yeo G, Howe K, Maddox BJ, Steiner LA. Immunoglobulin light chain (IgL) genes in zebrash: genomic congurations and inversional rearrangements between (VL-JL-CL) gene clusters. Dev Comp Immunol 2008; 32:42134. 8 Ramsden DA, Wu GE. Mouse kappa light-chain recombination signal sequences mediate recombination more frequently than do those of lambda light chain. PNAS 1991; 88:107215. 9 Westereld MA. The Zebrash Book: A Guide for the Laboratory Use of Zebrash. Eugene, Oregon: University of Oregon Press, 1995. 10 Campanella JJ, Bitincka L, Smalley J. MatGAT: an application that generates similarity/ identitiy matrices using protein or DNA sequences. BMC Bioinformatics 2003; 4:14. 11 Giudicelli V, Chaume D, Lefranc MP. IMGT/V-QUEST, an integrated software program for immunoglobulin and T cell receptor V-J and V-D-J rearrangement analysis. Nucleic Acids Res 2004; 32:43540. 12 Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specic gap penalties and weight matrix choice. Nucleic Acids Res 1994; 22:467380. 13 Kabat ER, Wu TT, Perry HM, Gottesman KS, Foeller C. Sequences of Proteins of Immunological Interest. Bethesda MD: National Institutes of Health, 1991. 14 Shapiro GS, Wysocki LJ. DNA target motifs of somatic mutagenesis in antibody genes. Crit Rev Immunol 2002; 22:183200. 15 Stothard P. The Sequence Manipulation Suite: JavaScript programs for analyzing and formatting protein and DNA sequences. BioTechniques 2000; 28:11024. 16 Lossos IS, Tibshirani R, Narasimhan B, Levy R. The inference of antigen selection on Ig genes. J Immunol 2000; 165:51226. 17 Zheng NY, Wilson K, Jared M, Wilson PC. Intricate targeting of immunoglobulin somatic hypermutation maximizes the efciency of afnity maturation. J Exp Med 2005; 201:146778. 18 Rogozin IB, Diaz M. Cutting edge: DGYW/WRCH is a better predictor of mutability at G:C bases in Ig hypermutation than the widely accepted RGYW/WRCY motif and probably reects a two-step activation-induced cytidine deaminase-triggered process. J Immunol 2004; 172:33824. 19 Shapiro GS, Aviszus K, Ikle D, Wysocki LJ. Predicting regional mutability in antibody V genes based solely on di- and trinucleotide sequence composition. J Immunol 1999; 163:25968. 20 Yang F, Waldbieser G, Lobb C. The nucleotide targets of somatic mutation and the role of selection in immunoglobulin heavy chains. J Immunol 2006; 176:165567. 21 Messmer BT, Albesiano E, Messmer D, Chiorazzi N. The pattern and distribution of immunoglobulin VH gene mutations in chronic lymphocytic leukemia B cells are consistent with the canonical somatic hypermutation process. Blood 2004; 9:34905. 22 Wilson TM, Vaisman A, Martomo SA et al. MSH2-MSH6 stimulates DNA polymerase eta, suggesting a role for A:T mutations in antibody genes. J Exp Med 2005; 201:637 45. 23 Masuda K, Ouchida R, Takeuchi A et al. DNA polymerase theta contributes to the generation of C/G mutations during somatic hypermutation of Ig genes. PNAS 2005; 102:1398691. 24 Schanz S, Castor D, Fischer F, Jiricny J. Interference of mismatch and base excision repair during the processing of adjacent U/G mispairs may play a key role in somatic hypermutation. PNAS 2009; 106:55938. 25 Diaz M, Velez J, Singh M, Cerny J, Flajnik MF. Mutational pattern of the nurse shark antigen receptor gene (NAR) is similar to that of mammalian Ig genes and to spontaneous mutations in evolution: the translesion synthesis model of somatic hypermutation. Int Immunol 1999; 11:82533. 26 Wilson M, Marcuz A, Du Pasquier L. Somatic mutations during an immune response in Xenopus tadpoles. Dev Immunol 1995; 4:22734. 27 Hinds-Frey KR, Nishikata H, Litman RT, Litman GW. Somatic variation precedes extensive diversication of germline sequences and combinatorial joining in the evolution of immunoglobulin heavy chain diversity. J Exp Med 1993; 178:81524. 28 Chaudhuri J, Tian M, Khuong C, Chua K, Pinaud E, Alt FW. Transcription-targeted DNA deamination by the AID antibody diversication enzyme. Nature 2003; 422:726 30. 29 Bachl J, Ertongur I, Jungnickel B. Involvement of Rad18 in somatic hypermutation. PNAS 2006; 103:120816. 30 Shen HM, Poirier MG, Allen MJ, North J, Lal R, Widom J, Storb U. The activationinduced cytidine deaminase (AID) efeciently targets DNA in nucletosomes but only during transcription. J Exp Med 2009; 206:105771. 31 Storb U, Shen HM, Nicolae D. Somatic hypermutation: processivity of the cytosine deaminase AID and error-free repair of the resulting uracils. Cell Cycle 2009; 8:3097 101. 32 Martoma SA, Gearhart PJ. Somatic hypermutation: subverted DNA repair. Curr Opin Immunol 2006; 18:2438.
Acknowledgement
This work was supported in part by grants from the PhRMA Foundation and the National Science Foundation.
Disclosures
The authors have no conicts of interest to disclose.
References
1 Weigert MG, Cesari IM, Yonkovich SJ, Cohn M. Variability in the lambda light chain sequences of mouse antibody. Nature 1970; 228:10457. 2 Harris RS, Kong Q, Maizels N. Somatic hypermutation and the three Rs: repair, replication and recombination. Mutat Res 1999; 436:15778. 3 Maizels N, Scharff MD. Molecular mechanisms of hypermutation. In: Neuberger M, Honjo T, Alt FW, eds. Molecular Biology of B Cells. New York: Academic Press, 2004:32738. 4 McKean D, Huppi K, Bell M, Staudt L, Gerhard W, Weigert M. Generation of antibody diversity in the immune response of BALB/c mice to inuenza virus hemagglutinin. PNAS 1984; 81:31804. 5 Danilova N, Bussmann J, Jekosch K, Steiner LA. The immunoglobulin heavy-chain locus in zebrash: identication and expression of a previously unknown isotype, immunoglobulin Z. Nat Immunol 2005; 6:295302.
254
SHM in zebrash
33 Di Noia J, Neuberger M. Altering the pathway of immunoglobulin hypermutation by inhibiting uracil-DNA glycosylase. Nature 2002; 419:438. 34 Liu M, Duke JL, Richter DJ, Vinusesa CG, Goodnow CC, Kleinstein SH, Schatz DG. Two levels of protection for the B cell genome during somatic hypermutation. Nature 2008; 451:8416. 35 Muramatsu M, Kinoshita K, Fagarasan S, Yamada S, Shinkai Y, Honjo T. Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell 2000; 102:55363. 36 Visnes T, Doseth B, Pettersen HS et al. Uracil in DNA and its processing by different DNA glycosylases. Philos Trans R Soc Lond B Biol Sci 2009; 364:5638. 37 Sousa MM, Krokan HE, Sluppphaug G. DNA-uracil and human pathology. Mol Aspects Med 2007; 28:276306. 38 Chelico L, Pham P, Petruska J, Goodman MF. Biochemical basis of immunological and retroviral responses to DNA-targeted cytosine deamination by activation-induced cytidine deaminase and APOBEC3G. J Biol Chem 2009; 41:277615. 39 Barreto BM, Pan-Hammarstrom Q, Zhao Y, Hammarstrom L, Misulovin Z, Nussenzweig MC. AID from bony sh catalyzes class switch recombination. J Exp Med 2005; 202:7338. 40 Wakae K, Magor BG, Saunders H, Nagaoka H, Kawamura A, Kinoshita K, Honjo T, Muramatsu M. Evolution of class switch recombination function in sh activationinduced cytidine deaminase, AID. Int Immunol 2006; 18:417. 41 Nagaoka H, Muramatsu M, Yamamura N, Kinoshita K, Honjo T. Activation-induced deaminase (AID)-directed hypermutation in the immunoglobulin Sl region: implication of AID involvement in a common step of class switch recombination and somatic hypermutation. J Exp Med 2002; 195:52934. 42 Zeng Z, Negrete GA, Kasmer C, Yang WW, Gearhart PJ. Absence of DNA polymerase {eta} reveals targeting of C mutations on the non-transcribed strand in immunoglobulin swith regions. J Exp Med 2004; 199:91724. 43 Lundqvist ML, Pilstrom L. Variability of the immunoglobulin light chain in the Siberian sturgeon, Acipenser baeri. Dev Comp Immunol 1999; 23:60715. 44 Du Pasquier L. The immune system of invertebrates and vertebrates. Comp Biochem Physiol B Biochem Mol Biol 2001; 129:115. 45 Flajnik MF. Comparative analyses of immunoglobulin genes: surprises and portents. Nat Rev Immunol 2002; 2:68898. 46 Cannon JP, Haire RN, Rast JP, Litman GW. The phylogenetic origins of the antigenbinding receptors and somatic diversication mechanisms. Immunol Rev 2004; 200:12 22. 47 Bengten E, Quiniou S, Hikima J, Waldbieser G, Warr GW, Miller NW, Wilson M. Structure of the catsh IGH locus: analysis of the region including the single functional IgHM gene. Immunogenetics 2006; 58:83144. 48 Zarrin AA, Alt FW, Chaudhuri J, Stokes N, Kaushal D, Du Pasquier L, Tian M. An evolutionarily conserved target motif for immunoglobulin class-switch recombination. Nat Immunol 2004; 5:127581. 49 Reynaud CA, Anquez V, Grimal H, Weill J. A hyperconversion mechanism generates the chicken light chain pre-immune repertoire. Cell 1987; 48:37988. 50 Kohzaki M, Nishihara K, Hirota K et al. DNA polymerases v and h are required for efcient immunoglobulin V gene diversication in chicken. J Cell Biol 2010; 189:111727. 51 Mage RG, Lanning D, Knight KL. B cell and antibody repertoire development in rabbits: the requirement of gut-associated lymphoid tissues. Dev Comp Immunol 2006; 30:13753.
255

Targets of Somatic Hypermutation Within Immunoglobulin Light Chain Genes in Zebrafish

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Targets of Somatic Hypermutation Within Immunoglobulin Light Chain Genes in Zebrafish

Cargado por

Copyright:

Formatos disponibles

IMMUNOLOGY

Targets of somatic hypermutation within immunoglobulin light chain genes in zebrash

Alexis E. Marianes* and Anastasia M. Zimmerman

Materials and methods

The zebrash V7 locus

BX927234 500 kb 923 kb

A. E. Marianes and A. M. Zimmerman

Animals and RNA extraction

cDNA synthesis, library construction and cloning of VJ-C rearrangements

Random hexamers Oligo-dT(20-VN)1 Oligo-dT adapter 30 RACE adapter

Calculation of mutability indexes

Tests for genomic contamination

Taq polymerase delity assay

A. E. Marianes and A. M. Zimmerman

EU795305 EU795306 EU795308 EU795309

EU797180 EU797181 EU797183 EU797184 EU797185

EU797186 EU821496 EU821497 EU821498 EU821500 EU821501 EU821502 EU821503 EU821504

EU795315 EU795316 EU795318 EU795319

EU821508 EU821510 EU821511 EU821512 EU821513

EU795321 EU795322 EU795323

EU795324 EU795326 EU795328

EU821521 EU821522 EU821523 EU825202 EU825203 EU825204 EU821516***

Distribution of VL, JL and CL mutations

Absence of N and P addition

Mutational bias towards single base transitions in zebrash VL

A. E. Marianes and A. M. Zimmerman

Mutation frequency WRCH/DGYW Palindromic AID hotspot

Mutations are concentrated at WRCH/DGYW hotspot motifs

Number of AID hotspot motifs

Uracil glycosylase DNA repair appears to be limited at AID hotspots

WRCH/DGYW mutations are highly prevalent in CDR3 regions

Palindromic WRCH/DGYW motifs appear to be disproportionately targeted for mutation

A. E. Marianes and A. M. Zimmerman

WRCH/DGYW mutational targeting occurs on both DNA strands

Mutability indexes (MIs) reveal additional nonrandom nucleotide targeting

Substitution A C G T Mutability index1 Observed Expected (observed/expected)

2 (154)2 9 (692) 2 (143) 0 (0) 41 (911) 3 (67) 1 (48) 17 (810) 3 (142) 45 22 12

2 (154) 13 (100) 12 (857) 14 (100) 1 (22) 45 (100) 21 (100) 15 93

054 087 173** 076*

A. E. Marianes and A. M. Zimmerman

The impact of selection on somatic mutations

Mutations at consensus positions: potential founders in lineage radiation

Clonal lineages and afnity maturation

A. E. Marianes and A. M. Zimmerman

EU797184 CDR2 (137) S

EU821521 CDR1 (71) S FR3 (181) R EU821523 CDR2 (124)R

EU797183 FR2 (82) R

EU795305 CDR2 (137) S FR3 (186) S

EU821511 FR1 (34) R CDR1 (71) S

EU795306 CDR3 (259) R

EU821496 CDR2 (124) R CDR3 (259) R

EU821513 CDR3 (235) R CDR3 (259) R

EU821518 FR1 (23) S FR2 (97) R CDR2 (129) R CDR3 (259) R

EU821505 CDR1 (65) S CDR3 (259) R

EU795306 CDR3 (259) R

EU821504 CDR2 (172) R CDR3 (259) R

EU821519 FR1 (27) R CDR3 (259) R

EU795310 CDR1 (64) S CDR3 (259) R

EU821508 CDR1 (44) S CDR3 (259) R

EU825204 FR3 (196) R FR3 (197) S CDR3 (259) R

EU797179 CDR1 (41) S CDR3 (259) R

A. E. Marianes and A. M. Zimmerman