Documentos de Académico
Documentos de Profesional
Documentos de Cultura
ORIGINAL ARTICLE
Summary In mammals, somatic hypermutation (SHM) of immunoglobulin (Ig) genes is critical for the generation of high-afnity antibodies and effective immune responses. Knowledge of sequence-specic biases in the targeting of somatic mutations can be useful for studies aimed at understanding antibody repertoires produced in response to infections, B-cell neoplasms, or autoimmune disease. To evaluate potential nucleotide targets of somatic mutation in zebrash (Danio rerio), an enriched IgL cDNA library was constructed and > 250 randomly selected clones were sequenced and analysed. In total, 55 unique VJ-C sequences were identied encoding a total of 125 mutations. Mutations were most prevalent in VL with a bias towards single base transitions and increased mutation in the complementarity-determining regions (CDRs). Overall, mutations were overrepresented at WRCH/DGYW motifs suggestive of activationinduced cytidine deaminase (AID) targeting which is common in mice and humans. In contrast to mammalian models, N and P addition was not observed and mutations at AID hotspots were largely restricted to palindromic WRCH/DGYW motifs. Mutability indexes for di- and trinucleotide combinations conrmed C/G targets within WRCH/DGYW motifs to be statistically signicant mutational hotspots and showed trinucleotides ATC and ATG to be mutation coldspots. Additive mutations in VJ-C sequences revealed patterns of clonal expansion consistent with afnity maturation responses seen in higher vertebrates. Taken together, the data reveal specic nucleotide targets of SHM in zebrash and suggest that AID and afnity maturation contribute to antibody diversication in this emerging immunological model. Keywords: immunoglobulin; somatic hypermutation; zebrash
doi:10.1111/j.1365-2567.2010.03358.x Received 18 May 2010; revised 18 August 2010; accepted 20 August 2010. *Present address: Department of Molecular, Cell, and Developmental Biology, Johns Hopkins University, Baltimore, MD, USA. Correspondence: A. M. Zimmerman, Department of Biology, College of Charleston, 66 George Street, Charleston, SC 29424, USA. Email: zimmermana@cofc.edu Senior author: Anastasia M. Zimmerman
Introduction
A hallmark of the adaptive immune system is the capacity to mount a heightened memory response to pathogens encountered upon sustained or recurrent infection. Within mammalian models, an integral part of this protection stems from the ability of the immune system to ne-tune its diverse repertoire of antigen receptors over time through mutation and selection.1 In the case of B cells and antibody responses, initial receptor diversity is created by V(D)J rearrangement of immunoglobulin gene segments to form functional immunoglobulin genes. The modular nature of the immunoglobulin segments, imprecise joining, and addition of nucleotides at V(D)J junc240
tions generate an initial repertoire of nave B cells with membrane-bound antigen receptors (B-cell receptors). When a pathogen triggers an immune response, B cells with specicity to antigen rapidly relocate to the T/B-cell interface of lymphoid tissues where they are stimulated to proliferate through interactions with T helper (Th) cells and appropriate cytokines. The cellular outcome of the resultant daughter cells is thought to bifurcate down one of two pathways. Cells either terminally differentiate into post-mitotic short-lived antibody-producing plasma cells or inltrate germinal centres of lymphoid follicles or other comparable structures and undergo massive and rapid clonal expansion. During this expansion, B cells can be subject to somatic hypermutation (SHM) in
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
SHM in zebrash
which point mutations are introduced into variable (V) regions of rearranged immunoglobulin DNA. This targeted mutation is thought to be largely responsible for generating subsets of B cells with slightly different afnities to the antigen. In mice and humans, the mutation rate in V regions is estimated to be quite high, at 10)3 per base per generation, that is, 106-fold higher than the rate of background mutations in DNA.2,3 In mammals, mutations in rearranged immunoglobulin gene segments have been found to cluster in CDRs of VH and VL, which together encode the regions of the antibody most likely to inuence afnity for antigen. Following hypermutation, B cells encoding receptors with greater specicity for antigen can undergo positive selection4 through a process referred to as afnity maturation. Selected B cells may further differentiate into either longlived plasma cells or memory B cells which contribute to enhanced antibody responses upon reinfection by pathogen. At present, bony shes represent one of the most ancestral groups of animals known to produce antibodies for which high-quality genome sequences have become available. The availability of genomic sequences has recently facilitated genome-wide annotations of the germline segments available for V(D)J rearrangements. Prior to such annotations, the repertoire of immunoglobulin segments available for rearrangements was often unknown, making alignment of expressed transcripts with concordant genomic segments difcult to ascertain. Recently, using a full annotation of the zebrash (Danio rerio) IgH locus,5 Weinstein et al.6 analysed an expressed repertoire of immunoglobulin H (IgH) gene segments in zebrash to determine VH family usage. Previous work in our laboratory7 coupled genomic annotation with expression data to reveal VJ-C expression of IgL loci from ve different chromosomes in zebrash. In the present study, we constructed an enriched cDNA library of zebrash IgL in order to generate a sufcient quantity of VJ-C transcripts with which to ascertain potential nucleotide targets of somatic hypermutation. Collectively, the data presented
BX511206 BX649562 BX571825 (BAC zK158E13) BX914202 CT009671 CR384077 BX640456
V1 J1C1 V2 V3P J2C2P V4 V5 J3C3 V6P J4C4 V7 J5C5 V8P
herein reveal several patterns of mutation in the IgL of zebrash and extend understanding of the processes underlying the generation of antibody diversity in this emerging immunological model.
100 kb
Figure 1. Targeted zebrash immunoglobulin L (IgL) germline reference sequence. The existing genomic annotation of zebrash IgL was extended by assembling overlapping BAC inserts. BACs with overlapping end reads were prioritized for sequencing at the Sanger Center (http:// www.sanger.ac.uk/Projects/D_rerio). The 923-kb contiguous chromosomal tiling path was manually assembled using the Artemis Annotation Software package. The IgL gene segments targeted are clustered in a single region on zebrash chromosome 19. Immunoglobulin segments at this locus are divergent, enabling alignment of cDNA sequences with germline sequences to determine patterns of somatic hypermutation (SHM). Each BAC is designated by its corresponding NCBI accession number and drawn approximately to scale, while the IgL locus is expanded with exon sizes exaggerated.
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
241
DNA sequencing
Clones with inserts were sequenced bi-directionally using universal M13 forward and M13 reverse primers at the Clemson University Genomics Institute (Clemson, SC). Plasmid vectors and PCR primers were trimmed from sequences, and overall sequence quality and automated sequencing calls were veried by inspection of each sequence chromatogram. In total, 220 VJ-C clones were identied to have high-quality forward and reverse complement sequencing reads.
Table 1. Primers and polymerase chain reaction (PCR) or reverse transcription conditions Targeted transcript V7 C1/C2/C3/C4/C5 C5 EF1a Primer 50 -TGACTGTAGTGACTCAGAGTCC-30 50 -GCTCAGGCTGCTGCTCCAGC-30 50 -TGTACAGTCCATCCTC-30 FWD: 50 -CCTGGTGACAACGTTGGCTT-30 RVS: 50 -GAACGGTGTGATTGAGGGAA-30 Invitrogen (product no. 48190-011) Invitrogen (product no. 12577-011) 50 -GCGAGCACAGAATTAATACGACT-30 50 -GCGAGCACAGAATAATACGACTCACTATAGG(dT)-30 Conditions 5 min at 94; 30 cycles (30 seconds at 95, 30 seconds at 5055 and 60 seconds at 72); 10 min at 72 4 min at 94; 30 cycles (30 seconds at 95, 30 seconds at 56, 60 seconds at 72 and 10 min at 72); 10 min at 72 10 min at 25 and 30 min at 42 30 min at 42 1 hr at 42 3 min at 94; 35 cycles (30 seconds at 94, 30 seconds at 60 and 30 seconds at 72); 7 min at 72
Oligo-dT(20-VN) is mixture of 12 primers, each a string of 20-dT residues followed by two additional variable nucleotides (VNs). The VN anchor targets primer annealing at the 50 end of the poly(A) tail. EF1a, elongation factor 1a; RACE, rapid amplication cDNA ends.
242
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
SHM in zebrash
Alignment of expressed VJ-C sequences to genomic regions
The 220 VJ-C sequences were compared against the nonredundant nucleotide database at national center biological information (NCBI) using the megaBLAST algorithm. Sequences were subsequently compared against VL, JL, and CL identied7 in zebrash using the Matrix Global Alignment Tool.10 All of the 220 VJ-C clones had highest identity to the targeted V7 gene segment with at least 95% identity. The stringent 95% requirement of IMGT/ V-QUEST11 was employed, as the existence of additional VL cannot be ruled out from the genome. Given that the per cent variability in nucleotide sequences of identied zebrash VL ranges from 43 to 93% overall, with VL on Chr 19 (V1V8) ranging from 495 to 836%, a 95% criterion is suitably rigorous. The resultant 220 VJ-C sequences were aligned to germline IgL segments using CLUSTALW,12 and CDRs and frameworks (FRs) were dened using the rules of Kabat.13 In total, from the 220 VJ-C clones, 55 unique sequences containing a total of 125 mutations from concordant germline immunoglobulin segments were identied. Because any somatic mutation could in theory be carried during the clonal expansion of a single B cell or be amplication of the same transcript by reverse transcriptase (RT)-PCR, identical VJ-C sequences were deemed to represent a single B-cell population and therefore counted only once in the mutational analyses. In addition, as mutations can be additive if resultant B cells are derived from a common founder, the 55 unique VJ-C sequences were scored for mutations exclusive for that clone and with all mutations included. Unique VJ-C sequences were submitted to NCBI (accession numbers in Table 2). in the cDNA library construction. Products were run out on agarose gels, bands were puried, amplicons were cloned into pCR21 cloning vectors, and plasmids were transformed into TOP10 cells (Invitrogen). Twelve subclones were randomly selected for bi-directional sequencing and no mismatches were identied in 11 280 resultant bases. Thus, the PCR amplication error rate can be considered negligible and few, if any, of the base pair changes found in the VJ-C cDNA clones warrant being ascribed to PCR or sequencing errors.
Statistical analyses
Chi-squared analyses of mono, di- and trinucleotide mutability indexes were carried out by contrasting observed mutational frequencies to their expected (nonbiased) mutational frequencies. For mutability indexes, P values < 001 were considered statistically signicant. In cases where mutations could be assigned to different di- or tri-nucleotide targets, Bonferroni corrections were applied. Chi-squared analyses were also performed to
243
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
Region1
Mutation2,3
Type4
Codon5
EU795310 EU795311
EU795313 EU795314
EU821505
EU821507
EU795320
EU821518
EU821519
244
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
SHM in zebrash
Table 2. (Continued) Accession number EU821520
Region1 VL-FR1 (32) VL-FR3 (218) VL-FR3 (222) VL-CDR3 (259) VL-CDR1 (71) VL-FR3 (181) VL-CDR3 (259) JL (29) VL-CDR2 (124) CL (95) CL (152) VL-CDR2 (125) VL-FR3 (196,197) VL-CDR3 (259)
Mutation2,3 ATCATT GCAGCG GTTTTT TGCTAC AGCAGT TTTTCT TGCTAC GTTATT AGTAAT TTTTT T, insertion AGTAGC CGTCAC TGCTAC
Type4 T T* V T T T T T T NA NA T T, T T
Codon5 Silent (Ile) Silent (Ala) ValPhe CysTyr Silent (Ser) PheSer CysTyr ValIle SerAsn Frameshift Frameshift Silent (Ser) ArgHis CysTyr
as the number of mutations per base sequenced in each gene segment region, the percentages became 065, 051 and 022. These results show that mutations are concentrated in the VL regions in terms of both higher overall numbers and higher density when compared with the JL and CL regions.
VL, variable region; JL, joining region; FR, frame work region; CDR, complementarity-determining region. 2 Mutated nucleotides are underlined, and depicted within triplet bases of codons. 3 Bases shaded correspond to targeting of the G and C nucleotides of DGYW/WRCH** AID hotspot motifs. 4 Transition (T) or transversion (V) base mutations. 5 Amino acids resulting in a change in side-chain polarity are shown in italics. *Designates mutation in the wobble position of a degenerate codon. For example, in the third position of glycine codons (GGA, GGC, GGG and GGT) all nucleotide substitutions are synonymous (do not change the amino acid). **DGYW (AGT/G/CT/AT); WRCH (AT/GA/C/TAC). ***Clone with no mutations from germline gene segments.
evaluate mutational frequencies and distributions of each WRCH/DGYW motif as reported within the results. Statistical analyses of antigen selection pressure on FR and CDR regions were carried out using the multinomial distribution model of Lossos et al.16 which is presently available as a JAVA applet at http://www-stat.stanford.edu/ immunoglobulin. For multinomial distributions, an excess of CDR replacements or scarcity of FR replacements was judged signicant at P < 005.
Results
Somatic mutation occurs within VL, JL and CL encoded regions of zebrash IgL
Alignments of 207 682 VJ-C encoded nucleotides with concordant germline gene segments revealed 125 mutations over 55 unique VJ-C cDNA sequences (Table 2). The majority of mutations were found in VL; however, the JL and CL regions were also found to have mutations. The percentage of total mutations in the VL, JL and CL regions was 75, 8 and 17%, respectively. When weighted
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
30 25 20 15 10 05 00
CDR1
CDR2
CDR3
8 7 6 5 4 3 2 1 0
Figure 2. Distribution of mutation frequency and activation-induced cytidine deaminase (AID) WRCH/DGYW hotspot motifs in targeted zebrash immunoglobulin L (IgL). The combined frequency of mutations (primary vertical axis) per base sequenced for 55 unique VJ-C cDNA sequences plotted against position number (horizontal axis) reveals variability in VL, whereas mutations in CL were infrequent. The additive number of WRCH/DGYW hotspot motifs (secondary vertical axis) in the targeted germline IgL when plotted against position number shows an apparent trend of increasing mutation at AID hotspots. Within mammals, WRCH/DGYW motifs have been found to be the principal hotspot for AID-induced G:U lesions in rearranged immunoglobulin genes during somatic hypermutation. The bi-directional arrows show locations of palindromic WRCH/DGYW hotspot motifs which coincide with the highest mutation frequencies of the zebrash IgL. The lower diagram depicts locations of VL, JL and CL regions with respect to overall distribution of the VJ-C sequence and the complementarity-determining regions (CDRs) are depicted in the graph plot area.
substitutions and one was a single base deletion (EU797185). As insertions or deletions in non-triplet increments result in frameshifts, it is not surprising that their occurrence was rare in the VJ-C cDNA clones. B cells with non-productive VL caused by frameshifts would inherently lack functional B cell receptor (BCR) and probably be selected against. Of the 93 VL single base substitutions, base transitions (79; 85%) were more abundant than transversions (14; 15%). The transition to transversion ratio of VL mutations was 564, which is considerably higher than the theoretical 1 : 2 ratio (P < 001) if random bases were incorporated during substitutions. These results are consistent with analyses in mammalian models demonstrating a strong tendency for transitions by SHM.17 Notably, 24 of the 94 VL mutations occurred in codon wobble positions (Table 2) and, of these, 96% (23/24) resulted in silent mutations. This ratio (23 : 1) is also higher than expected assuming random base substitution. These results indicate that transitions and silent mutation accumulation are prevalent in the zebrash IgL analysed in this study.
% R and S mutations
2 21 0 4 41 0 6 61 0 8 81 0 10 100 1 12 12 1 0 14 140 1 16 16 1 0 18 180 1 20 20 1 0 22 220 1 24 240 1 26 26 1 0 28 280 1 30 300 1 32 32 1 0 34 340 1 36 360 1 38 38 1 0 40 40 1 0 42 420 1 44 44 1 0 46 460 1 48 0
VL
JL Position number
CL
incorporation of an A in the daughter strand at the complementary position during replication. In a subsequent round of DNA replication, if left unrepaired, the A in the daughter strand results in a xed C-to-T mutation in the coding strand. Similarly, G-to-A exchanges can result from C deamination on the non-coding strand and two rounds of replication. As illustrated in Fig. 3, the majority of the cytidine mutations observed in VL (8/14) were found at the widely accepted WRCY (rened WRCH18)
45 40 35 30 25 20 15 10 5 0
AID hotspot Outside AID hotspot Outside Outside Outside (WRCH) AID motif (DGYW) AID motif AID motif AID motif Cytidine mutations Guanosine mutations Adenosine Thymidine mutations mutations
Figure 3. Base exchanges are biased to replacement (R) mutations and nucleotide targeting at activation-induced cytidine deaminase (AID) hotspot motifs (WRCH/DGYW). The majority of the base exchanges observed resulted in amino acid changes [replacements (R); black areas of bars]. Silent mutations (S; white areas), while found in all four bases, were proportionally higher outside of WRCH/DGYW hotspot motifs. These data suggest that neutral mutations may be more prone to accumulate in bases outside of hotspots whereas replacement mutations are favoured at AID hotspot motifs.
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
SHM in zebrash
AID mutation hotspot motif. Even more abundant was cytidine mutation (37/45) on the non-coding strand (reverse complement of the hotspot motif RGYW resulting in G mutations on the coding strand). These ndings suggest that AID hotspot motifs are targets for mutations in zebrash and resultant mutations appear biased to xation in daughter strands by U to A base pairing during DNA replication. analysis showed that 89% (41/46) of the germline C/G targets in WRCH/DGYW motifs were in the rst or second codon position. Given that this percentage is higher than the 666% expected if WRCH/DGYW targets were distributed equally among the three codon positions, it appears that there is a propensity for replacement mutations at these motifs. There was, however, no statistically signicant difference between FR and CDR regions with respect to AID targets at wobble positions (P < 062), meaning that, unless they are subject to other selective pressures, the mutations induced at AID hotspots in both FR and CDR regions are more prone to be replacements. Of the 46 mutations in WRCH/DGYW hotspots, 89% (41/46) occurred at non-wobble positions and, of these, 100% (41/41) resulted in replacements. These numbers are considerably higher than the 57% (27/47) replacement rate outside AID hotspot motifs (Fig. 3). Thus, the data suggest that non-random mutations are favoured, with a strong bias towards replacements in WRCH/DGYW motifs.
Replacement mutations are favoured within and outside AID hotspot motifs
Overall, 74% (68/93) of the VL base exchanges resulted in replacement mutations. Replacement (R) over silent (S) mutations were favoured in both the FR (26R; 11S) and CDR (42R; 14S) regions. When viewed in the context of WRCH/DGYW motifs, 89% (41/46) of the mutations at AID hotspots were replacements. To discern whether C/G targets of the AID motifs were more likely to be situated in either the rst or the second codon position and hence by location be predisposed towards replacements, WRCH/DGYW positioning within the open reading frame of the targeted germline VL was determined. This
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
Figure 4. Mutations at activation-induced cytidine deaminase (AID) hotspots are disproportionately concentrated at complementarity-determining region 3 (CDR3). (a) AID hotspot motifs (n = 37) are distributed across framework regions (FRs) and complementarity-determining regions (CDRs) of the targeted germline VL. (b) The proportion of bases contained in WRCH/DGYW motifs varies slightly between FRs and CDRs. (c) Despite different densities of WRCH/DGYW motifs (a, b), mutations within WRCH/DGYW motifs are highly biased to occurrence in CDR3.
AID
AID
WRCH W YGD
5-WRCH-3 /5-DGYW-3 AACA/TGTT (VL=2);0 AACC/GGTT (VL=3);1 AACT/AGTT (VL=5);3 AGCA/TGCT (VL=2);0 AGCC/GGCT (VL=1);0 TACA/TGTA (VL=3);0 TACC/GGTA (VL=3);1 TACT/AGTA (VL=3);0 TGCC/GGCA (VL=1);0 TGCT/AGCA (VL=2);0
W RCH H CRW
AID
5-WRCH-3 /5-WRCH-3 AGCT/AGCT (VL=6); 8 TGCA/TGCA (VL=6); 33
Palindromic AGCT and TGCA motifs conform to WRCY and RGYW in both directions. VL = 37 motifs 41/46 mutations in palindromes.
Figure 5. Somatic mutations are highly concentrated in palindromic activation-induced cytidine deaminase (AID) hotspot motifs. Within mammals, WRCH/DGYW motifs are considered the principal hotspot for AID-induced cytidine deamination in rearranged immunoglobulin during somatic hypermutation (SHM) whereas palindromic WRCY/WRCY are often targeted during class switch recombination (CSR). In the zebrash VL targeted, every WRCH/DGYW sequence was present (the number of occurrences is listed in parentheses and the mutation number within each is depicted in bold). Overall, VL mutations were disproportionately concentrated within palindromic AGCT/AGCT and TGCA/TGCA AID hotspot motifs.
Mononucleotides
To determine whether additional nucleotides or combinations of adjacent nucleotides in the VL coding region were preferentially targeted for mutation, MIs were determined. Each index is a normalized measure that takes into account the fact that each base or group of bases does not occur at the same frequency over the VL region analysed. The MIs reported for mononucleotides (Table 3) show by index score that the nucleotides are preferentially mutated in the order G>C>T>A. Transversion to transition ratios at G, C, T and A were 010, 016, 023 and 044, indicating a strong preference for base exchanges resulting in transition mutations. Each of the four ratios
AID hotspot motifs in the targeted VL, they accounted for 89% (41/46) of the mutations observed across all WRCH/DGYW motifs. AGCT/AGCT palindromes were found in FR1, CDR1 and FR2 (each region had one palindromic motif) with the number of mutations at each motif being 2, 4 and 2, respectively. The TGCA/TGCA palindromes were present in FR2, FR3 and CDR3 (also each having one palindromic motif) with respective mutation numbers at these motifs being 1, 1 and 31. These results suggest either preferential targeting of somatic mutation to palindromic AID hotspot motifs or an increased selection for resultant mutations at CDRs.
248
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
SHM in zebrash
Table 3. Substitutions and mononucleotide mutability indexes in VL regions of zebrash immunoglobulin L (IgL) From A C G T Total
1
24 16 26 27 93
The mutability index is the observed number of mutations in a specic nucleotide divided by the expected number of mutations given a mechanism without bias. Expected numbers were derived by multiplying the frequency of the nucleotide over the region sequenced by the total number of mutations observed. A mutability index score of 10 would be assumed to represent non-biased sequence insensitive mutations. The number of VL nucleotides over the 55 VJ-C sequences was 14 355 (A = 3685; C = 2473; G = 4015; T = 4180) with 93 VL mutations. 2 Numbers in parentheses are substitution percentages for the indicated nucleotides. Chi-squared contrasts of observed and expected mutations with signicant differences. *P = 0005; **P = 001.
is signicantly different (P < 001) from the theoretical 2 : 1 transversion to transition ratio that would be expected if base exchanges were random and nondiscriminatory. Thus, mononucleotide analyses suggest that neither the nucleotides targeted nor the resultant patterns of substitution can be explained by assuming that somatic mutation events occur randomly in the zebrash VL.
Table 4. Dinucleotide mutability indexes in zebrash VL VL No. of Mutability index2 1 Dinucleotide germline mutations Expected (observed/expected) AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT 17 14 19 17 12 9 2 23 22 10 22 19 16 13 30 17 6 6 10 7 11 2 2 17 7 41 6 9 1 8 42 9 12 10 13 12 8 6 1 16 15 7 15 13 11 9 21 12 050 061 075 059 131 032 143 105 045 584** H 039 067 009** C 088 199** H 076
Dinucleotides
Dinucleotide mutability indexes (Table 4) were calculated to determine if mutations were concentrated to specic pairs of adjacent bases. Overall, dinucleotide MIs varied considerably from the observed/expected score of 10 which is assumed to represent random sequence insensitive mutation. Index scores greater than 10 are assumed to represent preferential targeting whereas scores less than 10 imply either an avoided target for mutation (a cold spot) or that mutations at these positions are selected against. Chi-squared analyses of mutations at dinucleotides showed statistically signicant targeting at GC and TG combinations. GC dinucleotides have also been found to be signicant targets of mutation in human19 and catsh20 VH regions. In the present study, 90% (9/10) of the GC dinucleotides in the zebrash germline VL were encoded within WRCH/DGYW motifs. Moreover, 98% (40/41) of the mutations at GC positions were in the target C positions of AID hotspot motifs. In contrast to the high mutability score for GC (MI = 584), the MI for TA was 009 (Table 4), which would suggest that TA dinucleotides are avoided targets for mutation (cold spots) or mutations at TA are selected against. The presence of both statistically signicant mutational hotspots and cold spots is also indicative of non-random nucleotide targeting in the zebrash model.
H, favoured somatic hypermutation (SHM) target or hotspot; C, avoided target or cold spot. 1 Number of times the dinucleotide is present in the targeted germline VL. 2 Mutability indexes were calculated as described in the legend of Table 3. **Statistically signicant by chi-squared test at P = 001.
Trinucleotides
When expanded to trinucleotides, MIs (Table 5) for GC targeting were highest and statistically signicant in GCA and TGC combinations whereas reduced targeting at TA remained consistent in TAC and TAA trinucleotides. In addition, GTG was also found to be a statistically signicant base combination for mutation and ATG was
249
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
Trinucleotide AAA AAC AAG AAT ACA ACC ACG ACT AGA AGC AGG AGT ATA ATC ATG ATT CAA CAC CAG CAT CCA CCC CCG CCT CGA CGC CGG CGT CTA CTC CTG CTT GAA GAC GAG GAT GCA GCC GCG GCT GGA GGC GGG GGT GTA GTC GTG GTT TAA TAC TAG TAT TCA TCC
VL germline 6 4 5 2 4 4 1 5 3 5 3 8 1 5 7 4 1 2 8 2 2 0 0 7 1 0 0 1 4 4 13 2 2 2 6 6 4 2 0 4 11 0 5 6 5 2 6 6 3 6 0 7 3 3
Region1 1 7 7 5 14 4 6 3 3 5
Mutation2,3 0 6 3 34 11 0 3 1 2 6
Type4 000 081 041 645** H 075 000 047 032 063 114
Codon5
H, favoured somatic hypermutation (SHM) target or hotspot; C, avoided target or cold spot. 1 Mutability indexes were calculated as described in the legend of Table 3. Statistically signicant by chi-squared test at *P = 005; **P = 001. NA, not applicable.
identied as a coldspot. The GCA and TGC trinucleotides are both contained within DGYW/WRCH motifs and one of the highly mutated DGYW hotspots was immediately preceded by a G which accounts for GTG being statistically signicant. Thus, GCA, TGC, and GTG appear to be targets for somatic mutation because of their inclusion within the larger WRCH/DGYW hotspot motifs, whereas ATC and ATG appear to be mutational cold spots or trinucleotides in which mutations are selected against.
250
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
SHM in zebrash
found (P = 045). Multinomial distribution analyses of R and S mutations in productive VJ-C clones also did not reveal signicant P values for a scarcity of FR replacements or an excess of CDR replacements. These results could be interpreted to mean that selection may be somewhat restricted in the zebrash used in this study. Recent work by others20 has shown that, in addition to collective assessments of R and S mutations, analyses of productive rearrangements in the context of clonal lineages and subsequent radiations may be more relevant to understanding the potential impact of selection on somatic mutation. in some FRs. For several VJ-C clones, R mutations in FRs appear to be early in the mutational lineage. The presence of founder mutations carried in the clonal descendants implies that, although the mutation may change the afnity for antigen, it does not ablate the structural integrity of the B-cell receptor. Identication of lineages in the zebrash VJ-C clones is suggestive of both sustained and incremental mutational events and selection, both characteristics of afnity maturation responses seen in higher vertebrates.
Discussion
Zebrash are rapidly emerging as an important immunological model for biomedical research. In contrast to mice and human models, far less is known concerning potential nucleotide targets and mutational hotspots that may underpin immunoglobulin afnity maturation in this species. Previously, by aligning cDNA and expressed sequence tag (EST) sequences with genomic reference sequences, our laboratory had shown that somatic mutation can occur in the immunological light chains in zebrash.7 In the present study, by focusing on a single VL locus, we were able to obtain a sufcient quantity of VJ-C cDNA transcripts to identify specic nucleotide targets and patterns of SHM in this animal model. In mammals, SHM is typically characterized by the presence of single base substitutions, with few insertions or deletions. Single base substitutions and limited insertion or deletion were also dening characteristics in the zebrash VJ-C cDNA clones (Table 2). Given that insertions or deletions in non-triplet increments result in frameshifts, it is expected that B cells harbouring such mutations would be selected against, for without functional light chains the integrity of the BCR would be lost. Also similar to ndings in mice and humans,21 transition (T) mutations were more prevalent than transversions (V) (T:V ratio in zebrash VL = 564). The transition preference in zebrash is even more pronounced if only mutations in wobble positions of degenerate codons are considered (T:V ratio = 10; data in Table 2). Mutations at degenerate wobble positions may escape selection at the protein level as, regardless of the base exchanged, an identical amino acid would be encoded. Thus, mutations at wobble positions might offer a window into targeting and base change preferences of a mutational mechanism active in zebrash. However, selection based on nucleotides adjacent to wobble positions cannot be excluded. Of the 11 mutations in wobble positions of degenerate codons, base exchanges were from T (seven), A (three), or G (one) and none of these 11 exchanges was in WRCH/DGYW motifs. Similarly, if all non-WRCH/ DGYW VL mutations are included, A:T mutations represent 73% (34/47; Table 2) of mutations outside of AID
251
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
Progenitor VJ-C
EU795319 FR2 (82) R FR2 (93) R CDR3 (259) R EU821500 FR2 (97) R CDR3 (259) R
(b)
EU821516
Progenitor VJ-C
EU821507 FR1 (34) R CDR1 (56) S CDR1 (71) S EU821498 FR3 (218) S CDR3 (259) R EU795304 CDR1 (50) S CDR2 (118) R CDR2 (158) R CDR3 (259) R EU795311 FR3 (196) R CDR3 (259) R
EU821520 FR1 (32) S FR3 (218) S FR3 (222) R CDR3 (259) R EU795323 FR1 (27) R CDR3 (259) R +7additional mutations
EU795318 CDR2 (166) R CDR3 (259) R EU795316 FR2 (80) S CDR3 (259) R
Figure 6. Lineage relationships of VJ-C cDNA clones are consistent with the possibility of clonal expansion and afnity maturation in zebrash B cells. Individual VJ-C cDNA clones are depicted as circles by the accession number. A potential progenitor clone is depicted at the apex of each diagram. (a) Clones harbouring two or more additive mutations. (b) Clones radiating from a founder mutation in complementarity-determining region 3 (CDR3). In both diagrams, the progenitor VJ-C clone (accession no. EU821516) is identical in sequence to unmutated germline immunoglobulin L (IgL). Mutations listed for each VJ-C clone designate the location in framework regions (FRs) or CDRs and numbers in parentheses indicate the concordant germline position. Replacement mutations are indicated with an R while silent mutations are indicated with an S. Listings in italics are mutations at the C/G positions of WRCH/DGYW hotspot motifs. The directionality of arrow segments depicts sequential mutation accumulation in the radiation of clonal descendants.
hotspot motifs. Collectively, these ndings indicate that in zebrash a substantial number of mutations outside of WRCH/DGYW motifs in VL may be attributable to mutational mechanisms that target A:T base pairs. AID-dependent deamination of cytidine to uracil, in addition to producing mutations at C/G nucleotides, has also been shown to activate mismatch repair at U:G mismatches in mouse models. The mismatch repair proteins MSH2MSH6 have been found to bind U:G mismatches and in doing so can recruit a low-delity DNA polymerase called polymerase eta (g).22 Upon binding the MSH2MSH6 heterodimer, the catalytic activity of g is stimulated, allowing the polymerase to move more rapidly along the template DNA. Being a low-delity polymerase, g is prone to incorporate base substitutions preferentially at A:T positions downstream of the original U:G
252
lesion.23,24 Thus, in theory, the G:C and A:T mutations observed in zebrash VL could be largely dependent on the combined outcome of uracil-DNA glycosylase (UNG) and mismatch (MSH) repair pathways. Somatic mutations in mammalian immunoglobulin18 and the nurse shark antigen receptor, (NAR)25 are also proportionally distributed among G:C and A:T base pairs. However, in other vertebrates, including frogs,26 and the VH of shark IgM,27 G:C mutations are favoured. These ndings indicate mutational targeting of G:C and A:T pairs and subsequent repair strategies therein may occur at different capacities in different organisms. When individual zebrash VJ-C sequences are considered, 18 of the 55 clones contained mutations within a WRCH/DGYW and one or more mutations either upstream or downstream of the targeted AID hotspot. In
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
SHM in zebrash
total, the 18 clones harboured 29 VL mutations outside of a targeted WRCH/DGYW (data in Table 2). Of these, 72% (21/29) were at A:T positions and 28% (8/29) were at G:C base pairs. Most intriguing, however, is that 93% (27/29) of these mutations were downstream of the targeted WRCH. Based on these ndings, it is tempting to speculate that, similar to mammalian models, AID might target C:G at WRCH/DGYW in zebrash which in turn may activate orthologous MSH2MSH6 proteins to resultant U:G mismatches. If recruitment of a low-delity polymerase similar to mouse polymerase g were also to ensue, this could in theory account for the increased propensity for A:T mutations downstream of targeted AID hotspots. Most experimental evidence in mice and humans suggests that AID initiates deamination of cytidines in actively transcribed immunoglobulin genes.28,29 Recently, it was shown that in vitro nucleosomes prevent AID access, unless the immunoglobulin segment is being transcribed.30 Transcription is required for SHM in vivo, presumably in part to loosen the contact of nucleosomes with the DNA.31 During transcription, the single-stranded DNA (ssDNA) is prone to AID-mediated C to U conversion, producing U:G mismatches in the DNA. U:G mismatches cause modest distortions in the DNA which may in turn activate a suite of DNA repair mechanisms involving DNA glycosylases, general mismatch repair factors, and a variety of error-prone polymerases.32 Alternatively, if left unrepaired, the U:G mismatches become xed as CT mutations in replicated DNA as a result of the lack of discrimination by DNA polymerases between U and T in the template strand during DNA replication. In the zebrash VL, the majority (41/46) of the cytidine mutations found at WRCH/DGYW hotspot motifs were CT transitions. This nding suggests that uracil glycosylase-mediated DNA repair may be somewhat limited in the zebrash VL regions. For example, if uracil residues in U:G mismatches were substrates for base excision repair, it would seem likely that hydrolysis of the glycosidic bond between U and deoxyribose and subsequent endonucleolytic cleavage of the sugar would result in an abasic site. Polymerases involved in base excision DNA repair are generally more error-prone and their activity over an abasic lesion brought about by U removal from U:G mismatches would be predicted to result in CA/G/T mutations. The precise preference for each type of substitution would in large part be reliant on the polymerase involved. Given that two of the three possible substitutions are transversions, it seems that a predominance of transversions would be apparent if base excision repair was extensive at U:G mismatches created at AID motifs.33 The statistically signicant preference (P < 001) for transition over transversion mutations both within and outside AID motifs in the zebrash VL implies a tendency either for mutations at AID hotspots to escape DNA repair or for selective pressures to maintain transition mutations once they become xed in the B-cell genome. In mammals, it has been suggested that AID-induced mutagenesis saturates the overall repair capacity of B cells.34 If AID mutagenesis were to saturate uracil glycosylase capacities in zebrash B cells, this might in part explain why the majority of mutations at AID hotspots in zebrash VL were CT. Conversely, if mismatch repair pathways were not as saturated, this could also explain the increased capacity for A:T mutations downstream of targeted at AID hotspots. Although the balance between saturation of repair mechanisms and toleration of mutation remains largely unknown, it appears that exibility in this balance would result in an increased capacity to generate mutational diversity within immunoglobulin gene segments. Uracil glycosylase base excision and mismatch repair systems are evolutionarily ancient mechanisms for DNA repair thought to exist in all prokaryotes and eukaryotes. The utilization of these repair mechanisms in vertebrates to generate additional diversity within immunoglobulin gene segments is an area of research that has only recently begun to be explored. The discovery just over a decade ago35 that AID is responsible for both SHM and class switch recombination (CSR) dramatically enhanced the possibility for obtaining an in-depth understanding of the mechanistic processes underlying adaptive immunity in vertebrates. It had long been thought that the uracil in DNA was an adverse condition arising from inappropriate incorporation of dUTP during replication or spontaneous deamination of cytosine.36 It is becoming increasingly apparent, however, that nature incorporates uracil into DNA as a central mediator of adaptive immunity and as a strategy against certain viruses during innate responses.37,38 Thus, uracil incorporation, once thought to be solely a mutagenic burden, has been revealed as a mechanism to modify immunoglobulin DNA in B cells for diversity or even non-self DNA for degradation. An orthologue of AID has been identied in zebrash and its expression in mammalian cells in vitro has been shown to induce both CSR and SHM.39,40 In the present study, the patterns revealed for in vivo mutations in zebrash VL strongly suggest that AID and uracil incorporation are utilized as a means to diversify immunoglobulin diversity in the zebrash model. Despite drastically different outcomes for SHM and CSR (point mutations versus large-scale deletions) and functionally distinct target sequences (VH/L exons versus switch regions), SHM and CSR are both contingent upon the B-cell specic AID enzyme and single-strand templates brought about by transcription. Point mutations similar to those at VH WRCH sequences have also been found at the WRCH within switch regions in mice, suggesting a common AID targeting method for both SHM and CSR.41,42 Given that SHM has been found in all vertebrates including sh, whereas CSR appears limited to
253
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
Acknowledgement
This work was supported in part by grants from the PhRMA Foundation and the National Science Foundation.
Disclosures
The authors have no conicts of interest to disclose.
References
1 Weigert MG, Cesari IM, Yonkovich SJ, Cohn M. Variability in the lambda light chain sequences of mouse antibody. Nature 1970; 228:10457. 2 Harris RS, Kong Q, Maizels N. Somatic hypermutation and the three Rs: repair, replication and recombination. Mutat Res 1999; 436:15778. 3 Maizels N, Scharff MD. Molecular mechanisms of hypermutation. In: Neuberger M, Honjo T, Alt FW, eds. Molecular Biology of B Cells. New York: Academic Press, 2004:32738. 4 McKean D, Huppi K, Bell M, Staudt L, Gerhard W, Weigert M. Generation of antibody diversity in the immune response of BALB/c mice to inuenza virus hemagglutinin. PNAS 1984; 81:31804. 5 Danilova N, Bussmann J, Jekosch K, Steiner LA. The immunoglobulin heavy-chain locus in zebrash: identication and expression of a previously unknown isotype, immunoglobulin Z. Nat Immunol 2005; 6:295302.
254
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
SHM in zebrash
33 Di Noia J, Neuberger M. Altering the pathway of immunoglobulin hypermutation by inhibiting uracil-DNA glycosylase. Nature 2002; 419:438. 34 Liu M, Duke JL, Richter DJ, Vinusesa CG, Goodnow CC, Kleinstein SH, Schatz DG. Two levels of protection for the B cell genome during somatic hypermutation. Nature 2008; 451:8416. 35 Muramatsu M, Kinoshita K, Fagarasan S, Yamada S, Shinkai Y, Honjo T. Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell 2000; 102:55363. 36 Visnes T, Doseth B, Pettersen HS et al. Uracil in DNA and its processing by different DNA glycosylases. Philos Trans R Soc Lond B Biol Sci 2009; 364:5638. 37 Sousa MM, Krokan HE, Sluppphaug G. DNA-uracil and human pathology. Mol Aspects Med 2007; 28:276306. 38 Chelico L, Pham P, Petruska J, Goodman MF. Biochemical basis of immunological and retroviral responses to DNA-targeted cytosine deamination by activation-induced cytidine deaminase and APOBEC3G. J Biol Chem 2009; 41:277615. 39 Barreto BM, Pan-Hammarstrom Q, Zhao Y, Hammarstrom L, Misulovin Z, Nussenzweig MC. AID from bony sh catalyzes class switch recombination. J Exp Med 2005; 202:7338. 40 Wakae K, Magor BG, Saunders H, Nagaoka H, Kawamura A, Kinoshita K, Honjo T, Muramatsu M. Evolution of class switch recombination function in sh activationinduced cytidine deaminase, AID. Int Immunol 2006; 18:417. 41 Nagaoka H, Muramatsu M, Yamamura N, Kinoshita K, Honjo T. Activation-induced deaminase (AID)-directed hypermutation in the immunoglobulin Sl region: implication of AID involvement in a common step of class switch recombination and somatic hypermutation. J Exp Med 2002; 195:52934. 42 Zeng Z, Negrete GA, Kasmer C, Yang WW, Gearhart PJ. Absence of DNA polymerase {eta} reveals targeting of C mutations on the non-transcribed strand in immunoglobulin swith regions. J Exp Med 2004; 199:91724. 43 Lundqvist ML, Pilstrom L. Variability of the immunoglobulin light chain in the Siberian sturgeon, Acipenser baeri. Dev Comp Immunol 1999; 23:60715. 44 Du Pasquier L. The immune system of invertebrates and vertebrates. Comp Biochem Physiol B Biochem Mol Biol 2001; 129:115. 45 Flajnik MF. Comparative analyses of immunoglobulin genes: surprises and portents. Nat Rev Immunol 2002; 2:68898. 46 Cannon JP, Haire RN, Rast JP, Litman GW. The phylogenetic origins of the antigenbinding receptors and somatic diversication mechanisms. Immunol Rev 2004; 200:12 22. 47 Bengten E, Quiniou S, Hikima J, Waldbieser G, Warr GW, Miller NW, Wilson M. Structure of the catsh IGH locus: analysis of the region including the single functional IgHM gene. Immunogenetics 2006; 58:83144. 48 Zarrin AA, Alt FW, Chaudhuri J, Stokes N, Kaushal D, Du Pasquier L, Tian M. An evolutionarily conserved target motif for immunoglobulin class-switch recombination. Nat Immunol 2004; 5:127581. 49 Reynaud CA, Anquez V, Grimal H, Weill J. A hyperconversion mechanism generates the chicken light chain pre-immune repertoire. Cell 1987; 48:37988. 50 Kohzaki M, Nishihara K, Hirota K et al. DNA polymerases v and h are required for efcient immunoglobulin V gene diversication in chicken. J Cell Biol 2010; 189:111727. 51 Mage RG, Lanning D, Knight KL. B cell and antibody repertoire development in rabbits: the requirement of gut-associated lymphoid tissues. Dev Comp Immunol 2006; 30:13753.
2010 The Authors. Immunology 2010 Blackwell Publishing Ltd, Immunology, 132, 240255
255