This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon.
Revealing biases inherent in recombination protocols
BMC Biotechnology 2007, 7:77 doi:10.1186/1472-6750-7-77
Javier F Chaparro-Riggers (email@example.com) Bernard LW Loo (firstname.lastname@example.org) Karen M Polizzi (email@example.com) Phillip R Gibbs (firstname.lastname@example.org) Xiao-Song Tang (email@example.com) Mark J Nelson (firstname.lastname@example.org) Andreas S Bommarius (email@example.com)
ISSN Article type Submission date Acceptance date Publication date Article URL
1472-6750 Methodology article 11 June 2007 14 November 2007 14 November 2007 http://www.biomedcentral.com/1472-6750/7/77
Like all articles in BMC journals, this peer-reviewed article was published immediately upon acceptance. It can be downloaded, printed and distributed freely for any purposes (see copyright notice below). Articles in BMC journals are listed in PubMed and archived at PubMed Central. For information about publishing your research in BMC journals or any BioMed Central journal, go to http://www.biomedcentral.com/info/authors/
2007 Chaparro-Riggers et al., licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Revealing biases inherent in recombination protocols
Javier F. Chaparro-Riggers1*, Bernard L.W. Loo1*, Karen M. Polizzi1, Phillip R. Gibbs1,4, Xiao-Song Tang3, Mark J. Nelson3, and Andreas S. Bommarius,1,2
School of Chemical and Biomolecular Engineering, Parker H. Petit Institute of
Bioengineering and Bioscience, School of Chemistry and Biochemistry, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30332-0363, USA
School of Chemistry and Biochemistry, Georgia Institute of Technology, 315 Ferst Drive, Atlanta, GA 30332-0363, USA
EI DuPont de Nemours & Company, PO Box 80328, Wilmington, DE 19880-0328, USA
current address: Stheno Corporation, 311 Ferst Drive, Atlanta, 30332-0100, USA
these authors contributed equally to this work Corresponding author:
Email addresses: JCH: firstname.lastname@example.org BWL: email@example.com KMP: firstname.lastname@example.org PRG: email@example.com XST: firstname.lastname@example.org MJN: email@example.com ASB: firstname.lastname@example.org
Background: The recombination of homologous genes is an effective protein engineering tool to evolve proteins. DNA shuffling by gene fragmentation and reassembly has dominated the literature since its first publication, but this fragmentation-based method is labor intensive. Recently, a fragmentation-free PCR based protocol has been published, termed recombination-dependent PCR, which is easy to perform. However, a detailed comparison of both methods is still missing. Results: We developed different test systems to compare and reveal biases from DNA shuffling and recombination-dependent PCR (RD-PCR), a StEP-like recombination protocol. An assay based on the reactivation of lactamase was developed to simulate the recombination of point mutations. Both protocols performed similarly here, with slight advantages for RD-PCR. However, clear differences in the performance of the recombination protocols were observed when applied to homologous genes of varying DNA identities. Most importantly, the recombination-dependent PCR showed a less pronounced bias of the crossovers in regions with high sequence identity. We discovered that template variations, including engineered terminal truncations, have significant influence on the position of the crossovers in the recombination-dependent PCR. In comparison, DNA shuffling can produce higher crossover numbers, while the recombination-dependent PCR frequently results in one crossover. Lastly, DNA shuffling and recombination-dependent PCR both produce counter-productive variants such as parental sequences and have chimeras that are over-represented in a library,
respectively. Lastly, only RD-PCR yielded chimeras in the low homology situation of GFP/mRFP (45% DNA identity level). Conclusion: By comparing different recombination scenarios, this study expands on existing recombination knowledge and sheds new light on known biases, which should improve library-creation efforts. It could be shown that the recombination-dependent PCR is an easy to perform alternative to DNA shuffling.
Background: Directed evolution of proteins has become a widely adopted and accepted method for protein engineering. There are two basic iterative steps involved in the process: the creation of diversity at the gene level and the screening or selection for improved variants [reviewed in   ]. The quality of the diversity method is crucial and the performance of the chosen protocol has a direct impact on the success rate of obtaining improved variants as well as on the time and cost effectiveness of the ensuing screening or selection process  . Two main categories can be classified into methods for creating molecular diversity: random mutagenesis and recombination . A recent, indepth comparison of random mutagenesis methods showed that the existing methods are limited and highly biased. On average they can only achieve between 3.15-7.4 amino acid substitutions per residue . On the other hand, to date recombination methods have not been compared in detail. Since its introduction in 1994, DNA shuffling of Stemmer has become a widely adopted method for creating chimeric genes. As of the end of February 2007, the two original papers outlining the methodology (one in Proceedings of the National Academy of Sciences, the other in Nature) have been cited 517 and 760 times, respectively  . DNA shuffling is the most common method with which to recombine genes, and it has become a powerful tool for protein evolution                . Despite the pervasiveness of DNA shuffling in protein engineering, there are several drawbacks to its implementation. The protocol is somewhat skill-intensive, involving the fragmentation of the genes to be shuffled with DNAseI and a long, primerless reassembly PCR step (Figure 1). Because DNA shuffling utilizes annealing and extension steps
during reassembly, crossover points are biased towards regions of high sequence identity   . In addition, the yield of chimeras can be quite low, particularly when short genes are being shuffled. Parental background ranging from around 20%   to almost 100%    has been reported. Finally, there is a lower limit to the DNA identity level of the genes being recombined, with 56% being the lowest reported identity level that lead to successful chimera generation . One alternate group of methods to recombine genes are fragmentation-free PCRbased protocols, which utilize a series of short annealing/extension steps to promote template switching, which in turn, leads to recombination. The first such protocol was the Staggered Extension Process (StEP, ). Further modifications that introduced skew primers to amplify chimeras over parental background have been introduced recently (Recombination-Dependent Exponential Amplification- RDA-PCR , and Shuffling Using Upaired Primers- SUUPER , collectively called recombinationdependent PCR, Figure 1). These recombination-dependent PCR (RD-PCR) protocols are much less skill-intensive than DNA shuffling, and the use of skew primers should, in principle, eliminate parental background. The efficiency of diversity generation has a direct impact on the time and cost effectiveness of the screening or selection process, and ultimately, on the probability of identifying an improved variant. The optimal library generation method would be unbiased and would avoid duplication of chimeras. Reducing or completely eliminating parental background would minimize the effort required to screen these redundant variants. Additionally, the ability to control the crossover number via tunable parameters is desirable as it enables access to different areas of sequence space. It is important to
note that to minimize severe disruption of chimeras the crossover region should be located in regions of similar three-dimensional structure . The purpose of this work is to systematically compare the libraries produced by DNA shuffling and RD-PCR using the same representative templates, in order to determine the suitability of RD-PCR as a less labor-intensive alternative to DNA shuffling for the recombination of genes. We were interested in the number and type of chimeras generated by each protocol: the location of crossover points, the number of crossovers obtained, and the percentage of unique sequences generated with each protocol in our three test systems. We focused on RD-PCR as opposed to StEP since the use of skew primers will eliminate most parental background. Our three test cases encompass the most common scenarios encountered in protein engineering: the recombination of point mutations, recombination of closely related genes, and the recombination of low homology but structurally similar proteins (usually performed with iterative-truncation-type methods    because of the limits of DNA shuffling). To our knowledge, this is the first detailed, head-to-head comparison of DNA shuffling and RD-PCR on the same systems.
Results and Discussion
Recombination of point mutations using -lactamase system One common strategy in the directed evolution of proteins is several cycles of error-prone PCR followed by recombination of the point mutations in selected improved clones to enrich positive mutations and delete negative ones      
. The optimal recombination protocol in this case would result in a high number of crossovers, no additional point mutations, and no parental background. To estimate the crossover rate, we created a phenotype-based screening system to estimate crossover frequency on a large scale by introducing mutations into -lactamase that disrupt activity and are not recoverable by a PCR mutation to the wild-type or a tolerated amino acid . In contrast to previous systems   , this system allows easy selection for reactivation and does not show any genetic instability that could alter the distribution between observed and actual recombination frequencies . Crossovers in certain areas are required for reactivation, so an estimated crossover number can be obtained directly from the observed reactivation rate (functional complementation), reducing the need to sequence large numbers of library members. Template pairs were created requiring 1-5 crossovers for reactivation. The template pairs for DNA shuffling and RD-PCR were slightly different to allow for some extension of the genes before the first crossover in the RD-PCR pairs (see Figure S.1, additional file 1), but the nature of the point mutations and the number of crossovers required per 1000 bp was kept constant. RD-PCR was optimized by varying DNA concentration, annealing/extension temperature, and time. A template concentration of 0.8 ng DNA /L of PCR reaction gave sufficient yields of PCR product, while higher template concentrations reduced the crossover yield (data not shown). The results of the reactivation experiments are summarized in Table 1. Using the cycling conditions from Milano & Tang , the crossover rate decreases upon increasing annealing/extension time, yielding a lower survival rate on ampicillin. This is logical, as longer extension times reduce template
switching, providing less opportunity for crossovers to occur. Using the cycling conditions from Ikeuchi et al. , increasing the annealing temperature increases the reactivation rate but decreases the yield of PCR product. Higher temperatures favor the annealing and extension of longer fragments, making it harder to begin synthesizing a recombinant gene but promoting annealing of partially extended products to different templates after the melting step. Using Pfu polymerase, which has higher fidelity than Taq polymerase, slightly decreased the reactivation rate. However, reactivation rates were still fairly comparable to those in other conditions. In cases where avoiding the introduction of further point mutations is desirable, Pfu polymerase can be successfully used in RD-PCR. DNA shuffling can be optimized either by varying the size of fragments or by adjusting the annealing temperature. Larger fragments tend to yield fewer crossovers  . Because of the high level of homology in our case, a fragment size of 50-120 bp produced sufficient product yield, though in many cases larger fragments are required to promote assembly. In general, the reactivation rates for most RD-PCR conditions and DNA shuffling are very similar. The optimized RD-PCR conditions (60 C, 5 s) showed almost 2-fold higher crossover rates than DNA shuffling. A further advantage of RD-PCR is the ease of implementation for the RD-PCR protocol, since very low template concentrations are required (in contrast to the large amount of small DNA fragments needed for reassembly PCR) and no fragmentation is required.
Recombination of closely related genes
Most family shuffling experiments are performed using genes from closely related organisms with DNA identity levels greater than 75%, due in part to the homology limits of the DNA shuffling protocol. To represent this scenario we chose the sequences of the red fluorescent protein from Discosoma sp. (DsRed, ) and the monomeric red fluorescent variant (mRFP, ). Our version of mRFP had been codon optimized for expression in E. coli, giving the pair a DNA identity level of 75%. The chosen template pair is still a challenging test case, since the average length of identical regions in the alignment is only 3.9 bp. The optimal result when recombining closely related genes would be a diverse library that samples all possible crossover positions. To determine crossover number, crossover position, percentage of parental background, percentage of duplicate sequences, and to estimate point mutation rate, we sequenced 295 randomly chosen functional and non-functional variants from our libraries. We also estimated the number of useful sequences for screening purposes, which is the total number of chimeras minus the number of duplicates of any sequences that appear more than once. We used a series of templates to generate RD-PCR library. Figure 2 shows the different type of templates we used for the libraries RD-PCR 1 to RD-PCR 5. For RDPCR 1 one-sided skew templates were used. RD-PCR 2 is a combination of one sidedskew template with another parental template having a truncation near the beginning of the gene. RD-PCR 3 templates is similar to the templates used in RD-PCR 2 but with an increased truncation length. RD-PCR 4 templates are one sided-skew templates with truncations on both templates. Lastly, RD-PCR 5 uses templates that are both two sidedskewed. The effects of using different templates on the library are discussed in the next few paragraphs.
Following the procedure to recombine -lactamase, our first RD-PCR library (RD-PCR 1) was created using a single skew primer for each parent as shown in Figure 1. We sequenced 50 variants from this library, all of which contained at least one crossover. However, 39 out of 50 contained a single crossover at position 6 (of mRFP), meaning that 38/50, or 76% were duplicate sequences, which we term chimera background. Consequently, in a screening scenario only 12 out of 50, or 24%, would be useful sequences to screen. The bias could not be removed by truncating the first 5 bases (see Figure 3) from the front of the DsRed gene before recombination (RD-PCR 2: 21 variants sequenced, 43% unique chimeras). Truncating the first 44 base pairs of the DsRed parental gene created a bias towards crossovers at the 3' end of the genes, although it was not localized to a single position (RD-PCR 3: 66 variants sequence, 35% unique chimeras). When truncating both templates simultaneously, we could only obtain clonable products when DsRed was truncated at the 3' end (43 bp) and mRFP was truncated at the 5' end (40 bp). (Note that in this case, DsRed is the "top" gene.) Fifty variants from this library (RD-PCR 4) were sequenced, and the result was a localization of crossovers to position 50 (35% unique sequences). Statistics for the libraries are shown in Figure 3. Also striking, in the case of the truncated libraries RD-PCR 2 and RD-PCR 4, we obtained parental background of approximately 10% of sequences, despite the fact that this should not be possible when using skew primers. The parental background could arise either from contamination of the PCR reaction with full-length templates, or by the accidental elongation of the unpaired extension on a strand containing no crossovers (skew extension without recombination  via template switching). One way to
minimize such accidental elongation would be to use two different skew primers for each parental template. Even though the recombination PCR is performed with only one primer for each parent, the amplification PCR is performed with both, thereby creating unique extensions on both ends of the gene and blocking unproductive skew extension without recombination. When we created the library using templates extended in both directions (RDPCR 5), parental background was eliminated, and only chimeras were obtained. Of 39 colonies randomly sequenced, 72% contained unique sequences, predominantly with one crossover per gene. One sequence with three crossovers and one with five crossovers were obtained. Crossover points were also more evenly distributed than in the case of the libraries made with one skew primer, which showed significant bias towards the ends of the genes (Figure 4). Further details on all of the sequences obtained can be found in the supplementary information (see additional file 1). The bias toward crossovers at the ends of PCR products amplified with single skew primers has been noted previously in recombinations during normal (as opposed to StEP-like) PCR cycling conditions . By using the templates amplified with two skew primers we have demonstrated that this bias can be reduced significantly. Therefore, when performing RD-PCR the use of two skew primers for each parental template is important to avoid skew extension without recombination, which leads to parental background and a bias toward crossovers at the ends of the genes. When such precautions are taken, RD-PCR libraries result in a higher ratio of unique chimeras with lower parental background than those produced by DNA shuffling (>70% versus 45%, Figure 3). However, it is important to note that the majority of chimeras produced by RD-
PCR had a single crossover (mean crossovers of 1.1), while DNA shuffling produced sequences with 2 or more crossovers (mean crossovers of 1.6), nearly 25% of the time and that the DNA shuffling parental background could also be reduced by using a skewed primer strategy similar to RD-PCR. RD-PCR is also constrained to have an odd number of crossovers (unless more than 2 parental templates are used) because the skew primers require that different parents contribute the 5' and 3' sequences. The shading on Figure 4 indicates regions of identity between mRFP and DsRed. The lines on supplementary Figures S.4 (a) to S.4 (f) represent rolling point averages of DNA identity between mRFP and DsRed. In many cases, crossovers are clustered in regions high in shading (or high DNA identity levels shown in supplementary Figures S.4 (a) to S.4 (f)) for all protocols tested. In fact, the large level of identity at the 3' end of the gene may be partially responsible for the clustering of crossover points in this region for RD-PCR using one skew primer per parent. The optimized DNA shuffling procedure applied to mRFP and DsRed produced approximately half parental genes (67 variants, 49% background, Figure 3). The percentage of parental background is consistent with published results for the shuffling of green fluorescent protein and yellow fluorescent protein, which have a similar DNA identity level . Of the 34 chimeras we obtained a mean of 1.6 crossovers. 18 had a single crossover, 11 had two crossovers, four had three crossovers, and one had four crossovers. Characteristics of the library are summarized in Figure 3 (for further details, please see Supplementary Information in additional file 1). Figure 5 shows the percentage distribution of the highest number of continuous identical base pairs on either side of the crossover region. Both protocols produced
crossovers in regions with a low number of identical base pairs; however, DNA shuffling is biased towards crossovers in regions with a high level of identity (11 or more base pairs). The two distributions are significantly different as determined by the nonparametric Wilcoxon Rank-Sum test (p=0.028). In general both protocols show a bias towards regions with a high sequence identity, as already reported for DNA shuffling   . It is interesting to note that we obtained more than 50% functional chimeras of DsRed and mRFP. Table S5 shows the functional relationships of some of the chimeras we obtained through recombination. A high percentage of functional chimeras should be expected as mRFP protein was evolved from DsRed protein. As a result of their high homologies, most of the crossovers preserved the activity of the parents.
Recombination of distant homologs In some cases it is desirable to recombine distant homologs with a low level of DNA sequence identity but a high level of structural similarity . In this case, the potential for diversity increase is very large, but the probability of obtaining nonfunctional variants is very high. Currently, very low homology recombination is accomplished by the iterative-truncation family of methods    or by oligonucleotide-directed shuffling   , because DNA shuffling cannot successfully recombine genes with very low levels of nucleotide identity below about 50%. We were interested in determining the lower limit of homology that can be successfully recombined using RD-PCR. DNA shuffling experiments were carried out simultaneously as a control measure.
We were able to successfully recombine DsRed with HcRed (Heteractis crispa ) (65% DNA identity, near the current published lower limit for recombination using homology-based methods) with both DNA shuffling and RD-PCR. Library quality was similar to that of DsRed/mRFP-- Figures 6 (a) and 6 (b) show that no parental background was obtained in the case of RD-PCR (23 sequences) and approximately 20% parental background was obtained for DNA shuffling (20 sequences). 56% of the crossovers for RD-PCR were localized near a 25 base pair stretch of DNA identity, whereas crossovers for DNA shuffling were more diffuse. In general, the DNA shuffling reaction appeared to yield about the same number of crossover positions, but yielded 40% more unique chimeras (14 versus 10) and many more times the clones with multiple crossovers than RD-PCR (7 of 14 different clones versus 1 of 10 different clones). We then moved to a lower DNA identity level, recombining GFP and mRFP (45% DNA identity). Sequencing of 38 variants from the RD-PCR showed all variants had one crossover (no parental background) and 18% useful chimeric sequences (Figure 3). Most of the sequences (30/38) contained a crossover point at the 5' end of the gene, with the remaining six unique crossover points distributed across the gene (Figure 7). Two variants had a crossover in regions with a single base pair of identity between the two sequences, highlighting the ability of PCR-based methods to produce diverse chimeras. We were unable to obtain any chimeras using DNA shuffling with this
template set (14 variants sequenced, 100% parental background). We also found most of the chimeras were non functional (Table S5). In this case, the homology between GFP and mRFP could be too low for useful shuffling.
To streamline the process of screening large combinatorial libraries, it is highly important to have an efficient diversity generation method, one which produces unique, nonparental sequences and is easy to implement. The current gold-standard for
recombination of genes is DNA shuffling, although this protocol suffers from a high rate of parental background and can be technically difficult to perform. One recently
developed alternative to DNA shuffling is RD-PCR, which is based on simple techniques and should, in theory, produce libraries with no parental background. We explored the use of RD-PCR as an alternative to DNA shuffling for three common laboratory scenarios: recombination of point mutations, closely related sequences, and distantly related homologs. We found that RD-PCR produces libraries of equal or greater quality to DNA shuffling in the first two scenarios, as determined by the percentage of unique sequences from each protocol in the case of the fluorescent proteins and by the reactivation rate in the case of -lactamase. Depending on the number of inactivating mutations (1 n 5), n crossovers were observed for either protocol. In the moderate homology scenario, recombination experiments for DsRed/HcRed indicate that DNA shuffling performed better than RD-PCR in producing a higher quality library with multiple crossovers In the low homology situation of GFP/mRFP (45% DNA identity level), only RD-PCR yielded chimeras. Generally, the rate of introduction of inadvertent point mutations with RD-PCR is similar to the rate for DNA shuffling (Table 2 and ) performed with Pfu polymerase as well as for normal PCR amplifications (Table 2, all less than 5%), even though RDPCR employs Taq polymerase. Even though Taq polymerase lacks the 5' to 3' excision-
repair mechanism, RD-PCR uses a short cycling protocol. One caveat of the RD-PCR for the shuffling of fluorescence genes is the dominant finding of only one crossover per gene, while DNA shuffling resulted often in multiple crossovers. The above results imply that DNA shuffling should be the method of choice in cases where multiple crossovers are highly desired. DNA shuffling and RD-PCR seemed to have distinct crossover positions, hence, in some situations DNA shuffling and RD-PCR could be complementary methods used for generating diverse libraries. One can perform RD-PCR followed by DNA shuffling to improve sequence diversity of the library. Both recombination protocols share the bias that they preferentially produce crossover in region of high sequence identity in the alignment. This phenomenon can be overcome by using homology-independent recombination protocols  . A combined approach was used by Griswold et al. . They divided the genes in five sections and perform RD-PCR on four of them to obtain multiple crossovers/gene. One section showed low DNA identity (59.7% DNA identity) and they used a homologyindependent recombination approach called enhanced crossover SCRATCHY . To create high quality libraries with RD-PCR, two skew primers for each parental sequence must be used to minimize skew extension without recombination, such as parental background and a bias toward crossovers at the termini of the genes. If care in library design is taken, RD-PCR represents a viable alternative to classical DNA shuffling that is easier to implement. Similarly, to create high quality libraries with reduced parental background, skew primers can also be used. Such an application has been successfully tested on estrogen receptor in yeast to generate chimeras .
Finally, to improve success of recombination of genes with low level of identity, one can also increase sequence identity between two genes. With decreasing costs of synthesis of whole genes, designer synthetic recombination libraries can be created. It is now straightforward to resynthesize genes with new codon choices to increase DNA sequence identity between two genes prior to recombination because it is more economical to order oligonucleotides than ten years ago as the price per base-pair dropped from US $4 to approximately US $0.30 . Theoretically, one can re-optimize DNA identity between two genes to prior to applying recombination to improve the chances of success and reduce bias in the library.
Reagents All enzymes were purchased from New England Biolabs (Beverly, MA) except for Pfu polymerase, which was purchased from Stratagene (La Jolla, CA). Oligonucleotide primers were purchased from MWG Biotech (Highpoint, NC). Ampicillin,
chloramphenicol and tetracycline were purchased from Sigma (St. Louis, MO). Autoclaved tetracycline was made by autoclaving 250 mg/L solution of tetracycline adjusted to pH 3 for 45 min. Mass spectrometry confirmed that approximately 60% conversion to anhydrotetracycline.
PCR Machine For all PCRs, we used the Eppendorf Mastercycler Gradient, Model no. 950000015 which is capable to ramping the temperature at a rate of 3.0C/s.
Construction of the parental plasmids The full-length TEM-1 -lactamase was amplified from template pDrive (Qiagen, Valencia, CA) with following primers adding a KpnI and NdeI at the 5` and SalI and HindIII at the 3` (restriction sites are in italic): 5`-CAA AGT TTT G GT ACC ATA TGA GTA TTC AAC ATT TCC GTG TCG CCC TTAT TCC C-3`, 5- TAA ATA ACA AAG CTT GTC GAC TTA CCA ATG CTT AAT CAG TGA GGC ACC TAT CTC AGC G3`. The PCR product was cloned into the KpnI/HindIII site of vector pPROTet (BD Bioscience, Palo Alto, CA) resulting in pPROTet--lactamase. The amino acid sequence of mRFP was obtained from NCBI and E. coli-codon optimized primers (Table S5) were designed using DNAworks  and synthesized. The mRFP gene was obtained via two PCR reactions, one to assemble the codon optimized primers, and the second to amplify the full length product. The mRFP gene was cloned using the dovetail method . The gene was amplified using primers with Esp3I restriction sites 5' -TAC GTC TCG TCG ACA TGG CGT CTT CTG AAG ACG TTA TCA AAG AAT TCA TGC GT 3' and 5' TAC GTC TCT GGC CTA TTA CGC ACC GGT AGA GTG ACG ACC TTC - 3') and digested with Esp3I enzymes and ligated using T4 DNA ligase into SalI and NotI digested pPROTet vector. Sequencing, expression and characterization consistent with the literature confirmed that the E. coli expression optimized mRFP gene was successfully assembled . The GFP gene was amplified from pQBIT7-GFP plasmid (QBIOgene, Carlsbad, CA) using the dovetail method and primers with Esp3I restriction sites (5' - TAC GGT TAC GTC TCG TCG ACA TGG CGT CTT CTG AAG ACG TTA TCA - 3' and 5'- TAC
GGT TAC GTC TCG TCG ACA TGG CTA GCA AAG GAG AAG AAC TCT TCA 3'), digested using Esp3I and ligated into SalI and NotI digested pPROTet vector. Expression experiments indicated successful cloning. The DsRed gene was amplified from DsRed2-1 plasmid (BD Biosciences Clontech, Palo Alto, CA) with primers containing Esp3I restriction sites (5'-TAC GGT TAC GTC TCG TCG ACA TGG CCT CCT CCG AGA ACG TCA -3' and 5'-CAT TAC TAC GTC TCT GGC CTA CTA CAG GAA CAG GTG GTG GCG G -3') and cloned as with mRFP and GFP. Expression experiments indicated successful cloning. The HcRed gene was cloned from pHcRed1-N1/1 plasmid (BD Biosciences Clontech, Plao Alto, CA) with primers containing SalI and NotI restriction sites (CGG GAT TCC ACA TAG TCT CAG GTA GTC GAC ATG GTG AGC GGC CTG CTG AAG GAG AGT ATG 3' and 5`- TTC CGA TAA GTT CAT AGG CCG TGG CGG CCG CTC AGT TGG CCT TCT CGG GCA GGT CGC T 3'). Expression and fluorescence characterization experiments confirm that the cloning was successful.
Introduction of point mutations into -lactamase The -lactamase mutants were constructed by overlap extension PCR using the pPROTet--lactamase as template. Therefore following external primers were used: 5`CCT ATC AGT GAT AGA GAT ACT GAG C-3` (top strand, N-terminal) and 5`-GAT TCT GTG GAT AAC CGT ATT ACC -3` (bottom strand C-terminal). Internal primers were used for the introduction of following mutations: K30P (AAA to CCG: 5`-CTC ACC CAG AAA CGC TGG TGC CGG TAA AAG ATG CTG AAG ATC AG-3`, 5`GAG TGG GTC TTT GCG ACC ACG GCC ATT TTC TAC GAC TTC TAG TC-3`),
P105G (CCA to GGC: 5`-GAA TGA CTT GGT TGA GTA CTC AGG CGT CAC AGA AAA GCA TCT TAC G-3`, 5`-CTT ACT GAA CCA ACT CAT GAG TCC GCA GTG TCT TTT CGT AGA ATG-3`), D177P (GAC to CCG: 5`-CCA TAC CAA ACG ACG AGC GTC CGA CCA ACG ATG CCT GTA GCA ATG-3`, 5`-GGT ATG GTT TGC TGC TCG CAG GCT GGT TGC TAC GGA CAT CGT TAC-3`), D231P (GAT to CCG: 5`-CTT CCG GCT GGC TGG TTT ATT GCT CCG AAA TCT GGA GCC GGT GAG CGT GG-3`, 5`-GAA GGC CGA CCG ACC AAA TAA CGA GGC TTT AGA CCT CGG CCA CTC GCA CC-3`) and I278P (ATA to CCG: 5`-GAA CGA AAT AGA CAG ATC GCT GAG CCG GGT GCC TCA CTG ATT AAG CAT TG-3`, 5`- CTT GCT TTA TCT GTC TAG CGA CTC GGC CCA CGG AGT GAC TAA TTC GTA AC-3`). The DNA sequence of each mutant was confirmed by sequencing.
-lactamase templates sets for the crossover determination Four different sets of -lactamase mutants were constructed for the estimation of the crossover number (Figure 2). The number of crossover and the segment size where the crossover must occur are summarized in Table 1. For the amplification of the templates following primers were used: Min-1: I278P (5`- CGG GAT TCC ACA TAG TCT CAG GTA GGT ACC ATA TGA GTA TTC AAC ATT TCC-3`, 5`-CGA CTT ACC AAT GCT TAA TCA GTG AGG C -3`), K30P (5`- CCA TAT GAG TAT TCA ACA TTT CCG TGT CG-3`, 5`- TTC CGA TAA GTT CAT AGG CCG TGG GGA TCC AAG CTT GTC GAC TTA CC-3`); Min-3a: K30P/I278P (5`- CGG GAT TCC ACA TAG TCT CAG GTA GCT TCC TTA GCT CCT GAA AAT CTC GAT AAC TC-3`, 5`-CGA CTT ACC AAT GCT TAA TCA GTG AGG C -3`), D177P (5` GCT TCC TTA GCT
CCT GAA AAT CTC GAT AAC TC-3`, 5`- TTC CGA TAA GTT CAT AGG CCG TGG GGA TCC AAG CTT GTC GAC TTA CC-3`); Min-3b: P105G/I278P (5`- CGG GAT TCC ACA TAG TCT CAG GTA GGT ACC ATA TGA GTA TTC AAC ATT TCC-3`, 5`-CGA CTT ACC AAT GCT TAA TCA GTG AGG C -3`), K30P/D177P (5`CCA TAT GAG TAT TCA ACA TTT CCG TGT CG-3`, 5`- TTC CGA TAA GTT CAT AGG CCG TGG GGA TCC AAG CTT GTC GAC TTA CC-3`); Min-5: K30P/D177P/I278P (5`-CGG GAT TCC ACA TAG TCT CAG GTA CTT TCG TCT TCA CCT CGA GTC CCT ATC AGT G-3`, 5`-CGA CTT ACC AAT GCT TAA TCA GTG AGG C -3`), P105G/D231P (5`-CTT TCG TCT TCA CCT CGA GTC C-3`, 5`TTC CGA TAA GTT CAT AGG CCG TGG GGA TCC AAG CTT GTC GAC TTA CC3`).
Template preparation for DNA shuffling The mRFP, DsRed, HcRed, GFP and -lactamase genes were amplified by Pfu polymerase using primers that anneal to the pPROTet vector (5`-CTT TCG TCT TCA CCT CGA GTC C-3`, 5`-CCT ACT CAG GAG AGC GTT CAC C-3`), which added 122 bp to the 5`-terminus and 155 bp to the 3`-terminus. The PCR products were gel purified.
DNA shuffling DNA shuffling was performed according to Joern , which uses a hybrid method derived from Stemmer et al.  and Abcassis et al. . After optimizing the DNaseI concentration and digestion time 2 g of an equimolar mixture of the desired parental templates was digested. Fragments