Free access
Research Article
15 August 2010

Method for Designing and Optimizing Random-Search Libraries for Strain Improvement


Random searches have been the hallmark of directed evolution and have been extensively employed in the improvement of complex or poorly understood phenotypes such as tolerance to toxic compounds in the context of cellular engineering. While genome-wide mutagenesis followed by selection or screening has been a traditional means of phenotype improvement, the list of experimental methods for cellular engineering based on random searches is rapidly expanding. Adding to the confusion is the element of chance, which lengthens the process and most notably adds to the cost of phenotypic improvement programs. Here we present a method to systematize the effort of finding superior mutants by successively improving random libraries. The method, based on the quantification of phenotypic diversity, is then used to isolate more-robust strains.
Random searches for phenotypic improvement, similar to the iterations of directed evolution, comprise two steps: introducing genetic diversity and screening for variants with interesting traits. Because most protocols for introducing genetic diversity hinge on creating combinatorial arrangements of many nucleotides, the number of variants that can be thus constructed is virtually infinite. This implies that in most cases only a minute fraction of the search space can be covered experimentally (22), which becomes a particularly relevant problem when screening for phenotypes of interest fails to deliver improved variants. In this case, the result of one experiment rarely suggests how ensuing experiments should be conducted, as it is impossible to ascribe the failure to find improved mutants to any particular step of the random search protocol. As such, no useful information can be gathered from unsuccessful experiments.
On the other hand, if a metric of the quality of the library were available, its potential to deliver an improved mutant could be assessed, regardless of the outcome of the time-consuming screening step. A good library would be one that has a high probability of delivering mutants with an improved phenotype. The problem with this definition is that it is not a priori specified what traits are of interest because the same libraries can be screened for improvement of different and even distant phenotypes (2, 11, 19, 20). Therefore, to have a higher a priori probability of harboring a mutant with an improved trait, a library must be phenotypically diverse (12). This is in contrast to cases where the property of interest is known from the start (as with most protein engineering searches), for which a better library can be pragmatically regarded as one that delivers a better trait.
We recently reported a metric for the quantification of phenotypic diversity (12); in the present study, we illustrate how this concept can be used to obtain information about the quality of random libraries and to guide their construction. First, we adapted the reported metric for use with Escherichia coli by measuring the diversity in intracellular pH (pHi). Second, we demonstrated the library optimization method consisting of (i) building a mutagenesis library with a particular design (i.e., target DNA to be mutated, mutagenesis rate, etc.), (ii) evaluating it by using the phenotypic diversity metric, and (iii) combining the information gained from these steps to design a new and improved library. Finally, we use the optimized libraries to isolate more-robust strains. We show, for the first time, that the construction of random-search libraries can be directed using a measurable property of populations, even when extensive screening fails to deliver variants with improved traits.
To test our method, we have chosen a random strain improvement approach, gTME, that is based on global alteration of the transcriptome and has delivered several improved mutants (1, 2, 12). We used the alpha subunit (rpoA gene) of the RNA polymerase (RNAP) as our target for cellular engineering and built three libraries with different mutation frequencies (low, medium, and high, designated, respectively, rpoA*L, rpoA*M, and rpoA*H; the mutation frequencies of all of the libraries are reported in Materials and Methods). While these libraries yielded strains with improved butanol tolerance, hyaluronic acid accumulation, and tyrosine production (11), they failed to also deliver a butyrate-tolerant mutant of interest for butanol production. This provided an opportunity to test the library optimization strategy presented here.


Strains and library construction.

The E. coli K-12 recA mutant was used throughout this study, except for transformation of the ligation reaction products, for which strain DH10B (Invitrogen) was used instead. The native rpoA gene was amplified from genomic DNA using Phusion DNA polymerase (Finnzymes) with primers A and B and cloned into the ApaLI and XmaI sites of the multicloning site of pHACm (2) using NEB restriction enzymes as described in reference 11. The correct insert was verified by sequencing, and strains transformed with this plasmid are referred to as wild-type strains throughout this report. For rpoA*L, rpoA*M, and rpoA*H, error-prone PCR (epPCR) was carried out with the same primers using the GeneMorph II kit (Stratagene), resulting in approximately four, seven, and nine mutations per kilobase, respectively. For αCTD*H and αCTD*L, a BsiWI restriction site was introduced by the point mutation T707C (slightly upstream of the C-terminal domain [CTD]) using a QuikChange Multi Site-Directed Mutagenesis kit (Stratagene). The CTD sequence was amplified by epPCR with primers B and C (resulting in about five or six and about one or two mutations per sequence for αCTD*H and αCTD*L, respectively) and cloned between the newly introduced BsiWI site and the ApaLI site present at the 3′ end. For the αCTD*t library, two oligonucleotides (D and E) spiked at the target positions with 6% non-wild-type bases were constructed and an artificial BglII site was introduced at the 5′ end of each primer to allow recircularization of the plasmid (the BglII site was introduced by a T835A mutation between amino acids E273 and E286). The residues targeted for mutagenesis in αCTD*t were D259, L262, R265, N268, C269, K271, E273, E286, L290, G296, K298, and S299. The entire plasmid was amplified with Phusion DNA polymerase using spiked oligonucleotides D and E and cut with BglII and DpnI to rid the mixture of the unmutated plasmid. Neither the newly introduced BsiWI nor the BglII site changed the amino acid sequence of rpoA.
The exact same protocol was used to amplify the gene expressed from the P spc promoter (21), except that a pCL1920 vector was used (14); both of the vectors used in this study have a pSC101 origin of replication. The rpoA gene was first cloned using primers F and G, which include the P spc promoter and T1 terminator, respectively, for efficient use in the pCL1920 vector. The primers are the following (restriction sites are underlined, and an asterisk indicates that the preceding base is spiked): A, 5′-GCGCG CCCGGG ACGTTGTAAGCATTCGTGAGAAAGCG-3′; B, 5′-GCGCG GTGCAC TGGCGCATGACCTTATCCTTCTCAGTA-3′; C, 5′-ACGTGA CGTACG TCAGCCTGAAGTGAAAGAAGAGAAACC-3′; D, 5′-TATCGG AGATCT GGTACAGCGTACCG*A*G*GTTGAGCTCC*T*T*AAAACGCCTAACCTTG*G*T*AAAA*A*A*T*C*T*CTTACTGAGATTAAAGACGTGCTGGCTTCCCGT-3′; E, 5′-TGTACC AGATCT CCGATATAGTGGATACGT*T*C*TGCT*T*T*AAGG*C*A*G*T*T*AGCAGAG*C*G*GACAGTC*A*A*TTCCAGA*T*C*GTCAACAGGGCGCAGCAGGATCGGAT-3′; F, 5′-GCGAGCGA TCTAGA CTCAGAAATGAGCCGTTTATTTTTTCTACCCATATCCTTGAAGCGGTGTTATAATGCCGCGCCCTCGATATGGGGATTTTTGTGTATGCTGGCAAGATGGAAGGTACGTTTAAG-3′; G, 5′-CGGCGCG CCCGGG TTTATAAAACGAAAGGCCCAGTCTTTCGACTGAGCCTTTCGTTTTATGTGCACTGGCGCATGACCTTATCCTTCTC-3′.
All ligations were done using Fast-link ligase (Epicentre) and transformed into DH10B cells (Invitrogen), which were plated on LB agar and pooled together after overnight growth. The plasmids were recovered by miniprep (Qiagen) and used to retransform the K-12 recA mutant host strain. Each library was approximately 105 in size. K-12 recA mutant cells were grown in morpholinepropanesulfonic acid (MOPS; Teknova) or M9 (U.S. Biologicals) minimal medium with 0.5% glucose (unless noted otherwise), and the plasmid-borne rpoA gene was induced with 1 mM isopropyl-β-d-thiogalactopyranoside (IPTG) when measuring pHi or during selection in butyrate. Chloramphenicol (34 μg/ml) and streptomycin (50 μg/ml) were added as needed.

Diversity quantification using pHi.

Divergence is calculated by measuring the pairwise phenotypic distance between members of a library population, averaging it, and normalizing it to that of the control population (a wild-type clonal population). The divergence of each library can be calculated from the distance in several phenotypes; each constitutes an entry in the phenotypic distance vector used to calculate divergence (see appendix). This ensures that the result is not biased by a particular data set. In this study, we used the pHis of growing and nongrowing cells as phenotypes contained in the divergence metric. For determination of pHi during growth, cells were stained with carboxyfluorescein diacetate succinimidyl ester (Invitrogen) as suggested by the product manual and grown in MOPS medium with 250 mg/liter each d-xylose, d-galactose, l-arabinose, and glycine. Several carbon sources were used to prevent favoring the growth of a subset of mutants. Variability introduced by the choice of carbon sources or other details in the protocol was accounted for by normalization to the control.
Medium was withdrawn at different time points from each library and control cultures, put on ice, and measured by flow cytometry (using a BD FACScan). The pHi was calculated as the ratio of emissions at 585 and 530 nm upon excitation at 488 nm (23). Each time point was considered an entry in the distance vector for quantification of divergence (see appendix). Two more entries of the distance vector were composed of pHi values of nongrowing cells. These were stained with BCECF-AM (Invitrogen) and resuspended in 10 mM phosphate buffer at either pH 5.0 or 7.0 immediately before fluorescence-activated cell sorter analysis (pHi with this probe was calculated as the ratio of emissions at 650 and 530 nm upon excitation at 488 nm, in accordance with the recommendations in the manual). A subsample of 1,500 data points was taken at random from each library and control data sets, and this subset was used to calculate the divergence as described in the appendix; the algorithm was run 50 times, and the divergence was averaged to smooth out the effects of subsampling. The exact divergence values varied somewhat with changes in the protocol, but the trends shown in Fig. 1 were maintained.

Library selection in butyrate and growth assays.

MOPS medium with 15 g/liter butyrate was used for both selection and growth assays (initial pH adjusted to 7.0 with 6 N HCl), except when trying the conditions described in the legend to Fig. 2. For selection, 30 ml of medium was inoculated and cells were grown for about 20 to 24 h and then a sample was transferred to a fresh batch of medium. This procedure was repeated thrice, after which cells were spread on solid medium overnight and individual colonies were picked for further study. Clones 1 and 16 in αCTD*L were chosen for their faster growth in butyrate, and their plasmids were purified and retransformed into a clean K-12 recA background to confirm the phenotype (Fig. 3). For growth assays, cells were cultured overnight in 15g/liter butyrate to avoid adaptation-related distortion of the measurements and then diluted in the same medium to obtain their growth curves. The mutant genes from clones 1 and 16 and wild-type rpoA were transferred to a pCL1920 plasmid (which has the same origin of replication as pHACm but confers streptomycin resistance [14]) and expressed from the P spc promoter (21). Primers F and G were used as explained above.


Development of a pHi-based divergence metric.

The design and construction of random libraries for strain improvement have been, for the most part, a blind effort. A few exceptions have been reported, in which the genetic changes responsible for a phenotypic improvement were tracked and analyzed (4, 15), opening the possibility for targeted efforts based on random approaches. Here, we define a metric of library quality to guide the construction of new ones, independently of how the libraries were created. To this end, we developed a method for quantifying phenotypic diversity based on pHi to replace the previously established method based on growth (12) that proved unreliable in the case of E. coli (this strain forms irregular and semiopaque colonies).
The diversity metric, called divergence, is based on how different, on average, members of a library population are from each other compared to how different members of a clonal wild-type population are from each other with respect to a complex trait (i.e., one that results from the interplay of many intracellular components). In some sense, variability in a complex phenotype is used as a proxy of how “reachable” novel phenotypes are in general. Implicit is the assumption that the mutagenesis protocol alters the physiological network globally (e.g., by targeting a central node of the biomolecular interaction network [16]), and thus, diversity in a measurable complex phenotype is tied to the diversity in other (nonmeasurable) phenotypes. As such, pHi can be used for quantification of diversity because it is affected by the relative levels of numerous proteins and metabolites in the cell, even when it is maintained in a narrow range (13).
To test the pHi-based metric, we quantified the diversity of the rpoA*L and rpoA*H libraries and used the metric to assess the effect of mutagenesis rate on the phenotypic diversity of the resulting populations. As shown in Fig. 1, the divergence increases when the rpoA sequence diversity is increased, in line with previous findings. However, since even the more diverse rpoA*H libraries did not deliver improved mutants (see below), we sought further increases in the divergence through different library designs, as discussed in the following section.

Optimization of alpha subunit libraries.

The library optimization strategy hinges on iteratively combining (i) informed guesses of what designs are likely to yield increased diversity and (ii) quantification of the effects of such modifications using the divergence metric. The second step is crucial, since any optimization strategy relies on quantification of the past and current “states” (the library designs, in this case) in order to make decisions about the direction to be pursued next.
Previous studies on the alpha subunit had produced three improved mutants, all of which had nucleotide changes in the CTD (αCTD) (11). To further optimize the existing rpoA*H library, we hypothesized that diversity could be increased by directing mutagenesis to the CTD region of the protein. A library with a high mutation frequency was thus constructed, as this seemed to correlate with the high phenotypic diversity previously obtained (see the preceding section and previous work [12]). However, quantification of the phenotypic diversity of the new library (designated αCTD*H) contradicted these predictions (Fig. 1). Not only was diversity not increased by focusing mutations to the αCTD, but it was actually decreased.
There are at least two possible explanations for the decrease in diversity in the αCTD*H library compared to rpoA*H: (i) that by focusing mutations to this domain, diversity is reduced because mutations in the N-terminal domain (NTD) that also confer novel phenotypes (e.g., by modulating the assembly of RNAP complexes [9] or by transcriptional regulation at class II promoters [18]) are eliminated and (ii) that the mutation frequency was too high, generating many detrimental mutations that possibly masked other useful mutations. In other words, high mutation frequencies may reduce library diversity because many clones display the same phenotype: that of expressing an alpha subunit with a nonfunctional αCTD.
As a next step, the information obtained from previous iterations was used to continue with the optimization strategy. In light of the aforementioned explanations, a new library was constructed in which mutations were still focused on the αCTD, but a lower mutagenesis rate was applied (designated αCTD*L). Quantifying the diversity of this library favored the second hypothesis (Fig. 1). This library, in fact, has higher diversity than the rpoA library, with a high mutation frequency throughout the coding region (rpoA*H). The mutation frequency in the CTD of rpoA*H is comparable to that of αCTD*L, but the latter library has markedly greater diversity. This suggests that the diversity in rpoA*H likely arises from mutations in the CTD and that the NTD has transcriptional functions that are either not as important as those of the CTD or are sparsely found in sequence space.
The fact that the diversity of αCTD*L is greater than that of αCTD*H suggests a high sensitivity to mutations in this domain. Nonspecific amino acid changes may prevent the αCTD from folding properly so that it cannot attain the conformation necessary for proper interaction with promoters. This hypothesis suggested a fourth iteration in our optimization strategy. The new design consisted of a library in which mutations were restricted to only surface amino acids of this domain (designated αCTD*t), aiming at introducing diversity and at the same time preventing the formation of many nonfunctional, unfolded variants. As shown in Fig. 1, the αCTD*t library has a marked increase in diversity. The choice of amino acids was suggested by structural information (8) and previous studies (17) (see Materials and Methods). Besides the libraries considered here, the effect of simple modifications such as promoter changes and library size on the quality of the resulting populations was also studied (results not shown).

Isolation of a butyrate-tolerant mutant.

The above libraries were screened in different butyrate concentrations following diversity assessment using the pHi-based metric. First, the three rpoA libraries (rpoA*L, rpoA*M, and rpoA*H) were screened under four conditions involving butyrate stress, and none yielded butyrate-tolerant mutants (Fig. 2), consistent with the relatively low divergence of these libraries. Different conditions were used to test the possibility that the lack of success in finding improved mutants was due to the selection conditions instead of the quality of the library. Approximately 30 individual mutants of all of the libraries were tested under each of the four conditions before asserting that a library was unlikely to contain an improved variant. Instead of displaying data for all individual mutants, and to summarize the experiments carried out with each library, Fig. 2 shows the maximum recorded improvement in the growth of the libraries with respect to the control as a measure of the potential advantage of a mutant in the library with respect to the wild type (for example, the growth of the rpoA*L library under condition 1 was recorded as a function of time, alongside with the growth of the wild type; the y axis shows that at no point in time was the growth of rpoA*L higher than 5% of that of the wild-type control).
The same lack of success was observed for the αCTD*H library, which had lower diversity than the rpoA*H library. Upon screening the αCTD*L library under the same conditions, two improved mutants were isolated (Fig. 2) that showed 23% and 40% growth rate improvements in the presence of 15 g/liter butyrate (Fig. 3), respectively. Not coincidentally, the two mutants had the same amino acid sequence and only one amino acid change with respect to the wild type (S299T), consistent with the diversity assessment that small sequence changes in the αCTD result in large phenotypic changes. Amino acid S299 is a surface amino acid and is directly involved in interacting with UP promoter elements (7); hence, a mutation in this position should alter the affinity of the RNAP for several targets, resulting in the novel phenotype. The mutant with the lower improvement level (23%) differs from the mutant with the higher improvement level (40%) in a synonymous substitution that changes a codon that is frequently used in E. coli (GGT for glycine) to a rare one (GGA). This finding suggested the expression of the mutant and wild-type genes under the control of a stronger promoter (P spc ), which yielded further growth rate improvement, up to 60%. Our success with increasing the tolerance when increasing the promoter strength invited the use of a yet stronger promoter. No further increase was observed, however, when expression was placed under the control of the PN25 promoter (3; results not shown). Actually, a slight decrease in the overall growth rate was seen, suggesting that the cell may be experiencing a burden when expressing the protein from the stronger promoter.

Posterior probability analysis.

The isolated mutants were theoretically present in all of the libraries that had been constructed prior to the αCTD*L library, given the parameters used in their construction (i.e., targeted regions, mutagenesis rate, identity of mutations, etc.). In other words, all of the library designs could have delivered the improved variants. This fact raises the question of whether the divergence metric actually reflects a probabilistic difference of finding the mutants in the different populations. Ideally, the metric would identify the population most likely to deliver butyrate-tolerant mutants. It should be noted that this generalized correlation is strictly true only when analyzing the results across many selections for a variety of traits, since the divergence metric is a measure of library quality and not of the probability that a particular phenotype will be found in a population.
The posterior probability of finding the S299T mutant in the different libraries was evaluated using information about the length of the fragment and the average mutation frequency of each library and assuming that the mutations follow a Poisson distribution (6). Table 1 shows that the S299T mutant could be found in the αCTD*L library more than an order of magnitude more frequently than in any other library tested. (It is important to note that this is the frequency of amplified PCR products at the DNA level and not the frequency in the cell library. The distinction is vital, since variants will be amplified differently, depending on their effect on growth rate in the steps prior to purifying selection; only at that stage would the improved variants exhibit an advantage.)
Table 1 shows that the population with the highest phenotypic diversity exhibited the highest probability for finding the improved mutant, implying that the improved mutant in the αCTD*L library was not discovered accidentally. Further support for the relevance of the presented metric is the fact that all of the mutants isolated to date have one or two mutations in the αCTD (present and previous work [11]).


We have described a method for systematizing evolutionary methods for strain improvement. To this end, a random approach to finding an improved mutant can be regarded as an iteration of two steps: building a library and screening it. Because screening is the resource- and labor-intensive step (5, 10), it makes sense to undertake it only if the expected outcome from screening is better than that of constructing a new library, that is, if the a priori probability of finding an improved mutant (quantified by divergence) is greater than that of the previous iteration. This process can continue until an improved mutant is found, constructing new libraries becomes expensive (e.g., for fully synthetic libraries), or it is no longer clear how an improved library may be obtained (e.g., by changing the mutation frequency, the localization of mutations, etc.).
Conceptually, this method relies on successively evaluating the search space prior to screening for a particular phenotype. It can be used not only to accelerate and economize strain improvement programs by eliminating screening steps with a low probability of success but also to guide the construction of new libraries. That is, one can probe the characteristics of the search space and potentially use this previously unobtainable information to design better populations. In addition, comparing the diversity of libraries can yield mechanistic insights into their differences. Though we believe our findings can be applied to correlate genetic and phenotypic information, this concept should be explored in much greater depth.
Ultimately, a key goal would be to gather enough information about a particular mutagenesis target and sequentially reduce the search space to the point where it can be widely covered experimentally. A nontrivial tradeoff of reducing the search space is that potentially useful mutations are forgone by restricting the nucleotide regions that are allowed to be changed. An ideal route toward the optimization of a library is then delimiting the search space by ignoring genetic determinants that, when altered, result in phenotypically redundant variants but keeping those that result in new phenotypes. This is analogous to the protein engineering effort of delimiting the search space by eliminating positions that are structurally intolerant (24), but in our case, the objective function is the a priori probability of finding new phenotypes. Both efforts could be used complementarily, since they can be expected to deliver synergistic improvements in the quality of a library.
FIG. 1.
FIG. 1. Divergence is a statistical measurement that describes the additional phenotypic distance of the libraries compared to that of the wild type and was calculated as described in the appendix. It uses pHi as the phenotype both in growing and nongrowing cells. Note that the divergence value is a relative measurement; it is used only for comparing different populations. rpoA*L and rpoA*H are epPCR libraries of the entire coding region of the alpha subunit with low and high mutation frequencies, aCTD*L and aCTD*H are epPCR libraries of the CTD of alpha with low and high mutation frequencies, and aCTD*t is a library in which amino acid changes are restricted to a few surface residues located in the CTD.
FIG. 2.
FIG. 2. Graph showing the maximum recorded advantage in optical density at 600 nm of cultures of the libraries relative to the control under different screening conditions, that is, the theoretical enrichment of improved clones. The conditions are as follows: 1, M9 medium, 15 g/liter butyrate throughout screening; 2, MOPS medium supplemented with amino acids (5%), decreasing butyrate concentration (18, 15, 12 g/liter); 3, MOPS medium, 15 g/liter butyrate throughout screening; 4, MOPS medium supplemented with amino acids, 15 g/liter butyrate throughout screening. For αCTD*L, two repeats of the last set of conditions are given by runs αCTD*L 5 and 6. For rpoA*L, rpoA*M, and rpoA*H, some conditions were tried more than once (not shown) to rule out experimental error as the reason for not obtaining improved mutants. Even though a positive theoretical enrichment is shown, no improved mutant was isolated in any library except αCTD*L, suggesting that transient advantages of up to ∼15% can be considered noise.
FIG. 3.
FIG. 3. Growth rate of the K-12 recA mutant transformed with wild-type (Wt) rpoA or mutant versions of rpoA, isolated from the αCTD*L library, under the control of two promoters (lac and spc). Mutants 16 and 1 have the same amino acid sequence, but an additional synonymous mutation in mutant 16 changes a common codon for glycine to a more uncommon one. As shown, increasing the expression level of the mutant (using P spc ) increases the growth advantage over the wild type by up to 60%.
TABLE 1. Posterior probability analysis
LibraryNo. of bases subject to mutagenesisProbability of:  Frequency of mutant (1 in:)
  1 mutation occurringThe mutation being in the right baseThe change being the one required 


We acknowledge Franz Hartner for his help with some of the experiments and Curt Fischer for insightful discussions.
Funding was provided by the Department of Energy (grant DE-FC36-07G017058), the National Science Foundation (grant CBET-0730238), and the MIT Energy Initiative.


Calculation of divergence.

Divergence was calculated by using the ratio of emissions at the two different wavelengths as explained in Materials and Methods (equations are from reference 12). The phenotype for quantification is defined as follows: P = E λ 1/E λ 2. This equation is used to calculate the phenotypic distance under each condition (time points for growing cells or extracellular pH values for nongrowing cells) as follows:
\[ \[P{=}\frac{E_{{\lambda}1}}{E_{{\lambda}2}}\] \]
\[ \[d{=}{\langle}d_{i,j}{\rangle}{\forall}_{i,j}\] \]
\[ \[d_{i,j}{=}{\vert}P_{i}{-}P_{j}{\vert}\] \]
Each phenotypic distance calculated this way has a variance associated with it, which is used to determine the certainty that the phenotypic distance value is statistically significant. The averages (i.e., d) and their variances are used to determine the divergence as the Bhattacharyya distance between a library and control populations as follows:
\[ \[BD{=}\frac{1}{8}({\mu}_{l}{-}{\mu}_{c})^{T}\ \left(\frac{{\Sigma}_{l}{+}{\Sigma}_{c}}{2}\right)^{{-}1}({\mu}_{l}{-}{\mu}_{c}){+}\frac{1}{2}\mathrm{ln}\left(\frac{{\vert}\frac{{\Sigma}_{l}{+}{\Sigma}_{c}}{2}{\vert}}{\sqrt{{\vert}{\Sigma}_{l}{\vert}{\vert}{\Sigma}_{c}{\vert}}}\right)\] \]
where Σ is the covariance matrix, μ is the vector of averages, and the subscripts l and c are for the library and control populations, respectively. The Bhattacharyya distance was selected because it mathematically summarizes information about both the distance of the mean value of distributions and the spread of each. Other metrics for quantifying statistical distance could be used instead with similar results.


Alper, H., J. Moxley, E. Nevoigt, G. R. Fink, and G. Stephanopoulos. 2006. Engineering yeast transcription machinery for improved ethanol tolerance and production. Science 314 : 1565-1568.
Alper, H., and G. Stephanopoulos. 2007. Global transcription machinery engineering: a new approach for improving cellular phenotype. Metab. Eng. 9 : 258-267.
Brunner, M., and H. Bujard. 1987. Promoter recognition and promoter strength in the Escherichia coli system. EMBO J. 6 : 3139-3144.
Conrad, T. M., A. R. Joyce, M. K. Applebee, C. L. Barrett, B. Xie, Y. Gao, and B. Ø. Palsson. 2009. Whole-genome resequencing of Escherichia coli K-12 MG1655 undergoing short-term laboratory evolution in lactate minimal media reveals flexible selection of adaptive mutations. Genome Biol. 10 : R118.
Demain, A. L., J. Davies, R. M. Atlas, G. Cohen, C. Hershberger, W. Hu, D. Sherman, R. Willson, and J. D. Wu (ed.). 1999. Manual of industrial microbiology and biotechnology, 2nd ed. American Society for Microbiology, Washington, DC.
Firth, A. E., and W. M. Patrick. 2005. Statistics of protein library construction. Bioinformatics 21 : 3314-3315.
Gaal, T., W. Ross, E. E. Blatter, H. Tang, X. Jia, V. V. Krishnan, N. Assa-Munt, R. H. Ebright, and R. L. Gourse. 1996. DNA-binding determinants of the alpha subunit of RNA polymerase: novel DNA-binding domain architecture. Genes Dev. 10 : 16-26.
Jeon, Y. H., T. Negishi, M. Shirakawa, T. Yamazaki, N. Fujita, A. Ishihama, and Y. Kyogoku. 1995. Solution structure of the activator contact domain of the RNA polymerase alpha subunit. Science 270 : 1495-1497.
Kimura, M., and A. Ishihama. 1995. Functional map of the alpha subunit of Escherichia coli RNA polymerase: insertion analysis of the amino-terminal assembly domain. J. Mol. Biol. 248 : 756-767.
Kittell, J., B. Borup, R. Voladari, and K. Zahn. 2005. Parallel capillary electrophoresis for the quantitative screening of fermentation broths containing natural products. Metab. Eng. 7 : 53-58.
Klein-Marcuschamer, D., C. N. S. Santos, H. Yu, and G. Stephanopoulos. 2009. Mutagenesis of the bacterial RNA polymerase alpha subunit for improvement of complex phenotypes. Appl. Environ. Microbiol. 75 : 2705-2711.
Klein-Marcuschamer, D., and G. Stephanopoulos. 2008. Assessing the potential of mutational strategies to elicit new phenotypes in industrial strains. Proc. Natl. Acad. Sci. U. S. A. 105 : 2319-2324.
Kresnowati, M. T. A. P., C. Suarez-Mendez, M. K. Groothuizen, W. A. van Winden, and J. J. Heijnen. 2007. Measurement of fast dynamic intracellular pH in Saccharomyces cerevisiae using benzoic acid pulse. Biotechnol. Bioeng. 97 : 86-98.
Lerner, C. G., and M. Inouye. 1990. Low copy number plasmids for regulated low-level expression of cloned genes in Escherichia coli with blue/white insert screening capability. Nucleic Acids Res. 18 : 4631.
Lynch, M. D., T. Warnecke, and R. T. Gill. 2007. SCALEs: multiscale analysis of library enrichment. Nat. Methods 4 : 87-93.
Martínez-Antonio, A., S. C. Janga, and D. Thieffry. 2008. Functional organisation of Escherichia coli transcriptional regulatory network. J. Mol. Biol. 381 : 238-247.
Murakami, K., N. Fujita, and A. Ishihama. 1996. Transcription factor recognition surface on the RNA polymerase alpha subunit is involved in contact with the DNA enhancer element. EMBO J. 15 : 4358-4367.
Niu, W., Y. Kim, G. Tau, T. Heyduk, and R. H. Ebright. 1996. Transcription activation at class II CAP-dependent promoters: two interactions between CAP and RNA polymerase. Cell 87 : 1123-1134.
Park, K., D. Lee, H. Lee, Y. Lee, Y. Jang, Y. H. Kim, H. Yang, S. Lee, W. Seol, J. Kim, and S. Lee. 2003. Phenotypic alteration of eukaryotic cells using randomized libraries of artificial transcription factors. Nat. Biotechnol. 21 : 1208-1214.
Patnaik, R., S. Louie, V. Gavrilovic, K. Perry, W. P. C. Stemmer, C. M. Ryan, and S. del Cardayré. 2002. Genome shuffling of Lactobacillus for improved acid tolerance. Nat. Biotechnol. 20 : 707-712.
Post, L. E., A. E. Arfsten, F. Reusser, and M. Nomura. 1978. DNA sequences of promoter regions for the str and spc ribosomal protein operons in E. coli. Cell 15 : 215-229.
Sauer, U. 2001. Evolutionary engineering of industrially important microbial phenotypes. Adv. Biochem. Eng. Biotechnol. 73 : 129-169.
Spilimbergo, S., A. Bertucco, G. Basso, and G. Bertoloni. 2005. Determination of extracellular and intracellular pH of Bacillus subtilis suspension under CO2 treatment. Biotechnol. Bioeng. 92 : 447-451.
Voigt, C. A., S. L. Mayo, F. H. Arnold, and Z. G. Wang. 2001. Computational method to reduce the search space for directed protein evolution. Proc. Natl. Acad. Sci. U. S. A. 98 : 3778-3783.

Information & Contributors


Published In

cover image Applied and Environmental Microbiology
Applied and Environmental Microbiology
Volume 76Number 1615 August 2010
Pages: 5541 - 5546
PubMed: 20581192


Received: 5 April 2010
Accepted: 15 June 2010
Published online: 15 August 2010


Request permissions for this article.



Daniel Klein-Marcuschamer
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts
Present address: Lawrence Berkeley National Laboratory, Joint BioEnergy Institute, 1 Cyclotron Rd. MS 978-4121, Berkeley, CA 94720.
Gregory Stephanopoulos [email protected]
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts

Metrics & Citations



  • For recently published articles, the TOTAL download count will appear as zero until a new month starts.
  • There is a 3- to 4-day delay in article usage, so article usage will not appear immediately after publication.
  • Citation counts come from the Crossref Cited by service.


If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download.

View Options

Figures and Media






Share the article link

Share with email

Email a colleague

Share on social media

American Society for Microbiology ("ASM") is committed to maintaining your confidence and trust with respect to the information we collect from you on websites owned and operated by ASM ("ASM Web Sites") and other sources. This Privacy Policy sets forth the information we collect about you, how we use this information and the choices you have about how we use such information.
FIND OUT MORE about the privacy policy