EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review provides a detailed description of the data content of EcoCyc and of the procedures by which this content is generated.
EcoCyc (pronounced “eeko-sike,” as in “ecology” and “encyclopedia”) is a bioinformatics database that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The project’s long-term goal is describing the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for all researchers who work with E. coli and related microorganisms. In addition to the database, a steady-state metabolic flux model is available, generated from each new version of EcoCyc.
This review provides an overview of EcoCyc’s data content and the procedures by which these data enter EcoCyc.
EcoCyc accelerates science. EcoCyc is designed for several different modes of interactive use via both the EcoCyc.org website and in conjunction with the downloadable Pathway Tools (1) software (the resources available to assist users in learning the website and software are listed in “How to Learn More,” below):
EcoCyc is an encyclopedic reference providing information about the biological roles of E. coli genes, metabolites, and pathways. Visualization tools, such as a genome browser, metabolic map display, and regulatory network diagram, aid in the comprehension of these complex data.
EcoCyc facilitates the analysis of high-throughput data such as gene-expression and metabolomics data via tools for enrichment analysis, and for visualizing omics data on a metabolic map diagram, complete genome diagram, or regulatory network diagram.
The EcoCyc metabolic flux model can predict growth or no-growth of wild-type and knockout E. coli strains under different nutrient conditions.
Users of EcoCyc fall into several different groups. Experimental biologists use EcoCyc as an encyclopedic reference on genes, pathways, and regulation, and they use its omics-data analysis tools to analyze gene-expression and metabolomics data. Examples of papers citing EcoCyc in the analysis of functional genomics data include references 2, 3, 4, 5, and 6.
Because the EcoCyc data are structured within a sophisticated ontology that is amenable to computational analyses, EcoCyc enables scientists to ask computational questions spanning the entire genome of E. coli, the known metabolic network of E. coli, the known transport complement of E. coli, the known genetic regulatory network of E. coli, and combinations thereof. Past work includes the use of EcoCyc to develop methods for studying path lengths within metabolic networks (7, 8, 9), in studies relating protein structure to the metabolic network (10, 11), and in analysis of the E. coli regulatory network (12, 13).
The development of many new bioinformatics methods requires high-quality, gold-standard data sets for the training and validation of those methods. EcoCyc has been used as a gold-standard data set for the development of genome-context methods for predicting gene function (14, 15), operon-prediction methods (16, 17), the prediction of promoters and transcription start sites (18, 19), regulatory network reconstruction (20), and the prediction of functional and direct protein-protein interactions (21, 22, 23). The EcoCyc metabolic data have been used for studies concerning predicted metabolic networks and growth prediction (24, 25), and for model checking of a symbiotic bacterium’s metabolic network (26).
Metabolic engineers alter microbes to produce biofuels, industrial chemicals, and pharmaceuticals; to degrade toxic pollutants; and to sequester carbon (27, 28, 29). Metabolic engineers who use E. coli as their host organism consult EcoCyc to aid in optimizing the production of an end product through a better understanding of the metabolic network and its regulation and to predict undesirable side effects of a metabolic alteration. Metabolic engineering studies using EcoCyc include references 30, 31, and 32.
According to the Thomson Reuters Web of Knowledge citation index, as of August 2013, the 23 EcoCyc and RegulonDB papers authored since 1997 had been cited by 2,395 publications from 1997 to 2013. According to Google Analytics, approximately 100,000 visitors query the EcoCyc website each year, generating 177,000 object page views per month on average in 2012.
EcoCyc data are available for download in multiple file formats (see http://biocyc.org/download.shtml) and can be queried programmatically via web services (see http://biocyc.org/web-services.shtml).
The Pathway Tools software that underlies EcoCyc (1) is not specific to E. coli, but rather has been applied to manage genomic and biochemical data for thousands of organisms.


EcoCyc covers a broad array of data types. Key to understanding the EcoCyc data and their presentation within the EcoCyc website and Pathway Tools is the notion of a database class, which describes a specific type of data. For example, the class Genes provides the database definition of a gene, including the attributes (e.g., starting nucleotide position within the genome) and relationships (e.g., the linkage between a gene and gene product) of the class. Each specific gene within EcoCyc is stored in a single database object, or frame, that is an instance of the class Genes.
No one-to-one mapping exists between EcoCyc classes and the data pages within the EcoCyc website, because one data page typically integrates information from multiple classes. For example, the pathway data page integrates information from objects in the classes Pathways, Reactions, Genes, Proteins, and Chemicals.


EcoCyc contains the complete genome sequence of E. coli and describes the nucleotide position and function of all known protein-coding and RNA-coding E. coli genes. Genome-related classes that are populated within EcoCyc include Genes, Pseudo-Genes, Promoters, DNA-Binding-Sites, and REP-Elements. Gene Ontology (GO) terms are assigned to genes both by EcoCyc curators and by import of GO terms from UniProt (33). EcoCyc data on the essentiality of E. coli genes are described in “Essential Gene Information” (see below).


EcoCyc describes all known monomers and multimeric protein complexes of E. coli. EcoCyc contains extensive annotation of the features of E. coli proteins, such as phosphorylation sites, metal ion binding sites, and enzyme active sites, assigned by EcoCyc curators and imported from UniProt. Relevant classes within EcoCyc include Polypeptides and Protein-Complexes.


EcoCyc describes all known RNAs and protein-RNA complexes of E. coli. Relevant classes within EcoCyc include RNAs, rRNAs, and regulatory RNAs. Note that EcoCyc does not explicitly represent messenger RNAs.


EcoCyc contains the most complete description of the regulatory network of any organism. It covers E. coli operons, promoters, transcription factors, transcription factor binding sites, attenuators, and small-RNA regulators, as well as substrate-level regulation of E. coli enzymes. Each molecular regulatory interaction is described as an instance of class regulation, whose subclasses describe different types of regulation.


EcoCyc describes all known metabolic and signal-transduction pathways of E. coli. It describes each metabolic enzyme of E. coli including its cofactors, activators, inhibitors, and subunit structure.

Membrane Transporters

EcoCyc annotates E. coli transport proteins and the associated transport reactions that they mediate.

Growth Observations

EcoCyc integrates data on the growth of E. coli under many different growth conditions, as described in “Conditions of E. coli Growth and Nongrowth.”

Database Links

EcoCyc is linked to other biological databases containing protein and nucleic acid sequence data, bibliographic data, protein structures, and descriptions of different E. coli strains.


Curation is the process of manually refining and updating a bioinformatics database. The EcoCyc project uses a literature-based curation approach in which database updates are based on evidence in the experimental literature. EcoCyc is largely up to date with respect to its curation activities. As of October 2013, EcoCyc encodes information from more than 25,000 publications. A staff of four full-time curators updates the annotation of the E. coli genome on an ongoing basis.
The transcriptional regulatory information in EcoCyc and RegulonDB is curated by the group of Dr. Julio Collado-Vides at the Universidad Nacional Autónoma de México (UNAM); therefore, both databases include the same data content on transcriptional regulation of gene expression. The actual data curation occurs within EcoCyc, and the information is periodically propagated to RegulonDB.
Curators collect gene, protein, pathway, and compound names and synonyms. They classify genes and gene products by using the Gene Ontology (34) and MultiFun (35) ontologies, and they classify pathways within the Pathway Tools pathway ontology. Protein complex components and the stoichiometry of these subunits are captured; cellular localization of polypeptides and protein complexes is entered, as are experimentally determined protein molecular weights; enzyme activities and any enzyme prosthetic groups, cofactors, activators, or inhibitors are captured. Operon structure and gene regulation information are encoded.
Curators author textual summaries with extensive citations. Within the summaries for proteins, RNAs, pathways, and operons, curators capture additional information not otherwise captured in the highly structured database fields of EcoCyc. For example, curators use the free-text summary sections to describe the overall function of a gene product, the phenotypes caused by mutation, depletion, or overproduction of each gene product; any known genetic interactions; protein domain architecture and structural studies; the similarity to other proteins; or any functional complementation experiments that have been described. Summaries can also be used to note cases in which the published reports present contradictory results. In such cases, both viewpoints will be presented with proper attribution. This approach strives to ensure that no information is lost.
EcoCyc entries are generally updated when new literature becomes available. Regular PubMed searches are used to generate lists of potentially curatable publications, which are then evaluated and prioritized for curation. Papers containing newly identified functions of gene products, as well as substantial advances in understanding the functions of known gene products, are given the highest priority for curation. Because the Pathway Tools software continues to evolve and to enable the addition of new data types, older entries are also being updated in a systematic fashion (e.g., each enzyme in a metabolic pathway) as time allows.


Tables 1, 2, 3, and 4 present statistics on EcoCyc content. The listed numbers are current as of version 17.5, released in October 2013.
Table 1
Table 1 Genes and gene products in EcoCyc
   Protein-coding genes4,284
   tRNA genes86
   rRNA genes22
   Regulatory RNA genes41
   Other RNA genes56
Protein complexes995
   Heteromultimeric protein complexes290
   Homomultimeric protein complexes705
Protein features23,114
Enzymes (excluding transporters)1,245
Protein features are annotations of protein sites and regions such as enzyme active sites, metal ion binding sites, and transmembrane domains. A small number of IS elements are included in the count of genes but are not included in the subcategories of genes.
Table 2
Table 2 Gene annotation status in EcoCyc
Gene annotation statusNo.
Genes of known or predicted molecular functiona3,127
   Genes of known molecular function2,710
   Genes of predicted molecular function417
Genes of unknown molecular function1,374
Genes of known molecular function have experimental evidence for their assigned function, whereas genes of predicted molecular function have had their function predicted computationally.
Table 3
Table 3 Reactions, compounds, and pathways in EcoCyc
Metabolic reactions1,443
Transport (including electron transfer) reactions379
   Small-molecule metabolism base pathways291
   Signaling pathways29
   Metabolites that are substrates of enzyme-catalyzed reactions1,331
   Metabolites that are physiological enzyme regulators121
   Metabolites that are cofactors or prosthetic groups56
   Transported metabolites274
Superpathways are connected sets of base metabolic pathways (connected via shared substrates.)
Table 4
Table 4 Regulation-related objects and interactions in EcoCyc
Transcriptional/translational regulationNo.
Transcription units4,510
Transcription factors194
Transcription factor binding sites2,773
Instances of regulation of transcription initiationa3,293
Instances of regulation by transcriptional attenuation20
Instances of regulation of translation146
Each member of “Instances of Regulation of Transcription Initiation” describes a single regulatory interaction between a transcription factor and its binding site.


As of 2011, EcoCyc incorporates media that have been shown experimentally to support or not support growth of both wild-type and knockout strains of E. coli K-12. This work has two goals. First, a comprehensive encyclopedia of E. coli growth conditions will be assembled for experimentalists. The spectrum of environmental conditions supporting the growth of a bacterium is among its most important phenotypic traits. We cannot expect to understand the functions of all genes in an organism unless we understand the full range of the environments in which the cell can grow. Second, a comprehensive collection of E. coli growth media will drive more accurate systems biology modeling of E. coli. The larger the set of growth media against which these computational models are validated, the more accurate and comprehensive that the models will be.
EcoCyc captures approximately 20 media that are commonly used by E. coli laboratories; growth data are provided for some of these media. EcoCyc also records the results of high-throughput experiments using Biolog Phenotype Microarrays (PMs) that measure cell respiration as a sensitive indicator of microbial growth (36). The commercially available PM system for microorganisms provides a comprehensive set of phenotype tests including information on the ability to metabolize 190 carbon (C) compounds, 95 nitrogen (N) compounds, 59 phosphorus (P) compounds, and 35 sulfur (S) compounds. EcoCyc currently documents five sets of PM data from the following sources:
B. Bochner and X. Lei, personal communication, 2012.
Strain: E. coli K-12 BW30270 (rph+ [RNase PH] derivative of MG1655; the strains also show a PyrE deficiency. Found to be fnr+ as well, according to K. A. Datsenko and B. L. Wanner, unpublished results.)
This data set includes aerobic growth observations for the full complement of C, N, P, and S compounds that are included in the PM system plus growth observations for 95 C sources under anaerobic conditions.
“Genome Scale Reconstruction of a Salmonella Metabolic Model,” AbuOun et al., 2009 (37).
Strain: E. coli K-12 MG1655 (American Type Culture Collection 700926)
This data set includes growth observations for the full complement of C, N, P, and S compounds under aerobic conditions. Bacteria were pregrown on LB agar before the inoculation of Biolog plates and incubation at 37°C for 26 hours. The Omnilog instrument (a specialized incubator plus reader) was used for data collection and analysis.
“The Evolution of Metabolic Networks of E. coli,” Baumler et al., 2011 (38).
Strain: E. coli K-12 MG1655
This data set consists of growth observations for 95 C compounds under aerobic and anaerobic conditions. Bacteria were pregrown on Biolog Universal Growth Agar plus sheep blood (BUG-S) before the inoculation of Biolog plates and incubation at 37°C. Growth was monitored by measuring optical density at 600 nm with readings taken at 3, 6, 12, 24, and 48 h (D. Baumler, personal communication).
Mackie et al., 2013 (39).
Strain: E. coli K-12 MG1655 (Coli Genetic Stock Center 7740).
This data set consists of growth observations for the full complement of C, N, P, and S compounds under aerobic conditions. Bacteria were pregrown on either LB or R2A agar before inoculation of Biolog plates and incubation at 37°C for 48 h. The Omnilog instrument was used for data collection and analysis.
“Comparative Multi-Omics Systems Analysis of Escherichia coli strains B and K-12,” Yoon et al., 2012 (40).
Strain: E. coli K-12 MG1655
This data set consists of growth observations for the full complement of C, N, P, and S compounds under aerobic conditions. Bacteria were pregrown on BUG-S agar before the inoculation of Biolog plates and incubation at 37°C for 48 hours. The Omnilog instrument was used for data collection and analysis.
Data on growth conditions can be accessed from the EcoCyc website by invoking the menu command Search → Growth Media and then clicking on the button “All Growth Media for this Organism.” Individual media are shown in the initial table; PM data are shown in the following tables. The coloring of each box indicates the degree of growth observed under that condition. Three levels of growth are recorded: no growth, low growth, and growth (see legend that indicates the colors associated with each level of growth). Click on any growth medium to request a page describing its composition and to see genes that are essential or not essential for growth under that condition.


As of 2011, EcoCyc incorporates several large-scale data sets on gene essentiality in E. coli. Gene essentiality information is useful for:
Predicting antibiotic targets for pathogenic bacteria.
Guiding the design of minimal genomes.
Validating genome-scale metabolic flux models. Model predictions can be compared with the experimental data recorded in EcoCyc to assess model accuracy.
Providing clues regarding the functions of genes of unknown function, when essentiality varies depending on conditions of growth.
EcoCyc incorporates data on essentiality from the following publications:
“Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli, MG1655,” Gerdes et al. (41).
Strain: E. coli K-12 MG1655 (F λ ilvG rfb-50 rph-1)
This study used a genetic footprinting technique with a Tn5-based transposome system and reported unambiguous assessment of approximately 87% of E. coli open reading frames (ORFs) for essentiality. Six hundred twenty-six genes were identified as essential for aerobic growth in rich media, while 3,126 genes were dispensable. Note that the inability to obtain an insertion mutant by using this system may in some cases be a reflection of the nontargeted nature of transposon insertion rather than a reflection of gene essentiality. For this and other technical reasons, 327 genes were classified in this study as ambiguous with regard to essentiality.
“Construction of Escherichia coli K-12 In-Frame, Single-Gene Knockout Mutants: The Keio Collection,” Baba et al. (42) and corrections (43)
Strain: E. coli K-12 BW25113 [rpoS(Am) rph-1 λ rrnB3 ΔlacZ4787 hsdR514 Δ(araBAD)567 Δ(rhaBAD)568 rph-1]
This study created 3,985 in-frame, single-gene deletion mutants by using the lambda RED recombinase system. Three hundred three genes were unable to be disrupted and were predicted to be essential for growth in rich media at 37°C. Note that, in some cases, there were secondary impacts from single-gene deletions, such as compensating suppressor mutations. There were also errors in some of the mutants described in this paper, which were later corrected (43). This study also profiled the growth of the mutants in minimal glucose MOPS (morpholinepropanesulfonic acid) media to identify genes that are conditionally essential under these conditions.
“Experimental and Computational Assessment of Conditionally Essential Genes in Escherichia coli,” Joyce et al. (44)
Strain: E. coli K-12 BW25113 [rpoS(Am) rph-1 λ rrnB3 ΔlacZ4787 hsdR514 Δ(araBAD)567 Δ(rhaBAD)568 rph-1] (the same as in reference 42)
This study used the Keio collection of single-gene knockout mutants and profiled them for growth on glycerol-supplemented minimal medium. One hundred nineteen genes were identified as essential for growth on glycerol. They also combined these observations with those made by Baba et al. (42) regarding the conditional essentiality of the mutants when grown on glucose-supplemented minimal media and were thus able to identify a conserved conditionally essential core of 94 genes that are required for E. coli K-12 to grow under minimal nutritional supplementation but are not essential for growth under rich conditions.
“A Genome-Scale Metabolic Reconstruction for Escherichia coli K-12 MG1655 that Accounts for 1260 ORFs and Thermodynamic Information,” Feist et al. (45)
This publication used the experimental data regarding conditional gene essentiality from Joyce et al. (44) and from Baba et al. (42) and compared these data with the computationally predicted essential genes in their genome-scale metabolic reconstruction of E. coli. This data set is included in EcoCyc to facilitate the benchmarking of computational predictions of essentiality from the EcoCyc model with computations from the model of Feist et al. (45). Multicopy suppression underpins metabolic evolvability.
“Multicopy Suppression Underpins Metabolic Evolvability,” Patrick et al. 2007 (46)
Strain: E. coli BW25113 [rpoS(Am) rph-1 λ rrnB3 ΔlacZ4787 hsdR514 Δ(araBAD)567 Δ(rhaBAD)568 rph-1]
This study used the conditionally essential gene sets identified by Baba et al. (42) and Joyce et al. (44) and tested them for their ability to form colonies on glucose M9 agar. They identified 107 genes that were conditionally essential under these conditions.
When essentiality data are available for a given gene, the EcoCyc gene page includes a table of the conditions under which that gene has been found to be either essential or not essential for growth. Clicking on the condition will navigate to a growth-medium page that lists all essentiality information under that growth condition.


A quantitative steady-state metabolic flux model has been derived from EcoCyc by using flux balance analysis (FBA) (47, 48). By running this model with different parameters, scientists can model the growth of E. coli under different nutrient conditions and for different gene knockouts. Every time the model is executed, it is freshly generated from EcoCyc, meaning that, as the reactions in EcoCyc are updated because of curation, the model automatically reflects those changes.
The EcoCyc FBA model is distinct from the E. coli FBA models derived by the Palsson group (45, 49, 50), but these models have much in common because EcoCyc and the iAF1260 model were partially unified in 2007 (45), and both groups consult the other’s work when updating their models.
The Supplementary Information provided separately details the E. coli biomass metabolite set used to model biomass production metabolite requirements in EcoCyc FBA. This metabolite set is derived from the iJO1366 model WT biomass reaction of Orth et al. (50). The Supplementary Information also contains a description of the nutrient and secretion metabolite sets that supply inputs and outputs to the FBA model, as well as a description of differences between the EcoCyc FBA biomass metabolite set and the iJO1366 WT biomass reaction.
To run the EcoCyc FBA model, download and install a Pathway Tools software configuration that includes EcoCyc, and invoke the MetaFlux modeling component of Pathway Tools (see Chapter 8 of the Pathway Tools User’s Guide).
EcoCyc provides several example files describing invocations of the FBA model under different nutrient conditions. Those files are found within the installed Pathway Tools directory tree at pathway-tools/aic-export/pgdbs/biocyc/ecocyc/VERSION/data/fba/. Output files produced as a result of successful FBA runs on the supplied .fba files are also included. The supplied input files (where CDW is cell dry weight) are:
GlucoseAer.fba : 10 mmol/g CDW/h glucose uptake, minimal media, aerobic conditions
GlucoseAnaer.fba : 10 mmol/g CDW/h glucose uptake, minimal media, anaerobic conditions
GlycerolAer.fba : 10 mmol/g CDW/h glycerol uptake, minimal media, aerobic conditions
GlycerolAnaer.fba : 10 mmol/g CDW/h glycerol uptake, minimal media, anaerobic conditions

External Flux Predictions

MetaFlux metabolic flux predictions from EcoCyc version 17.5 for aerobic growth on glucose and glycerol are given in Tables 5 and 6. Model predictions for anaerobic growth on glucose and glycerol are given in Tables 7 and 8. In all cases, the uptake rate of the carbon source is set to an upper bound reflecting experimental uptake rates in mmol/g CDW/h. O2 uptake rates are set to an upper bound of 0.00 mmol/g CDW/h under anaerobic conditions. All other nutrient sources are left free.
Table 5
Table 5 Comparison of experimental aerobic glucose-limited chemostat growth data with EcoCyc and iJO1366 FBA model predictions (389 reactions active in EcoCyc)
Glucose uptake3.0083.008
Growth rate0.3000.276
O2 uptake7.4134.472
NH4 uptake2.3673.026
Sulfate uptake 0.068
Phosphate uptake 0.288
CO2 production7.385.480
H2O production 13.026
H+ production 2.582
Metabolite uptake and production rates are in millimoles/gram CDW/hour (where CDW is cell dry weight). Growth rate is per hour. Experimental data are from Kayser et al. (51).
Table 6
Table 6 EcoCyc FBA model performance for aerobic glycerol-limited growth (385 reactions active in EcoCyc)
Glycerol uptake1010.00
Growth rate 0.53
O2 uptake118.96
NH4 uptake 5.80
Sulfate uptake 0.13
Phosphate uptake 0.55
CO2 production 5.74
H2O production 30.36
H+ production 4.95
Uptake and production rates are in millimoles/gram CDW/hour. Growth rate is per hour. Experimental OUR/G1-UR is estimated graphically from Ibarra et al. (52).
Table 7
Table 7 EcoCyc FBA model performance for anaerobic glucose-limited growth (383 reactions active in EcoCyc)
Glucose uptake10.010.00
Growth rate0.300.25
O2 uptake0.000.00
NH4 uptake 2.76
Sulfate uptake 0.06
Phosphate uptake 0.26
CO2 production 0.27
H2O production 0.57
H+ production 27.14
Acetate production7.58.03
Formate production11.316.76
Succinate production1.20.00
Ethanol production8.77.73
Uptake and production rates are in millimoles/gram CDW/hour. Growth rate is per hour. Experimental data are from Belaich and Belaich (53) via Varma et al. (54). FHL was set as inactive for the purpose of comparison.
Table 8
Table 8 EcoCyc FBA model performance for anaerobic glycerol-limited growth (374 reactions active)
Glycerol uptake10.00
Growth rate0.08
O2 uptake0.00
NH4 uptake0.88
Sulfate uptake0.02
Phosphate uptake0.84
CO2 production0.17
H2O production3.13
H+ production9.47
Acetate production0.00
Formate production8.72
Succinate production0.00
Ethanol production8.90
Uptake and production rates are in millimoles/gram CDW/hour. Growth rate is per hour. Quantitative experimental rates are not currently available; for a qualitative description of anaerobic glycerol fermentation, see Dharmadi et al. (55).

Improvement of the Metabolic Model

With each EcoCyc release, we plan to include an improved version of the EcoCyc metabolic flux model that reflects recent improvements to our knowledge of the E. coli metabolic network.
Model predictions can differ from experimental measurements owing to a number of reasons including the operation of additional, unmodeled reactions and metabolites; existing reactions operating in a different fashion from the model (e.g., the model contains a “perfect” respiratory electron-transfer chain without the possibility of reactive oxygen-species generation); the presence of regulation or of product inhibition that deactivates reactions or limits their throughput; and differences in optimization objective functions depending on the specified feed source.


The EcoCyc.org and BioCyc.org websites and downloadable files are updated three to four times per year. A faster, more powerful version of EcoCyc that you can install locally on your computer (Macintosh, PC/Windows, PC/Linux) is released semiannually.


EcoCyc includes data imported from the following bioinformatics databases. In most cases, the data are reimported once or twice per year. We note that many literature references within EcoCyc were obtained from PubMed.

UniProt Features

UniProt protein features (the UniProt KB term is sequence annotations) from the complete proteome of E. coli K-12 MG1655 in SwissProt are imported into EcoCyc for every EcoCyc release. We import all protein features with experimental or nonexperimental evidence qualifiers except for the following types: turn, helix, beta strand, and coiled-coil. The chain type is only imported if it does not span the entire length of the protein. Examples of imported feature types include catalytic domains, phosphorylation sites, and metal ion binding sites. We import citations associated with UniProt protein features if they include an associated PubMed ID.
The import of protein features into EcoCyc is done via the UniProt Feature Importer tool within the Pathway Tools software.

Gene Ontology

For several years, EcoCyc and EcoliWiki/PortEco have been collaborating on improving and maintaining the GO annotations for E. coli. GO and its applications are described in more detail in reference 56. Since the summer of 2008, we have been periodically generating a file containing all E. coli K-12 GO term annotations, called gene_association.ecocyc, that may be obtained from the Gene Ontology Consortium.
GO annotation is a standard part of EcoCyc’s manual literature-based curation process. The GO annotations are added to the database objects that represent the functional gene products or multimers, not directly to the gene objects. This approach models the biology more accurately because it indicates exactly which form of the gene product has the specified GO function. In parallel, manual annotation of E. coli genes with GO is ongoing at EcoliWiki. On a regular basis, the GO annotations are merged. The latest UniProt and EcoliWiki annotations are imported into EcoCyc. Because the GO Consortium does not accept electronic annotations as part of the gene association file if the annotations are more than 1 year old, these UniProt annotations are reimported into EcoCyc on a regular basis.
EcoCyc incorporates many electronic and experimental GO term annotations of E. coli K-12 gene products obtained from the “UniProt [multispecies] GO Annotations @ EBI” file downloaded from the Gene Ontology Consortium. When this import was first performed in 2007, approximately 30,000 new IEA (“Inferred from Electronic Annotation”) GO term assignments were added to EcoCyc, along with approximately 1,000 assignments with experimental evidence codes including assignments from high-throughput protein-interaction studies. During the import of GO terms from UniProt into EcoCyc, a filtering operation is applied to prune GO term annotations based solely on computational (IEA) evidence if the EcoCyc gene product already has more specific GO annotations (in other words, GO terms that are children of the GO term being imported) that have experimental evidence available. For example, if a gene product already contained an experimental annotation of the term “galactose kinase,” the software would not add the computational annotation “carbohydrate kinase.” This filtering leads to the removal of approximately 1,000 of these less specific and redundant annotations.
A gene association file is generated from the quarterly EcoCyc releases. This file is sent to the EcoliWiki team at Texas A&M for further processing. At EcoliWiki, annotations made in the wiki-based community annotation system since the last EcoCyc update are added to the file, along with annotations containing qualifiers (mainly contributes_to) not yet supported by EcoCyc. Only those annotations that are complete by GO Consortium standards are extracted from EcoliWiki; incomplete annotations are left with the hope that community members will eventually complete them. EcoliWiki runs the GO Consortium validation scripts and deposits the file with the GO Consortium via their Concurrent Versioning System.


The GenBank record U00096, produced by the Blattner laboratory in October 1997, was the source of the original E. coli MG1655 genome sequence and annotation incorporated by EcoCyc. A corrected nucleotide sequence was deposited in GenBank as U00096.2 in 2004, and the revised sequence was incorporated into EcoCyc as of version 8.6 (November 2004). The revised genome annotation published in reference 57 was incorporated into EcoCyc in version 10.0 (March 2006).

RefSeq Collaboration

EcoCyc is involved in a collaboration to update the genome annotation of the GenBank (U00096.2) and RefSeq (NC 000913.2) entries for E. coli K-12 MG1655 on an ongoing basis. The primary collaborators include EcoCyc, EcoGene, UniProtKB/Swiss-Prot, and NCBI. The collaborators routinely share their data and resolve data conflicts. The updates of gene names, gene positions, and gene product names are shared among all partners.


The EcoCyc and MetaCyc databases exchange data as part of the release processes for both databases. The updates that have occurred to enzymes, genes, pathways, reactions, and metabolites are exchanged between the databases based on automated comparisons of update dates to ensure that the latest information and corrections are propagated between the databases.


Gene Accession Numbers

Three systems of accession numbers are typically available for genes within EcoCyc. Any of these accession numbers may be used when querying EcoCyc genes “by name,” and in the website Quick Search.
EcoCyc ID: The EcoCyc project assigns unique identifiers to each gene that for historical reasons are of variable syntax, and are of the form “Gnnnn,” “EGnnnnn,” or “G0-nnnnn.” EcoCyc IDs are stored as the frame id of the EcoCyc gene object.
B-numbers: Originally assigned by the Blattner laboratory as part of the E. coli genome project, the b-number identifiers are of the form “bnnnn.” B-numbers were originally assigned sequentially along the genome. When a gene object is removed from the genome because of a decision that insufficient evidence for the existence of that gene is available, then that b-number is retired and is not reused. When new genes are added to the genome, they are assigned the next highest available b-number. Thus, b-numbers are no longer purely sequential along the genome. B-numbers are stored in the EcoCyc slot Accession-1.
ECK numbers: ECK numbers were assigned to the E. coli K-12 MG1655 and W3110 genomes in 2005 in an attempt to provide shared accession numbers for genes common to the two genomes (57). ECK numbers are stored in the EcoCyc slot Accession-2. For only the first 18 or so genes in the E. coli K-12 MG1655 genome are the b-number and ECK number the same number; for subsequent genes the numbers have diverged.


EcoCyc is part of the larger BioCyc collection of Pathway/Genome Databases (PGDBs). BioCyc version 17.5 (2013) includes 160 E. coli and Shigella PGDBs. Most of these PGDBs were generated computationally and lack the extensive manual literature-based curation of the EcoCyc K-12 database. The E. coli genomes in BioCyc are focused on complete genomes and do not include draft genomes.
Two of these PGDBs have undergone additional curation: the BioCyc PGDBs for strains E. coli W3110 and for E. coli B str. REL606. Both strains underwent a computational annotation-normalization procedure in which gene names, product names, heteromultimeric protein complexes, and Gene Ontology terms were propagated from EcoCyc to their orthologous genes in these other two strains (the orthologs were computed by SRI as bidirectional best-BLAST hits with additional manual review and curation). This procedure was performed under the assumption that genome-annotation pipelines typically introduce syntactically large but semantically insignificant variation in the naming of genes and gene products. In addition, E. coli B str. REL606 underwent literature-based curation by SRI to incorporate experimental information regarding the genes and pathways present in this strain but not in the EcoCyc strain MG1655. This curation is supported by the PortEco project.
To select a given genome for querying in the BioCyc website, click on the words “change organism database” under the Quick Search and Gene Search buttons in the upper right corner of most EcoCyc web pages.


Feedback from the scientific community has proved invaluable to improving EcoCyc during its many years of development. We strongly encourage your comments and suggestions for improvements in all areas, including:
The database content of EcoCyc
The presentation of information within the EcoCyc website
The analysis tools provided in conjunction with EcoCyc
The performance of the EcoCyc website
If you see an error or omission within EcoCyc, please report it by using the “Report Errors or Provide Feedback” link at the bottom of every data page. Please email suggestions or questions to biocyc support at [email protected].
During every EcoCyc release, we email a summary of new developments to our biocyc users mailing list. To subscribe to this mailing list, please see http://biocyc.org/subscribe.shtml.



Please cite EcoCyc in publications that benefit from the use of the EcoCyc database or website. Please cite EcoCyc as the most recent Nucleic Acids Research Database issue article, currently: Keseler et al. 2013, Nucleic Acid Res 41:D605–D612.


Monica Riley led the curation of EcoCyc for many years, from its inception. Her efforts created the content for the first organism-scale metabolic database. John Ingraham was a valued advisor to EcoCyc for many years. We thank the scientists who have contributed corrections and suggestions to EcoCyc over the years, and we thank the scientists who have served on the EcoCyc Steering Committee. Many contributors to EcoCyc are listed on the EcoCyc credits page.
The development of EcoCyc is funded by NIH grants GM77678 and GM71962 from the NIH National Institute of General Medical Sciences.
Conflicts of interest: We disclose no conflicts.


Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM, Lee TJ, Kaipa P, Gilham F, Spaulding A, Popescu L, Altman T, Paulsen I, Keseler IM, Caspi R. 2010. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief Bioinform 11:40–79.
Kim KS, Lee S, Ryu CM. 2013. Interspecific bacterial sensing through airborne signals modulates locomotion and drug resistance. Nat Commun 4:1809.
Bower JM, Gordon-Raagas HB, Mulvey MA. 2009. Conditioning of uropathogenic Escherichia coli for enhanced colonization of host. Infect Immun 77:2104–2112.
Rhodius V, Van Dyk TK, Gross C, LaRossa RA. 2002. Impact of genomic technologies on studies of bacterial gene expression. Annu Rev Microbiol 56:599–624.
Gonzalez R, Tao H, Purvis JE, York SW, Shanmugam KT, Ingram LO. 2003. Gene array-based identification of changes that contribute to ethanol tolerance in ethanologenic Escherichia coli: comparison of KO11 (parent) to LY01 (resistant mutant). Biotechnol Prog 19:612–623.
Taoka M, Yamauchi Y, Shinkawa T, Kaji H, Motohashi W, Nakayama H, Takahashi N, Isobe T. 2004. Only a small subset of the horizontally transferred chromosomal genes in Escherichia coli are translated into proteins. Mol Cell Proteomics 3:780–787.
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL. 2002. Hierarchical organization of modularity in metabolic networks. Science 297:1551–1555.
Simeonidis E, Rison SC, Thornton JM, Bogle ID, Papageorgiou LG. 2003. Analysis of metabolic networks using a pathway distance metric through linear programming. Metab Eng 5:211–219.
Arita M. 2004. The metabolic world of Escherichia coli is not small. Proc Natl Acad Sci USA 101:1543–1547.
Jardine O, Gough J, Chothia C, Teichmann SA. 2002. Comparison of the small molecule metabolic enzymes of Escherichia coli and Saccharomyces cerevisiae. Genome Res 12:916–929.
Rison SC, Thornton JM. 2002. Pathway evolution, structurally speaking. Curr Opin Struct Biol 12:374–382.
Ma HW, Kumar B, Ditges U, Gunzer F, Buer J, Zeng AP. 2004. An extended transcriptional regulatory network of Escherichia coli and analysis of its hierarchical structure and network motifs. Nucleic Acids Res 32:6643–6649.
Shen-Orr SS, Milo R, Mangan S, Alon U. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31:64–68.
Karimpour-Fard A, Leach SM, Gill RT, Hunter LE. 2008. Predicting protein linkages in bacteria: which method is best depends on task. BMC Bioinformatics 9:397.
Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D. 2004. Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol 5:R35.
Price MN, Huang KH, Alm EJ, Arkin AP. 2005. A novel method for accurate operon predictions in all sequenced prokaryotes. Nucleic Acids Res 33:880–892.
Steinhauser D, Junker BH, Luedemann A, Selbig J, Kopka J. 2004. Hypothesis-driven approach to predict transcriptional units from gene expression data. Bioinformatics 20:1928–1939.
Burden S, Lin YX, Zhang R. 2005. Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences. Bioinformatics 21:601–607.
Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, Solovyev VV. 2003. Sequence alignment kernel for recognition of promoter regions. Bioinformatics 19:1964–1971.
Fu Y, Jarboe LR, Dickerson JA. 2011. Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities. BMC Bioinformatics 12:233.
Watanabe RL, Morett E, Vallejo EE. 2008. Inferring modules of functionally interacting proteins using the Bond Energy Algorithm. BMC Bioinformatics 9:285.
Muley VY, Ranjan A. 2012. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction. PLoS One 7:e42057.
Moreno-Hagelsieb G, Jokic P. 2012. The evolutionary dynamics of functional modules and the extraordinary plasticity of regulons: the Escherichia coli perspective. Nucleic Acids Res 40:7104–7112.
Kastenmuller G, Schenk ME, Gasteiger J, Mewes HW. 2009. Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes. Genome Biol 10:R28.
Kumar VS, Maranas CD. 2009. GrowMatch: an automated method for reconciling in silico/in vivo growth predictions. PLoS Comput Biol 5:e1000308.
Thomas GH, Zucker J, Macdonald SJ, Sorokin A, Goryanin I, Douglas AE. 2009. A fragile metabolic network adapted for cooperation in the symbiotic bacterium Buchnera aphidicola. BMC Syst Biol 3:24.
Frazier ME, Johnson GM, Thomassen DG, Oliver CE, Patrinos A. 2003. Realizing the potential of the genome revolution: the Genomes to Life program. Science 300:290–293.
Bailey JE. 1991. Toward a science of metabolic engineering. Science 252:1668–1675.
Stephanopoulos G, Vallino JJ. 1991. Network rigidity and metabolic engineering in metabolite overproduction. Science 252:1675–1681.
Arense P, Bernal V, Charlier D, Iborra JL, Foulquie-Moreno MR, Canovas M. 2013. Metabolic engineering for high yielding L(-)-carnitine production in Escherichia coli. Microb Cell Fact 12:56.
Jantama K, Zhang X, Moore JC, Shanmugam KT, Svoronos SA, Ingram LO. 2008. Eliminating side products and increasing succinate yields in engineered strains of Escherichia coli C. Biotechnol Bioeng 101:881–893.
Weber J, Hoffmann F, Rinas U. 2002. Metabolic adaptation of Escherichia coli during temperature-induced recombinant protein production: 2. Redirection of metabolic fluxes. Biotechnol Bioeng 80:320–330.
UniProt Consortium. 2013. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41:D43–D47.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson, JE, Ringwald M, Rubin GM, Sherlock G. 2000. Gene ontology: tool for the unification of biology. Nat Genet 25:25–29.
Serres MH, Riley M. 2000. MultiFun, a multifunctional classification scheme for Escherichia coli K-12 gene products. Genome Biol 5:205–222.
Bochner BR, Gadzinski P, Panomitros E. 2001. Phenotype microarrays for high-throughput phenotypic testing and assay of gene function. Genome Res 11:1246–1255.
AbuOun M, Suthers PF, Jones GI, Carter BR, Saunders MP, Maranas CD, Woodward MJ, Anjum MF. 2009. Genome scale reconstruction of a Salmonella metabolic model: comparison of similarity and differences with a commensal Escherichia coli strain. J Biol Chem 284:29480–29488.
Baumler DJ, Peplinski RG, Reed JL, Glasner JD, Perna NT. 2011. The evolution of metabolic networks of E. coli. BMC Syst Biol 5:182.
Mackie A, Paley S, Keseler IM, Shearer A, Paulsen IT, Karp PD. 20 December 2013. Addition of Escherichia coli K–12 growth-observation and gene essentiality data to the EcoCyc database. J Bacteriol doi:10.1128/JB.01209-13.
Yoon SH, Han MJ, Jeong H, Lee CH, Xia XX, Lee DH, Shim JH, Lee SY, Oh TK, Kim JF. 2012. Comparative multi-omics systems analysis of Escherichia coli strains B and K–12. Genome Biol 13:R37.
Gerdes SY, Scholle MD, Campbell JW, Balazsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, Bhattacharya A, Kapatral V, D'Souza M, Baev MV, Grechkin Y, Mseeh F, Fonstein MY, Overbeek R, Barabasi AL, Oltvai ZN, Osterman AL. 2003. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J Bacteriol 185:5673–5684.
Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H. 2006. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2:2006.0008.
Yamamoto N, Nakahigashi K, Nakamichi T, Yoshino M, Takai Y, Touda Y, Furubayashi A, Kinjyo S, Dose H, Hasegawa M, Datsenko KA, Nakayashiki T, Tomita M, Wanner BL, Mori H. 2009. Update on the collection of Escherichia coli single-gene deletion mutants. Mol Syst Biol 5:335.
Joyce AR, Reed JL, White A, Edwards R, Osterman A, Baba T, Mori H, Lesely SA, Palsson BØ, Agawalla S. 2006. Experimental and computational assessment of conditionally essential genes in Escherichia coli. J Bacteriol 188:8259–8271.
Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ. 2007. A genome-scale metabolic reconstruction for Escherichia coli K–12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol 3:121.
Patrick WM, Quandt EM, Swartzlander DB, Matsumura I. 2007. Multicopy suppression underpins metabolic evolvability. Mol Biol Evol 24:2716–2722.
Orth JD, Thiele I, Palsson BØ. 2010. What is flux balance analysis? Nat Biotechnol 28:245–248.
Thiele I, Palsson BØ. 2010. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc 5:93–121.
Reed JL, Vo TD, Schilling CH, Palsson BO. 2003. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol 4:R54.
Orth JD, Conrad TM, Na J, Lerman JA, Nam H, Feist AM, Palsson BØ. 2011. A comprehensive genome-scale reconstruction of Escherichia coli metabolism—2011. Mol Syst Biol 7:535.
Kayser A, Weber J, Hecht V, Rinas U. 2005. Metabolic flux analysis of Escherichia coli in glucose-limited continuous culture. I. Growth-rate-dependent metabolic efficiency at steady state. Microbiology 151:693–706.
Ibarra RU, Edwards JS, Palsson BO. 2002. Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420:186–189.
Belaich A, Belaich JP. 1976. Microcalorimetric study of the anaerobic growth of Escherichia coli: growth thermograms in a synthetic medium. J Bacteriol 72:497–499.
Varma A, Boesch BW, Palsson BO. 1993. Stoichiometric interpretation of Escherichia coli glucose catabolism under various oxygenation rates. Appl Environ Microbiol 59:2465–2473.
Dharmadi Y, Murarka A, Gonzalez R. 2006. Anaerobic fermentation of glycerol by Escherichia coli: anew platform for metabolic engineering. Biotechnol Bioeng 94:821–829.
Hu JC, Karp PD, Keseler IM, Krummenacker M, Siegele DA. 2009. What we can learn about Escherichia coli through application of Gene Ontology. Trends Microbiol 17:269–278.
Riley M, Abe T, Arnaud MB, Berlyn MK, Blattner FR, Chaudhuri RR, Glasner JD, Horiuchi T, Keseler IM, Kosuge T, Mori H, Pema NT, Plunkett G 3rd, Rudd KE, Serres MH, Thomas GH, Thomson NR, Wishart D, Wanner BL. 2006. Escherichia coli K-12: a cooperatively developed annotation snapshot–2005. Nucleic Acids Res 34:1–9.
Keseler IM, Collado-Vides J, Santos-Zavaleta A, Peralta-Gil M, Gama-Castro S, Muñiz-Rascado L, Bonavides-Martinez C, Paley S, Krummenacker M, Altman T. Kaipa P, Spaulding A, Pacheco J, Latendresse M, Fulcher C, Sarker M, Shearer AG, Mackie A, Paulsen I, Gunsalus RP, Karp PD. 2011. EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Res 39:D583–D590.
Keseler IM, Bonavides-Martinez C, Collado-Vides J, Gama-Castro S, Gunsalus RP, Johnson DA, Krummenacker M, Nolan LM, Paley S, Paulsen IT, Peralta-Gil M, Santo-Zavaleta A, Shearer AG, Karp PD. 2009. EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res 37:D464–D470.
Karp PD, Keseler IM, Shearer A, Latendresse M, Krummenacker M, Paley SM, Paulsen I, Collado-Vides J, Gama-Castro S, Peralta-Gil M, Santos-Zavaleta A, Peñaloz-Spinola MI, Bonavides-Martinez C, Ingraham J. 2007. Multidimensional annotation of the Escherichia coli K-12 genome. Nucleic Acids Res 35:7577–7590.
Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen IT, Peralta-Gil M, Karp PD. 2005. EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res 33:D334–D337.
Karp PD, Arnaud M, Collado-Vides J, Ingraham J, Paulsen IT, Saier MHJ. 2004. The E. coli EcoCyc database: no longer just a metabolic pathway database. ASM News 70:25–30.
Karp PD, Riley M, Saier M, Paulsen IT, Paley S, Pellegrini-Toole A. 2002. The EcoCyc Database. Nucleic Acids Res 30:56–58.
Karp PD, Riley M, Saier M, Paulsen IT, Paley S, Pellegrini-Toole A. 2000. The EcoCyc and MetaCyc databases. Nucleic Acids Res 28:56–59.
Karp PD. 1999. Using the EcoCyc Database, p 269–280. In: Bishop M (ed), Nucleic Acid and Protein Databases and How To Use Them. Academic Press, London, UK.
Karp PD, Riley M. 1999. EcoCyc: the resource and the lessons learned, p 47–62. In: Letovsky S (ed), Bioinformatics Databases and Systems. Norwell, MA, Kluwer Academic Publishers.
Karp PD, Riley M, Paley SM, Pellegrini-Toole A, Krummenacker M. 1999. EcoCyc: encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res 27:55–58.
Karp PD, Riley M, Paley SM, Pellegrini-Toole A, Krummenacker M. 1998. EcoCyc: encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res 26:50–53.
Karp PD, Riley M, Paley SM, Pellegrini-Toole A, Krummenacker M. 1997. EcoCyc: encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res 25:43–51.
Karp PD, Riley M, Paley SM, Pellegrini-Toole A. 1996. EcoCyc: encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res 24:32–39.

Information & Contributors


Published In

cover image EcoSal Plus
EcoSal Plus
Volume 6Number 131 December 2014
eLocator: 10.1128/ecosalplus.ESP-0009-2013
Editor: James Kaper, University of Maryland School of Medicine, Baltimore, MD


Received: 16 January 2014
Returned for modification: 17 January 2014
Published online: 21 March 2014



Peter D. Karp
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Daniel Weaver
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Suzanne Paley
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Carol Fulcher
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Aya Kubo
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Anamika Kothari
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Markus Krummenacker
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Pallavi Subhraveti
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Deepika Weerasinghe
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Socorro Gama-Castro
Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
Araceli M. Huerta
Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
Luis Muñiz-Rascado
Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
César Bonavides-Martinez
Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
Verena Weiss
Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
Martin Peralta-Gil
Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
Alberto Santos-Zavaleta
Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
Amanda Mackie
Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
Julio Collado-Vides
Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, México
Ingrid M. Keseler
Bioinformatics Research Group, SRI International, Menlo Park, CA 94025
Ian Paulsen
Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095
UCLA Institute of Genomics and Proteomics, University of California, Los Angeles, CA 90095
Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095
UCLA Institute of Genomics and Proteomics, University of California, Los Angeles, CA 90095


James Kaper
University of Maryland School of Medicine, Baltimore, MD


Address correspondence to Peter Karp, [email protected]

Metrics & Citations


Note: There is a 3- to 4-day delay in article usage, so article usage will not appear immediately after publication.

Citation counts come from the Crossref Cited by service.


If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download.

View Options

Figures and Media






Share the article link

Share with email

Email a colleague

Share on social media

American Society for Microbiology ("ASM") is committed to maintaining your confidence and trust with respect to the information we collect from you on websites owned and operated by ASM ("ASM Web Sites") and other sources. This Privacy Policy sets forth the information we collect about you, how we use this information and the choices you have about how we use such information.
FIND OUT MORE about the privacy policy