Data Releases & Updates

PATRIC Data Release and Website Update. (11 June 2008)


Bacterial Genome Data

A new Brucella genome, Brucella melitensis ATCC 23457, is being released to the public for the first time on the PATRIC website. The genomic sequence was supplied to PATRIC by LANL and is has received its primary annotation from PATRIC. Annotation includes protein coding genes, pseudogenes, RNA features, and riboswitches.

Total 144 new pseudogenes were Identified and annotated in Brucella genomes. As a result, 291 previously annotated CDS have been removed. Brucella ortholog groups, MSA, and trees have been updated to include new genome.

Total 511 new pseudogenes were identified and annotated in Rickettsia genomes. As a result, 1107 previously annotated CDS have been removed. Rickettsia ortholog groups, MSA, and trees have been updated to reflect annotation changes.

A new Coxiella plasmid, Coxiella burnetii 'MSU Goat Q177', has been loaded into the database and is released on the PATRIC website with primary RefSeq annotation. A new Coxiella unclosed/draft genome, Coxiella burnetii RSA 334, is now available on the PATRIC FTP site.

Viral Genome Data

Nineteen new Calicivirus genomes have been added to PATRIC database. PATRIC genome classification has been updated to include new genomes.

Four new Coronavirus genomes have been added to the PATRIC database. Seven new genomes have received manual curation. In addition, gene and product names have been standardized for all of the features in previously annotated genomes. Ortholog groups, MSA, and trees have been updated to include newly curated genomes. PATRIC genome classification has also been updated to include all new genomes.

Sixteen new Lyssavirus genomes have been added to the PATRIC database and fourteen of them have received standardized annotations. Lyssavirus ortholog groups, MSA and trees have been updated to include newly curated genomes. PATRIC genome classification has aslo been updated to include all new genomes.

Web Site Enhancements
  • Partial Genomes for Viruses

    Total 24,266 partial genome sequences related to PATRIC viral pathosystems have been obtained from GenBank and loaded into the PATRIC database. All of the partial genome sequences are listed under the Partial Genome Sequence Tab under the Genomes Tab for each of the viral pathosystems (Example: Lyssavirus Partial Genome Sequences). From the partial genome list, one can navigate to Genome Summary Page, Genome Browser, or Feature Table. One can also download the list of partial genomes as Excel or text file. Genome Finder Tool has been enhanced to allow one to quickly search for partial genome sequences using keyword, taxonomy id, sequence size, country of origin, or host species. Genomic Features Search Tool now allows one to quickly gather genomic features of interest from partial and/or complete genome sequences. Partial genome sequences and corresponding gene and protein sequences can also be searched using the BLAST Search Tool.

  • Integration of Swiss-Prot (SIB) Annotations

    Total 4,621 of the PATRIC bacterial proteins have been mapped to corresponding Swiss-Prot (SIB) annotation records. Number of proteins with SIB annotations is presented on Genome Overview Page (Example: Brucella suis 1330). The number is hyperlinked to the Feature Table, allowing one to quickly review the list of features with SIB annotations. SIB Annotations, when available, are displayed on the Feature Overview Page (Example). Any protein features annotated by SIB are displayed on the AA Evidence Page (Example).

  • Integration and Visualization of PDB Structures

    PATRIC annotated proteins are mapped to corresponding PDB structures using BLASTP search. Mappings are organized in three distinct categories: exact match, partial match, and similar. Total 804 proteins have been mapped to 36 PDB structures as exact or partial matches, while 4580 more proteins have been mapped to total 1477 PDB structures based on sequence similarity. For each genome the number of proteins with PDB structures is summarized on the Genome Overview Page. The number is hyperlinked to the Feature Table, allowing easy access to feature with PDB structures. At organism, genome, and feature levels, a new 3D Structure Tab has been added which shows list of PDB structures available (Example). From here, one can navigate to the 3D Structure Visualization Page (Example). On this page, 3D structure of the protein is displayed using JMOL viewer. Areas of interest, such as pre-computed IEDB epitopes, InterPro Domains, and Swiss-Prot annotated protein features can be highlighted on the 3D structure view.

  • Literature Data

    Literature references parsed from the GenBank submission files and Swiss-Prot annotations are displayed under Literature Tab at the Organism, Genome and Feature levels. The available literature set for any organism can also be searched by keywords using the Literature Search Tool.


PATRIC Data Release and Website Update. (18 April 2008)


Bacterial Genome Data

A new mass spec dataset has been added for Brucella abortus S19 genome. Results of this experiment are available as direct evidence under the Experiment Data tab for the genome.

A new Rickettsia genome, Rickettsia rickettsii str. Iowa (NC_010263), has been added to our database. It is included in this release with primary RefSeq annotations. Four of the whole-genome-shotgun sequence records annotated at PATRIC have been superceded by new assembly of the genomes at NCBI. Since the original genome sequences have not changed, the annotations at PATRIC remain the same.

A microarray dataset has been added for Rickettsia conorii Malish 7 genome. Experiment Results are available as direct evidence under the Experiment Data tab for the genome. Results from the same microarray experiment are also available as indirect evidence (via ortholog groups) for all other Rickettsia genomes.

All of the bacterial genomes have received complete and improved RNA annotations.

Viral Genome Data

One new Calicivirus genome has been added to the PATRIC database and has received standardized annotations. PATRIC genome classification has been updated to include the new genome.

One new Coronavirus genome has been added to the PATRIC database and Thirty-five additional genomes have received standardized annotations. Coronavirus ortholog groups, MSA and trees have been updated to include newly annotated genomes. PATRIC genome classification has also been updated. Nine additional genomes have been designated as Reference Genomes.

One new Hepatitis A genome has been added to the PATRIC database and has received standardized annotations. This genome has also been added to the PATRIC genome classification.

One new Hepatitis E genome has been added to the PATRIC database and has received manual curation. New annotations for specific protein domains have been provided for all Hepatitis A virus genomes.

Two new Lyssavirus genomes have been added to the PATRIC database and have received standardized annotations.

Web Site Enhancements
  • Integration of Rickettsia Microarray Data

    Microarray data from Rickettsia conorii Malish 7 genome were integrated with PATRIC. Experiment results are available as direct evidence under the Experiment Data tab for this genome. Results from the same experiment are also mapped to proteins in other Rickettsia genomes via ortholog groups and presented as indirect evidence. The Experiment Data Search Tool has been enhanced to allow querying on microarray data.

  • New Genomic Feature Search Tool

    The Gene/Protein Search Tool has been replaced by a new Genomic Feature Search Tool. This tool allows searching on not only protein coding genes and CDSs, but all other DNA feature types.

  • BLAST Improvements

    New organism specific BLAST libraries containing full genome, gene, protein sequences have been created. They are available through the BLAST Search page.


PATRIC Data Release and Website Update. (31 January 2008)


Bacterial Genome Data

The Brucella abortus S19 genome has been sequenced and the annotated genome is being released to the public for the first time on the PATRIC website. Its annotation includes RNA species and identification of pseudogenes. With regard to pseudogene identification, the boundaries of each pseudogene have been identified, however the precise location of gene disruptions have not been curated. Brucella ortholog groups, multiple sequence alignments (MSAs), and trees have been recalculated to include this new genome.

A new mass spectrometry experiment has been released under the experiment data tab.

Two new Rickettsia genomes, Rickettsia africae and Rickettsia massiliae, have been added to the public database and received PATRIC annotations, which include annotation of RNA species and pseudogenes. Rickettsia ortholog groups, multiple sequence alignments (MSAs), and trees have been updated to include these two genomes.

A new Coxiella genome, Coxiella burnetii RSA 331, has been added to our database. Also, the genome that was previously described as Coxiella burnetii Dugway 7E9-12 has been renamed as Coxiella burnetii Dugway 5J108-111 to reflect the naming correction made by the sequencing center. Coxiella burnetii RSA 331 genome and Coxiella burnetii Dugway 5J108-111 are currently presented with their primary annotations. The proteins on the Coxiella plasmids have received manual curation.

Viral Genome Data

Sixteen new Coronavirus genomes have been added to the PATRIC database, and additional nine genomes have received standardized annotations.The classification of Coronavirus genomes has been updated.Corona ortholog groups, multiple sequence alignments (MSAs), and trees have been recalculated.

Five genomes have been added to the PATRIC database.

Three new Hepatitis E genomes have been added to the PATRIC database, and the genome classification scheme has been updated. A new Hepatitis E species tree based on the full genome nucleotide sequence has been built and is available with this release. The proposed genotype, Genotype 5, is not represented in this tree since its level of divergence from the other sequences prevented a good tree from being constructed.

Two new Lyssavirus genomes have been added to the PATRIC database, and these have received standardized annotations.

Web Site Enhancements
  • Experiment Data

    Mass Spectrometry data from Brucella were integrated into the PATRIC system in the October 2007 website update. In this release refinements to both the user interface and database have been made. These refinements include a distinction in the database between data that was generated using the same genomic strain and data that can be mapped from a different strain; direct and indirect evidence, respectively. This is clearly presented when looking in the Experiment Data tab for individual genomes and individual genes. The tables that display experimental data have been reformatted to list genes just a single time, rather than listing the data for each experimental condition for each gene in a single table. Users can now drill down to the data on experimental conditions for the genes in which they are interested. The data is now summarized in a scatter plot with supporting data for each peptide that contributes to the data for that protein. A new query for experimental data has been implemented with a look and feel that is consistent with PATRIC's other queries. The user can specify the genomes for which they are interested, whether they are interested in direct or indirect data, and which keywords or annotations they would like to use in their query. (Link to Brucella Experiment Page)


PATRIC Data Release and Website Update. (15 October 2007)


Bacterial Genome Data

A new Brucella genome is included in this release, Brucella suis ATCC23445. The Brucella suis ATCC23445 sequence was supplied to PATRIC by Los Alamos National Labs (LANL) and received its primary annotation from PATRIC. The Brucella ovis genome has also received a PATRIC annotation in this release for a total of seven Brucella genomes with PATRIC provided/updated annotations. Ortholog groups have been calculated to include the seven PATRIC-curated Brucella genomes. The seven PATRIC-curated Brucella genomes have been evaluated for split genes resulting from frame-shifts or nonsense codons using the GenVar program (download GenVar). The segments of these split genes have been joined and curated as pseudogenes, though some of these potential pseudogenes may be the result of sequencing errors. The resulting proteins have been analyzed by our Protein Annotation Pipeline (PAP), and are included as members of ortholog groups to permit comparison across genomes. Multiple sequence alignments and trees have been created for the new ortholog groups.

For Rickettsia, we have added an analysis of microarray data evaluating the transcriptional effects of nutrient-limiting conditions. The analysis and data are presented under the "Collaborative Research" tab for Rickettsia.

Viral Genome Data

This data release includes 33 new genomes for Calicivirus genomes, 3 new genomes for Coronaviruses, 2 new genomes for Hepatitis E viruses, and 1 new genome for Lyssaviruses. In addition 13 Hepatitis A genomes, 9 Lyssavirus, and many Coronavirus genomes have received manual curation. For Coronaviruses, we've also updated all of our locus tags. Ortholog groups, multiple sequence alignments, and trees have been recalculated to include these newly curated genomes. A mapping of old and new locus tags is available under the download section.

Enhancements to functionality of the website:
  • New Search Tools

    A Genome Finder tool allows one to find genomic sequences of interest by keyword (organism name, genome name, accession, etc.), NCBI Taxon ID, or GI number. Also, a Genomic Pattern Search allows one to search any pattern (defined as regular expression) in the complete sequences of one or more selected genomes. This version will search only against genome sequences, but the next version will allow users to find protein sequences.

  • BLAST Search Improvements

    We have enabled users to BLAST against complete genome sequences in this release. Sequences in the BLAST report can now be added directly to the feature cart.

  • Improved Visibility of Reference Genomes

    Reference genomes are now indicated with a visual cue reference genome in a number of lists and tables throughout the site, including the ortholog group filtering mechanism.

  • Improved Representation of Multi-segment Features in Table Layouts

    The feature table now represents features with multiple segments on a single row. An added visual cue joined feature distinguishes these features. These features will include the CDSs stitched together based on GenVar data, as well as known frame shifts and splice sites.

  • Genome Classification on Viral Organism Landing Pages

    The main page for each viral organism now displays the PATRIC-defined classification of available genomes. These classification schemes have been derived from literature and in collaboration with our organism experts. This should enable users to more easily find their genomes of interest directly from each viral organism's home page.

Enhancements to the Ortholog Group Page:

The mechanism to filter and search for ortholog groups has been laid out more intuitively, allowing users to find ortholog groups with specific genome memberships, specific keywords, or of specific sizes. Also, histograms at the top of the ortholog pages are now clickable. Clicking on a bar within the histogram will display the ortholog groups consisting of the number of members indicated in the histogram.


PATRIC Data Release and Website Update. (17 August 2007)


Bacterial Genome Data

Two new Brucella genomes are included in this release, Brucella ovis ATCC 25840 and Brucella canis. Brucella canis was supplied to PATRIC by Los Alamos National Labs (LANL) and received its primary annotation from PATRIC. Brucella ovis ATCC 25840 maintains the annotation provided by its sequenceing center (JCVI, formerly known as TIGR) in this data release. The species tree for Brucella genomes has been recalculated to include these new genomes. Ortholog groups have been calculated to include the five PATRIC-curated Brucella genomes. The five PATRIC-curated Brucella genomes have been evaluated for split genes resulting from frame-shifts or nonsense codons using the GenVar program (download GenVar). The segments of these split genes have been joined and curated as pseudogenes, though some of these potential pseudogenes may be the result of sequencing errors. The resulting proteins have been analyzed by our Protein Annotation Pipeline (PAP), and are included as members of ortholog groups to permit comparison across genomes.

Rickettsia protein annotations have have been manually reviewed by the PATRIC curation team. One member of each ortholog group has been manually curated. In this release we are also removing 1580 previously released gene predictions, which may be the result of gene overprediction by GeneMark or Glimmer. They all have no orthologs in other PATRIC Rickettsia genomes, are shorter than 100 amino acids, and have no identified homology in NCBI's BLAST databases, and have no predicted protein domains.

Viral Genome Data

This data release contains 26 new genomes for Calicivirus, 24 new genomes for Coronaviruses, 12 for Hepatitis A, 2 for Hepatitis E, 7 for Lyssaviruses. These numbers include both complete and nearly complete genomes. Forty-five Coronavirus genomes have received standardized PATRIC annotations, and their genes and mature peptides are now included in Coronavirus ortholog groups.

SARS Coronavirus 3D Structures

Solved 3D structure data for SARS Coronavirus proteins are now available from the PATRIC website. Users can gain access to this information through the 3D Structure tab on the Coronavirus page. These structures are provided through the Resource Center for Biodefense Proteomics Research.

Brucella abortus 2308 Mass Spectrometry Data

Mass spectrometry data from the outer membrane fraction of Brucella abortus 2308 and related mutant strains are now available from the PATRIC website. Users can gain access to this information through the Experimental Data tab on the Brucella page, as well as Experimental Data tabs for Brucella abortus 2308 and individual genes. These data are provided through the Resource Center for Biodefense Proteomics Research.

Brucella suis Genome Prioritization Data

To facilitate downstream research on validation and development of countermeasures we are using bioinformatics methods for prioritizing the pathogen genomes. As a first step we have evaluated a number of data for Brucella suis. We have taken into account information on pathways, druggable protein domains, protein localization, and literature on essential and virulence genes in this analysis. Methods and results of this analysis are available through the Collaborative Research tab on the Brucella page.


PATRIC Website Update. (01 June 2007)

This update contains significant changes to the PATRIC website. The changes were designed to make the website easier to use and get the information that you are looking for. Some of the highlights are detailed below. We encourage you to use the site Feedback capability to provide us input on the features that are provided as well as requests to consider for future work or questions about the current site.

Additional Support for Comparative Genomics

Enhanced comparative genomics capabilities allow you to make time-saving comparisons across genomes, such as finding features of interest that are unique to specific genomes, or conversely, features that are common to all but one of many related genomes. At the feature level, you can perform a multiple sequence alignment across members of an ortholog group, and visually arrange members (and their sequences) based on phylogeny. At the pathway level, you can now perform comparative analysis examining relationships between PATRIC's bacterial reference genomes and the human host genome.

A Suite of New Searches and Sophisticated Context-Sensitive Filtering

We have added nine new specialized searches to provide rich querying of PATRIC's data. You can query against PATRIC's annotation data, RefSeq/GenBank data or both. You can search for epitopes, GO Terms, EC number, InterProScan, TIGRFam, PFam, BLOCKS, COGS, and more. In addition to the searches, you can narrow your search results by using the new context-sensitive filtering tools. For example, when presented with a list of ortholog groups, you can filter based on genome membership, size of group or keyword (such as protein function).

Enhanced Support for Collecting Related Sequences of Interest

An improved feature cart (much like a shopping cart) has been added throughout the site, allowing you to gather sequences of interest while you work. The feature cart allows you to collect large sets of features from organism feature tables; sets of related features from the ortholog group pages; sets of related features from search results tables; and from individual feature pages. Once collected, these features can be exported as FASTA DNA or FASTA Protein sequences.

New Website Organization, Navigation, and Look-and-Feel

We have re-designed the website to provide more flexible navigation between and among website areas. Specifically, we provide support for "organism-centric" task flows for those interested in specific properties and features of specific genomes, as well as support for "search-centric" task flows for those interested in locating resources both within and across PATRIC organisms. The look and feel of the new website has been upgraded as well, leveraging the latest web-technology to provide application-like productivity.


PATRIC Data Release and Website Update. (16 April 2007)

This data release contains new genomes for Coronaviruses (18 genomes), Hepatitis A (2 genomes), and Lyssaviruses (14 genomes). The new Hepatitis A and Lyssavirus genomes have had their DNA annotations and gene product naming standardized. Nineteen Coronavirus genomes have received this standardization in this release as well.

The functional annotation of the gene products for Coxiella burnetii RSA493 has been manually reviewed by the PATRIC curation team to ensure proper naming, G.O., and E.C. number assignments. Based on this updated functional annotation, pathways have been reconstructed computationally for Coxiella burnetii.

To accommodate the new genomes, phylogenetic trees have been reconstructed for Hepatitis A, Lyssaviruses, and Coronaviruses. Also a new tree has been constructed for the Caliciviruses, which is based on the amino acid sequence of the capsid protein rather than ORF1. Additional phylogenetic trees for Lyssaviruses are now available, and provide broader Lyssavirus coverage than the whole genomes available through PATRIC.

Ortholog groups have been rebuilt to incorporate the proteins of the new genomes added to the database. For the first time ortholog groups are also available for Caliciviruses and Coronaviruses.


PATRIC Data Release and Website Update. (15 January 2007)

This release contains newly curated genomes and website enhancements. Updated curation includes standardization of protein product names and names of mature peptides for all Calicivirus genomes and curation of 10 additional Hepatitis A genomes. Additionally, all bacterial protein products have received updated functional curation through our Automated Protein Curation Pipeline (APCP), which adopts functional annotation from TIGRfams, SwissProt, and BLAST hits to the NCBI non-redundant database in decreasing order of preference. The SOP for this pipeline is posted on Standard Operating Procedures page.

The reference genomes for the bacterial pathogens have had RNAs curated. We improve upon traditional tRNAscan-SE prediction of tRNA genes by coupling it with a second tRNA-finding program, Aragorn. We determine the endpoints of small subunit (16S) and large subunit (23S and 5S) rRNAs using secondary-structure-based multiple sequence alignments, with trimming to match the endpoints of the E. coli RNAs. Profiles and tools from Rfam are applied to identify many small RNA genes in genomic sequences. Among the RNA genes that can be detected this way are those for trans-acting regulatory RNAs, riboswitches, RNase P RNA, SRP RNA, 6S RNA, plasmid replication RNAs, retron RNAs, self-splicing introns and other ribozymes, the rRNA modification guide RNAs of Archaea, and other microbial RNAs. The gene table has been updated with a filter that allows the user to selectively display CDS, RNA, and/or mature peptides.

We have integrated the Immune Epitope Database Data with our sequence data. The first iteration of the interface for this data can be reached from the Epitope Search page. Additionally, each protein information page contains a list of any epitopes present in the protein sequence within the evidence section of the page.

Two bioinformatic analysis "special projects" have been carried out as use cases for building new pipelines. The rationale, workflow, and results of these special projects are available on Coxiella Collaborative Research page and Lyssavirus Collaborative Research page. The first of these projects aims to identify the secreted or membrane-attached proteins for Rickettsia and Coxiella genomes as one approach to identifying potential vaccine candidates. The second project aims to design PCR primers that broadly amplify Lyssavirus genomes. We anticipate that the results of additional projects will become available as projects are completed.

Only minor changes have been made to website functionality with this release. In order to better organize genomes, particularly for pathogens with a large number of genomes, we have implemented a "Group Name" column that specifies the groups under which the genomes are classified. Generally, this is one reference genome and all of its associated genomes. We hope that this facilitates easier access to genome data. The other enhancement, as described above, is the ability to filter the gene table for any genome for CDS, mature peptides, and RNAs.