Existing software for gene-order comparison require heavy pre-computing to identify homologous genes (done by computationally intensive genome-wide comparisons). Since interests of most Molecular biologists comprise only of a small-set of genes (cluster), such pre-computing is normally considered a waste of time and precious resources. Our web-based software system (CGCV) allows users to submit their genes and perform a dynamic search against the genomes of their choices and interactively visualize the gene conservation using our novel multi-genome browser, thus bypassing the "unwanted" pre-computing step.

This system has been tested and applied to work with the Microbial sequence dataset (source: NCBI GenBank FTP site) and 6 Eukaryotic sequence datasets: Homo sapiens, Mus musculus, Rattus norvegicus, Daniorerio, Drosophila melanogaster and Caenorhabditis elegans (source: Ensembl FTP site)

This system consists of 2 integral components:

  1. The web-interface (front-end)
  2. The MySQL relational tables (back-end)

The Web interface provides the user with a flexible interface consisting of a BLAST server (that does the dynamic search) and a multi-genome browser. The user submits the query sequences as input(either by uploading a file or copy-pasting the sequence(s) in the submission box) and is required to choose a BLAST program and reference database(both relevant to the type of query sequence) and an e-value cutoff for the BLAST search. If necessary, the user may choose to alter the parameters of the BLAST program and customize it based on their requirements.

On completion of the search, the user is presented with a Phylogenetic Profiling table which represents the presence or absence of a particular query sequence against each chosen reference genome (based on the specified e-value cutoff). This table allows users to select the genomes against which they would like to visualize the query sequences.

The trademark feature of this system is the novel multi-genome browser, which utilizes the Perl GD Library for the visualization.

viz

For each genome, the image is equipped with navigation controls (to zoom in/out, pan left/right, zoom in to a particular range, filter the hits based on a cutoff e-value or rearrange the order of the image) and 3 tracks (the first one corresponding to the genome (displayed as a black bar), the second corresponding to the annotated genes (represented in green) and the third track corresponding to the query sequences).

The MySQL relational tables store the parsed annotation information (derived from standard GFF and GTF files) alongwith other relevant information that enable the web-interface to retrieve the necessary data for the dynamic. Apart from the relational tables, the backend comprises of the genome, gene and amino acid sequences (stored as flatfiles on the filesystem) corresponding to all the Microbial and Eukaryotic organisms in our database. The beauty of this system is that it synchronizes on a nightly basis with the NCBI GenBank and Ensembl FTP servers. Thus, the users always have access to the latest data, as and when it is made available to the public by these sequence repositories.

panleft rearrange evalfilter zoominout range refresh panright genometrack gfftrack querytrack popup

Copyrights Reserved