Step-by-Step Tutorial

Synteny File

1. Format

- Line 1 must begin with the pound symbol '#'.
- Line 1 must contain column headers separated by tabs.
- Column headers can be labeled any term of your choice that contains alphabetic characters.
- Columns 1 to 6 (in purple below) are mandatory and column headers should be called 'org1', 'org1_start', 'org1_end', 'org2', 'org2_start' and 'org2_end'.
- Columns 7 to N (in orange below) are provided as additional features and are not required to be filled out for the program to work.

2. Column 1-6 description (mandatory fields)

Column 1: Genome1_ID
- This is where the ID of genome 1 must be entered.
- It must be alphanumeric with either underscores or spaces, but does not support tabs.
- This field can be the name or ID of the organism, chromosome, scaffold and contig, etc.
- This field is limited to 255 characters.
Examples of acceptable formats: Org_A, Chromosome 3, HS19

Column 2: Genome1_Start
- This is where the Start coordinate of the conserved region on genome 1 is entered.
- It must be numeric.
Example of acceptable format: 3001768

Column 3: Genome1_End
- This is where the End coordinate of the conserved region on genome 1 is entered.
- It must be numeric.
Example of acceptable format: 3020367

Column 4: Genome2_ID
- This is where the ID of genome 2 is entered.
- The format of the entry must be alphanumeric and can have either underscores or spaces, but does not support tabs.
- This field can be the name or ID of the organism, chromosome, scaffold and contig, etc.
- This field is limited to 255 characters.
Example of acceptable formats: Org_B, Chromosome 5, CE8

Column 5: Genome2_Start
- This is where the Start coordinate of the conserved region on genome 2 is entered.
- It must be numeric.
Example of acceptable format: 2983692

Column 6: Genome2_End
- This is where the End coordinate of the conserved region on genome 2 is entered.
- It must be numeric.
Example of acceptable format: 32991467

3. Column 7 onwards

- The following fields are optional and are provided as additional features.
- To use these fields all entries must be numeric values.

Column 7: Score
- In this example column 7 contains the score generated for the alignment.
- e.g., 255, 86, 198

Column 8: E-value
- In this example column 8 contains the e-value generated for the alignment.
- e.g., 1e-40, 1.7e-8, 1

4. Screen shot of synteny file

params

5. Generate synteny_file.txt

Disclaimer: We hope the example provided in this script is a valuable resource for you. Your use of the information contained in this script is at your own risk. All information
provided on this script, whether expressed or implied, is not warranted for its accuracy, completeness, appropriateness for a particular purpose, authorized, recommended,
supported, or guaranteed by the mGSV team.

- mGSV allows users to visually identify conserved regions among their genomes of interest, however mGSV does not generate the synteny data for the user. Synteny data must be
generated using a third party program such as BLAST, BLASTZ or other software. If you have any questions or concerns with generating synteny data to use with mGSV,
please feel free to contact our team.
- mGSV provides the following scripts to help convert the BLAST or BLASTZ output into the proper mGSV format.
- BLASTparser.pl (download): This script will convert standard BLAST output into synteny_file.txt for mGSV.

Usage: BLASTparser.pl -r <query_genome> -t <hit_genome> -i <blout_file> -o <output_file>
	where -
	query_genome: name or ID of the query genome.
	hit_genome: name or ID of the hit genome.
	blout_file: name or path to BLAST output file, where query_genome is blasted against hit_genome
	output_file: name of the output file, for e.g., synteny_file.txt
	

- BLASTZparser.pl (download): This script will convert standard BLASTZ output into synteny_file.txt for mGSV.

Usage: BLASTZparser.pl -r <query_genome> -t <hit_genome> -i <blastz_output_file> -o <output_file>
	where -
	query_genome: name or ID of the query genome.
	hit_genome: name or ID of the hit genome.
	blout_file: name or path to BLASTZ output file, where query_genome is blasted against hit_genome
	output_file: name of the output file, for e.g., synteny_file.txt
	

- You can combine more than one synteny_file.txt to generate a single synteny_file.txt

Annotation File

1. Format

- The annotation file must be prepared as illustrated in the example file below.
- It must be a 9-column tab delimited file.

2. Column descriptions

Column 1: ID
- This where the ID is entered and must correspond to the ID provided in Column 1 or Column 4 in the Synteny file.
- It must be alphanumeric and can have either underscores or spaces, but it does not support the use of tabs.
- This field is limited to 255 characters.
Examples of acceptable formats: Org_A, Chromosome 3, HS19

Column 2: Start
- This is where the Start coordinate of the annotated feature is entered.
- It must be numeric.
Example of acceptable format: 11315

Column 3: End
- This where the End coordinate of the annotated feature is entered.
- It must be numeric.
Example of acceptable format: 15093

Column 4: Strand
- This is where the Strand orientation of the annotated feature is entered.
- It must be either + which denotes a forward strand or - which denotes a reverse strand.
- If not applicable, provide a period, (.).
Examples of acceptable formats: +, -, .

Column 5: Name
- This is where the name of the annotated feature is entered.
- It can be alphanumeric.
- If not applicable, provide a period, (.).
- The name can be an HTML link. When the user clicks on that feature in the synteny browser, it will open the link in another tab.
Examples of acceptable formats: PAU_0022, 123.09, <a href="http://www.ncbi.nlm.nih.gov/gene/945006">lacZ</a>

Column 6: Value
- This is where a Value associated with the feature can be entered.
- This is usually an expression value to generate an XY plot.
- It must be numeric or in scientific notation.
- If not applicable, provide a period, (.).
Examples of acceptable formats: 85, 35, 1e-4

Column 7: Track
- This is where you can name the feature to a specific Track.
- It can be alphanumeric.
- Each unique entry in this column will result in a track.
Examples of acceptable formats: gene, transposon, expression

Column 8: Shape
- This is where you can enter and define the Shape of the annotated feature.
- We support the following 7 shapes: arrow, dashline, xyplot, pentagram, christmasarrow, box and ellipse.
- It is not case-sensitive.
Examples of acceptable formats: arrow, box, xyplot

Column 9: Color
- This is where you can enter and define the Color display of the Annotated feature.
- We support the following 32 colors: purple, pink, crimson, olive, sandybrown, firebrick, darkgray, tomato, seagreen, peru, lightsteelblue, salmon, khaki, skyblue, maroon, silver,
gold, darkgreen, orange, chocolate, darkcyan, tan, darkviolet, indianred, gainsboro, brown, gray, red, thistle, darkslategray, magenta, and black.
- If the color name is misspelled, the shape will automatically be displayed in black color.
- It is not case-sensitive.
Examples of acceptable formats: red, chocolate, pink

3. Screen shot of annotation file

params

4. Generate annotation_file.txt

Disclaimer: We hope the example provided in this script is a valuable resource for you. Your use of the information contained in this script is at your own risk. All information
provided on this script, whether expressed or implied, is not warranted for its accuracy, completeness, appropriateness for a particular purpose, authorized, recommended,
supported, or guaranteed by the mGSV team.

- The most common format for genome annotation information is GFF3 format. These GFF3 format files can be downloaded from NCBI(ftp://ftp.ncbi.nih.gov/genomes/Bacteria/)
and other sources, or produced by running genome annotation programs on your sequence of interest.
- To convert GFF3 format into mGSV format, user will require a text file called conf.txt and a perl script called GFF3parser.pl.
- conf.txt (download): This is a tab delimited file to extract required information from a GFF3 file, where Column 1 is the feature in the GFF3 file, Column 2 is the desired shape and
Column 3 is the desired color.

   
	gene       arrow   red
	CDS     box     blue
	misc_feature    box     green
	microarray_oligo        xyplot  sky
	where - CDS, gene, mis_feature and microarray_oligo are features you want to extract from the GFF3 file.
	

- GFF3parser.pl (download)

Usage: GFF3parser.pl -n <genome> -i <gff3_file> -c <conf.txt> -o <output_file>
	where -
	genome: name or ID of the genome
	gff3_file: name or path to GFF3 file
	conf.txt: file as described above
	output_file: name of the output file, e.g., annotation_file.txt
	

- You can combine more than one annotation_file.txt to generate a single annotation_file.txt

Upload Synteny

On the home page, upload the synteny and annotation files, or specify a URL where they can be accessed.

params

mGSV Web service Client

1. Overview

mGSV comes with an optional Web Service Component, when active it allows remote clients to upload synteny and annotation files using Web Service Protocol. An example client application that uses this protocol to upload local files and public urls is implemented. This documentation is gives a high level overview of the Web Service from the client's point of view. Shows how to use the example client application using its binary file as well as building the binary file from the sources. The final section is a guide to a Java programmer to build web service using simple java development tools.

The Web Service is available to communicate with clients at http://cas-bioinfo.cas.unt.edu:8081/MGSVService

2. Execute client application

2.A. Required
- Java 1.6 or higher

2.B. Working directory
- We assume /opt/mgsv as a working directory
- Create the directory if required and all commands below are executed at this directory unless specified otherwise.

	shell> mkdir /opt/mgsv
	shell> cd /opt/mgsv
2.C. Download Jar file
	shell> wget http://cas-bioinfo.cas.unt.edu/ws/ws-client-1.0-RC3-jar-with-dependencies.jar
2.D. Download sample synteny and annotation file.
	shell> wget http://cas-bioinfo.cas.unt.edu/sample_synteny.txt
	shell> wget http://cas-bioinfo.cas.unt.edu/sample_annotation.txt
2.E. Create configuration file
- to specify the address of mGSV server
	shell> echo "remote=http\://cas-genome.cas.unt.edu:8081/MGSVService" > config.properties
2.F. Initiate web service
- Use the downloaded jar file, along with sample synteny and annotation files
- You can also specify the path to these files
- Synteny file is required, Annotation file and email is optional.
- Output of the command below is <session_id>
	shell> java -jar ws-client-1.0-RC3-jar-with-dependencies.jar sample_synteny.txt sample_annotation.txt email@example.com
	Example output: 306120220123200000023417
- Use the <session_id> in the URL to access mGSV, after replacing the <session_id>
url: http://cas-bioinfo.cas.unt.edu/mgsv/summary.php?session_id=<session_id>

3. Running the client service from source

3.A. Required
- JDK 1.6 or higher
- Maven 3.0 or higher

3.B. Download client source.

	shell> cd /opt/mgsv 
	shell> wget http://cas-bioinfo.cas.unt.edu/ws/mgsv-ws-client.tar.gz 
	shell> tar xzf mgsv-ws-client.tar.gz
	shell> cd mgsv-ws-client
3.C. Build the sources
	shell> mvn package 
	shell> cp target/ws-client-1.0-RC3-jar-with-dependencies.jar /opt/mgsv/.
3.D. Go to above section (2.D) to continue running client Application


The video can be downloaded here

panleft rearrange evalfilter zoominout range refresh panright genometrack gfftrack querytrack popup