pharmavengers

pharmavengers

Friday, 12 June 2015

BLAST :)

BLAST is the Basic Local Alignment Search Tool. It is a set of search programs designed to explore all available sequence databases in either protein or DNA. This software has been designed to achieve great speeds while keeping a well-defined statistical interpretation.

Setup

No setup is needed to run BLAST.

Usage

BLAST provides a variety of commands including:

    blastall
    • performs protein-protein (blastp) searches,
    • nucleotide-nucleotide (blastn) searches,
    • nucleotide to protein database (blastx) searches,
    • protein to translated nucleotide database (tblastn) searches,
    • nucleotide to translated protein database (tblastx) searches,
    • or position-specific interated (psiblastn) searches.
    megablast
    • performs nucleotide-nucleotide searches using an optimized greedy algorithm that concatenates queries to save time spent scanning the database.
    blastpgp
    • performs gapped blastp searches and can be used to perform iterative searches in psi-blast and phi-blast mode.
    bl2seq
    • performs a comparison between two sequences using either the blastn or blastp algorithm. Both sequences must be proteins or both sequences must be nucleotides.

So, here are the example of the usage of this database.

HIV BLAST

Purpose: Find the HIV database sequences most similar to your query(s).

Input
Paste your sequence(s)

or upload a file
or enter accession number(s)

Options
Output style 
Number of BLAST matches to display
Run BLAST against
or a background set of sequences you upload
E-mail Always email results
Show location of match in genome Only matters for nucleotide input; uncheck to speed the job
 

Details: Our DNA database contains most of the same HIV sequences found in GenBank, but a BLAST search here gives more informative output. The results will contain some of the fields we annotate, such as subtype, sampling country and isolation year.

Input: One nucleotide or amino acid sequence, or a bulk set of sequences. A single sequence can be in FastA format or raw sequence.

Run BLAST against: The default BLAST background is all sequences in the LANL HIV Database. You can also search only the sequences with assigned subtypes, or sequences of one pure subtype. If you want to BLAST against your own submitted background set, browse for a file that contains those sequences.

Subsequent analyses: From the BLAST results page, you can: 

  • Download and align all or a selection of your output sequences,
  • Use the Geography search to examine the origin of your BLAST results,
  • Run NCBI BLAST

HIV BLAST Examples

Output

All BLAST results begin with a table of the best matches to your query sequence. Matches are excluded if the %Identity is <50% or if the length of the match is <20% of the length of the query sequence.

sample BLAST output

Output columns:

  • Download. Use these boxes to select sequences for download. Check/uncheck the top box to select all or unselect all.
  • Accession. This will link you to the accession record for each sequence indicated.
  • Name. The common name of the sequence in our database.
  • Subtype. The sequence subtype, if defined.
  • Country. The sampling country.
  • Year. The sampling year.
  • Description. The GenBank sequence description.
  • Score. The score is calculated by the BLAST algorithm. It takes into account both the length of alignment and percent of matching bases. Click score to jump to the alignment of this sequence, below (pairwise output only).

    Note that results are listed by score, and score is not always correlated with percent identity. For example, if you BLAST a full-length sequence, the top scores will be other full-length sequences; shorter sequences of higher identity will be missed.

  • E value. The likelihood that this match is to occur by chance.
  • Identities. The number and percent identity between your query and this subject, across the longest continuous alignment.
  • Location of match. Shows the location and length of the subject sequence (yellow) and the matched region between query and subject (red).


Pairwise versus Master-Slave output

Following the list of best matches there appears an alignment of your query sequence to its matches. There are two different styles of alignment to choose from in the pop-up menu on the BLAST search submission page.


Pairwise:

In pairwise output, the query is matched against each single subject sequence and the identities are shown by the vertical bar ( | ) character.


Score = 541 bits (273), Expect = e-154
Identities = 273/273 (100%), Positives = 273/273 (100%)
Query: 1 gtaattagatccgccaatttcacagacaatactaaaatcataatagtacagctgaatgaa  60
         ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtaattagatccgccaatttcacagacaatactaaaatcataatagtacagctgaatgaa  60
 
Query: 61 tctgtacaaattaattgtacaagacccaacaacaatacaagaaaaagtataaatatagga 120
          ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 tctgtacaaattaattgtacaagacccaacaacaatacaagaaaaagtataaatatagga 120
 

Master-Slave with identities:

Query seq 1 gtaattagatccgccaatttcacagacaatactaaaatcataatagtacagctgaatgaa  60
Z29296    1 ............................................................  60
U95417  433 .............a.........g......g............................. 492
U95414  433 .....c.................g......g............................. 492
U95413  433 .............a.........g......g............................. 492
U95411  433 .............a.........g......g............................. 492
U95410  433 .............a.........g......g............................. 492
L21486   49 .............a.........g......g............................. 108
L21468   49 .............a.........g......g............................. 108
U95419  433 .............a.........g......g............................. 492
U95400  430 ............aat........g.............g............c......... 489
U95392  430 ............aat........g.............g............c......... 489
L21480   49 .............a.........g......g............................. 108
Z67943    4 ....................................g...................g...  63

Here the query is aligned against ALL sequences producing a BLAST match and the identities are shown by the dot character.

Occasionally you may see lines in the alignment that look like those below. 


QUERY    121  ccaggcagagcattttatacaacaggagaaataataggagatataagtcaagcacattgt 180
AF105870 121  ............................................................ 180
                                                   \                       
                                                   |                        
                                                   a                    

This means that an "a" nucleotide occurs in sequence AF105870 at position 157. The sequence of AF105870 in the region of this insertion reads tagAgag, where the "A" marks the inserted "a". If you choose to download a file of all or part of this alignment the insertions are handled as follows. The insertion is placed into its sequence and gaps are opened in all other sequences at that point. In the example above the alignment in the region of the "a" insertion would look like:

QUERY    tag-gag 
AF105870 tagagag


Thank you. This is all for this post. Hope you guys understand and enjoy it.




No comments:

Post a Comment