![]() |
Sequence Similarity Search (BLAST/ FASTA) Tutorial |
blastp compares an amino acid query
sequence against a protein sequence database
blastn
compares a
nucleotide query sequence against a nucleotide sequence
database
blastx compares a nucleotide query
sequence translated in all reading frames against a
protein sequence database
tblastn compares a protein query
sequence against a nucleotide sequence database
dynamically translated in all reading frames
tblastx compares the six-frame
translations of a nucleotide query sequence against the
six-frame translations of a nucleotide sequence database
BLAST
databases:
| Nucleotide Sequence Databases | |
| nr | All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant" |
| month | All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days |
| Drosophila genome | Drosophila genome provided by Celera and Berkeley Drosophila Genome Project (BDGP) |
| dbest | Database of
GenBank+EMBL+DDBJ sequences from EST Divisions |
| dbsts | Database of
GenBank+EMBL+DDBJ sequences from STS Divisions |
| htgs | Unfinished High Throughput Genomic Sequences: phases 0, 1 and 2 (finished, phase 3 HTG sequences are in nr) |
| gss | Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences |
| yeast | Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences |
pdb |
Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank |
| kabat
[kabatnuc] |
Kabat's database of sequences of immunological interest |
| vector | Vector subset of GenBank (R), NCBI, in ftp://ncbi.nlm.nih.gov/blast/db/ |
| mito | Database of mitochondrial sequences |
| alu | Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences |
| epd | Eukaryotic Promotor Database |
Peptide Sequence Databases |
|
| nr | All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF |
| month | All new or revised GenBank CDS translation +PDB+SwissProt+PIR+PRF released in the last 30 days |
| swissprot | Last major release of the SWISS-PROT protein sequence database (no updates) |
| Drosophila
genome |
Drosophila genome proteins provided by Celera and Berkeley, Drosophila Genome Project (BDGP) |
| yeast | Yeast (Saccharomyces cerevisiae) genomic CDS translations |
| ecoli | Escherichia coli genomic CDS translations |
| pdb | Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank |
| kabat [kabatpro] | Kabat's database of sequences of immunological interest |
| alu | Translations of select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences |
Sequence Input
The preferred query sequence format for the BLAST program is the FASTA format. Advanced BLAST tolerates both spaces and numbers and is case insensitive.
Parameter Settings
Parameter settings can be modified to optimize your BLAST search:
sequence filtering: enables the program to mask regions of a query sequence in order to exclude regions of low compositional complexity such as repetitive elements (turned ON as the default setting) it inserts "x"s in regions of low complexity
introduction of gaps: (regions of insertions and deletions); blastn and blastp search tools offer fully gapped alignments, while blastx and tblastn have "in-frame" gapped alignments; the tblastx search tool provides only ungapped alignments (turned ON as the default setting)
Statistical matrices: are used both to identify sequences in a database, and to predict the biological significance of the match. There are two main types of matrices, you can select the prefered matrix for your Advanced BLAST search
PAM (Percent Accepted Mutation) matrices: predicted matrices, most sensitive for alignments of sequences with evolutionary related homologs. The greater the number in the matrix name, the greater the expected evolutionary (mutational) distance, i.e. PAM30 would be used for alignments expected to be more closely related in evolution than an alignment performed using the PAM250 matrix
BLOSUM (Blocks Substitution Matrix): calculated matrices, most sensitive for local alignment of related sequences, ideal when trying to identify an unknown nucleotide sequence. BLOSUM62 is the default matrix set be the BLAST search tool
Results
Format
Results returned in either text format (default) or HTML format (must supply an e-mail address and select the HTML results format option)
A Request ID number is given such that the results be obtained at a later time, if you want the results immediately, click on the "Format Results" button
Formatting items such as the results format option and the number of descriptions and alignments in the results output are needed only for formatting, these items may be specified from the BLAST query form or at the time you request your results
Most results will be held for up to 24 hours; very large result files will be deleted after 30 minutes
BLAST Output
All BLAST programs produce a similar output consisting of:
program introduction
a schematic distribution of the ordered alignments of the query sequence to those in the databases
coloured bars are distributed in a way to reflect the region of alignment onto the query sequence. The colour legend represents the significance of the alignment scores
holding the mouse over a given bar will display a description of that specific alignment sequence in the above window; clicking on a specific bar will cause the browser to jump down to that particular alignment
an ordered set of biological definition line of the database sequences which have been significantly aligned to the query sequence
E value: the expect value is the probability that the associated match is due to randomness; the lower the E value, the more specific/significant the match
and a list of statistics specific to the particular BLAST search are displayed at the bottom of the output, they include the BLAST version number, the database and matrices used for the search
For more information on how to use BLAST see the following BLAST tutorials:
BLAST now comes in several flavors including (these
programs can also be accessed from the BLAST home page at NCBI):
MEGABLAST search: like a Basic BLAST search, but allows you to change certain parameters in order to perform a more specified BLAST search
GENERAL GUIDELINES for analysis CUT-OFFS of BLAST OUTPUT
2. Global Alignment: FASTA
Copyright
©2001
eBioinfogen Innovatives. All Rights Reserved
editor@ebioinfogen.com