The databases to run the sequence similarity search against. Multiple databases can be used at the same time.
The query sequence can be entered directly into this form. The sequence can be in GCG, FASTA, EMBL (Nucleotide only), GenBank, PIR, NBRF, PHYLIP or UniProtKB/Swiss-Prot (Protein only) format.
FASTA: protein:protein; DNA:DNA. FASTX: DNA Vs Protein. FASTY: DNA Vs Protein. SSEARCH: local protein:protein. GGSEARCH: global protein:protein. GLSEARCH: global/local protein:protein. TFASTX: Protein Vs DNA. TFASTY: Protein Vs DNA
(Protein searches) The substitution matrix used for scoring alignments when searching the database.
Score for the first residue in a gap.
Score for each additional residue in a gap.
FASTA uses a rapid word-based lookup strategy to speed the initial phase of the similarity search.
The KTUP is used to control the sensitivity of the search. Lower values lead to more sensitive, but slower searches.
Limits the number of scores and alignments reported based on the expectation value.
This is the maximum number of times the match is expected to occur by chance.
Limit the number of scores and alignments reported based on the expectation value.
This is the minimum number of times the match is expected to occur by chance.
This allows closely related matches to be excluded from the result in favor of more distant relationships.
For nucleotide sequences specify the sequence strand to be used for the search.
By default both upper (provided) and lower (reverse complement of provided) strands are used,
for single stranded sequences searching with only the upper or lower strand may provide better results.
Turn on/off the histogram in the FASTA result.
The histogram gives a qualitative view of how well the statistical theory fits the similarity scores calculated by the program.
Filter regions of low sequence complexity.
This can avoid issues with low complexity sequences where matches are found due to composition rather then meaningful sequence similarity.
However in some cases filtering also masks regions of interest and so should be used with caution.
The statistical routines assume that the library contains a large sample of unrelated sequences.
Options to select what method to use include regression,
maximum likelihood estimates, shuffles, or combinations of these.
Maximum number of match score summaries reported in the result output.
Maximum number of match alignments reported in the result output.
Specify a range or section of the input sequence to use in the search.
Specify the sizes of the sequences in a database to search against.
For example: 100-250 will search all sequences in a database with length between 100 and 250 residues, inclusive.
Turn on/off the display of all significant alignments between query and library sequence
Turn on/off annotation features. Annotation features shows features from UniProtKB,
such as variants, active sites, phospho-sites and binding sites that have been found in the aligned region of the database hit.
To see the annotation features in the results after this has been enabled, select sequences of interest and click to 'Show' Alignments.
This option also enables a new result tab (Domain Diagrams) that highlights domain regions.
Different score report formats.
Query Genetic code to use in translation