BLAST E-MAIL SERVER Last change : February 24, 1995 1. Recent changes to BLAST e-mail server * Section (8) of the Help file on a new GenBank sequence submission tool, called BankIt, now available through the NCBI's home page on the World Wide Web has been updated. * IMPORTANT: Section (9) on the recomended rate to submit the requests has been updated. Due to the increase of abusive use of the server in that matter, we urge you to respect these rates for the benefit of all. * The BLAST FAQ is now also send in response to the "help" request. * The BLAST programs in use on the NCBI E-mail server have been updated to version 1.4, for better sensitivity and selectivity. Usage of the e-mail service is generally unchanged by this update, except for the availability of a new program called TBLASTX. * TBLASTX compares a nucleotide query sequence translated in all six reading frames against a nucleotide sequence database dynamically translated in all six reading frames (6 x 6 = 36 possible reading frame combinations). Due to the computational resource requirements of TBLASTX, this program is restricted to searching only the "dbest", "dbsts", and "alu" databases on the NCBI BLAST servers. * Due to its size, the "BLAST Manual" is sent as a separate message in response to "help" requests. The BLAST Manual has also been updated and includes a description of program output. * A new section (13) of the help message provides the postal and e-mail addresses for submitting new sequence data and updates to GenBank(R). * The QOFFSET directive is available to permit long query sequences to be broken into shorter, overlapping segments to be searched with individually while retaining proper coordinate numbering in the blast output. 2. BLAST E-Mail Server Instructions The NCBI BLAST E-mail server allows a similarity search to be performed against a standard sequence database by sending it a properly composed mail message that, among other elements, contains the user's nucleotide or protein query sequence. The query sequence is compared against the specified database using the BLAST algorithm along with several layered enhancements and statistics; the results are then returned to the user in a mail message. From the results, the user may be interested in retrieving specific database sequences and any accompanying annotation. The unique sequence identifiers reported in the search results can be used as sequence retrieval keys, or the user may retrieve sequences based on keywords of their own choosing, via the NCBI RETRIEVE E-mail server (see Section 10 below). The BLAST algorithm is a heuristic for finding ungapped, locally optimal sequence alignments, that was developed by the National Center for Biotechnology Information at the National Library of Medicine. The BLAST family of programs employs this algorithm to compare an amino acid query sequence against a protein sequence database or a nucleotide query sequence against a nucleotide sequence database, as well as other combinations of protein and nucleic acid. If you use BLAST as a tool in your published research, we ask that the following reference be cited: S. F. Altschul, W. Gish, W. Miller, E. W. Myers and D. J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-410. Table of Contents 1. Accessing the BLAST E-mail Server 2. Obtaining Help 3. Example of a BLAST E-mail Search Request 4. Formatting a Search Request 5. Table of Server Directives 6. The Query Sequence Format 7. Databases Available for Use with BLAST 8. Interpreting Sequence Identifiers 9. Submittal and Queueing of Requests 10. Retrieving Database Sequences from the RETRIEVE E-Mail Server 11. Obtaining BLAST Software 12. Subscribing to _NCBI News_ 13. Submitting New Data or Updates to the GenBank(R) Database 1. Accessing the BLAST E-Mail Server To access the server, send an electronic mail message containing a properly formatted request (as described below) to the following Internet address: blast@ncbi.nlm.nih.gov (Note: If you are not on Internet, you may need to change the format of the address. Consult your local system manager to determine the appropriate address format.) 2. Obtaining Help To receive the current set of instructions on using the BLAST E-mail server, send a help message to the normal BLAST E-mail server address: blast@ncbi.nlm.nih.gov Put the word HELP on a single line in the body of the mail message. (Any Subject line will be ignored and need not be specified). The BLAST Manual, which is appended to this help text, describes many program features and parameters. Only a subset of the parameters is supported by the BLAST E-mail service, as outlined in the table of Directives below. For answers to further questions not adequately explained in the help text or in the BLAST Manual, send a mail message with your question to NCBI staff at the address: blast-help@ncbi.nlm.nih.gov. Do not send HELP requests for the documentation to this address. To receive instructions on using the Retrieve E-mail Service, send a 'HELP' message to: (See also Section 10 below) retrieve@ncbi.nlm.nih.gov A frequent problem encountered by BLAST E-mail users is that the results from their requests are too voluminous for their computer to receive. Sometimes their computer has insufficient disk storage to hold the BLAST results; other times the mail software on their computer may be configured to reject large messages. The situation can be exacerbated by users who submit many requests within a short period of time. On computers that are shared, search requests submitted by one or by a few of its users can thus have a deleterious effect on all other users' ability to receive their results. Ways of minimizing the chance of this happening are to specify small values for the ALIGNMENTS and DESCRIPTIONS directives, indicate that a FILTER be applied to the query sequence; issue a SPLIT directive to have voluminous results broken down into several shorter messages (see the table of Directives below); and to pace search requests to no more than a few each hour, harvesting the results as they are returned. On occasion, a user may be able to send e-mail messages to the NCBI just fine, but a configuration problem with their computer -- or perhaps with a mail relay computer handling mail between them and the NCBI -- prevents them from receiving outside e-mail replies no matter how short. This case may apply to you if you do not obtain responses to search requests within a day of submitting them and receive no response to questions sent to blast-help@ncbi.nlm.nih.gov. either. You may then wish to include your telephone number in questions sent to the blast-help address, so that NCBI staff can respond by telephone if necessary. If poor response occurs over a weekend or holiday, during which time the BLAST servers run unattended and may crash with no one knowing it, this may not be the sign of a problem that requires the user's attention at all. 3. Example of a BLAST E-Mail Search Request The first four lines in the example below comprise a mail message header that is automatically created by a mail program and bundled with the message when it is sent. Nothing needs to be entered for the Subject of a BLAST E-mail request; the Subject is ignored by the BLAST E-mail server. The actual search request begins with the mandatory parameter 'PROGRAM' in the first column followed by the value 'blastn' (the name of the program) for searching nucleic acids. The next line contains the mandatory search parameter 'DATALIB' with the value 'nr' for the combined nucleic acids database. The third line contains an optional EXPECT parameter and the value desired for it. The fourth line contains the mandatory 'BEGIN' directive, followed by the query sequence in FASTA/Pearson format. Each line of information must be less than 80 characters in length. From: bluegene@someaddress.somewhere.edu Tue Jul 28 21:36:38 1992 Date: 28 Jul 1992 21:29:02-EDT To: blast@ncbi.nlm.nih.gov Subject: PROGRAM blastn DATALIB nr EXPECT 0.75 BEGIN >XYZ012 mygene XYZ tgcttggctgaggagccataggacgagagcttcctggtgaagtgtgtttcttgaaatcat caccaccatggacagcaaa We advise the use of the 'nr' (non-redundant) database for both protein and nucleic acid database searching, because it provides for comprehensive searches with more concise reports. See Section 6 for a complete list of database choices, which includes the individual component databases to the 'nr'. 4. Formatting a Search Request A search request consists of a mail message with a set of search parameters identifying the program (e.g., 'blastp' for proteins or 'blastn' for nucleic acids), the database to be searched, values related to the search parameters, and the query sequence to be used in the search. NOTE: Not all command line options described for the BLAST application programs in the BLAST Manual Page below have corresponding search parameters in the BLAST E-mail server. The E-mail server only understands the commands listed in the Table of Server Directives (see Section 5). Components of the mail message must be provided in this order: two mandatory directives (PROGRAM and DATALIB); any optional parameters or directives described below; another mandatory directive (BEGIN); and finally the query sequence on the remaining lines. Each directive must be specified on a separate line. The programs BLASTX and TBLASTN consume substantial computational resources. Therefore, please control the number of BLASTX and TBLASTN jobs you submit and do not submit simultaneous or duplicate requests. We ask for your cooperation in this matter, to avoid having to establish quotas on the number of BLAST E-mail jobs submitted per user and so that these two programs can continue to be made available on the server. 5. Server Directives Below is a table of directives for controlling the NCBI BLAST E-mail server. Some of the directives are required to be present in every search request, while others can alter the default behavior of the server. Some directives correspond to program parameters described in the accompanying BLAST Manual, but many program parameters have no corresponding server directive and, thus, can not be used through the NCBI BLAST E-mail Service. Attribute definitions: Mandatory = the directive must appear in every search request. (Currently there are only 3 mandatory directives: PROGRAM, DATALIB, and BEGIN) Numerical = the directive uses a numerical type of argument (e.g.,"EXPECT 100"). Text = the directive uses a textual type of argument (e.g., "DATALIB nr"). Boolean = the directive uses a Boolean type of argument ("yes", "no", "true", "false", "1", or "0") ArgumentRequired = the directive requires an argument. Server Directive Attributes Explanation -------- ---------- ----------- PROGRAM Mandatory The PROGRAM directive is used to specify the particular Text BLAST program to execute (blastp, blastn, blastx, or Argument- tblastn). All message lines following the PROGRAM Required directive are checked for validity by the server. Any erroneous directives appearing prior to the PROGRAM directive line will be ignored. For this reason, it is advised that the PROGRAM directive be the very first line in every request, so that the complete search request will be validity-checked. DATALIB Mandatory The DATALIB directive is used to indicate which Text database should be searched (see the list of databases Argument- in Section 6). Only one database can be searched per Required mail message. Databases can only be searched in their entirety--subsets are not available to be searched. HISTOGRAM Boolean Display a histogram of scores for each search; default is yes. Not applicable to BLASTX. (See parameter H in the BLAST Manual). DESCRIPTIONS Retricts the number of short descriptions of matching Numerical sequences reported to the number specified; default Argument- limit is 100 descriptions. (See parameter V in the Required manual page). See also EXPECT and CUTOFF. ALIGNMENTS Numerical Restricts database sequences to the number specified Argument- for which high-scoring segment pairs (HSPs) are Required reported; the default limit is 50. If more database sequences than this happen to satisfy the statistical significance threshold for reporting (see EXPECT and CUTOFF below), only the matches ascribed the greatest statistical significance are reported. (See parameter B in the BLAST Manual). EXPECT Numerical The statistical significance threshold for reporting Argument- matches against database sequences; the default value Required is 10, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable. (See parameter E in the BLAST Manual). CUTOFF Numerical Cutoff score for reporting high-scoring segment pairs. Argument- The default value is calculated from the EXPECT value Required (see above). HSPs are reported for a database sequence only if the statistical significance ascribed to them is at least as high as would be ascribed to a lone HSP having a score equal to the CUTOFF value. Higher CUTOFF values are more stringent, leading to fewer chance matches being reported. (See parameter S in the BLAST Manual). Typically, significance thresholds can be more intuitively managed using EXPECT. MATRIX Text Specify an alternate substitution scoring matrix for Argument- BLASTP, BLASTX and TBLASTN. The default matrix is Required BLOSUM62 (Henikoff & Henikoff, 1992). The valid alternatives that may be specified in a MATRIX directive include PAM40, PAM120 and PAM250. BLASTN does not use a scoring matrix; specifying the MATRIX directive in BLASTN requests is an error. (See parameter M in the BLAST Manual). STRAND Text Restrict a BLASTN or TBLASTN search to just the top Argument- or bottom strand of the database sequences; or restrict Required a BLASTX search to just the reading frames on the top or bottom strand of the query sequence. The required argument to this directive should be chosen from the vocabulary "top", "bottom", "plus", "minus", "+", "-", "complementary", "single", "double" or "both". Specifying "STRAND single" is equivalent to "STRAND top" or "STRAND plus", while "STRAND double" is equivalent to searching both strands (or not restricting the search to any one strand, which is the default). FILTER Text Mask off segments of the query sequence that have low compositional complexity, as determined by the SEG program of Wootton & Federhen (Computers and Chemistry, in press), or segments consisting of short-periodicity internal repeats, as determined by the XNU program of Claverie & States (Computers and Chemistry, in press). Filtering can eliminate statistically significant but biologically uninteresting reports from the blast output (e.g., hits against common acidic-, basic- or proline-rich regions) leaving the more biologically interesting regions of the query sequence available for matching against database sequences. Filtering is only applied to the query sequence (or its translation products, not to database sequences. FILTERING IS NOT AVAILABLE FOR USE WITH BLASTN. An optional argument to the FILTER directive indicates which filter to use, or a combination thereof. For example, SEG+XNU and XNU+SEG each indicate that both filters should be applied in succession to the query sequence and in the order indicated. When filtering is applied, the filtered query sequence is automatically displayed in the output with any masked regions indicated by runs of Xs. For BLASTX, the filter is applied separately to the full-length translation products from each reading frame, not to the nucleotide query sequence itself. It is not unusual for nothing at all to be masked by SEG, XNU, or both, when applied to sequences in SWISS-PROT, so filtering should not be expected to always yield an effect. Furthermore, in some cases, sequences are masked in their entirety, indicating that the statistical significance of any matches reported against the unfilterd query sequence should be suspect. QOFFSET Numerical Adds the value of the numerical argument to every Argument- coordinate number reported for the query sequence. Required This helps when a long query sequence must be broken into shorter, overlapping segments in order to complete individual searches within execution time limits. GCODE Numerical Select an alternate genetic code for translation by Argument- the programs BLASTX and TBLASTN. The standard or Required universal code is 0. See the description of the C parameter in the BLAST Manual for a complete list of the alternate genetic codes that are available. PATH Text Provide your return Internet e-mail address if the Argument- BLAST E-mail server seems unable to correctly parse Required your address from the message header of your search requests. If the server is not responding to your requests, try explicitly telling it your return e-mail address with this directive. SPLIT Numerical Split voluminous output into individual files that are each no more than 1000 lines in length. Some mailers won't accept messages that are longer than this; and BLAST output can be quite voluminous, depending upon the query and the selected parameter values. Avoid using this directive, however, unless you are unable to receive long messages. An optional numeric argument can be used to specify a different number than the default number of 1000 lines per message. Users with return addresses that appear to be on BITNET will have their output automatically SPLIT. BEGIN Mandatory This mandatory directive is not paired with any value. It must appear after all other parameters and immediately before the query sequence. No other directives or parameters besides those described above are selectable through the BLAST E-mail server. Only those parameters with an M in the Attributes column are required to be present in a BLAST E-mail request. Lowering the CUTOFF score from its default value (calculated from an EXPECT of 10) by even a few units often produces a profound increase in the number of HSPs reported. The number reported tends to increase exponentially with decreasing CUTOFF score. However, the lowest scoring alignments are expected to not only be statistically insignificant but biologically uninteresting as well. Hence, the DESCRIPTIONS and ALIGNMENTS parameters are used to govern the volume of output produced. Even when a high CUTOFF score (low EXPECT value) is used, the actual number of HSPs reported may be great, depending on the number of true homologs present in the database and on the prevalence of low compositional complexity regions in the query and database sequences. The latter characteristic can yield estimates of statistical significance that are not in accordance with the biological interest in such alignments; matches between low compositional complexity regions should often be discounted heavily. Use the FILTER directive if you believe low-compositional complexity may be yielding an inordinate number of matches. BLASTN was written to achieve high-speed nucleotide versus nucleotide sequence comparisons, but at the expense of decreased sensitivity (particularly for low-scoring or even moderately diverged homologs). The number of alignments observed with this program is frequently less than the number predicted from the value of the EXPECT parameter. Except for closely related homologs, far better sensitivity in detecting homology between coding regions is to be expected from protein level comparisons using BLASTX or TBLASTN, due to the combination of degeneracy in the genetic code and functional constraints on the encoded protein. 6. The Query Sequence Format The last server directive that must appear in every request is the BEGIN directive. The query sequence should immediately follow the BEGIN directive and must appear in FASTA/Pearson format. A sequence in FASTA/Pearson format begins with a single-line description. The description line, which is required, is distinguished from the lines of sequence data that follow it by having a greater-than (">") symbol in the first column. For the purposes of the BLAST E-mail server, the text of the description is arbitrary. In order to successfully pass through all computers that may need to relay the search request to the NCBI, all lines of the sequence (including the description line) should be kept to 80 characters or less in length. Only one query sequence is accepted per mail message. Example sequence in FASTA format: >MNKSV40 Monkey DNA fragment of unknown function, acquired by Simian virus ggttaaaatggtgatttttatgctttgtgtattttaccacttttttttttttaaggcaga ttcctttcaatcatctgagtgagcccagtgcgatctgaagggtccctacaggtggaagag gcagtggccaggatcgcggt Mail programs typically allow the user to import a file containing a sequence into the mail message. Assuming the sequence is already in FASTA/Pearson format, the sequence file should be imported into the mail message on the line after the 'BEGIN' directive. Please follow the complete message format shown in the example in Section 3. A blank line after the last line of the query sequence is recommended for inclusion with each request, since some mailer programs automatically append a signature block to the messages they send; without an intervening blank line, the BLAST E-mail server will treat the signature block as being part of the sequence itself. Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue). The nucleic acid codes supported are: A --> adenosine M --> A C (amino) C --> cytidine S --> G C (strong) G --> guanine W --> A T (weak) T --> thymidine B --> G T C U --> uridine D --> G A T R --> G A (purine) H --> A C T Y --> T C (pyrimidine) V --> G C A K --> G T (keto) N --> A G C T (any) - gap of indeterminate length Although the IUB/IUPAC standards include several nucleotide ambiguity codes (e.g., N), the current version of BLASTN will not consider as "matching" any ambiguity code paired with any other code -- even when paired with itself For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are: A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine - gap of indeterminate length 7. Databases Available for Use with BLAST The following databases are available for BLAST searching on the NCBI server. The names shown are for use in DATALIB directives, to select which database to search. The short strings in square brackets are abbreviations that appear in the definition lines of BLAST output; they indicate the database from which each sequence originated; however, these abbreviations are _not_ for use in DATALIB directives. More abbreviations will appear in BLAST server output in the future as additional databases are incorporated into GenBank(R) and as the procedures for obtaining these data are improved. For instance, it will be apparent from these strings when a sequence in GenBank originated from the EMBL Data Library or from the DNA Database of Japan (DDBJ). The "nr" databases are composites of multiple databases, produced by merging the annotations of identical sequences from the different sources into single entries. Only when two (or more) sequences are absolutely identical both in length and in overall sequence are their annotations concatenated to form one entry in the nr databases. Sequences that are perfect substrings are _not_ merged. The nr databases are also constructed in such a way as to suppress those sequences in the major release of GenBank, EMBL and SWISS-PROT for which a revision appears in the respective update. The source for sequences having EMBL-owned accessions is the EMBL flat files; the GenBank flat files are taken as the source for all other accessions (GenBank, LANL, and DDBJ). Not all of the databases available for searching through the NCBI BLAST server are accessible for sequence retrieval through the NCBI Retrieve server; the list of INaccessible databases currently includes "epd", "acr" and "alu". Each of these databases can however be obtained via anonymous FTP (see the FTP locations in the table below). Full dbEST and dbSTS reports can be obtained from the NCBI Retrieve server. To get more information on these servers, send the following message to retrieve@ncbi.nlm.nih.gov: datalib dbest {or dbsts} help **Peptide Sequence Databases** Name [abbreviation] Description =========== ============= nr "Non-redundant" protein database; it includes sequences from the PDB, SWISS-PROT, PIR(R), GenPept, and GenPept updates swissprot [sp] the last major release of the SWISS-PROT protein sequence database (no updates) pir [pir] the last major release of the NBRF PIR(R) protein sequence database spupdate [sp] Cumulative update to the SWISS-PROT major release genpept [gp] GenPept (translated coding sequence features from the last major release of GenBank(R)) gpupdate [gp] GenPept update (cumulative daily updates) pdb [pdb] sequences derived from the 3-dimensional structure Brookhaven Protein Data Bank kabatpro [kabat] Kabat's database of sequences of immunological interest tfd [tfd] Transcription Factors Protein Database acr [acr] Ancient Conserved Region subset of SWISS-PROT (See the contents of the directory /pub/jmc/acr on the ncbi.nlm.nih.gov anonymous FTP server) alu [alu] Translations of select Alu repeats from REPBASE. This qualified database, suitable for accurately masking Alu repeats from query sequences, is available for anonymous FTP downloading from ncbi.nlm.nih.gov beneath the /pub/jmc/alu directory. **Nucleotide Sequence Databases** Name [abbreviation] Description =========== ============= nr Non-redundant nucleotide sequence database; it includes sequences from PDB, GenBank(R), GenBank(R) updates, EMBL, and EMBL updates genbank [gb] the last major release of the GenBank(R) nucleotide sequence database (does not include updates) gbupdate [gb] GenBank(R) update (cumulative daily updates to the last major release) embl [emb] the last major release of the EMBL Data Library nucleotide sequence database (does not include updates) emblu [emb] the latest EMBL Data Library cumulative weekly update to the current major release pdb [pdb] sequences derived from the 3-dimensional structure Brookhaven Protein Data Bank alu [alu] Select Alu repeats from REPBASE. This qualified subset of REPBASE is suitable for accurately masking Alu repeats from query sequences and is available for anonymous FTP downloading from ncbi.nlm.nih.gov beneath the /pub/jmc/alu directory. See "Alu alert" by Claverie and Makalowski, Nature vol. 371, page 752 (1994). vector [vector] Vector subset of GenBank(R), NCBI, (anonymous FTP to ncbi.nlm.nih.gov beneath the /pub/vector directory) kabatnuc [kabat] Kabat's database of sequences of immunological interest dbest [dbest] Database of Expressed Sequence Tags (ESTs) dbsts [dbsts] Database of Sequence Tagged Sites (STSs) epd [epd] Eukaryotic Promotor Database 8. Interpreting Sequence Identifiers The syntax of sequence identifiers used by the NCBI BLAST server depends on the database from which each sequence was obtained. The table below outlines the identifier syntax for several databases. For reliable long-term reference to a particular sequence (especially for publication purposes), it is recommended that both the accession number and the name of the database be noted. Database Name Identifier Syntax ============================ ======================== GenBank gb|accession|locus "GenPept" derivative of GenBank gp|accession|locus EMBL Data Library emb|accession|name NBRF PIR pir|accession|entry SWISS-PROT sp|accession|name Brookhaven Protein Data Bank pdb|name|chain Kabat's Sequences of Immuno... gnl|kabat|name TFD gnl|tfd|name Eukaryotic Promotor Database gnl|epd|name For example, an identifier might be "gb|M73307|AGMA13GT", where the "gb" tag indicates that the identifier refers to a GenBank sequence, "M73307" is its GenBank ACCESSION, and "AGMA13GT" is the GenBank LOCUS; a publication that cites this sequence should state that it was GenBank accession M73307. Individual database identifiers may be concatenated, separated from one another by a vertical bar (``|''), producing compound identifiers as in "gi|176485|gb|M73307|AGMA13GT". gi (pronounced "jee-aye") identifiers are coming into increasing use at the NCBI. These identifiers provide a uniform and stable naming convention across all of the supported databases (currently GenBank, EMBL, DDBJ, SWISS-PROT, PIR, PDB and PRF) within the NCBI GenInfo Integrated Database. If a sequence (nucleotide or peptide) changes in any way, the NCBI assigns it a new, unique gi identifier -- even if the accession number remains unchanged. Thus, gi identifiers provide stable names by which precise sequences can be referenced. This is of utmost importance to people who compute information from sequences. gi identifiers have been available for use within Entrez, where they are called "NCBI Seq IDs," since release 9. GenPept is not a database in its own right, but is a derivative work, obtained by parsing amino acid sequences out of the CDS (coding region) features annotated in GenBank. A GenPept "accession" number is the primary ACCESSION number of the GenBank nucleotide sequence record from which the CDS was obtained; and a GenPept locus is derived from the LOCUS of the same GenBank record, followed by an ordinal number pointing to the CDS feature from which the amino acid sequence was obtained. For example, the GenPept identifier "gp|M59434|AGMHSV1A_3" would be assigned to the amino acid sequence obtained from the third CDS feature annotated in the GenBank record whose primary accession number and locus name were, respectively, M59434 and AGMHSV1A. Unfortunately, GenPept identifiers are unstable, as the accession quoted from GenBank does not refer to a specific CDS and the locus_cds# may change when the record is updated either by changing the locus or adding/removing CDS features. Caution is advised in the use of PIR identifiers, too, as PIR accessions are not guaranteed to be unique; and neither the PIR entry names nor the PIR accessions provided in the NCBI BLAST server output are guaranteed by the NBRF to be stable from release to release of the database. 9. Submittal and Queueing of Requests The BLAST E-mail server can not process requests received at arbitrarily high rates. For this reason, it is asked that individual users submit slow query (blastx, tblastn, and tblastx) not faster than 1 per minute and fast jobs (blastn and blastp) no faster than 1 per 2 minutes. Moreover, over a period of 7 minutes we expect to receive only three fast jobs or one fast and one slow. For users sending multiple blast jobs in batch, after quite four years in service for the blast e-mail server, it appears that the right hour to send us such number of jobs is during the night (EST time). THIS IS IMPORTANT because a consequence of overloading the server with rapid-fire requests is that other users of this shared facility would essentially be blocked from using it. More stastistics are given in the BLAST Notebook [URL http://www.ncbi.nlm.nih.gov/] Requests sent to the BLAST E-mail server are partitioned into two queues depending on the particular BLAST program requested. BLASTN and BLASTP requests are entered in a "Fast" queue. BLASTX and TBLASTN requests are placed in a "Slow" queue. Both queues operate concurrently, but Fast queue requests often complete faster and the Fast queue is typically shorter than the Slow queue, so individual BLASTN and BLASTP requests are often responded to considerably sooner than BLASTX and TBLASTN requests. Both queues occasionally get backed up with requests, and re-submissions of the same queries can only exacerbate the problem. *** So that access to the compute-intensive BLASTX and TBLASTN programs can continue to be provided, please refrain from re-submitting any search--not just BLASTX or TBLASTN--within any given 24 hour period. That should provide ample opportunity for the queues to clear of other requests or for any underlying problem with the queues to be resolved.*** 10. Retrieving Database Sequences from the RETRIEVE E-Mail Server Complete sequence records can be retrieved either by locus name or by accession number from the NCBI RETRIEVE server. To obtain full instructions on using the RETRIEVE server, send a help message to: retrieve@ncbi.nlm.nih.gov Put the word 'HELP' on a single line in the body of the mail message. No subject line is needed. A sample retrieval message and brief instructions follow below. The first four lines of the message are filled in automatically by your mail program. From: bluegene@someaddress.somewhere.edu Tue Jul 28 21:36:38 1992 Date: 28 Jul 1992 21:29:02-EDT To: retrieve@ncbi.nlm.nih.gov Subject: DATALIB genbank BEGIN BOVPRL CHKTUBA J02852 In the above Retrieve request, the 'DATALIB' directive designates the database to search. (BLAST output from searches against the 'nr' database contain acronymic database identifier(s) in the description line for each database sequence. For the RETRIEVE server, 'nr' is not a valid database. Instead, the component database indicated by the acronym must be searched). Then the 'BEGIN' directive appears on a line by itself and indicates the beginning of the query. The query can consist of locus names, accession numbers, or any other text words used in records. Lines of the query are automatically joined by the Boolean connector OR, so this query will retrieve records containing any of the three values BPVPRL, CHKTUBA, or J02852. Several words can be combined on one line; these are also combined by default using the Boolean OR. 11. Obtaining BLAST Software Public domain, UNIX-compatible source code for the BLAST programs, written in the C language, is available via anonymous ftp from ncbi.nlm.nih.gov [130.14.20.1] beneath the /pub/blast directory. 12. Subscribing to _NCBI News_ A free subscription to _NCBI News_, a periodical describing the services and resources provided by the NCBI, may be obtained by sending your name and postal mailing address to info@ncbi.nlm.nih.gov and stating that you wish to be placed on the _NCBI News_ mailing list. The NCBI postal mail address and telephone numbers are: National Center for Biotechnology Information National Library of Medicine Building 38A, Room 8N-806 8600 Rockville Pike Bethesda, MD 20894-0001 Voice: (301) 496-2475 FAX: (301) 480-9241 E-mail: info@ncbi.nlm.nih.gov 13. Submitting New Data or Updates to the GenBank(R) Database Authors have now two ways to directly submit new data or updates for the GenBank(R) database, including submissions composed using the AuthorIn software package. * First, new submission and updates can be sent to the NCBI at the above-mentioned postal mail address or to one of the following e-mail addresses. If data is submitted on diskette, please indicate whether it is a PC/DOS or Macintosh diskette. E-mail submissions of new sequences: gb-sub@ncbi.nlm.nih.gov E-mail submissions of updates: update@ncbi.nlm.nih.gov A free copy of AuthorIn version 3.0 may be obtained by sending your name and address to authorin@ncbi.nlm.nih.gov Please specify whether you would like to receive the PC/DOS or Macintosh version. Note: an update to any record found in GenBank(R) may be sent to the above update address. If the record is actually owned by EMBL or DDBJ, the update will be forwarded for you to the respective data management facility for its processing. In this manner, it is not necessary to remember which database received your original submission. * Second, new submission can be submit using BankIt, available through the NCBI's home page on the World Wide Web. We actually discourage people to use BankIt for submiiting sequence updates. The URL is http://www.ncbi.nlm.nih.gov/ For more information, point your browser to the NCBI URL or contact GenBank User Services at info@ncbi.nlm.nih.gov GenBank is a registered trademark of the National Institutes of Health. The GenBank database is managed and distributed by the National Center for Biotechnology Information, National Library of Medicine.