Simplified Instructions
RETRIEVE email server
The unmodified instructions are here.
The RETRIEVE e-mail server allows users to retrieve records by keyword
searching from sequence databases at the National Center for Biotechnology
Information (NCBI), National Library of Medicine, NIH, in Bethesda, MD.
To access the server, send an electronic mail message containing the query
formatted as described below to the following Internet address:
retrieve@ncbi.nlm.nih.gov
Nothing is put in the Subject: line.
Commands to the server are put, one command per line, in the body of the
message.
In the simplest case, to retrieve a sequence whose accession number is U02356
from Genbank, your message would be:
DATALIB genbank
BEGIN
U02356
More generally, the format of a request is:
DATALIB <name of the library to search>
[OPTIONAL PARAMETERS]
BEGIN
searchterm1 searchterm2 searchterm3
searchterm4 searchterm5
.
.
.
searchtermN
All of the sequence files containing any of the search terms will be emailed
back to you.
Each searchterm can be put between quotes, and should be if it contains any
"special" characters; e.g. and underscore (_).
Search terms can include the following wildcards:
- #
- match any single character
- $
- match zero or one character
- *
- match zero or more characters
To limit your search, you can include AND, OR or NOT between searchterms on
one line. (OR is the default, but can be explicitly entered.) For more
details, see the Full Instructions.
The exception to the above is that if you wish to retrieve several entries
from Swiss-Prot, you must use an
explicit OR operator, e.g.:
DATALIB swissprot
BEGIN
"CYB_POMSU" OR
"POLG_SVDVH" OR
"YS1_FOAMV"
Your choices for DATALIB are:
- gb or genbank
- GenBank DNA sequence database (this choice searches both
the current public release as well as the updates)
- gbu or gbupdate
- GenBank update (ONLY daily updates since the last
public release)
- gbonly
- GenBank Full Release (ONLY last full release; no search of the
updates)
- embl or emb
- EMBL DNA sequence database (this choice searches both the
current public release as well as the updates)
- emblupdate or emblu
- EMBL update (ONLY daily updates since the last
public release)
- emblonly
- EMBL Full Release (ONLY last full release; no search of the
updates)
- sp or swiss or swissprot
- SWISS-PROT protein database
- spu or swissprotupdate
- SWISS-PROT updates (cumulative weekly)
- pir
- PIR protein database
- vector
- Vector sequence subset of GenBank (LANL)
- vecbase
- Vecbase (1987 version)
- gp or genpept
- GenPept (translated GenBank)
- gpu or gpupdate
- GenPept update (cumulative daily updates)
- kabatnuc
- Kabat's database of sequences of immunological interest --
nucleotide sequences
- kabatpro
- Kabat's database of sequences of immunological interest --
protein sequences
- epd
- Eukaryotic Promoter Database (Philip Bucher)
- pdb
- Protein Data Bank (3-D protein structure database)
- tfd
- Transcription Factors Database
For retrieving records from the 'dbEST' (Database of Expressed Sequence Tags)
or 'dbSTS' (Database of Sequence Tagged Sites), there is a different query
format. To obtain the help documentation for these two databases send a mail
message to:
retrieve@ncbi.nlm.nih.gov
with the following in the body of the message
datalib dbest (or dbsts)
help
Optional parameters are:
- MAXDOCS
- Maximum number of documents to transmit; default is 20 (Note
that there is a total document limit of 2400).
- MAXLINES
- Maximum number of lines of output (limit is 50000); default
is 1000.
- TITLES
- Display only titles of records; no value is necessary, simply
use the search parameter TITLES and you will retrieve titles only.
- STARTDOC
- Starting document number; default is 1. (Use in successive
mail messages to retrieve different blocks of documents when a query results
in more data being retrieved than the values of MAXDOCS and MAXLINES allow to
be transmitted. Also, some mail systems set a limit to the number of lines or
bytes in a single message).
- PATH
- Put your e-mail return address here if there are problems in
having the e-mail server return responses to the proper address.