Simplified Instructions
RETRIEVE email server

The unmodified instructions are here.

The RETRIEVE e-mail server allows users to retrieve records by keyword searching from sequence databases at the National Center for Biotechnology Information (NCBI), National Library of Medicine, NIH, in Bethesda, MD.

To access the server, send an electronic mail message containing the query formatted as described below to the following Internet address:

retrieve@ncbi.nlm.nih.gov

Nothing is put in the Subject: line.

Commands to the server are put, one command per line, in the body of the message.

In the simplest case, to retrieve a sequence whose accession number is U02356 from Genbank, your message would be:

DATALIB genbank
BEGIN
U02356
More generally, the format of a request is:

DATALIB <name of the library to search>
[OPTIONAL PARAMETERS]
BEGIN
searchterm1 searchterm2 searchterm3
searchterm4 searchterm5
.
.
.
searchtermN
All of the sequence files containing any of the search terms will be emailed back to you.

Each searchterm can be put between quotes, and should be if it contains any "special" characters; e.g. and underscore (_). Search terms can include the following wildcards:

#
match any single character
$
match zero or one character
*
match zero or more characters
To limit your search, you can include AND, OR or NOT between searchterms on one line. (OR is the default, but can be explicitly entered.) For more details, see the Full Instructions.

The exception to the above is that if you wish to retrieve several entries from Swiss-Prot, you must use an explicit OR operator, e.g.:

DATALIB swissprot
BEGIN
"CYB_POMSU" OR
"POLG_SVDVH" OR
"YS1_FOAMV"
Your choices for DATALIB are:

gb or genbank
GenBank DNA sequence database (this choice searches both the current public release as well as the updates)
gbu or gbupdate
GenBank update (ONLY daily updates since the last public release)
gbonly
GenBank Full Release (ONLY last full release; no search of the updates)
embl or emb
EMBL DNA sequence database (this choice searches both the current public release as well as the updates)
emblupdate or emblu
EMBL update (ONLY daily updates since the last public release)
emblonly
EMBL Full Release (ONLY last full release; no search of the updates)
sp or swiss or swissprot
SWISS-PROT protein database
spu or swissprotupdate
SWISS-PROT updates (cumulative weekly)
pir
PIR protein database
vector
Vector sequence subset of GenBank (LANL)
vecbase
Vecbase (1987 version)
gp or genpept
GenPept (translated GenBank)
gpu or gpupdate
GenPept update (cumulative daily updates)
kabatnuc
Kabat's database of sequences of immunological interest -- nucleotide sequences
kabatpro
Kabat's database of sequences of immunological interest -- protein sequences
epd
Eukaryotic Promoter Database (Philip Bucher)
pdb
Protein Data Bank (3-D protein structure database)
tfd
Transcription Factors Database
For retrieving records from the 'dbEST' (Database of Expressed Sequence Tags) or 'dbSTS' (Database of Sequence Tagged Sites), there is a different query format. To obtain the help documentation for these two databases send a mail message to:

retrieve@ncbi.nlm.nih.gov

with the following in the body of the message

datalib dbest (or dbsts)
help
Optional parameters are:
MAXDOCS
Maximum number of documents to transmit; default is 20 (Note that there is a total document limit of 2400).
MAXLINES
Maximum number of lines of output (limit is 50000); default is 1000.
TITLES
Display only titles of records; no value is necessary, simply use the search parameter TITLES and you will retrieve titles only.
STARTDOC
Starting document number; default is 1. (Use in successive mail messages to retrieve different blocks of documents when a query results in more data being retrieved than the values of MAXDOCS and MAXLINES allow to be transmitted. Also, some mail systems set a limit to the number of lines or bytes in a single message).
PATH
Put your e-mail return address here if there are problems in having the e-mail server return responses to the proper address.
Valid HTML 2.0!