See the REVISION HISTORY for recent bugfixes and enhancements.
use Bio::UnivAln;
my $aln = Bio::UnivAln->new('t/alnfile.fasta');
$aln = Bio::UnivAln->new(-file=>'t/alnfile.aa',
-desc=>'Sample alignment',
-type=>'amino',
-ffmt=>'raw' # 1 line in file -> 1 sequence
);
$aln = Bio::UnivAln->new(-seqs=>"TCCCGCGTCAACTG\nTGGTGCTTCAACCG\nACTTG--TCAACTG");
$aln = Bio::UnivAln->new(-seqs=>[$sequence_strg,\@character_list,$bioSeqObject]);
$aln = Bio::UnivAln->new(-seqs=> ['ACCCGCGTCAACTG',
['A','G','G','G','G','C','T','T','C','A','A','C','C','G'],
Bio::PreSeq->new(-seq=>'ACTTG--TCAACTG')
]);
$aln = Bio::UnivAln->new($file,$seqs,$id,$desc,$names,$row_ids,$col_ids,
$row_descs,$col_descs,$numbering,$type,$ffmt,$descffmt,$inplace);
OUTPUT
$aln->ffmt('fasta'); # set default output format
print "\n aln in default format:\n", $aln->layout();
print "\n aln in raw format:\n", $aln->layout("raw");
print "\n aln in fasta format:\n", $aln->layout("fasta"), "\n";
print "\n aln in MSF format (via readseq):\n", $aln->layout("MSF"), "\n";
SLICING
my $alnSlice = $aln->seqs(1,3,1,2); # multiline string of rows 1-3, columns 1-2
print $alnSlice, "\n";
my @alnSlice = $aln->seqs([1..3], [1,4]); # multidimensional array
# of rows 1-3, columns 1+4
for $aref ( @alnSlice ) { print @$aref, "\n"; }
$alnSlice = $aln->seqs([3,2,3,1], [1,3..5]); # rows 3,2,3,1, cols 1,3..5
ADVANCED SLICING
sub has_purine {
my $str = join "", @{ $_[0] };
if ($str =~ /[AaGgRr]+/) {return 1;} else {return 0;}
}
@alnSlice = $aln->seqs([1,2,3], \&has_purine); # rows 1-3, and from these
# only the entries of columns for which has_purine returns 1
$alnSlice = $aln->seqs({ids=>'SeqA SeqB'},{ids=>'ColA ColB ColC'});
$aln->inplace(1); $aln->seqs({ids=>'A B'},[1..6]); $aln->inplace(0);
# manipulates the object itself, assigning the slice internally
MAPPING
@resSlice = $aln->map_r(\&has_purine, [1..3]); # 1,0,1 if row 1+3 has purine @resSlice = $aln->map_c(\&has_purine, [1,4]); # 1,0 if column 1 has purine
UTILITIES
$resSlice = $aln->consensus(); # 75% majority needed for consensus letter $resSlice = $aln->consensus(0.6, [1,3]); # 60% majority, columns 1+3 only $resSlice = $aln->var_sites(); # no columns that are invariable ... $indices = $aln->var_inds(); # ... and their indices $resSlice = $aln->invar_sites(); # only columns that are invariable $indices = $aln->invar_inds(); # (..inds() is available for most utilities) $resSlice = $aln->var_sites(0.6); # no columns with >= 60% maj. of 1 letter $resSlice = $aln->invar_sites(0.6); # only columns with >= 60% majority $resSlice = $aln->gap_free_sites(); $resSlice = $aln->no_allgap_sites(); # exclude sites that only have gaps $resSlice = $aln->reverse([1,3]); # reverse of rows 1+3 only $resSlice = $aln->complement([1,3]); # dna/rna complement of rows 1+3 only $resSlice = $aln->revcom([1,3]); # reverse complement, rows 1+3 only $resSlice = $aln->remove_gaps(); # original sequences without gaps
% perl Makefile.PL
% make
% make test
% make install
For intensive testing, see t/univaln.t2. Run that script via
% perl t/univaln.t2 > t/my_univaln
and compare the output with the file t/univaln.o . Expected error messages can be found in t/univaln2_expected_errors .
If you do not have superuser installation privileges, you can do one of two things:
1) Either copy the UnivAln.pm module into the ``Bio'' subdirectory of an accessible Perl module directory (e.g. /my/perl/lib/Bio/). (One possible ``accessible Perl module directory'' is your current working directory; there you can create a subdirectory named ``Bio'' and place UnivAln.pm in that subdirectory. Then, you can start scripts using Bio::UnivAln from your current working directory.)
2) Or, run the above commands but specify an alternate location for the module by supplying a PREFIX argument to the first command:
% perl Makefile.PL PREFIX=/my/perl/lib
The make install command may report problems with creating documentation files (pod2man/perllocal.pod); please find the documentation in UnivAln.pm.html instead.
If you have Bio::PreSeq installed and want to test Bio::PreSeq usage by Bio::UnivAln, you need enable the line ``use Bio::PreSeq;'' at the beginning of t/univaln.t2; Bio::PreSeq can be found via http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/ . Note that the test script will also test error handling; you can expect the error messages included in the file univaln.t2_expected_errors -- these are OK.
If you wish that the module uses Don Gilbert's readseq package for sequence format conversion (Version 1 Feb 1993), you can set the environment variable `READSEQ_DIR'`` appropriately. (Currently, only ``fasta'' and ``raw'' format are supported directly by UnivAln.pm.) Then, the program detects and uses `readseq' automatically, if it is in the specified directory (the default directory is ``./''). Modifying the environment variable `READSEQ' changes the expected name of the executable. For example, $ENV{READSEQ_DIR} may be ``/vol/biotools/bin/'' and $ENV{READSEQ} ``readseq2.0''. Readseq will give you support for PIR/CODATA, MSF/GCG and PAUP/NEXUS formats; ASN.1 does not seem to work reliably. (URLs: http://iubio.bio.indiana.edu/IUBio-Software+Data/molbio/readseq/ http://dot.imgen.bcm.tmc.edu:9331/seq-util/Help/readseq.html http://bimas.dcrt.nih.gov/molbio/readseq/formats.html )
Similar support for conversion from Clustal format, using Clustal as a converter, is implemented, but not properly tested and documented. The relevant environment variables are `CLUSTAL_DIR' and `CLUSTAL'. (URL: http://www-igbmc.u-strasbg.fr/BioInfo/ClustalW/Top.html )
(Thanks to Steve A. Chervitz for his help with bundling the module !)
(Most examples below are taken from the test script(s) that
can be found in directory ``t'' of the Bio::UnivAln distribution. There you
will also find a CGI script producing some graphics, which is currently in
alpha status: I suspect it needs some refitting to run on a different
server. If you'd like to know more about multiple alignments, in theory and
practice, check out the tutorial at http://www.techfak.uni-bielefeld.de/bcd/Curric/MulAli/mulali.html
)
my $aln = Bio::UnivAln->new('t/alnfile.fasta');
The first parameter is regarded as a file name; if you pass additional
parameters, they will overwrite the parameters read in from the file. You
can use named parameters; take a look at the documentation on the
new() method in the appendix for a list of all parameters, and
their names. In the following example, description, sequence type, and file
format are provided. The file format will relieve Bio::UnivAln from
guessing it; however, there are no guarantees if you bypass Bio::UnivAln's
guessing _and_ provide an incorrect file format.
$aln = Bio::UnivAln->new(-file=>'t/alnfile.aa',
-desc=>'Sample alignment',
-type=>'amino',
-ffmt=>'raw' # 1 line in file -> 1 sequence
);
If no description (``-desc'') is given, a default one will be based on the
file name. The format type is also the default format for output; if both
differ, you need to specify the input format when you construct the
$aln object, and then use the accessor ffmt() to
set the default output format. Bio::UnivAln can be passed the aligned
sequences directly, using the named parameter ``-seqs''. It takes a
multi-line string, or any mix of strings, array references, and Bio::PreSeq
objects:
$aln = Bio::UnivAln->new(-seqs=>"TCCCGCGTCAACTG\nTGGTGCTTCAACCG\nACTTG--TCAACTG");
$aln = Bio::UnivAln->new(-seqs=>[$sequence_strg,\@character_list,$bioSeqObject]);
$aln = Bio::UnivAln->new(-seqs=> ['ACCCGCGTCAACTG',
['A','G','G','G','G','C','T','T','C','A','A','C','C','G'],
Bio::PreSeq->new(-seq=>'ACTTG--TCAACTG')
]);
layout() method returns the sequence in a specified
format; supported formats are listed under the header Alignment Formats.
$aln->ffmt('fasta'); # set default output format
print "\n aln in default format:\n", $aln->layout();
print "\n aln in raw format:\n", $aln->layout("raw");
print "\n aln in fasta format:\n", $aln->layout("fasta"), "\n";
print "\n aln in MSF format (via readseq):\n", $aln->layout("MSF"), "\n";
seqs(). Here, intersection means that those elements are
returned that are both in rows 1-3 and in columns 1-2.
$alnSlice = $aln->seqs(1,3,1,2); # rows 1-3, columns 1-2 $alnSlice = $aln->seqs(); # returns the whole alignment
Here's a diagram illustrating the general case, intersecting rows
$y_lo to $y_hi, and columns $x_lo to $x_hi.
$x_lo $x_hi
: :
.. $y_lo ...:...........:.................
:::::::::::::
:: SELECTED :
::: PART ::::
:::::::::::::
.. $y_hi ...:::::::::::::.................
: :
Fig.1
Maximal intervals will be assumed if no parameters are provided. Per
default, the first row (sequence) has index 1 (not 0), and the first column
has index 1 (not 0). The latter can be modified using
numbering().
$alnSlice = $aln->seqs([1..3], [1,4]); # rows 1-3, columns 1+4
Here, letters in columns 2+3 will not be returned. Another example:
$alnSlice = $aln->seqs([3,2,3,1], [1,3..5]); # rows 3,2,3,1, cols 1,3..5
If you specify the empty list (``[]''), all rows/columns will be returned. The following diagram shows the case where the list of row indices is [$r1,$r2,$r3,$r4], and the list of column indices is [$c1,$c2,$c3].
: : :
: : :
............*....*..........*............. $r1
: : :
............*....*..........*............. $r2
............*....*..........*............. $r3
: : :
............*....*..........*............. $r4
: : :
: : :
$c1 $c2 $c3 Fig.2
Again, an element is selected for the slice if and only if it lies in the intersection of a row and a column which are both desired according to the index lists.
seqs() always returns a (multi-line) string in a
scalar context. In a list context, it returns an array; each element of
such an array is a reference to another array holding the letters of one
sequence, i.e. one single row.
@alnSlice = $aln->seqs([1..3], [1,4]); # rows 1-3, columns 1+4
for $aref ( @alnSlice ) { print @$aref, "\n"; }
If you use the result of an accessor or a utility function in the ``-seqs''
slot of new(), you may need to force that result into a scalar
context, because the accessor, etc, returns a list in a list context, and
the constructor naturally provides such a list context since it expects a
list of parameters.
$aln = new Bio::UnivAln(-seqs=>scalar($aln2->seqs()));
In the example above, the list-context return value of $aln2->seqs(),
i.e. the list of rows of $aln2, would be fed one by one as additional
parameters into the constructor, if you didn't ``protect'' it by
scalar(). You will be warned about the problem because
Bio::UnivAln detects any named parameters that it can't use.
Any list of desired row/column indices can be replaced by a hash of desired ids, which are recognized if they are in the object's own list of row or column ids. You need to pass a reference to a hash that has one key, ``ids'', and one value, which is a string containing the ids seperated by `` ''(blank) :
$alnSlice = $aln->seqs({ids=>'SeqA SeqB'},{ids=>'ColA ColB ColC'});
Row (sequence) ids are automatically extracted when reading fasta files and
Bio::PreSeq objects. Otherwise, they are set to the default numerical index
list, like (1..20) if the alignment has 20 rows. Since there's currently no
way to extract column (site) ids (none of the supported formats has this
feature), these always hold the default numerical indices. However, both
lists may be set using the accessors row_ids() and
col_ids(). Note that arbitrary numbering schemes can be
supported this way.
seqs() returns
exactly those letters which lie in a row (or column) for which your
function returns true. E.g. has_purine() returns true if the
row/column contains A, a, T, t, R, or r; the following
$alnSlice will contain only columns that have one of these
letters in them.
sub has_purine {
my $str = join "", @{ $_[0] };
if ($str =~ /[AaGgRr]+/) {return 1;} else {return 0;}
}
$alnSlice = $aln->seqs([1..3], \&has_purine); # rows 1-3, and from these
# only the columns for which has_purine returns 1
Similarly, the list of row indices, [1..3], could be replaced by a function, which is then used to designate the desired rows. (It has the same role that the expression EXPR has in the Perl code template grep EXPR, LIST, see the perlfunc manpage. Bio::UnivAln also provides the equivalent to map EXPR, LIST; this ``Mapping'' will be discussed soon.)
In other words, any list of indices can be replaced by the reference to a function r or c. The function then selects those elements (*) that lie in a row or column which meets a criterion, returning true if the criterion is met, and false otherwise. If both lists of indices are replaced by functions, the picture is like this:
: : :
: : :
............*....*..........*............. r(row) = true
: : :
............*....*..........*............. r(row) = true
............*....*..........*............. r(row) = true
: : :
............*....*..........*............. r(row) = true
: : :
: : :
c(column) c(column) c(column)
= true = true = true Fig.3
The function out_fasta observes all types of selectors that
seqs() does, and returns the result in fasta format :
print "\n aln in fasta format:\n", $aln->out_fasta([1..3],\&has_purine), "\n";
@resSlice = $aln->map_r(\&has_purine, [1..3]); # 1,0,1 if row 1+3 has purine @resSlice = $aln->map_c(\&has_purine, [1,4]); # 1,0 if column 1 has purine
As you may expect, maximal index lists (i.e, all rows / all columns) will
be assumed if no second parameter is provided. In the same way as before,
any one if the index lists may be replaced by a hash of ids, or a selector
function. In the section on User-defined Utility Functions, map_r() and map_c() are used to implement
user-defined consensus, reverse, complement, etc.
By the way, the following map is the same as @alnSlice =
$aln->seqs([1..3]) since mapping the identity function ``sub id { return
@_ }'' to desired row/column subsets and collecting the result
is just like slicing.
@alnSlice = $aln->map_r(\&id, [ 1..3 ]);
The number of rows/columns of an alignment can be obtained by using
height() and width() respectively. Ids and
descriptions of rows (sequences) and columns (sites) can be manipulated
using row_ids(), col_ids(),
row_descs() and col_descs(). Be warned that these accessors just return a reference to the array of
ids/descriptions; if you'd like to process the array without changing the
object's data, you need to create your own copy. This decision was taken
because especially the result of col_ids() can be huge, and
for a lot of applications a deep copy is unnecessary.
row_ids() and col_ids().
The following code sets the ``inplace'' flag, and overwrites the current alignment with the rows named A and B, and columns 1-6:
$aln->inplace(1);
$aln->seqs({ids=>'A B'},[1..6]);
$aln->inplace(0);
'inplace' manipulation is also available for most utility functions below,
with the notable exception of consensus(). A full list is
given in the description of inplace(), see the Appendix.
$resSlice = $aln->consensus();
calculates the consensus of the columns, i.e. those columns for which there exists a letter which has an absolute majority of 75% or more. The letter '!' designates the case that no consensus is given. To bypass the default threshold, write
$resSlice = $aln->consensus(0.6);
Note that for values smaller or equal to 0.5, two letters may have an equal (relative) majority, and the tie is currently broken arbitrarily. Thresholds larger than or equal to 1 imply that the site that has a consensus letter must be invariant.
$resSlice = $aln->consensus(1, [1..10]);
Just like seqs() and map_c()/map_r(),
consensus() can be passed a reference to a hash of desired
column ids, a function that selects the columns, or the desired column
indices themselves. (Here, it's columns 1 to 10. Internally,
map_c() is used to implement consensus().)
The methods var_sites() and invar_sites() are
using consensus(), just checking whether there is a consensus
letter ('invariable'), or not ('variable'). Naturally, the default
threshold is 1, i.e. sites must be truly invariable (100% majority of 1
letter), or truly variable (strictly less than 100% majority). Per default,
var_sites() and invar_sites() return a multiline string of rows (sequences), with elements from the desired columns (sites) only. They do not
return the alignment in a column-by-column fashion ! Also, don't be
confused by the possibilty to specify a list of desired row indices / ids,
or a selector function: It just constrains the output further, without
influencing which columns are selected. (Internally, seqs() is
used to implement these functions; it is passed a row selector, and a
function that returns true if there is a consensus letter.)
$resSlice = $aln->var_sites();
$resSlice = $aln->invar_sites();
$resSlice = $aln->var_sites(0.6, [1,3]); # no columns with >= 60% majority
# of one letter; also, print rows 1+3 only
$resSlice = $aln->invar_sites(0.6, [1,3]); # only columns with >= 60% maj.
$resSlice = $aln->no_allgap_sites([1,3]); # exclude sites that only have gaps
$resSlice = $aln->gap_free_sites([1,3]); # exclude sites that have >= 1 gaps
In a similar fashion, the last example selects gap-free columns, and from those only prints the elements that happen to be in rows 1+3. All these utilities support Inplace Manipulation:
$aln->inplace(1);
$aln->var_sites(0.6, {ids=>'1 2'});
$aln->inplace(0);
The utilities reverse(), complement(), and
revcom() also allow for Inplace Manipulation, and again you can specify the rows which shall form the new alignment by
overwriting the old, or be returned as reverse'd, complement'ed, or reverse
complement'ed :
$resSlice = $aln->reverse([1,3]); $resSlice = $aln->complement([1,3]); $resSlice = $aln->revcom([1,3]);
If an inplace manipulation reverses column order, e.g. in the case of
reverse(), this will be reflected in the column ids available
via col_ids(). Note that complement is defined according to
the IUPAC code, using the same substitutions that Bio::PreSeq uses, such
that results obtained for e.g. Amino Acid sequences are probably
nonsensical.
For all functions that have ``sites'' in their name, a corresponding ``inds'' function is available that returns the relevant array of indices instead of the sites themselves:
$indices = $aln->var_inds(); # array of indices of the variable sites $indices = $aln->no_allgap_inds([1,3]);
Finally, the original sequences (without gaps) are available via
remove_gaps().
$some_original_sequences = $aln->remove_gaps([1,3]);
In a list context, all these utility functions return an array of array
references, just like seqs(); the exception is
consensus(), which then returns a simple array of consensus
letters.
copy(), and compared for
equality modulo gaps with equal_nogaps().
Bio::UnivAln doesn't really care whether the sequences passed in are indeed
from an alignment (i.e. have the same length); sequence bags (i.e.
multisets of sequences where the same element may occur more than once) are
therefore handled, too. However, you need to be careful with some methods
(e.g. the accessor seqs(), and width()) because
their default behavior may depend on inspecting the first sequence only (or
the first requested sequence, if this information is available -- new
feature in 1.006) : If you use seqs() on a sequence bag, and
don't provide the number of columns explicitly, you may be surprised to
find out that the length of the first sequence is used as the default
length, and not the length of the longest sequence. width()
uses the same heuristic.
You can add gap symbols to the elements of a multiset so that each element
takes the length of the longest sequence by calling
equalize_length(); you can even pad more gap characters by
specifying the new width yourself, as an argument to
equalize_length().
If you need to have the alignment in an intermediate form, i.e. neither an array of array references, nor a single multiline string, but an array of strings, just request the output as a multiline string, and split it on ``\n''. For example, to just extract a single column, you can do
@col = split "\n", $aln->seqs([],[$colindex]);
Checking an alignment for characters that should not be there according to
the Alignment Type is currently not well-supported; see
alphabet_check() for a _preliminary_ way of doing manual
checkup, though.
The hash referenced by $names stores {loc,name} pairs of other
database locations and corresponding names where the alignment is located.
Currently, loc and name must both be set as text, and must consist entirely
of a string matching the regexp /^[A-Za-z]\w*$/. That is, they must be a
single ``word'' of at least one character, without a leading underscore.
This restriction is not enforced, but code which deviates is subject to
break in future releases. Note also that the object may place any other
sort of items in the name string, so users of this hash should not rely on
accessing entries conforming the requirements above.
maj_dominance returns false if all letters in a column/row are the same.
sub maj_dominance {
my $str = join "", @{ $_[0] };
my $first_res = @{ $_[0] }[0];
$str =~ s/$first_res//g;
if ($str) {return 0;} else {return 1;}
}
consensus returns the letter that has the (relative) majority among all residues, if it exceeds a certain threshold. (For simplicity, the threshold is hard-wired.)
sub consensus {
my @chars = @{ $_[0] };
my %temp = ();
my $threshold = 0.34; # more than 1/3
my @list = sort { $temp{$a}<=>$temp{$b} }
grep ++$temp{$_} >= $threshold * ($#chars+1), @chars;
# In case of a tie, it's not specified which residue is in $list[-1]
return (defined($list[-1]) ? $list[-1] : '!');
}
reverse_ and complement implement reversing a sequence (note the underscore; I'm looking for better ways to implement function passing), and complementing it according to IUPAC conventions.
sub reverse_ {
return [ reverse @{ $_[0] } ];
}
sub complement {
my $str = join "", @{ $_[0] };
$str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/;
return [ split "", $str ];
}
Here, the functions above are applied:
@resSlice = $aln->map_c(\&maj_dominance); print "\nDominated sites:\n", @resSlice, "\n";
@resSlice = $aln->map_c(\&consensus); @resSlice = $aln->map_c(\&consensus, \&has_purine); print "\nConsensus of the columns that have purine\n", @resSlice, "\n";
@resSlice = $aln->map_r(\&reverse_);
@resSlice = $aln->map_r(\&complement);
You can also parametrize your functions by using ``closures'' (See ``Programming Perl'', 2nd Ed., p.253). Basically, you set up a function that has the parameter built into it, and pass around the reference to that function. Here's a function that does the set-up:
sub _setup_consensus_with_threshold {
my $threshold = shift;
return sub {
my @chars = @{ $_[0] };
my %temp = ();
my @list = sort { $temp{$a}<=>$temp{$b} }
grep ++$temp{$_} >= $threshold * ($#chars+1), @chars;
# In case of a tie, it's not specified which residue is in $list[-1]
return (defined($list[-1]) ? $list[-1] : 'N');
}
}
Here, you create an instance of the function, incl. the parameter.
$consensus holds a reference to the new instance.
my $consensus = _setup_consensus_with_threshold(0.75);
and finally, you pass the reference to the function around, e.g. to
map_c().
@resSlice = $aln->map_c(\&$consensus);
(You can do pretty cute tricks using closures, e.g. you can set a counter to 1 in the setup function, and increment it in the ``real'' function.)
$self->{seqs} : An array of array references, each of which holds
one sequence of the alignment
$self->{id} : String specifying the ID; shall be in \w+ (i.e. composed
of characters in [a-zA-Z_0-9]; only \S+ is enforced, though)
$self->{desc} : String giving a description, (later) to be formatted
according to $descffmt
$self->{names} : A reference to a hash which stores {loc,name} pairs of
other database locations and corresponding names where
the alignment is located.
$self->{row_ids}: A reference to an array which stores row (sequence) ids
$self->{col_ids}: Same as $self->{row_ids}, for the columns (sites)
$self->{row_descs}: A reference to an array which stores row (sequence)
descriptions
$self->{col_descs}: Same as $self->{row_descs}, for the columns (sites)
$self->{numbering}: The offset of the first column, an integer
$self->{type} : The type of the alignment, concatenated from the molecule
type (see L<Alignment Types>) and a flag that is not
currently used, but intended to flag sequence bags
$self->{ffmt} : alignment format, see L<Alignment Formats>.
$self->{descffmt}: format of $desc; right now this should be ``raw''
or ``fasta'' which just implies that no specific
format is being followed, any text is allowed
excluding ``\n''(newline). More support is planned.
$self->{inplace}: Flag which is set to true if accessors (and utility
functions) should make the modification to the object
itself, and just return true on success. (See inplace().)
Some more helpful comments... In Perl, false is 0 or ``'', true is everything else. Not all internal functions have a POD/HTML documentation. If you read the code of this module, you need to be familiar with map and grep...
Better handling of access, copying and slicing of $self->{row_ids}, etc.
Add pointer to a UnivAln demo page based on the draft at http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/Docs/phylosnapshot.html.
Fix alphabet_check() (current workaround cannot be
generalized), Fix UnivAlnAlph{UnivAlnType{``unknown''}} (current setting is
not so nice; what's the easiest way to set it to all word characters ?),
Support more formats, especially Nexus, Improved sequence alphabet support;
not just providing a function alphabet_check() for manual
checking. Then, maybe use strings to represent alphabets. Test and document
aln() for returning a slice as an alignment.
Later:
Validity marker for correctly initialized/manipulated objects. Assign
_unique_ IDs if none are provided. Functions like has_seqs(),
etc. Using `undef' during initialization, and functions for regaining such
a state. Use Perl Data Language ?!
IN NO EVENT SHALL THE GLOBEWIDE NETWORK ACADEMY, THE VIRTUAL SCHOOL OF NATURAL SCIENCES, THE AUTHOR OR THE UNIVERSITY OF BIELEFELD BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS CODE, EVEN IF THE GLOBEWIDE NETWORK ACADEMY, THE VIRTUAL SCHOOL OF NATURAL SCIENCES, THE AUTHOR OR THE UNIVERSITY OF BIELEFELD HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE GLOBEWIDE NETWORK ACADEMY, THE VIRTUAL SCHOOL OF NATURAL SCIENCES, THE AUTHOR AND THE UNIVERSITY OF BIELEFELD SPECIFICALLY DISCLAIM ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE CODE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THERE IS NO OBLIGATION WHATSOEVER TO PROVIDE MAINTENANCEE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
Copyright (c) 1996, 1997, 1998 Georg Fuellen. All Rights Reserved. Some pieces of the code were contributed by Steven E. Brenner, Richard Resnick and Chris Dagdigian. Thanks !!!!
Technische Fakultaet - AG Praktische Informatik, Universitaet Bielefeld, D-33501 Bielefeld, Germany, georg.fuellen@uni-bielefeld.de
http://www.techfak.uni-bielefeld.de/~fuellen/
Fuellen, G. (1997). Bio::UnivAln - bioperl alignment object [WWW-Document]. URL http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/welcome.html
And please drop me a note :-) An article for the Perl Journal is planned. The page is mirrored at
http://merlin.mbcr.bcm.tmc.edu:8001/bcd/Perl/Bio/welcome.html http://www.biotech.ist.unige.it/bcd/Perl/Bio/welcome.html
Version 1.001 on 19 Feb 1997. Fixed a bug that triggered
_rowbounds() and _colbounds() to use maximal
index lists whenever the first index in an index list was 0. New example in
POD: $aln = new
Bio::UnivAln(-seqs=>scalar($aln2->var_sites())); Internal: Now
avoiding any global parameter passing by using closures, for the utility
functions.
Version 1.002 on 21 Feb 1997. Renamed the module to UnivAln, see the
discussion (Feb 1997) in the vsns-bcd-perl mailing list archive. Fixed
hopefully all problems in out_graph(), all of them triggered
by bugs/features of PGPLOT.
Version 1.003 on 25 Feb 1997. Added POD on using closures. Moved
out_graph() into the cgi-script t/univaln.cgi (this is
alpha-code !) so that ``use PGPLOT'' is no longer required.
Version 1.004 on 13 Mar 1997. Fixed bug with reading fasta files. Changed
file format identifiers ``Fasta'' and ``Raw'' to ``fasta'' and ``raw''.
Added support for arbitrary numeration schemes by providing access by
name/id for rows (sequences) and columns (sites). Made _arys()
and _strs() accept same row/column designations like
seqs(). Added 'inplace' manipulation for slicing and most
utilities. Improved speed a little. Deriving default description string
from file name, if available.
Version 1.005 on 18 Jun 1997. Using simple filename (``basename'') as default decription. Changed type identifiers ``Unknown'', ``Dna'', ``Rna'', ``Amino'' to ``unknown'', ``dna'', ``rna'', ``amino''. The old acronyms should still be supported for the forseeable future.
Version 1.006 on 15 Dec 1997. Added function
no_allgap_sites(); out_fasta() now accepts same
row/column designations like seqs(). For advanced users: the
width() of sequence bags can now be controlled by supplying
the index of the row whose width is taken -- by default the width of the
first row is used.
Version 1.007 on 15 Mar 1998. * Added functions that return indices instead
of actual sequence data: var_inds(),
invar_inds(), gap_free_inds(),
unknown_free_inds(), special_free_inds() and
no_allgap_inds() return the indices of (in)variable, gap-free,
unknown-free, gap+unknown-free and non-gap-only columns; they are the
counterparts of var_sites(), etc. * Added
remove_gaps() to return ungapped (original) sequence data *
Added equal_no_gaps() to check for equality ignoring gaps *
Added equalize_lengths() for padding a sequence bag with gaps
such that all rows have equal length (current procedure is slow) * The
default id is now ``_'' (was: ``No_Id_Given'') * Internal functions
_rowbounds() and _colbounds() now return indices
from the user's perspective, but WITHOUT substracting offset. Instead, all
functions subtract offset after applying
_rowbounds()/_colbounds()
Version 1.008 on 13 May 1998. * Added readseq conversion support, making it possible to read and write the following formats: MSF, Paup, PIR/Codata, ASN.1. (ASN.1 was not parsed successfully: readseq seems to be unable to read in its own ASN.1 output). Technically, readseq is now used to parse files that have been processed as ``raw'' before; now ``raw'' format is recognized using the expression /^[A-Z_0-9$_GAP_SYMBOL$_UNKN_SYMBOL\s]+$/im, i.e. the file may only have alphanumerical characters, gap and unknown-symbol, and whitespace. If commata, etc, are detected, readseq is used for parsing. Readseq itself seems to be unable to detect ``raw'' format in some cases, causing weird results. * Added Clustal support for parsing; still to be tested and documented.
Version 1.009 on 25 May 1998. * The module can now be ``built'' in the standard way (perl Makefile.PL, etc).
Usage : n/a (internal function)
Function : Assigns initial parameters to a blessed object.
Returns : 1 on success
Argument : As Bio::UnivAln->new, allows for named or listed parameters.
See ->new for the legal types of these values.
Usage : n/a (internal function)
Function : Rearranges named parameters to requested order.
Returns : @params - an array of parameters in the requested order.
Argument : $order : a reference to an array which describes the desired
order of the named parameters.
@param : an array of parameters, either as a list (in
which case the function simply returns the list),
or as an associative array (in which case the
function sorts the values according to @{$order}
and returns that new array.
Usage : $corrected_bounds = $aln->_rowbounds($uncorrected_bounds);
Function : create default row index list if necessary,
create row index list if specified by a hash of ids or
by a selector function that acts on rows and returns true/false,
check row index list for bounds errors,
NO LONGER DONE IN VERSION 1.007 AND HIGHER: substract offset of 1.
Returns : reference to corrected row index list
Argument : reference to uncorrected row index list
Usage : $corrected_bounds = $aln->_colbounds($uncorrected_bounds);
Function : create default column index list if necessary,
create column index list if specified by a hash of ids or
by a selector function that acts on columns and returns true/false,
check column index list for bounds errors,
NOT IN 1.007 and higher: substract offset (according to numbering scheme).
Returns : reference to corrected column index list
Argument : reference to uncorrected column index list
reference to row index list: its first row is used to
provide the width of the alignment considered, which matters
if the alignment is really a sequence bag
Usage : ($corrected_rowbounds,$corrected_colbounds) =
$aln->_fixbounds($uncorrected_rowbounds,$uncorrected_colbounds);
OR
($corrected_rowbounds,$corrected_colbounds) =
$aln->_fixbounds($firstpos1,$lastpos1,$firstpos2,$lastpos2)
Function : Convert unfixed index list information into the standard internal
one, allowing as input either max. 2 references or max. 4 boundary
coordinates (2 coord. for row indices and 2 for column indices).
Call functions to create maximal default index lists if needed,
to create index lists if specified by a hash of ids or by a selector
function that acts on rows/columns and returns true/false,
to check index lists for bounds errors, and to substract offsets.
Returns : 2 references to corrected index lists
Argument : EITHER max. 2 references to uncorrected index lists,
OR max. 4 boundary coordinates (integers)
Usage : @alnSlice = $aln->_select($rrowsel2,$rcolsel2);
Function : select elements from the 2-dimensional array $self->{'seqs'}
using index lists
Returns : array of references to array of characters
Argument : 2 index lists, one for the rows and one for the columns,
Comment : Here's a diagram of dependencies of some methods relying on _select:
seqs()
/ |
|/_ \|/
_arys() <-- _strs()
|
\|/
_select()
Usage : @alnSlice = $aln->_arys([$row1,$row2,$rowx],[col1,$col2,$colx])
(other usages see seqs())
Function : Same as seqs(), except that an array is returned always
Returns : array of references to array of characters
Argument : Same as seqs()
Usage : $alnSlice = $aln->_strs([$row1,$row2,$rowx],[col1,$col2,$colx])
(other usages see seqs())
Function : Same as seqs(), except that a multiline string is returned always
Returns : multiline string (including newline characters)
Argument : Same as seqs()
Usage : (1) $alnSlice = $aln->seqs($firstpos1,$lastpos1,$firstpos2,$lastpos2)
(2) $alnSlice = $aln->seqs([$row1,$row2,$rowx],[col1,$col2,$colx])
(3) $alnSlice = $aln->seqs(\&function_of_row,\&function_of_column)
(4) $alnSlice = $aln->seqs({ids=>'r1 r2 r3'},{ids=>'c1 c2 c3'})
[ (2),(3) and (4) can be intermixed ]
Function : (1) Returns part of an alignment, from row $firstpos1 to row
$lastpos1, and from column $firstpos2 to column $lastpos2
Missing parameters are replaced by default (maximal possible)
values, where the length of the first row of the _returned_ (*)
`alignment' determines the default length in case the alignment is
really a sequence bag. ((*) is a new feature in 1.006.)
(2) Returns part of an alignment, i.e. those elements that lie
in a row designated by an index from the first list
([$row1,$row2,$rowx]), and at the same time lie in a column
designated in the second list ([col1,$col2,$colx]).
The empty list (``[]'') is replaced by default (maximal possible)
values, where the length of the first row of the _returned_
`alignment' determines the default length in case the alignment is
really a sequence bag. (Note that the first element
of rows/columns has index 1 per default.)
(3) Instead of an index list, a function acting on a row / column
may be supplied; whenever the function returns true, the
row / column is designated.
(4) Instead of an index list, a hash of ids may be supplied;
the ids are looked up in the alignment's list of row (sequence)
ids / list of column (site) ids. The former list may be set
during the construction of the alignment (e.g. it may be read
from the fasta file, or the Bio::PreSeq objects), or it may be
manipulated using row_ids(). col_ids() sets the the latter list.
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : (1) $firstpos1,$lastpos1,$firstpos2,$lastpos2 (all integers; note
that the first element of rows/columns has index 1 per default.)
(2-4) 2 selectors, one for the rows and one for the columns,
each of which may be a reference to a list of indices, or a
reference to a hash that has the format ``{ids=>'id1 id2 idx'}''
where ids is the mandatory key, and the value is a string
containing the desired ids, seperated by `` '' (space), or a
function that acts on a list and returns true/false.
Usage : n/a (internal function)
Function : apply a row function to selected rows,
then return the list of all return values
Returns : list of all return values
Argument : $rowf: Reference to the function that is applied to selected rows,
its results are put into a list and returned.
$rrowsel: Reference to the selector designating the rows
to which rowf is applied, as in cases (2)-(4) in seqs().
Usage : n/a (internal function)
Function : apply a column function to selected columns,
then return the list of all return values
Returns : list of all return values
Argument : analogous to _map_r()
Usage : @resSlice = $aln->map_r($rowf,$rrowsel);
Function : apply a function to selected rows,
then return the list of all return values
Returns : list of all return values
Argument : $rowf: Reference to the function that is applied to selected rows,
its results are put into a list and returned.
$rrowsel: Reference to the selector designating the rows
to which rowf is applied, as in cases (2)-(4) in seqs().
Usage : @resSlice = $aln->map_c($colf,$rcolsel,$rrowsel);
Function : apply a function to selected columns,
then return the list of all return values
Returns : list of all return values
Argument : $colf: Reference to the function that is applied to selected
columns, its results are put into a list and returned.
$rcolsel: Reference to the selector designating the columns
to which colf is applied, as in cases (2)-(4) in seqs().
$rrowsel: NOT used as a selector, but as a hint for
determining the width of a sequence bag if $rcolsel is undef:
The last index of the first row specified by $rrowsel is
taken as the maximum column index.
Usage : $cons_letters = $aln->consensus($threshold,$rcolsel);
Function : return the consensus of a (subset of) the columns; the letter '!'
($_NO_CONSENSUS_SYMBOL) indicates that no consensus letter exists
Returns : in a scalar context: string of consensus letters
in an array context: array of consensus letters
Argument : $threshold: A letter is considered consensus of a column
if the fraction of the letters in the column that form a
(relative) majority is >= $threshold. Ties between 2 letters with
an equal relative majority are broken arbitrarily.
The default value is 0.75.
$rcolsel: Reference to the selector designating the columns
of which the consensus is calculated, as in cases (2)-(4) in seqs().
Usage : $resSlice = $aln->var_sites($threshold,$rrowsel);
Function : return the variable sites of an alignment
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $threshold: A column is considered variable
if the fraction of the letters in the column that form a
(relative) majority is NOT >= $threshold.
The default $threshold is 1, i.e. only constant, INvariable columns
are excluded.
$rrowsel: Reference to the selector designating the rows
of which in turn the letters in the variable columns are printed,
as in cases (2)-(4) in seqs(). $rrowsel DOES NOT influence the
calculation of the variable sites, it just constrains the output
further !
Usage : $indices = $aln->var_inds($threshold,$rrowsel);
Function : return the _indices_ of the variable sites of an alignment
Returns : reference to array of indices
Argument : $threshold: A column is considered variable
if the fraction of the letters in the column that form a
(relative) majority is NOT >= $threshold.
The default $threshold is 1, i.e. only constant, INvariable columns
are excluded.
$rrowsel: the first row in $rrowsel is used for
determining the width of a sequence bag. In other words,
the last index of the first row specified by $rrowsel is
taken as the maximum column index.
Usage : $resSlice = $aln->invar_sites($threshold,$rrowsel);
Function : return the INvariable columns of an alignment
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $threshold: A column is considered INvariable
if the fraction of the letters in the column that form a
(relative) majority is >= $threshold.
The default $threshold is 1, i.e. no variability is allowed.
$rrowsel: see var_sites()
Comment : This is the indices-returning version of invar_sites(),
cf. var_sites() and var_inds().
Usage : $resSlice = $aln->gap_free_sites($rrowsel);
Function : return the gap-free columns of an alignment.
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $rrowsel: see var_sites()
Comment : This is the indices-returning version of gap_free_sites(),
cf. var_sites() and var_inds().
Usage : $resSlice = $aln->unknown_free_sites($rrowsel);
Function : return the unknown-free columns of an alignment.
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $rrowsel: see var_sites()
Comment : This is the indices-returning version of unknown_free_sites(),
cf. var_sites() and var_inds().
Usage : $resSlice = $aln->no_allgap_sites($rrowsel);
Function : return the columns which do not have gaps only, of an alignment.
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $rrowsel: see var_sites()
Comment : This is the indices-returning version of no_allgap_sites(),
cf. var_sites() and var_inds().
Usage : $resSlice = $aln->special_free_sites($rrowsel);
Function : return the special-free (neither gap nor unknown-symbols) columns
of an alignment.
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $rrowsel: see var_sites()
Comment : This is the indices-returning version of special_free_sites(),
cf. var_sites() and var_inds().
Usage : $resSlice = $aln->reverse($rrowsel);
Function : return the rows of an alignment, in reversed form (right->left)
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $rrowsel: Reference to the selector designating the rows
which are returned in reversed form, as in cases (2)-(4) in seqs().
Usage : $resSlice = $aln->remove_gaps($rrowsel);
Function : return the rows of an alignment, gaps removed (i.e. the original
sequences)
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $rrowsel: Reference to the selector designating the rows
which are returned without gaps, as in cases (2)-(4) in seqs().
Usage : $resSlice = $aln->complement($rrowsel);
Function : return the rows of an alignment, in complemented form
In the case of dna/rna, the complement is given according to the
IUPAC code; in other cases the result is currently calculated in the
same way and probably meaningless
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $rrowsel: see reverse()
Usage : $aln->revcom($rrowsel);
Function : return the rows of an alignment, in reversed complement form
In the case of dna/rna, the complement is given according to the
IUPAC code; otherwise the result is currently calculated in the
same way and probably nonsensical.
Returns : in a scalar context: multiline string (including newline characters)
in an array context: array of references to arrays of characters
Argument : $rrowsel: see reverse()
Usage : $aln->equal_nogaps($other_aln);
Function : checks whether two alignments have the same original sequences
(i.e. gaps are removed and the rows are compared for equality)
Returns : 1 if alignments are equal modulo gaps, 0 otherwise
Argument : $other_aln: the other alignment
Usage : $aln->equalize_lengths($width);
Function : modifies the alignment / sequence bag such that all rows
have length $width; in the case of sequence bags, this
is a primitive way to obtain a true alignment with
rows of equal length
* The current procedure is straightforward but slow *
Returns : 1 if alignments are equal modulo gaps, 0 otherwise
Argument : $width: the length until which sequences should be padded;
per default the length of the longest row is taken
Usage : @oldSeqs = $aln->_seqs(@sequences,$start)
@oldSeqs = $aln->_seqs($sequences,$start)
Function : to APPEND/OVERWRITE sequences to an alignment
Returns : old list of sequences, i.e. ($self->{'seqs'})
Argument : 1. EITHER a reference to a list of either Bio::PreSeq objects, or
arrays of letters, or strings, or any mix of these,
OR a single (multi-line) string
2. $rrowsel: Reference to the selector designating the rows
which shall be appended/overwritten (feature in alpha status)
Usage : $aln_id = $aln->id();
$aln->id($id_string);
Function : Accessor, also sets field if an ID is passed in.
Returns : (original) ID value
Argument : sequence string with no whitespace
Usage : $aln_desc = $aln->desc();
$aln->desc($desc_string);
Function : Accessor, also sets field if a description string is passed in.
Returns : (original) description value
Argument : sequence string
Usage : %names = $aln->names;
$aln->names($hash_ref)
Function : Accessor, also sets field if names hash is passed in.
The names hash is 'human-readable' data; each key is a
location (whether it be URL, database, database query, etc.)
and each value is the id at that location.
Returns : (original) names hash value
Argument : reference to a hash
Usage : $row_ids = $aln->row_ids();
$aln->row_ids($row_ids);
Function : Accessor, also sets field.
Returns : A reference to the (original) array of row (sequence) ids
Argument : A reference to an array of row (sequence) ids
Usage : $col_ids = $aln->col_ids();
$aln->col_ids($col_ids);
Function : Accessor, also sets field.
Returns : A reference to the (original) array of column (site) ids
Argument : A reference to an array of column (site) ids
Usage : $row_descs = $aln->row_descs();
$aln->row_descs($row_descs);
Function : Accessor, also sets field.
Returns : A reference to the (original) array of row (sequence) descriptions
Argument : A reference to an array of row (sequence) descriptions
Usage : $col_descs = $aln->col_descs();
$aln->col_descs($col_descs);
Function : Accessor, also sets field.
Returns : A reference to the (original) array of column (site) descriptions
Argument : A reference to an array of column (site) descriptions
Usage : $num_start = $aln->numbering;
$aln->numbering($value);
Function : Accessor, also sets field if a new numbering scheme is passed in.
Returns : (original) numbering value
Argument : number that is used as the offset of the first column
Usage : $aln_type = $aln->type;
$aln->type($value); # May be dangerous !
Function : Accessor, also sets field if a new type is passed in.
The latter is considered dangerous !
Returns : (original) type value
Argument : new type, see the list %UnivAlnTypes in the code
(currently, ``dna'', ``rna'', ``amino'')
Usage : $ffmt = $aln->ffmt;
$aln->ffmt($value);
Function : Accessor, also sets field if a new format acronym is passed in.
This can be done before reading from a file, so that
the presumably correct parsing routine is called, or
the value can be set before writing the object using
layout(), so that layout uses a specified default format.
Returns : (original) format acronym
Argument : string describing the format, see ``Alignment Formats''
Usage : $inplace = $aln->inplace();
$aln->inplace($value);
Function : Accessor, also sets field if a value is passed in.
'inplace' is a flag which is set to true if accessors and
functions should make the modification to the object itself,
and just return true on success. Currently, inplace manipulation
is supported for seqs(), _arys(), _strs(), remove_gaps(),
var_sites(), invar_sites(), gap_free_sites(), no_allgap_sites(),
reverse(), complement(), and revcom() (reverse complement).
Returns : (original) inplace value
Argument : currently 0 or 1
Usage : $width = $aln->width(); Function : number of columns (of the first row per default) Returns : number of columns (of the first row per default) Argument : row
Usage : $height = $aln->height(); Function : number of rows (sequences) Returns : number of rows (sequences) Argument : ./.
Usage : @offendig_characters = $aln->alphabet_check($rowsel);
Function : Check rows of the alignment for ``offending characters'', i.e.
characters that are not (currently) expected to be found in the
aligned sequences, because they're not in the default alphabet
that belongs to the specified L<Alignment Type>.
Currently, the Alignment Type can only be set explicitly
via the constructor, or accessor. Since the default alphabet
is not the IUPAC code (e.g. just A,C,G,T,-,? in case of DNA),
this is just a proof of concept.
Returns : an array containing the offending characters, one string per row
Argument : $rrowsel: Reference to the selector designating the rows
to which the check is applied, as in cases (2)-(4) in seqs().
Usage : n/a (internal function)
Function : Read data from file, and call the parsing system to store data
into the object fields
Returns : 1 on success
Argument : filename, and (optionally) the format if known.
Usage : $aln->_parse($ent,[$ffmt]);
Function : Parses $ent into the object fields, according to
$ffmt or $self->{ffmt}.
Returns : 1 on success
Argument : the prospective alignment to be parsed,
and optionally its format so that it doesn't need to be estimated
Note that ``raw'' is not estimated in a reliable way by readseq
if readseq is installed; I presume it is then estimated only if
the sequence(s) are longer than 111 basepairs
Usage : $aln->_parse_unknown($ent);
Function : tries to figure out the format of $ent and then
calls the appropriate function to parse it into $self->{'seqs'}.
Returns : 1 on success
Argument : $ent : the rough multi-line string to be parsed
Usage : $aln->_parse_bad; Function : Carp on the bad data that the user gave us. Returns : undef Argument : (multiline) string that cannot be parsed
Usage : $aln->_parse_readseq;
Function : Try readseq to parse data.
Returns : 1 on success
Argument : $ent : the rough multi-line string to be parsed
Usage : $aln->_parse_raw;
Function : parses $ent into the $self->{'seqs'} field, using raw
file format.
Returns : 1 on success
Argument : (multiline) string to be parsed
Usage : $aln->_parse_fasta;
Function : parses $ent into the 'seqs' field, using fasta
file format.
Returns : 1 on success
Argument : (multiline) string to be parsed
Usage : $copyOfObj = $myUnivAln->copy; Function : Returns an identical copy of the object. Returns : Bio::UnivAln Argument : n/a
Usage : $aln->layout($format);
Function : Returns the alignment in whichever format the user specifies,
or according to the "ffmt" field if the user does not specify
a format.
Returns : varies; "" if unsuccessful
Argument : $format (one of the formats as defined in $UnivAlnForm).
Usage : $aln->out_bad; Function : Carp if we don't know the output format. Returns : undef Argument : n/a
Usage : $aln->out_raw; Function : Returns the alignment in raw format. Returns : multiline string Argument : n/a
Usage : $aln->out_fasta;
Function : Returns the alignment as a string in fasta format.
Returns : multiline string
Argument : Same as seqs()
Comment : The old ``out_fasta()'' function with no arguments
is slightly faster and simpler; still available as out_fasta2().