pandit
|
PANDIT
Protein and Associated NucleotideDomains with Inferred Trees
|
PANDIT release notes
Introduction |
PANDIT is a collection of multiple sequence alignments and phylogenetic
trees. It contains corresponding amino acid and nucleotide sequence
alignments, with trees inferred from each alignment. The alignments in
PANDIT are based on the Pfam database. For each
alignment, a phylogenetic tree has been inferred using automated, but
reasonably advanced, methods.
See http://www.ebi.ac.uk/research/goldman/software/pandit
|
|
Recent changes |
- Pandit now includes assessment of the quality of the protein alignments taken from
Pfam. This is done by building a profile hidden Markov model (HMM) from the
alignment using the hmmbuild program of Sean Eddy's HMMER package. The
alignment is itself then assessed using this profile HMM, and hmmbuild
indicates whether each alignment column is considered to have been generated by a
match state (taken to mean that it represents homologous sites with common
ancestry), or insert or delete states (taken to mean sites are evolutionarily
unrelated).
In Pandit, this information is presented in terms of a 'mask' which, for each
alignment, is a sequence of 'x' and '.' characters that indicate whether each
alignment column is considered reliable, i.e. contains homologous residues or
nucleotides (indicated by 'x'), or not (indicated by '.'). From Pandit version 17.0,
inferred phylogenies from both DNA and protein sequences are based on only those
alignment columns deemed reliable according to this procedure.
- The methods used to find nucleotide sequences corresponding to amino acid sequences
in Pfam were improved for the release of version 12.0. These methods remain
effective in version 17.0, despite the increasing size of sequence databases and the
corresponding increase in possibilities for errors. The 'hit rate' (% of amino acid
sequences for which nucleotide sequences exist in PANDIT) in version
17.0 is 96.3% (version 12.0 was 96.1%).
|
|
Statistics |
PANDIT came into being in November 2001 with release 6.2 (to match Pfam release numbers (see their Release notes)).
PANDIT |
Pfam-A.full (full alignments of curated families) |
|
|
Release |
date |
families |
DNA seqs |
total bases |
AA seqs |
total residues |
families |
AA seqs |
total residues |
|
|
|
|
|
|
|
|
|
|
6.2 |
Nov 01 |
2730 |
57863 |
28620216 |
64574 |
10527905 |
2773 |
260570 |
59749821 |
7.6 |
Nov 02 |
4341 |
82552 |
40579593 |
101230 |
16369705 |
4463 |
428237 |
97335159 |
12.0 |
May 04 |
7226 |
143926 |
66822105 |
149737 |
23122611 |
7316 |
898590 |
205356517 |
17.0 |
Sep 05 |
7738 |
174760 |
78644556 |
181448 |
27180232 |
7868 |
1321755 |
297821068 |
|
|
Description of files |
Aside from downloading alignments and phylogenies via the browser interface to PANDIT, the following files available for
download may of interest:
|
|
Pandit archive |
Past releases of the PANDIT flatfile (containing all information in the database) are available for download from the archive site.
|
|
Current PANDIT team |
Nick Goldman, Simon Whelan, Nicolas Rodriguez : EMBL-European Bioinformatics Institute, UK
Paul de Bakker: Department of Molecular Biology, Massachusetts General Hospital, Boston MA, USA
|
|
Copyright notice |
PANDIT - A database of protein and associated nucleotide domains
with inferred trees
Copyright (C) 2001-2005 The PANDIT Team
This database is free; you can redistribute it and/or modify it under
the terms of the GNU Library General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version.
In summary, you are free to redistribute *verbatim* copies of PANDIT or
any PANDIT files in any way you like, including packaging PANDIT in
proprietary software, so long as your copy of PANDIT retains our
copyright notice and the GNU license. You may also make *modified*
copies of PANDIT and distribute them, but your derivative database must
be freely distributed under the GNU LGPL. Many academic freeware
licenses prohibit any form of commercial use. In contrast, the intent
of our license is that PANDIT should be freely available to both
industrial and academic researchers, including the use of the PANDIT
database in commercial software; however, proprietary modifications of
the PANDIT database itself are prohibited. Proprietary modification of
the PANDIT database is possible only by a separate formal licensing
agreement from the PANDIT team and our host institutions. See the file
GNULICENSE for the full text of the GNU Library General Public License.
This database is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You may also obtain a copy of the GNU LGPL by writing to the Free
Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
02111-1307, USA.
PANDIT is maintained by a team of researchers. You can contact the
PANDIT team at:
pandit@ebi.ac.uk
|
|
The PANDIT Team, Sep 05
|
|