EMBL-EBI > Goldman Group

PANDIT Home | Browse PANDIT | Help on PANDIT | Release notes | Pfam

PANDIT Homepage
Protein and Associated NucleotideDomains with Inferred Trees

PANDIT release notes
PANDIT is a collection of multiple sequence alignments and phylogenetic trees. It contains corresponding amino acid and nucleotide sequence alignments, with trees inferred from each alignment. The alignments in PANDIT are based on the Pfam database. For each alignment, a phylogenetic tree has been inferred using automated, but reasonably advanced, methods. See http://www.ebi.ac.uk/research/goldman/software/pandit

Recent changes
  • Pandit now includes assessment of the quality of the protein alignments taken from
    Pfam. This is done by building a profile hidden Markov model (HMM) from the
    alignment using the hmmbuild program of Sean Eddy's HMMER package. The
    alignment is itself then assessed using this profile HMM, and hmmbuild
    indicates whether each alignment column is considered to have been generated by a
    match state (taken to mean that it represents homologous sites with common
    ancestry), or insert or delete states (taken to mean sites are evolutionarily
    In Pandit, this information is presented in terms of a 'mask' which, for each
    alignment, is a sequence of 'x' and '.' characters that indicate whether each
    alignment column is considered reliable, i.e. contains homologous residues or
    nucleotides (indicated by 'x'), or not (indicated by '.'). From Pandit version 17.0,
    inferred phylogenies from both DNA and protein sequences are based on only those
    alignment columns deemed reliable according to this procedure.
  • The methods used to find nucleotide sequences corresponding to amino acid sequences
    in Pfam were improved for the release of version 12.0. These methods remain
    effective in version 17.0, despite the increasing size of sequence databases and the
    corresponding increase in possibilities for errors. The 'hit rate' (% of amino acid
    sequences for which nucleotide sequences exist in PANDIT) in version
    17.0 is 96.3% (version 12.0 was 96.1%).

PANDIT came into being in November 2001 with release 6.2 (to match Pfam release numbers (see their Release notes)).
PANDIT Pfam-A.full
(full alignments of curated families)

Release   date   families DNA seqs total bases AA seqs total residues families AA seqs total residues

6.2 Nov 01 2730 57863 28620216 64574 10527905 2773 260570 59749821
7.6 Nov 02 4341 82552 40579593 101230 16369705 4463 428237 97335159
12.0 May 04 7226 143926 66822105 149737 23122611 7316 898590 205356517
17.0 Sep 05 7738 174760 78644556 181448 27180232 7868 1321755 297821068

Description of files
Aside from downloading alignments and phylogenies via the browser interface to PANDIT, the following files available for
download may of interest:

Pandit archive
Past releases of the PANDIT flatfile (containing all information in the database) are available for download from the archive site.

Current PANDIT team
Nick Goldman, Simon Whelan, Nicolas Rodriguez : EMBL-European Bioinformatics Institute, UK
Paul de Bakker: Department of Molecular Biology, Massachusetts General Hospital, Boston MA, USA

Copyright notice
PANDIT - A database of protein and associated nucleotide domains
         with inferred trees
Copyright (C) 2001-2005 The PANDIT Team

This database is free; you can redistribute it and/or modify it under
the terms of the GNU Library General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version.

In summary, you are free to redistribute *verbatim* copies of PANDIT or
any PANDIT files in any way you like, including packaging PANDIT in
proprietary software, so long as your copy of PANDIT retains our
copyright notice and the GNU license.  You may also make *modified*
copies of PANDIT and distribute them, but your derivative database must
be freely distributed under the GNU LGPL.  Many academic freeware
licenses prohibit any form of commercial use.  In contrast, the intent
of our license is that PANDIT should be freely available to both
industrial and academic researchers, including the use of the PANDIT
database in commercial software; however, proprietary modifications of
the PANDIT database itself are prohibited.  Proprietary modification of
the PANDIT database is possible only by a separate formal licensing
agreement from the PANDIT team and our host institutions.  See the file
GNULICENSE for the full text of the GNU Library General Public License.

This database is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
Library General Public License for more details.

You may also obtain a copy of the GNU LGPL by writing to the Free
Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
02111-1307, USA.

PANDIT is maintained by a team of researchers.  You can contact the
PANDIT team at:

The PANDIT Team, Sep 05