The PDBe search database documentation
Terminology
The PDBe search database documentation is organised based on standard notions and terminology from relational database theory.
- Marts: These are sections of the database that are logically more closely coupled and may be considered as different data sets.
The idea behind marts is that the can be optionally included or excluded from the database, based on user needs, without affecting the use of
the remaining ones.
Defining the borders of a mart is not always a straightforward task and organising the database in marts takes into account the application area and
the information management needs. Information belonging in different marts may be interrelated, but generally this type of relation may be
considered loose. Marts are also organised in hierarchies based on these interrelations.
The PDBe search database documentation provides an overview diagram that reflects this hierarchy as well as a
navigation tree that may be used as a starting point for exploring marts.
By clicking the
button of the "Database Sections (Marts)" option one may preview the overview diagram of the marts while
by clicking on the
button, one may preview a listing of the marts together with a brief explanation about the contents and the purpose of the mart.
Similarly by clicking on the
button for each mart, one may preview the overview entity-relationship diagram for a specific mart while by
clicking on the
button, one may preview a detailed listing of the mart entities, attributes and relations.
- Entities: The information inside a mart is organised in entities. An entity is a structured information item that is composed by attributes and has relations. The idea of an "Entity" is the equivalent (from an information management point of view) with a "Class" or a "Structured Type" or "Record type" while an instance of an Entity ("Entity instance") is equivalent with an object or a record.
- Attributes: Attributes are data items of a simple type (like numbers, strings etc) that belong to an entity. An attribute is equivalent with a member variable or a column or field of a record.
A specific attribute (or list of attributes) will be the "reference" attribute of an entity that will be used to always uniquely identify an instance (primary key). The PDBe search database, uses abstract primary keys that are simply arbitrary numbers and do not include any significant information (similar to pointers). Relations between entities are implemented using these abstract keys.
An attribute (or list of attributes) may constitute a name for an instance (naming attribute). This is a natural way to identify instances that includes significant meaning (unique key).
An attribute may be also "not visible" for the purposes of reporting and querying, usually because it does not have any significant meaning (it is internal) or it may be a "summary" attribute, that includes short descriptive, summary information and should be included by default in listings of instances.
- Relations: Relations are references between instances of entities that have some meaning. The idea is equivalent to references between objects or pointers. Relations are normally implemented by sharing common values in attributes of 2 different entities.
A relation may be the containment relation for an entity, that is used to define a tree hierarchy of contained entities that is useful for cloning and exporting information.
A relation will always have a reverse relation that will belong to the entity that the relation refers (relations come in pairs).
A relation may refer an entity in the same mart or it might be external, referring an entity in a different mart. Also a relation may be "not visible" usually because it does not have any significant meaning.
By clicking on an entry of a mart in the "mart navigation tree" one may preview the "entity tree" that that allows navigation through the attributes and relation of an entity as well as
access to the detailed documentation of an entity and its attributes and relations.
An other way to perform the same task is by clicking on a "box" representing an entity in the entity-relationship diagram of a mart.
The detailed documentation for entities, attributes and relations including apart from descriptions and names:
Entity Details:
- A=Number of attributes of the Entity
- R=Number of relations of the Entity
- T=Name of the database table
- I=Approximation of the number of instances of the entity
Attribute Details:
- Type of the attribute
:String,
:Integer,
:Number,
:Date,
:Unknown
- C=Name of the database column
- S=Maximum size of the attribute
- A=Actual average size used for the attribute
The attribute is a part of the name of an instance
The attribute is a part of the reference key of an instance
The attribute is not supposed to be visible and used for queries
The attribute is supposed to be used in summary reports (lists) for the entity
Relation Details:
Cardinality of the relation
:Optional,
:Many
=Reverse relation of the entity that the relation refers to
=Entity that this relation establishes an association (reverse entity)
The relation is the containment relation of the entity
The relation is associated with an external entity from a different mart
The relation is not supposed to be visible and used for queries
Data warehousing
The PDBe search database follows the ideas of data warehousing: A user friendly denormalised database that includes precaclulated and summarised information in order to simplify searching and improve performance for data analysis and data mining.
This means that it defies the rule of data normalisation and selected attributes are replicated, across different entities in order to avoid complex deep joins.
It avoids the problems of data inconsistency, simply because it is simply a replica of the PDBe internal normalised database that is designed in order to guarantee data consistency and referential integrity.