Overlapping homologous superfamilies
The Overlapping entries section represents the relationship between Homologous superfamilies and Family or Domain InterPro entries. It is calculated by analysing the overlap between matched sequence sets.
An InterPro entry (IPR with type family, domain, repeat or site) is considered related to a homologous superfamily if:
- their sequence matches overlap (i.e. the match positions fall within the homologous superfamily boundaries)
- the Jaccard index (equivalent) or containment index (parent/child) of the matching sequence sets is greater than 0.75
What are the union and the intersection of two datasets?
- The union (IPR1 ∪ IPR2) is the number of unique proteins found in the two datasets
- The intersection (IPR1 ∩ IPR2 or IPR2 ∩ IPR1) is the number of domains overlapping for the protein common to both datasets
How do we know if the protein domains are intersecting?
Looking at the common proteins between two entries of interest, the determination of whether or not their domains are intersecting is done by verifying if they overlap. This is possible by searching if the midpoint of the match from one entry is in between the boundaries of the match from the other entry.
There are three different scenarios possible:

Determining overlapping entries
Once the intersection and union of the two datasets have been determined, the Jaccard and containment indexes can be calculated.
- Two entries are equivalent if the Jaccard index score is equal or higher than 0.75
- Two entries have a parent/child relationship if the Jaccard containment index score is equal or higher than 0.75
Both equivalent and parent/child entries are shown in the Overlapping entries section.