0%

A guided example

Check out this guided example to help explain the calculations involved in understanding overlapping entries

Let’s consider the following sets:

  • IPR1 = 19539 proteins
  • IPR2 = 37052 proteins

Number of unique proteins (Union):

  • IPR1 ∪ IPR2 = 38081

Intersecting proteins:

  • IPR1 ∩ IPR2 = 17534
  • IPR2 ∩ IPR1 = 16950
Figure 9 Example input data.
Jaccard index score calculation

Using the Jaccard index formula we can calculate:

  • Jaccard index (IPR1 ∩ IPR2) = 17534/38081 = 0.46
  • Jaccard index (IPR2 ∩ IPR1) = 16950/38081 = 0.45

 For consistency, the average Jaccard index is calculated: JI avg = (0.46 + 0.45) /2 = 0

Containment score calculation

Application of the containment index formula then results in the following:

  • Containment (IPR1, IPR2) = 17534/19539 = 0.90
  • Containment (IPR2, IPR1) = 16950/37052 = 0.46
Conclusion

In summary, we can see the calculations have shown:

  • Jaccard index < 0.75
  • Containment (IPR1, IPR2) > 0.75
  • Containment (IPR2, IPR1) < 0.75

From these results, it can be said that IPR1 and IPR2 are considered as overlapping entries, more precisely IPR1