Clustering analysis methods II


Newman-Girvan fast greedy algorithm

  • Developed for the study of networks in general, with a special focus on social and biological networks (19).
  • Identifies communities by using the edge betweenness centrality measure. Edges that connect different communities have higher centrality values, since a larger proportion of shortest paths will pass through them.
  • To define communities it uses the edge betweenness centrality scores to rank the edges of the network, then removes the most central edges and then re-calculates the betweenness scores until no edges are left. Edges affected by the removal are deemed to be part of the same community.
  • Can be considered a ‘naïve’ approach that will define communities even when they are only marginally more connected than the rest of the network.

Communities defined using Newman-Girvan and MCODE

Figure 31 Communities defined using Newman-Girvan and MCODE.

MCODE algorithm

  • Developed to find protein complexes in PPI networks (20).
  • Can be considered to be more stringent than the Newman-Girvan algorithm, since it aims to find only those sub networks that are very highly interconnected, representing relatively stable, multi-protein complexes that function as a single entity in time and space.
  • The parameters of the algorithm can be adjusted to make it less stringent, so that a looser definition of a community is used.
  • The algorithm uses a three-stage process:

1. Weighting: a higher score is given to those nodes whose neighbours are more interconnected.

2. Molecular complex prediction: starting with the highest-weighted node (seed), recursively move out, adding nodes to the complex that are above a given threshold.

3. Post-processing: applies filters to improve the cluster quality (haircut and fluff).

It is important to note that when we speak about ‘stringency’ we are talking about how interconnected the nodes within a sub-network must be in order to be considered a separate community. This changes depending on the biological question underlying the analysis. It is not the same to look for stable protein complexes, such as the proteasome, as it is to look for functional sub-modules representing a specific step in a signalling pathway.