Network representation and analysis tools
There are a variety of tools that can be used to obtain, integrate and analyse PPI data to understand its biological context. Let’s have a brief look at some of them.
Cytoscape is one of the most popular network analysis tools. It is an open-source, Java-based, multi-platform desktop application that is widely used for network representation, integration and analysis. It was originally designed for the analysis of biological networks, which remains as its main application, but can also be used for general purpose network analysis.
Figure 21 Cytoscape is a popular tool for network analysis.
- One of the main reasons for its popularity is the large variety of apps (almost 300 at the time we wrote this course) that provide further, specific functionality to the core distribution of Cytoscape. This provides great flexibility, making the tool adaptable to multiple types of analysis in various domains of knowledge.
- For PPI network analysis there are specific apps for community search, (for example MCODE, clusterMaker2, JActiveModules) or to perform Gene Set Enrichment Analysis (BiNGO, ClueGO, EnrichmentMap).
- Some Cytoscape apps will only work with a specific version of the core distribution of Cytoscape. It is important to check that you have the right version for the type of analysis you need to run.
- Cytoscape tasks can be automated through command-line arguments, although the number of features you can access this way is still limited.
- It is quite demanding in terms of computing resources when it comes to large-scale networks and it reaches a limit of what it can handle once networks become too large (hundreds of thousands of nodes and edges).
Non-programmatic options for large networks
A non-programmatic option for handling large networks is Gephi. Gephi is capable of dealing with hundreds of thousands of nodes, and millions of edges, albeit processing and especially drawing of such nets requires massive computer power.
Figure 22 Gephi is a non-programmatic tool for analysing large networks.
The benefits of Gephi are that it is open source, multi-platform, and has a wide range of advanced network-related algorithms (often not found anywhere else) in the form of plugins. The one disadvantage is the lack of any capability for processing specifically biological information. It is a general network tool, and should be treated as such and used for enumerating, statistics, and visualisation.
Programmatic solutions for large scale network analysis include packages such as igraph (for R, Python and C) or NetworkX (for Python). These are scripting packages that have much lower demand on your computer resources and are more amenable for automated tasks. This means they can be easily implemented as part of larger bioinformatics analysis pipelines. For example, the R implementation of igraph is often used hand-in-hand with other biostatistics packages available through this language.
Figure 23 igraph and NetworkX are programmatic solutions for large scale network analysis.