GO Layout User Manual

Quickstart

Overview

The GO Layout Plugin uses annotation from the Gene Ontology (www.geneontology.org) to arrange networks into pathway-style subnetworks corresponding to biological processes, adding a new perspective to large networks and serving as a starting point for pathway curation.

The GO Layout algorithm works by first partitioning the network into a series of subnetworks based on the Biological Process annotation of nodes. For example, nodes annotated as "cell differentiation" will be assigned to one subnetwork and nodes annotated as "transcription" to another subnetwork. Next, for each subnetwork, a cell-based layout is applied based on the Cellular Component annotation for each node, using a graphical template to define regions of the cell (plasma membrane, nucleus, mitochondrion etc). Finally, nodes are colored based on Molecular Function annotation. For details on the algorithm, see Running GO Layout.

The below figure shows a subnetwork created by GO Layout, representing the GO Biological Process term "response to stress".


Installation

Currently, the GO Layout Plugin is available for manual installation:

Loading a network and node attributes

To run GO Layout, you must have a network containing the GO annotation as node attributes loaded in Cytoscape. Unless you are working with a session file that contains GO annotation, you will need to load annotation as node attributes for your network of interest.

Loading a network

  1. Under File>Import>Network... select a network for import. If you load a session file under File>Open, proceed to Running GO Layout below.
Note: Due to a bug in Cytoscape 2.6.3 and earlier, when using gene symbols as a key for ontology annotation (next step), they need to be all uppercase. To convert to uppercase, use control-A to select all nodes in the network view. Then, if you click on the third most right icon for the ‘batch editor’ at the right side of toolbar in the Data Panel, you will get a ‘Node Attribute Batch Editor’ dialog which will allow you to set all canonical names to uppercase.

Loading GO annotation as node attributes

  1. Select File>Import>Ontology and Annotation.
Note: GO Layout is designed to work with GO Slim annotations specifically, as opposed to full GO annotations. It is not recommended to run GO Layout using full GO annotation as the partitioning algorithm will generate a very large number of subnetworks due to the number of terms in GO.
  1. Under Advanced, check the box for Show Mapping Options. This will allow you to select which column in the gene association file to link to your network. Select the appropriate IDs under Key Column in Annotation File and Key Attribute for Network (e.g., canonicalName). In the bottom left corner of the window, check the number of matches (Key Matched) to ensure you have chosen the correct mapping. Note that Key Matched represents the number of matches in the first 100 records only.
  1. Finish the import by clicking the Import button.

  1. After import, check that the GO annotation has loaded as node attributes by selecting a few nodes and in Data Panel, click on the Select Attributes button. Select the GO annotations in the list of attributes, these will be prefixed by GO.Annotation. Check that the selected nodes have values for the GO attributes.

For detailed instructions on how to use the Ontology and Annotation Importer, see the Cytoscape users manual.

Running GO Layout

To run GO Layout with the default settings, select Layout>GO Layout>GO Layout. With the default settings, this will partition the network into subnetworks based on biological process, apply a cell-based layout to each subnetwork ("floorplan") and color nodes based on molecular function. Below is a detailed description of each part of this algorithm.

Partitioning

The network is partitioned into a set of subnetworks based on the biological process annotation of the nodes. For example, all nodes annotated as "translation" are placed in a new network, nodes annotated as "cell cycle" are placed in a different network etc. Subnetworks with more than 5 nodes but less than 200 nodes will be displayed by default. Networks outside these thresholds are created and can easily be displayed by right-clicking on the network in the Network tab of the Control Panel and selecting Create View. If the total number of subnetworks exceeds 100, a warning message will appear where you can choose to continue or quit.

Some nodes may be annotated as being part of more than one biological process, such nodes will be replicated between subnetworks. For example, if node A is annotated as both "cell cycle" and "cell growth", there will be one replicate of node A in each of those subnetworks.

During partitioning, an overview network is also created, with each node representing a biological process subnetwork. See the Overview network section for details.

Floorplan (Cell-based layout)

GO Layout performs cell-based layout using a hard-coded cell template. This template defines regions corresponding to cellular compartments and locations, as well as region colors. The cellular template currently encodes 6 regions: Extracellular, plasma membrane, cytoplasm, nucleus, endoplasmic reticulum and mitochondrion. Nodes annotated with a certain cellular component will be positioned in the region corresponding to this cellular component. If no nodes are found to match the template, GO Layout will skip the cellular layout for the particular subnetwork in question. Once nodes have been assigned to regions and positioned, a force-directed layout is applied within each region, with the exception of the plasma membrane region which has a linear layout.

Some nodes may be annotated as being located in more than one cellular component, such nodes will be replicated between regions. For example, a node annotated as "nucleus" and "cytoplasm" will be replicated for each of these regions. A suffix will be added to node IDs for replicated nodes, for example replicates of SNX1 become SNX1__1, SNX1__2 and so on, while the "canonicalName" of the node is retained as the original node ID and used as the node label. In addition, any nodes that appear in replicates in the network will be given a red border, to facilitate finding these nodes.

Because of node replication, relevant edges are also copied. For example, if SNX1 is annotated as both "endoplasmic reticulum" and "cytoplasm", and SNX1 has an edge to SNX2 in the original network, then SNX1 will be replicated in both the ER and cytoplasm regions, and each replicate will have an edge to SNX2.

The floorplan algorithm stores two additional node attributes, and one edge attribute:

_cellularLayoutRegion: The region assignment of the node in the cellular layout.

_isInMultipleRegions: Indicates if the node is replicated and therefore present in more than one region.

_isEdgeToUnassigned: Indicates if the edge connects to a node assigned to the "unassigned" region, that is the GO Cellular Component annotation is "unassigned".

Node coloring

The last step in the GO Layout algorithm is to apply color to nodes based on their molecular function annotation. For example, all nodes annotated as "ion transport" may be colored green, all nodes annotated as "protein binding" may be colored yellow etc. If a node has multiple molecular function annotations, the first annotation in the list will determine node color.

Settings

GO Layout can be customized to change which attribute to use for each part of the algorithm, and to define the details of partitioning and floorplan algorithms. To change any of the attribute used, go to Layout>Settings and select GO Layout from the Layout Algorithm drop-down.

Partitioning attribute

The attribute by which the nodes will initially be partitioned into subnetworks. The default is GO biological process annotation.

Layout attribute

The attribute used for node layout within each subnetwork. The default is GO cellular component annotation.

Node coloring attribute

The attribute used for node coloring. The default is GO molecular function annotation.

Partition settings

For the partitioning algorithm, the user can define the lower and upper node count thresholds to use for determining if a subnetwork will be shown. The default thresholds are 5 and 200 nodes, meaning that subnetworks with less than 5 nodes or more than 200 nodes will not be shown. Although networks outside the thresholds are not automatically visualized, they are created and can easily be visualized by selecting the network in the network tab of the control panel and right-clicking to select Create View.

Floorplan settings

For the layout of subnetworks, there are two additional user-defined settings:

Advanced options

GO Layout is flexible in terms of which attributes are used for each part of the algorithm. Which attribute to use is defined in the Settings menu, under Layout>Settings>GO Layout. The Settings menu also allows for avoiding a particular step in the algorithm altogether, but selecting "none" as the attribute.

To run partitioning only, set the "layout" and "node color" attributes to "none". Likewise, to skip the partitioning step to only run the floorplan and coloring, set the "partitioning" attribute to "none".

Note that the selected "layout" attribute must correspond to values that match the regions defined in the layout template. The regions are currently hard-coded as GO cellular components, but will be customizable through a template in future versions. When running floorplan only (no partitioning), if no nodes are found to match the template GO Layout will abort.

Navigating the results

When GO Layout is complete, subnetworks will be displayed as tiles as shown below. Included in this tiled view is the overview network (top left tile) and the original network (bottom right tile). All networks are also listed in the Network tab of the Control Panel.

Subnetwork view

Each subnetwork representing a Biological Process is laid out according to a cellular template defining the position, size and color of a set of cellular regions corresponding to GO Cellular Components. Nodes that have the annotation "unassigned" for Cellular Component are placed on the far right, separated from the rest of the network with a dashed line.

The network can be manipulated like any other Cytoscape network:

Mapping data on subnetworks

Because of the node replication strategy used by GO Layout (explained under Floorplan) replicated nodes will be assigned new node IDs. For example, if a node previously identified as CDC5 is replicated 2 times, the three nodes will be identified as CDC5__1, CDC5__2 and CDC5__3. Consequently, mapping data to subnetworks cannot be done using the node ID. Since each node retains the original identification as the "canonicalName" attribute , data can be mapped using the table importer under Import>Attribute From Table. During import, be sure to designate the "canonicalName" as the network key to map to.

Overview network

In addition to the subnetworks, an overview network is also created in which each subnetwork is represented as a node. The node size corresponds to the number of nodes in each subnetwork, and the weight of the edge between two nodes corresponds to the number of connections between nodes in the two subnetworks. The overview network has a degree sorted circle layout.