GenMAPP Help Topics    
  GenMAPP Introduction   MAPP Sets
  Drafting Board   MAPPFinder
  Drafting Board Toolbar   MAPPBuilder
  The Gene Database   Downloader
  The Gene Database Manager   Advanced Concepts
  Expression Datasets   GenMAPP Knowledge Base
  Expression Dataset Manager   Converter

MAPPFinder Analysis of Clustering Results

MAPPFinder over-representation analysis can be used to evaluate any subset of genes in your dataset. Typically, a set of genes is defined as significant or interesting by a combination of criteria using commonly used metrics such as fold changes and p-values, but any subset of genes can be evaluated by MAPPFinder. For example, the subset of genes can be comprised of genes that share a similar expression profile across the experiment, as determined by clustering algorithms. This approach tries to answer the question of what processes are over-represented in a set of genes that are co-regulated.

These instructions describe the process of running MAPPFinder analysis on cluster results, and visualizing the results in a graphical display.

Preparing the data

In addition to the typical pre-processing steps involved with using array data in GenMAPP, using cluster information also requires that the data is clustered before GenMAPP analysis, so that cluster results can be incorporated into the master spreadsheet that is imported to GenMAPP, as illustrated in the graphic below. The strategy used here is to incorporate a cluster assignment as an additional metric in the master spreadsheet, so that this can be used to create a GenMAPP Color Set criteria.

Example Data

For the purpose of these instructions, an example dataset is used. The data is from Nathan Salomonis in the Conklin lab at the Gladstone Institutes (San Francisco) and examines the murine myometrium during pregnancy. There are multiple time points throughout pregnancy, at term and postpartum, all compared to tissue from non-pregnant mice.

 

Background adjustment, normalization and probe-level summarization

Prior to cluster analysis, background adjustment, normalization and probe-level summarization should be performed. For more information on this, click here.

Example data: Background adjustment, normalization and probe-level summarization was done using the rma algorithm in Bioconductor.

Filtering

Depending on the clustering algorithm you use, it may be necessary to filter the data before cluster analysis, due to restrictions on the number of genes that can be clustered. For example, you might choose to only cluster genes that are significantly changed in your dataset, based on some metric.

Example data: To filter data prior to clustering, the multtest package in Bioconductor was used. An F-test p-value < 0.05 and fold > 2 were used as cutoffs, which resulted in ~4000 genes as input for clustering.

Cluster analysis

Cluster analysis groups genes with similar patterns throughout the dataset. There are numerous algorithms for clustering array data.

Example data: The filtered dataset was clustered using the HOPACH clustering algorithm.

Combining all data in one spreadsheet

All data, including the cluster assignments need to be incorporated into the same spreadsheet before import to GenMAPP. For more information on how to combine data from multiple spreadsheets, click here.

Example data: The output from HOPACH was copied into the original filtered spreadsheet to incorporate the cluster assignments (specifically the "Cluster_Label" parameter from HOPACH).

Format data

The final master spreadsheet needs to the correctly formatted before import to GenMAPP. For more information on this, click here.

Example data: A System Code column was inserted as the second column, and filled with the System Code for Affymetrix (X).

Creating a GenMAPP dataset

Importing the data

Once the data is properly formatted, it can be imported to GenMAPP via the Expression Dataset Manager:

  1. Download and load the appropriate database in GenMAPP.
  2. In the Expression Dataset Manager, select File>New to begin the data import process. For details on data import, please refer to the Expression Dataset Manager.

Creating Color Sets

To create Color Sets for your dataset, use the Criteria Builder in the Expression Dataset Manager. Since the goal is to perform MAPPFinder analysis for each group of clustered genes, a Color Set containing different criteria for each cluster group.

Example data: A Color Set specific for the cluster results was created, with separate criteria for each cluster group. The criteria are based on the "Cluster_Label" parameter from HOPACH, which represents a higher level cutoff in terms of cluster assignment than the "Cluster_Number" parameter.

MAPPFinder analysis

Setting up the MAPPFinder analysis

When the dataset is ready, the next step is to run MAPPFinder analysis on each of the subsets of genes represented by the various cluster groups.

When MAPPFinder completes, the MAPPFinder browser will open with the results for the last criteria selected. Results for other criteria are calculated as well and can be accessed in Excel. For more details on how to use the MAPPFinder application, click here.

Example data: MAPPFinder analysis was setup for the Color Set specific to the cluster results, including all criteria (all clusters).

Visualization

Graphical cluster and GO results display

The below graphical display combines the results of the MAPPFinder analysis with the original cluster heatmap. This type of display can be created in any image processing application, such as Illustrator.

Cluster heatmap

A graphical display of the cluster results (heatmap) is available through most cluster applications. To combine this with the MAPPFinder results, either export the figure to a graphical format (jpg, bmp etc) or take a screenshot of the figure. Once the results exist in a graphical format, open it in Illustrator or similar program.

Example data: The HOPACH cluster output was reorganized and opened in the TreeView application (as a cdt file) to create a visual display. The figure was then captured with a screenshot and pasted into Illustrator.

MAPPFinder results

The MAPPFinder results will be available as tab-delimited text files directly from MAPPFinder. The spreadsheet will contain all parameters reported by the MAPPFinder program, such as number of genes changed and z-score. At this point you should decide which parameter to display in the final figure.

If you selected multiple cluster groups for analysis, results for each will be represented as a separate text file. To transfer the results to a graphical format, perform the following steps:

  1. Open one of the results files in Excel.
  2. Rearrange the columns to match what you want to keep in the display. For example, if you only want to show the Z-score and GO description, copy and paste the Z-score field next to the GO description field.
  3. Copy the rows of information you want to include for the cluster in question. For example, if the data is sorted based on descending Z-score, copy the GO description and Z-score for terms with a Z-score > 2.5.
  4. Paste the table into the file with the cluster graphic and align it with the appropriate cluster.
  5. Repeat these steps with each results file for all clusters.

Example data: In Excel, the column "Total/Changed" (shown in graphic above) was created by inserting a new column, in which a concatenation of the columns for "Number of genes changed", a "/" sign, and "Number of genes measured". This column was positioned next to the GO description column and the relevant rows of data for each cluster was copied from Excel to Illustrator and aligned with the appropriate cluster.