MAPPFinder over-representation analysis can be used to evaluate any subset of genes in your dataset. Typically, a set of genes is defined as significant or interesting by a combination of criteria using commonly used metrics such as fold changes and p-values, but any subset of genes can be evaluated by MAPPFinder. For example, the subset of genes can be comprised of genes that share a similar expression profile across the experiment, as determined by clustering algorithms. This approach tries to answer the question of what processes are over-represented in a set of genes that are co-regulated.
These instructions describe the process of running MAPPFinder analysis on cluster results, with two visualization modes; a graphical display and an interactive table display.
In addition to the typical pre-processing steps involved with using array data in GenMAPP, using cluster information also requires that the data is clustered before GenMAPP analysis, so that cluster results can be incorporated into the master spreadsheet that is imported to GenMAPP, as illustrated in the graphic below. The strategy used here is to incorporate a cluster assignment as an additional metric in the master spreadsheet, so that this can be used to create a GenMAPP Color Set criteria.
For the purpose of these instructions, an example dataset is used. The data is from Nathan Salomonis in the Conklin lab at the Gladstone Institutes (San Francisco) and examines the murine myometrium during pregnancy. There are multiple time points throughout pregnancy, at term and postpartum, all compared to tissue from non-pregnant mice.

Prior to cluster analysis, background adjustment, normalization and probe-level summarization should be performed. For more information on this, click here.
Example data: Background adjustment, normalization and probe-level summarization was done using the rma algorithm in Bioconductor.
Depending on the clustering algorithm you use, it may be necessary to filter the data before cluster analysis, due to restrictions on the number of genes that can be clustered. For example, you might choose to only cluster genes that are significantly changed in your dataset, based on some metric.
Example data: To filter data prior to clustering, the multtest package in Bioconductor was used. An F-test p-value < 0.05 and fold > 2 were used as cutoffs, which resulted in ~4000 genes as input for clustering.
Cluster analysis groups genes with similar patterns throughout the dataset. There are numerous algorithms for clustering array data.
Example data: The filtered dataset was clustered using the HOPACH clustering algorithm.
All data, including the cluster assignments need to be incorporated into the same spreadsheet before import to GenMAPP. For more information on how to combine data from multiple spreadsheets, click here.
Example data: The output from HOPACH was copied into the original filtered spreadsheet to incorporate the cluster assignments (specifically the "Cluster_Label" parameter from HOPACH).
The final master spreadsheet needs to the correctly formatted before import to GenMAPP. For more information on this, click here.
Example data: A System Code column was inserted as the second column, and filled with the System Code for Affymetrix (X).
Once the data is properly formatted, it can be imported to GenMAPP via the Expression Dataset Manager:
To create Color Sets for your dataset, use the Criteria Builder in the Expression Dataset Manager. Since the goal is to perform MAPPFinder analysis for each group of clustered genes, a Color Set containing different criteria for each cluster group.
Example data: A Color Set specific for the cluster results was created, with separate criteria for each cluster group. The criteria are based on the "Cluster_Label" parameter from HOPACH, which represents a higher level cutoff in terms of cluster assignment than the "Cluster_Number" parameter.

When the dataset is ready, the next step is to run MAPPFinder analysis on each of the subsets of genes represented by the various cluster groups.
When MAPPFinder completes, the MAPPFinder browser will open with the results for the last criteria selected. Results for other criteria are calculated as well and can be accessed in Excel. For more details on how to use the MAPPFinder application, click here.
Example data: MAPPFinder analysis was setup for the Color Set specific to the cluster results, including all criteria (all clusters).
The below graphical display combines the results of the MAPPFinder analysis with the original cluster heatmap. This type of display can be created in any image processing application, such as Illustrator.

A graphical display of the cluster results (heatmap) is available through most cluster applications. To combine this with the MAPPFinder results, either export the figure to a graphical format (jpg, bmp etc) or take a screenshot of the figure. Once the results exist in a graphical format, open it in Illustrator or similar program.
Example data: The HOPACH cluster output was reorganized and opened in the TreeView application (as a cdt file) to create a visual display. The figure was then captured with a screenshot and pasted into Illustrator.
The MAPPFinder results will be available as tab-delimited text files directly from MAPPFinder. The spreadsheet will contain all parameters reported by the MAPPFinder program, such as number of genes changed and z-score. At this point you should decide which parameter to display in the final figure.
If you selected multiple cluster groups for analysis, results for each will be represented as a separate text file. To transfer the results to a graphical format, perform the following steps:
Example data: In Excel, the column "Total/Changed" (shown in graphic above) was created by inserting a new column, in which a concatenation of the columns for "Number of genes changed", a "/" sign, and "Number of genes measured". This column was positioned next to the GO description column and the relevant rows of data for each cluster was copied from Excel to Illustrator and aligned with the appropriate cluster.
Under construction