The Expression Dataset Manager is the interface within the GenMAPP program that is used to import gene expression data and customize the presentation of the data on MAPP files. An Expression Dataset is a GenMAPP-created file with the extension .gex; for example MyExperiment.gex. Functions of the Expression Dataset Manager include:
You may wish to view the Interactive Tutorial first.
To reach the Expression Dataset Manager from the Drafting Board, left-click the Data menu and choose Expression Dataset Manager. If your MAPP currently has an Expression Dataset applied to it, that Expression Dataset opens up in the Manager. Similarly, if a Color Set from that Expression Dataset was applied, the Expression Dataset opens with that Color Set exhibited.
If the GenMAPP program is not already open on your computer, you may double-left-click on an Expression Dataset (.gex file). This launches GenMAPP and opens the Expression Dataset Manager. The Expression Dataset you double-clicked on is displayed in the window, active and ready for manipulation.
Either way you enter the Expression Dataset Manager, you may not open an Expression Dataset that is being used in another instance of GenMAPP.
Once in the Manager, you may change Expression Datasets by left-clicking Expression Datasets and choose Open.
To exit the Expression Dataset Manager, either left-click the close button in the upper-right corner of the window or left-click on the Expression Datasets menu and choose Exit. The Manager prompts you if you have made changes to your Expression Dataset and have not saved them. When you leave the Manager, the Expression Dataset and Color Set with which you were working become the ones controlling the display on the MAPP for the current session.
The Expression Dataset Manager window is organized so that general information about your dataset appears at the top, specific information about the Color Set appears in the middle, and information about individual Criteria within a Color Set appears at the bottom of the window.

The Expression Datasets Menu provides the general functions needed to manage the entire Expression Dataset.
Clears any Expression Dataset currently in the Manager, offering you the option of saving it if changes have been made, and starts the process of importing new expression data.
Clears any Expression Dataset currently in the Manager, offering you the option of saving it if changes have been made, asks you for an exception file to process, and opens the Expression Dataset you built when the exception file was produced.
Clears any Expression Dataset currently in the Manager, offering you the option of saving it if changes have been made, and opens a different Expression Dataset. If an Expression Dataset is being used to color MAPPs in another instance of the GenMAPP program, you may not open that Expression Dataset for editing in the Manager.
| Note: The GenMAPP Expression Dataset Manager will not open an Expression Dataset that has been made "read-only" through the Windows operating system. |
Permanently saves any changes to the current Expression Dataset, including changes made to Color Sets.
| Note: Saving an Expression Dataset using the Save option from the Expression Datasets menu takes some time depending on the size of your dataset. This is because GenMAPP is modifying the already created .gex file. During the Save process, you will see a progress bar at the bottom of the Expression Dataset Manager. |
Allows you to choose the Gene Database that you want to use to link your MAPP to your Expression Dataset.
Permanently saves the Expression Dataset under a new name that you choose. This is one way to create a new dataset based on a previous one. However, Save As reproduces all the information, including the gene expression data, which may be quite extensive. It may be more efficient to create new Color Sets for the current dataset.
Closes the Expression Dataset Manager, offering you the option of saving your Expression Dataset if changes have been made, returns you to the Drafting Board, and applies the Expression Dataset to the MAPP.
To visualize your gene expression data on a MAPP, the Expression Dataset Manager must convert your raw data into the GenMAPP Expression Dataset format. In the conversion process, the Manager verifies that each gene identifier in your data exists in the GenMAPP Gene Database, checks for errors in your raw data format, and creates the Expression Dataset (.gex) file. If the Manager finds any formatting errors or cannot locate a gene identifier in the chosen Gene Database, it also creates an exception file during the conversion process. Entries from the exception file may later be added to the Expression Dataset when you Process Exceptions.
The steps in a typical process for creating a new Expression Dataset are listed below and described in more detail in subsequent sections.
GenMAPP accepts gene expression data from both custom arrays and commercial microarrays such as Affymetrix GeneChips. Although GenMAPP was designed specifically to visualize microarray data, GenMAPP can be used to view data from any large gene or protein dataset such as those generated from protein chips or other large-scale assays for protein function or interactions.
In order for the Expression Dataset Manager to accurately convert your raw data file into a GenMAPP Expression Dataset, your raw data file must be in the proper format. The Expression Dataset Manager imports files that are in a comma-separated-values format (.csv) or tab-delimited format (.txt or .tab). Both formats are typically exported by spreadsheet or database programs (e.g., Microsoft Excel).
If you choose tab-delimited format from Excel, be sure to choose Text (Tab delimited) (*.txt) from the Save As dialog. Do not choose UniCode Text (.txt).
It is very unlikely that data coming directly from other microarray analysis software will already be in a format acceptable to GenMAPP. You will need to format your data using the instructions below with a spreadsheet or database program, and save it as a comma-separated values (.csv) or a tab-delimited (.txt or .tab) file. Furthermore, any calculations or statistical analysis of your data must be prepared outside of GenMAPP and imported with your raw data file into an Expression Dataset. The columns and rows of the raw data file must be organized such that they contain specific information in the proper place.
The first column of each line of data must contain a valid gene identifier. The gene identifier links your data to a particular gene object on a MAPP for use in coloring the gene object. The second column of each line must contain the system code for that type of gene identifier. Version 2.0 of GenMAPP supports several types of gene identifiers, all listed in the table below. The applicable species for each gene ID system are shown. Different gene ID types can be used in the same Expression Dataset, but they must all be from the same species.
While all the below ID systems are valid for import into GenMAPP, some are not appropriate for use as gene identifiers. Those systems include InterPro, Pfam and GO. Since IDs from these systems do not actually identify genes, using them as such is not recommended. However, GenMAPP does allow the use of these IDs for import to allow for advanced use of the GenMAPP program.
| Gene ID System | System Code | Applicable Species | Example |
| Ensembl | En | B.taurus, C.elegans, C.familiaris, D.melanogaster, D.rerio, G.gallus, H.sapiens, M.musculus, R.norvegicus, S.cerevisiae | ENSMUSG00000027793 |
| UniProt/TrEMBL | S | B.taurus, C.elegans, C.familiaris, D.melanogaster, D.rerio, G.gallus, H.sapiens, M.musculus, R.norvegicus, S.cerevisiae | A2A2_HUMAN or O94973 |
| Entrez Gene | L | B.taurus, C.elegans, C.familiaris, D.melanogaster, D.rerio, G.gallus, H.sapiens, M.musculus, R.norvegicus, S.cerevisiae | 68377 |
| RefSeq (NM_xxxxxx only) | Q | B.taurus, C.elegans, C.familiaris, D.melanogaster, D.rerio, G.gallus, H.sapiens, M.musculus, R.norvegicus, S.cerevisiae | NP_031407, NM_016749 |
| Unigene | U | B.taurus, C.elegans, C.familiaris, D.melanogaster, D.rerio, G.gallus, H.sapiens, M.musculus, R.norvegicus | Hs.451376 |
| Affymetrix Probe Set ID (Affy) | X | B.taurus, C.elegans, C.familiaris, D.melanogaster, D.rerio, G.gallus, H.sapiens, M.musculus, R.norvegicus, S.cerevisiae | 100014_at |
| Agilent | Ag | D.rerio, H.sapiens, M.musculus, R.norvegicus | A_23_P99657 |
| Codelink | Ge | H.sapiens, M.musculus, R.norvegicus | GE470237 |
| Illumina | Il | H.sapiens, M.musculus | ILMN_97836 |
| HUGO | H | H.sapiens, B.taurus, G.gallus, C.familiaris | 16129 |
| WormBase | W | C.elegans | CE00005 |
| FlyBase | F | D.melanogaster | FBgn0000043 |
| ZFIN | Z | D.rerio | ZDB-GENE-000329-1 |
| Mouse Genome Informatics (MGI) | M | M.musculus | MGI:1194500 |
| Rat Genome Database (RGD) | R | R.norvegicus | RGD:70907 |
| Saccharomyces Genome Database (SGD) | D | S.cerevisiae | S0000157 |
| PDB | Pd | B.taurus, C.elegans, C.familiaris, D.melanogaster, D.rerio, G.gallus, H.sapiens, M.musculus, R.norvegicus, S.cerevisiae | 15C8 |
| EMBL | Em | B.taurus, C.elegans, C.familiaris, D.melanogaster, D.rerio, G.gallus, H.sapiens, M.musculus, R.norvegicus, S.cerevisiae | AA000715 |
| Other | O | - | 12345 |
The “Other” gene ID system contains gene identifiers added by the user to a local copy of the Gene Database. A gene identifier can be added to the Other table in the local copy of the Gene Database by using the Gene Finder, by processing exceptions to an Expression Dataset, or by editing the Gene Database directly in the Gene Database Manager. Situations where a user might add a gene identifier to the Other category might include adding genes with accession numbers from some other gene classification such as commercial cDNA clone sets. A gene identifier of type Other must be 30 characters or fewer. Commas, quotes (single or double), the dollar sign or the apostrophe (') may not be used in an Other gene identifier. Other non-alphanumeric characters are allowed.
The gene identifiers must occur in the first column of your raw data file, and the system code must occur in the second column. Each of the remaining columns contain the data associated with each gene. You may have an unlimited number of data columns. Do not use completely blank columns as spacers in your raw data file. The number of rows in your raw data file is unlimited.
| Gene ID | System Code | Heading | Heading | Heading | Heading | Heading | Remarks |
| Gene ID 1 | En | DATA | DATA | DATA | DATA | DATA | hyperlink |
| Gene ID 2 | En | DATA | DATA | DATA | DATA | DATA | hyperlink |
| Gene ID 3 | En | DATA | DATA | DATA | DATA | DATA | hyperlink |
| Gene ID 4 | En | DATA | DATA | DATA | DATA | DATA | Remark |
| Gene ID 5 | En | DATA | DATA | DATA | DATA | DATA | Remark |
| Gene ID 6 | En | DATA | DATA | DATA | DATA | DATA | Remark |
The values in each data column of your raw data file may be one of two different data types, numeric or character (short text), which you designate in the Data Type Specification window after you begin the data import process. However, the data types of the values within each column must be consistent. They must be all numeric values or all character values. You may also include a single column named Remarks that can contain an unlimited number of characters. Your raw data file may also have missing values. Be sure that you have the proper delimiters, either comma or tab even though the data for that column is missing. For example, the second column below contains no data:
111,,333,444
A sample raw data file might look like this if viewed in a spreadsheet program:
| Gene ID | System Code | Control Average | Treated Average | Fold Change | Quality | Remarks | |
| ENSMUSG00000000001 | En | 250 | 200 | 0.8 | 0.02 | high | www.mywebsite.org |
| ENSMUSG00000000253 | En | 20 | 30 | 1.5 | 0.05 | low | http://www.mywebsite.org/ |
| ENSMUSG00000000384 | En | 20 | 30 | 1.5 | 0.37 | high | http://www.mywebsite.org/ |
| ENSMUSG00000000579 | En | 50 | 300 | 6 | 0.1 | high | |
| ENSMUSG00000000901 | En | 20 | 30 | 1.5 | 0.46 | low | |
| ENSMUSG00000001225 | En | 20 | 30 | 1.5 | 0.81 | high |
The headings of the first two columns are irrelevant. They will always be interpreted as the gene identifier and that identifiers system code. The column headings for the subsequent data columns are limited to 50 characters. Certain non-alphanumeric characters may not be used in column headings: the accent(`), exclamation point(!), dollar sign($), open or close brackets([]), period(.), comma(,), bar(|), or double quotes("). Furthermore, the word "notes" may not be used as column headings; it is reserved for use by the GenMAPP program. One column may be named Remarks (see below for explanation). You may not use the same column heading for two different columns.
| Note to Excel Users: If you try to load a tab-delimited text or comma-separated values file with the first column headed "ID" into Microsoft Excel 2000 , it will crash with a "SYLK" error. This is an acknowledged Microsoft bug and, as of this writing, has no fix. |
The Expression Dataset Manager stores numerical values as real, floating-point numbers. Do not use commas, quotes, or any other symbol in your numeric fields. Your raw data file may express extremely large or small numbers in standard or scientific notation (E notation, for example 1.234E-12). On gene Backpages or as Gene Values, they display in their shortest form, standard or E notation. An Expression Dataset stores numeric data up to six significant digits. The columns Control Average, Treated Average, Fold Change, and p-value contain the numeric data type in the example above.
The Expression Dataset Manager accepts character (text) values in your raw data file. This data type is limited to 30 characters, including spaces. Numbers in a column designated as containing character data are treated as character. You may use all non-alphanumeric characters in character fields except for commas, double quotes, and the dollar sign. (There are additional limitations on column headings.) This data type is intended to be used for short strings of text, not for long annotations. It can be used to build criteria. The column Quality contains the character data type in the example above.
You may also include a single column called Remarks that can contain an unlimited number of characters. (Technical note: it is a memo field.) Anything entered in the Remarks column will be displayed under the Remarks heading on the Backpage for each gene. You may use all non-alphanumeric characters in a Remarks field except for commas, double quotes, and the dollar sign. The Remarks column cannot be used when building criteria for a Color Set. Since Remarks are only displayed within web browsers, even in GenMAPP, you may use HTTP, such as web links, in your Remarks.
If values are missing from your raw data file, you may still import the data into an Expression Dataset. The missing values are treated as NULL values by the Expression Dataset Manager. You may create Criteria that check for NULL values in your Expression Dataset as part of a Color Set. Instructions are found here. Note that a missing value is one which is still surrounded by the delimiters for that type of file; otherwise it is a missing column. For example in a comma-separated-values file, the second column in the following has a missing value:
111,,333,444
To accommodate certain spreadsheet programs, notably Microsoft Excel, GenMAPP allows a row to have one missing column before it raises an exception.
In order for the Expression Dataset Manager to accurately convert your raw data file into a GenMAPP Expression Dataset, your raw data file must be in the proper file format. The Expression Dataset Manager imports files that are in a comma-separated values format (.csv), or tab-delimited text format (.txt or .tab), which are both typical export choices for spreadsheet or database programs (e.g., Microsoft Excel or Access).
A comma-separated values file maintains each row of data as separate line and delimits values in each field with commas. Character values may or may not be in quotes. For example, a raw data file that looks that like this in a spreadsheet:

might look like this:
Gene ID,System Code,Control Average,Treated Average,Fold Change,p-value,Quality,Remarks ENSMUSG00000000001,En,250,200,0.8,0.02,high,www.mywebsite.org ENSMUSG00000000253,En,20,30,1.5,0.05,low,http://www.mywebsite.org/ ENSMUSG00000000384,En,20,30,1.5,0.37,high,http://www.mywebsite.org/ ENSMUSG00000000579,En,50,300,6,0.1,high, ENSMUSG00000000901,En,20,30,1.5,0.46,low, ENSMUSG00000001225,En,20,30,1.5,0.81,high,
Or like this:
"Gene ID","System Code","Control Average","Treated Average","Fold Change","p-value","Quality","Remarks" "ENSMUSG00000000001","En","250","200","0.8","0.02","high","<a href="http://www.mywebsite.org">My website</a>" "ENSMUSG00000000253","En","20","30","1.5","0.05","low","<a href="http://www.mywebsite.org">My website</a>" "ENSMUSG00000000384","En","20","30","1.5","0.37","high","<a href="http://www.mywebsite.org">My website</a>" "ENSMUSG00000000579","En","50","300","6","0.1","high","" "ENSMUSG00000000901","En","20","30","1.5","0.46","low","" "ENSMUSG00000001225","En","20","30","1.5","0.81","low",""
when exported as a comma-separated values (.csv) file and viewed in a text editor.
Certain non-alphanumeric characters may not be used anywhere in your comma-separated values (.csv) file. No fields may include commas because commas separate fields. Also, you may not use the dollar sign. There are additional limitations on column headings.
Most database and spreadsheet programs can export comma-separated values files. Be sure to direct the program to include the column headings as the first line if you are given the option. (Microsoft Excel, for example, includes them without asking.) Be aware that many spreadsheet and database programs will not correctly handle double quotes (") within a field that is also within quotes, such as the http link in the example above.
In order for the Expression Dataset Manager to accurately convert your raw data file into a GenMAPP Expression Dataset, your raw data file must be in the proper file format. The Expression Dataset Manager imports files that are in a comma-separated values format (.csv), or tab-delimited format (.txt or .tab), which are both typically export choices for spreadsheet or database programs (e.g., Microsoft Excel or Access).
A tab-delimited file maintains each row of data as separate line and delimits values in each field with tabs. Character values may or may not be in quotes. For example, a raw data file that looks like this in a spreadsheet:

might look like this (where » indicates a tab):
Gene ID»System Code»Control Average»Treated Average»Fold Change»p-value»Quality»Remarks ENSMUSG00000000001»En»250»200»0.8»0.02»high»<a href="http://www.mywebsite.org">My website</a> ENSMUSG00000000253»En»20»30»1.5»0.05»low»<a href="http://www.mywebsite.org">My website</a> ENSMUSG00000000384»En»20»30»1.5»0.37»high»<a href="http://www.mywebsite.org">My website</a> ENSMUSG00000000579»En»50»300»6»0.1»high» ENSMUSG00000000901»En»20»30»1.5»0.46»low» ENSMUSG00000001225»En»20»30»1.5»0.81»low»
Or like this:
"Gene ID"»"System Code"»"Control Average"»"Treated Average"»"Fold Change"»"p-value"»"Quality"»"Remarks" "ENSMUSG00000000001"»"En"»"250"»"200"»"0.8"»"0.02"»"high"»"<a href="http://www.mywebsite.org">My website</a>" "ENSMUSG00000000253"»"En"»"20"»"30"»"1.5"»"0.05"»"low"»"<a href="http://www.mywebsite.org">My website</a>" "ENSMUSG00000000384"»"En"»"20"»"30"»"1.5"»"0.37"»"high"»"<a href="http://www.mywebsite.org">My website</a>" "ENSMUSG00000000579"»"En"»"50"»"300"»"6"»"0.1"»"high"»"" "ENSMUSG00000000901"»"En"»"20"»"30"»"1.5"»"0.46"»"low"»"" "ENSMUSG00000001225"»"En"»"20"»"30"»"1.5"»"0.81"»"low"»""
when exported as a tab-delimited (.txt) file and viewed in a text editor.
No fields may include tabs because tabs separate fields. Also, you may not use the dollar sign or double quotes. There are additional limitations on column headings.
Most database and spreadsheet programs can export tab-delimited values files. Be sure to direct the program to include the column headings as the first line if you are given the option. (Microsoft Excel, for example, includes them without asking.) Be aware that many spreadsheet and database programs will not correctly handle double quotes (") within a field that is also within quotes, such as the http link in the example above.
After you have prepared your raw data file according to the guidelines in the Raw Data File Format section, begin the conversion process in the Expression Dataset Manager by choosing the Expression Datasets > New Dataset menu item. Select your raw data file (e.g., MyExperiment.csv or MyExperiment.txt), from the file dialog box that appears and click OK. If an Expression Dataset (.gex file) already exists with that name in the same folder, you will be asked if you want to overwrite the existing file.
The Expression Dataset Manager checks to see if the second column of your raw data file contains valid system codes for the gene IDs you have used. If the system codes you have used are not found in the Gene Database or if you have left out the system code column entirely, you will see the following error window:

You should cancel the data import process by clicking No and go back and fix your raw data file before attempting to import again.
The Expression Dataset Manager checks to see if the column headings in your raw data file exist and are formatted correctly. The following are examples of error windows that may appear at this step in the data import process.
If a column heading is missing, you will be asked to supply it at this time. Enter a heading that is 50 characters or fewer and click OK to proceed. Clicking Cancel will abort the Expression Dataset conversion.

Certain non-alphanumeric characters may not be used in column headings: the accent(`), exclamation point(!), dollar sign($), open or close brackets([]), period(.), comma(,), bar(|), or double quotes("). If your raw data file contains them, the Invalid Column Heading window allows you to change the headings in question.
Enter a column heading that is 50 characters or fewer and click OK to proceed. Clicking Cancel aborts the Expression Dataset conversion.
You may not use the same column heading for two columns. If the Expression Dataset Manager encounters duplicate column headings, a warning displays.

Enter a column heading that is 50 characters or fewer and click OK to proceed. Clicking Cancel aborts the Expression Dataset conversion.
If any column heading in your raw data file is longer than 50 characters, a warning displays.

Enter a column heading that is 50 characters or fewer and click OK to proceed. Clicking Cancel aborts the Expression Dataset conversion.
There are also other restrictions on the format of the raw data file.
If any character data field has more than 30 characters, GenMAPP will truncate the field to 30 characters and display a message at the end of the conversion process
The "~Error~" column in the exception file (MyExperiment.EX.txt) will show which data fields were truncated.
If any field in the second row of your raw data file is empty, GenMAPP will display a warning message.

Check the Data Type Specification window to confirm that GenMAPP was correct in guessing the data type of the column containing the missing field.
When you begin the conversion from raw data to a GenMAPP Expression Dataset, the Expression Dataset Manager asks you to specify the data type in each column as either numeric or character. The Expression Dataset Manager recognizes both numeric and character data, and accepts missing values for either. The data type must be consistent within each column; a column designated as numeric must contain only numeric values, and a character column can contain values consisting of up to 30 characters, including spaces. You may also have a single column called Remarks that can contain an unlimited number of characters. (Technical note: it is a memo field.)
The Expression Dataset Manager presents you with a Data Type Specification window that lists the headings from your raw data file. The Manager allows you to specify each column as either numeric or character but defaults to the type assumed from the first row of data. Look at the checkboxes, change any that are incorrect, and click OK. Alternatively, you may click the Cancel button to abort the data importing process. After this point, you cannot cancel the conversion process. You can, of course, allow it to complete, edit your raw data, and start the conversion process again.

After you have designated the data type in the Data Type Specification window, the conversion of your raw data file to a GenMAPP Expression Dataset begins. The Expression Dataset Manager displays the progress of the data import process in a progress bar at the bottom of the window. It also displays the number of errors encountered in the raw data file. The conversion may take a few minutes depending on the size of your dataset and your computer’s memory and processor speed. When the process is complete, your converted dataset is displayed in the Expression Dataset Manager window active and ready to be manipulated. The file is saved in the same folder as your raw data file, with a .gex extension (e.g., MyExperiment.gex).
As your raw data file is converted row by row, the Expression Dataset Manager searches for a match of the gene identifier and system code (the values in the first two columns) in your Gene Database. Finding no match produces an entry in the exception file.
If the Expression Dataset Manager finds no serious problems with your raw data file, it produces an Expression Dataset and saves it in the same folder and with the same name as your raw data file except that the extension is .gex. Your Expression Dataset will be open in the Expression Dataset Manager and ready for Customizing Your Expression Dataset with Color Sets. For example, if you convert a tab-delimited text file named MyExperiment.txt, the Expression Dataset Manager produces MyExperiment.gex.
If the Expression Dataset Manager finds lines it cannot convert or genes it cannot find in your Gene Database, it returns an error for each of those lines. This error can be found in the exception file, which will be of the same type as your input data (e.g. .txt or .csv). For example, errors from your file MyExperiment.csv, will be stored in a file called MyExperiment.EX.csv, in the same folder as your raw data file and Expression Dataset. The error will be in an "~Error~" column appended to each line. This column contains either error messages or, if Expression Dataset Manager finds no errors, a single space character.
Some examples of conversion errors are:
You may make corrections in this file using either a text editor or importing it into a spreadsheet or database program (the preferred method). Correct the lines of data with errors and save the file with the same name (e.g., MyExperiment.EX.csv or MyExperiment.EX.txt). For example, in Excel, you can filter to see only the lines with errors by using "NonBlanks" as your filter criterion. You need not remove the ~"Error~" column; the Expression Dataset Manager will ignore it when you process exceptions.
Once you have made the corrections, you can add these lines of data to your Expression Dataset. In the Expression Dataset Manager, choose Expression Datasets > Process Exceptions.
A MAPP is a GenMAPP-produced file format that shows a biological relationship between genes or gene products. Each MAPP contains gene objects that represent biological genes or gene products. Each gene is identified by a code for a particular gene ID system (for example, En for Ensembl) and the identifier for that system (for example ENSMUSG00000001225). That gene identifier links information from your Gene Database to the gene object for annotation shown on the Backpage. The gene identifier also links that gene object to data in an Expression Dataset for coloring the gene object.
As your raw data file is converted row by row, the Expression Dataset Manager searches for a match of the gene identifier and code (the values in the first two columns,) in your Gene Database. Finding no match produces an entry in the exception file.
Use the Process Exceptions function in the Expression Dataset Manager to add the data in an exception file to the Expression Dataset (.gex file) from which it was created. This may be done immediately after importing the Expression Dataset, or it may be done at any time later. To begin, choose the Expression Dataset > Process Exceptions menu item and pick your exception file from the dialog window that appears (e.g., MyExperiment.EX.csv). If the Expression Dataset active in the Manager is not the one the exception file came from, the Manager will open the correct Expression Dataset as long as the two files are kept in the same folder on your system.
When the Expression Dataset Manager processes the exception file, it performs a similar import process as when you originally imported your data. As your exception file is converted row by row, the Expression Dataset Manager searches for a match of the gene identifier and system code (the values in the first two columns) in your Gene Database.
On the first conversion attempt of your raw data file, any genes that are not found in your Gene Database are classified as errors. When you process exceptions, you will have the option to add all unidentified gene IDs to the Other table of your Gene Database.

Click Yes to do so. Click No if you do not want the gene IDs added to the Other table. The Expression Dataset Manager will create another exception file showing the same error message as before. Remember that if you add IDs to the Other category, they will not color existing MAPPs in GenMAPP. In order to color MAPPs using Expression Datasets with IDs added to the Other category, you will need to create your own MAPPs containing the newly added IDs.
Remember that any gene identifications you add through this process are stored in your local copy of the Gene Database (the one residing on your compiter, not the version of the Gene Database distributed through the Downloader). For example, you may be aware of a gene recently submitted to a public resource that is not yet in the Gene Database and want to add that to your local database. However, be careful when adding gene identifications to the local Gene Database; they become a permanent part of your local database.
An Other gene identification must be 10 characters or fewer. Commas, quotes (single or double), or the dollar sign may not be used as an Other gene identification. Other non-alphanumeric characters are allowed.
Once your Expression Dataset has been imported into GenMAPP, you can begin to customize it in the Expression Dataset Manager. You may include some general information about your Expression Dataset in the general information fields at the top of the Expression Dataset Manager. Add Color Sets that contain the Criteria that instruct GenMAPP how to display your data on MAPPs.
This field shows the name of the Expression Dataset file currently open in the Manager (e.g., MyExperiment). The Manager assigns this name when a new Expression Dataset is imported. It is the same name as the raw data file that is converted to the GenMAPP .gex format and cannot be changed from within the Manager. If you change the name of the .gex file in Windows, the next time you open that Expression Dataset in the Manager, the new name appears.
You may enter a brief description of the Expression Dataset or other information in the Remarks field. Remarks are limited to 50 characters. The Remarks may be displayed on the Legend on the MAPP, if this option is checked in the Options window. This is a good place to put a reference for the Expression Dataset.
Information in this field can only be viewed in the Expression Dataset Manager. The Notes field is unlimited in size. If you run out of room in the Remarks field, this is a good place to put the full information. This is a good place to put an explanation of the column headings for the Expression Dataset.

Color Sets contain the instructions to GenMAPP for displaying data from an Expression Dataset on MAPPs. To create a Color Set you must fill in several different fields in the Color Set area of the Expression Dataset Manager:
You may have an unlimited number of different Color Sets associated with each Expression Dataset. When you view a MAPP on the Drafting Board, you can switch between Color Sets in the drop-down list in the Drafting Board Toolbar.
Whenever you leave a Color Set in any way (by switching Color Sets or exiting the Expression Dataset Manager), the Expression Dataset Manager gives you the chance to save the Color Set if it has been changed.
| Note: Saving an expression dataset using the Save option from the Expression Datasets menu takes some time depending on the size of your dataset. This is because GenMAPP is modifying the already created .gex file. During the Save process, you will see a progress bar at the bottom of the Expression Dataset Manager. |
The Color Sets menu contains the functions needed to manage the Color Sets associated with a particular Expression Dataset.
Clears all the Color Set and Criteria fields so that you can fill in the information for a new Color Set. If the Color Set currently being displayed was changed, it will automatically be added to the collection of Color Sets.
Adds the current Color Set to the collection of Color Sets in your Expression Dataset. However, the addition does not become permanent unless you save the Expression Dataset of which it is a part. Typically, you use this function when you have just made a new Color Set. One easy way to create a new Color Set is to display a current Color Set, change the name, modify its components, and Add it to the Expression Dataset. The original under its original name also remains.
Removes the Color Set from the Expression Dataset. The deletion only becomes permanent when you save the Expression Dataset of which it was a part.
Copies the Color Set from another Expression Dataset (.gex). The two Expression Datasets must have the same number of columns, with the exact same column headers, and the same data types for this function.
The entry in the Color Set name field is the name for the Color Set that is currently active in the Expression Dataset Manager. When you first import a new Expression Dataset or when you select New from the Color Sets menu, this field will be blank. To begin creating a Color Set, enter a name that is 20 characters or fewer. Double quotes or the dollar sign may not be used in a Color Set name. Each Color Set in an Expression Dataset must have a unique name. You may change the name of a Color Set by typing in a new name over the old one and selecting Save from the Expression Datasets menu.
To change the Color Set active in the Expression Dataset Manager, click on
the down arrow
to open the
drop-down list and click one of the other Color Set names.
The data value that is displayed next to gene rectangles on your MAPPs must come from a specific column in your Expression Dataset. To have a complete Color Set, you must designate that column by clicking on the down arrow next to the Gene Value box and choosing a column from your dataset. If you do not want Gene Values to display on your MAPP, choose [None] from the drop down list. If [None] is selected, no Gene Value displays on the MAPP even though the gene rectangles are colored with expression data. The Gene Value displayed is truncated to six characters, including a decimal point for numerical values (e.g., 37.524 or –98763), or six characters for character data (e.g., kinase). Very large or small numbers are displayed in exponential notation.
The main component of a Color Set is the Criteria. Criteria are instructions to GenMAPP on how to color gene rectangles on a MAPP based on your expression data. In other words, if the data for a particular gene on a MAPP satisfies a criterion statement, the gene rectangle is filled with the color associated with that criterion.
Each Color Set has two criteria that are automatically filled in by the program and are always present. They are No criteria met and Not found. A gene rectangle is filled with the No criteria met color if the gene identifier associated with that gene object occurs in the Expression Dataset but the data do not satisfy any of the other criteria in a Color Set. If the gene identifiers associated with a gene object are not present in an Expression Dataset, the Not found criterion is met and the gene rectangle remains white.
Criteria are created and modified in the Criteria Builder section of the Expression Dataset Manager window. All the Criteria for the Color Set active in the Expression Dataset Manager are displayed in a list at the bottom of the window.
After you have given your Color Set a name and have designated the Gene Value column, the next step in creating a Color Set is to build the Criteria used to color gene rectangles on MAPPs. The Criteria Builder is the center section of the Expression Dataset Manager window. Each criterion must have three main components: a label, a color, and the criterion itself.
When you open an Expression Dataset in the Expression Dataset Manager or when an Expression Dataset is first imported into the Manager, the Criteria Builder is inactive, indicated by gray shading of the fields. Activate the Criteria Builder by clicking the New button.

Each criterion in a Color Set must have a unique label. The label is limited to 40 characters. Double quotes("), the bar symbol(|), or dollar sign($) may not be used in a label, although other non-alphanumeric characters are allowed. The label appears in the Legend area of a MAPP next to the color of that criterion if that option has been checked in the Options window.
Each criterion in a Color Set must have a unique color. The color associated with a criterion is displayed in the Color field of the Criteria Builder and next to the criterion in the list at the bottom of the Expression Dataset Manager window. When you create a new criterion, its color is initially white, an unacceptable color since white is associated with the Not found criterion, indicating that the gene object was not represented in the Expression Dataset. You must change it to something else.
You may alter the color for the selected criterion by left-clicking on the Color box and choosing or creating different hues from the Color window that appears. Choose a color from the palette displayed and click OK. You can create more color choices by left-clicking on the Define Custom Colors button. Create a new color by changing the values in the hue, saturation, and luminosity, or red, green, and blue fields. Alternatively, drag the sliders in the color gradients to a new color. To add the new color to the palette on the left side of the window, click the Add to Custom Colors button. The new color may then be chosen for the criterion. Be sure to pick a light color. The gene label within the gene rectangle will always be black. Black text on a dark background is not very readable.

The No criteria met color defaults to gray but it can be changed to anything you choose, except for white.
The label you enter and the color you choose appear in the Legend area on the Drafting Board.
In the Criterion field of the Criteria Builder, you can create your criteria by stating them with relationships such as "this column greater than this value" or "that column less than or equal to that value". These relationships may be as complicated as you wish to make them. You may combine individual relationships using as many ANDs and ORs as you need.
A typical relationship is
[ColumnName] RelationalOperator Value
with the column name always enclosed in brackets and character values enclosed in single quotes. For example:
[Fold Change] >= 2 [p value] < 0.05 [Quality] = 'high'
The easiest and safest way to create criteria is by choosing items from the Columns and Ops (operators) lists shown in the Criteria Builder. The Columns list contains all of the column headings from your Expression Dataset. To choose a column from the list, click on the column heading. Whatever you choose will appear at the location of the insertion bar (cursor) in the Criterion box. If you have highlighted something in the Criterion box, whatever you choose replaces it. The Criteria Builder surrounds the column names with brackets.
The Ops (operators) list contains the relational operators you may use in your criteria:
= equals
> greater than
< less than
>= greater than or equal to
<= less than or equal to
<> is not equal to
To choose an operator from the list, click on the symbol. Whatever you choose will appear at the location of the insertion bar (cursor) in the Criterion box. If you have highlighted something in the Criterion box, whatever you choose replaces it. The Criteria Builder automatically surrounds the operators with spaces when you choose them from the list.
The Ops list also contains the conjunctions AND and OR, which you may use to make compound criteria. For example:
[Fold Change] > 1.2 AND [p value] <= 0.05
Parentheses control the order of evaluation. Anything in parentheses is evaluated first. Parentheses may be nested ((parentheses) within parentheses). For example:
[Control Average] = 70 AND ([Exp1 Average] > 70 OR [Exp2 Average] > 70)
Column names may be used anywhere a value can be used, for example:
[Control Average] < [Experiment Average]
You can even use arithmetic operators to make calculations. GenMAPP, like most computer applications, uses the plus (+) for addition, the asterisk (*) for multiplication, the minus sign (-) for subtraction, and the slash (/) for division. Multiplication and division operations are performed before addition and subtraction. For example:
[Raw Intensity] * 2.5 >= 100
The previous examples demonstrate criteria written for numeric data. GenMAPP also recognizes character data and criteria may be written for those kind of data. Character values are always enclosed in single quotes (apostrophes). For example:
[Name] = 'Ras' [Fold Change] > 2 AND [Quality] = 'high'
GenMAPP applies criteria with character values in a case-insensitive manner. For example:
[Name] = 'Ras' [Name] = 'ras' [Name] = 'RAS'
are equivalent.
Your Expression Dataset may have missing values. When the raw data file is imported, the missing values are interpreted as NULL (no data exists) by the Manager. Both numeric and character data may contain NULL values. When writing criteria, NULL values are tested for slightly differently; you do not use = or <>, the operators are ISNULL and ISNOTNULL. For example:
[Fold Change] IS NULL [Control Average] IS NOT NULL AND [Fold Change] >= 1.5 [Quality] IS NOT NULL
For readability, it is traditional to surround operators with spaces but, with the exception of AND, OR, ISNULL, and ISNOTNULL, it is not necessary.
Criteria cannot be built using the Remarks column; that header will not even appear in the Columns list.
| Technical Note: GenMAPP uses the Microsoft Jet database engine to check and implement criteria. Microsoft Access uses the same engine. Anything you can do using SQL in Microsoft Access, you can use here. |
The Criteria Builder uses three buttons (New, Save, Add) to manage individual criteria. Clicking the New button activates the Criteria Builder after a new Expression Dataset has been imported. If you have already been working with a criterion, the New button clears all the fields in the in the Criteria Builder, prompting you if changes have not been saved.
After completing a new criterion, you can add the criterion entry (label, criterion, and color) to the Criteria List by clicking the Add button.
After modifying an existing criterion, you can save the criterion entry to the Criteria List by clicking the Save button. An easy way to make a new criterion is to modify an existing criterion, give it a different label, and Add it to the list leaving the original intact.
All of the Criteria (label, color, criterion) for a particular Color Set are listed at the bottom of the Expression Dataset Manager window. A single Color Set may contain up to 30 Criteria. If there are more Criteria than can be viewed at the bottom of the window, scroll bars are activated so that you can scroll through the entire list.
The Expression Dataset Manager always supplies the last two criteria: No criteria met and Not found. A gene rectangle is filled with the No criteria met color if the gene identifier associated with that gene object occurs in the Expression Dataset but the data do not satisfy any of the other criteria in a Color Set. If the gene identifiers associated with a gene object are not present in an Expression Dataset, the Not found criterion is met and the gene rectangle remains white. For No Criteria met, the color can be changed; Not found must remain white.
The buttons to the right of the list represent actions you may perform on individual criteria. To modify a criterion label, color, or the criterion itself, first select the criterion in the list by left-clicking on it, and then click the Edit button. This puts the selected criterion into the Criteria Builder to be modified.
To remove a criterion from the list, left-click on the criterion to select it, and then click on the Delete button.
The order of Criteria in the list has significance to GenMAPP since the top criteria are considered first (for details, click here). To change the order of the criteria in the list, left-click on the criterion to select it and then click the Move Up or Move Down buttons. No criteria met and Not found are always the last two positions in the list.

GenMAPP colors a Gene Rectangle and displays the Gene Value on a MAPP according to the criterion met in a Color Set for that gene. An unfilled (white) gene rectangle indicates that the gene identifiers associated with a gene object are not present in an Expression Dataset. A gene rectangle filled with the No criteria met color (gray unless you change it) indicates that a gene identifier associated with that gene object occurs in the Expression Dataset, but the data do not satisfy any of the criteria in a Color Set.
The order of Criteria in the list has significance to GenMAPP. When applying an Expression Dataset and Color Set to a MAPP, GenMAPP examines the expression data for a particular gene object and applies the color for the first criterion in the list that is true. Therefore, it is imperative that when criteria overlap the user put the most important or least inclusive criteria in the list first. For example, a Color Set with the following Criteria List:
1. [Fold Change] > 5 2. [Fold Change] > 10 3. No criteria met 4. Not found
would never color a gene rectangle with the second criterion’s color because a Fold Change greater than 5 would also be greater than 10. Reorienting the list the other way around is one way to solve this problem.
1. [Fold Change] > 10 2. [Fold Change] > 5 3. No criteria met 4. Not found
Another solution is to make the individual criteria mutually exclusive. For example:
1. [Fold Change] > 5 AND [Fold Change] <= 10 2. [Fold Change] > 10 3. No criteria met 4. Not found
To move a criterion up or down in the criteria list, click the Move Up or Move Down buttons next to the list.
Each MAPP contains gene objects that represent biological genes or gene products. Genes are identified by either of the GenMAPP accepted gene identifier types. Some ID types are preferred over others; please click here for details. The gene identifier links a gene object to data in an Expression Dataset for coloring the gene.
Your Expression Dataset may contain multiple rows of data that correspond to the same gene object. This may occur under two conditions: First, the same gene identifier may occur in more than one line of data, as is the case when multiple spots on an array contain DNA from the same gene. Second, the gene identifier may be related to another one as determined by the relationships established in your Gene Database. Each of these rows may have different data values that satisfy different Color Set criteria. When applying a color to a gene object, GenMAPP first assembles all the relevant rows of data in the Expression Dataset, and then determines which criterion is satisfied by each row. GenMAPP calculates the mode criterion, that is, the criterion satisfied by the greatest number of records, and uses that criterion to color the gene. The gene value displayed next to the gene rectangle is from the first row of data satisfying that criterion.
Gene objects that have more than one line of data in an Expression Dataset associated with them are represented on a MAPP by having a dashed line around the gene rectangle. If the gene rectangle with a dashed line is filled with a solid color, each of the lines of data for that gene satisfied the same criterion.
If the gene rectangle with a dashed line is filled with two different colors (a central color and a rim color) that means that two different lines of data associated with that gene object meet different criteria in the Color Set. The rim color is the second most frequently met criterion. Left-click on a gene rectangle on a MAPP to open the Backpage and view all the data along with the corresponding gene identifications.
The order of the data in your Expression Dataset was assigned when the raw data file was imported and has significance to the Expression Dataset Manager. It is used to break ties. If two criteria are tied (e.g., each is met by four occurrences in the data) the criterion that was met by the earlier-occurring row of data is used to color the gene. In addition, the value displayed next to the gene is that from the row of data occurring first for that coloring criterion.
The dashed line around the gene box assigned Entrez Gene ID 11464 indicates that there are more than one gene identifier linked to it from separate rows of data in the raw data file. The first row of data satisfies the criterion [FoldChange]>=2, coloring the center of the gene box pink. The second row also fulfills the same criteria. However, the third row of data meets the second criteria, coloring the rim of the gene box blue. The full expression data is shown in a table on the Backpage.

It is advisable to place the expression data in your raw data file so that the multiple data lines representing the same gene object are ordered according to some measure of interest. For example, you might put the data for the probes in which you have the most confidence first. You may change the display priority of duplicate data by changing the order of the rows in the raw data file and importing the file again.