GemmaHeader3

Analysis Details

Sequence analysis

To enable comparisons across platforms, we perform sequence analysis and gene assignment based on current genome annotations as described in Barnes, et al., 2005.

Differential Expression Analysis Details

Differential expression evidence is computed using linear modelling approaches combined with multiple-test correction, essentially as described in:

  1. Pavlidis, P., Using ANOVA for gene selection from microarray studies of the nervous system. Methods, 2003. 31(4): p. 282-9.
  2. Pavlidis, P. and W.S. Noble, Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol, 2001. 2(10): p. RESEARCH0042.

Coexpression Analysis Details

Coexpression evidence is computed using methods described in:
Lee, H.K., et al., Coexpression analysis of human genes across many microarray data sets. Genome Research, 2004. 14: p. 1085-1094.

The following figure shows a schematic of the process.

Node degree (“coexpression specificity”) computation

In several contexts, Gemma exposes information about how many coexpression partners a gene has. For example, in the network visualization nodes and edges are “faded” to make genes with fewer coexpression partners more obvious. The concept of specificity is motivated by the importance of down-weighting genes by node degree in gene function prediction methods, and related issues discussed in Gillis and Pavlidis (2012).

The values for node degree or “specificity” are based on a “global” analysis, and is intended to inform users about how meaningful any given coexpression relationship will be for a gene. The method Gemma uses to estimate node degrees is based on counting how many coexpression pairs (“links”) the gene is involved in, across all experiments. This value is then re-expressed as a relative rank for the genes of that organism. Thus the gene with the largest number of coexpression partners has a score of 1.0, and that with the lowest 0.0 (there can be ties). This method has the benefit of simplicity, and it is intuitive because it is tied directly to the data users will be able to access in the system. However, it users should be aware that the measures is potentially sensitive to sources of variance that are not biological in nature, most importantly the number of data sets in which the gene is tested. Genes which are tested more often will tend to have more links. A more sophisticated measure that takes that source of variance into account is highly correlated with the simpler measure (Spearman rho=~0.7), but not as readily understandable. In addition, the fact that a gene has a high node degree whether it is due to frequent testing or not is still of interpretational significance.

References

Pavlidis, P., Using ANOVA for gene selection from microarray studies of the nervous system. Methods, 2003. 31(4): p. 282-9.

Pavlidis, P. and W.S. Noble, Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol, 2001. 2(10): p. RESEARCH0042.

Barnes, M., et al. (2005) Experimental comparison and crossvalidation of the Affymetrix and Illumina gene expression analysis platforms, Nucleic acids research, 33, 5914-5923.

Lee, H.K., et al., Coexpression analysis of human genes across many microarray data sets. Genome Research, 2004. 14: p. 1085-1094.

Gillis J., Pavlidis P (2011) “The impact of multifunctional genes on “guilt by association” analysis.” PLoS ONE. 6(2):e17258. pubmed

Back to top