Gemma’s API is fairly large and complex. However, a few key entry points will get you a long way.
In command line tools (extending AbstractSpringAwareCLI, you are required to do this by name of the service. For example:
Groovy scripting requires similar code.
- The name of the service matches the type name but with the initial capital.
- AbstractSpringAwareCLI has predefined references to a few services (currently persisterHelper, auditTrailService).
- getBean() is a method exposed by AbstractSpringAwareCLI
- There are specialized CLI subclasses for handling common situations: ExpressionExperimentManipulatingCLI and ArrayDesignManipulatingCLI.
The Web Services docs cover many of the most common requests so serves as another source of information. Most of the web services resolve to a single API call.
- Getting a list of expression experiments: expressionExperimentService.loadAll()
- Getting probes for a gene: geneService.getCompositeSequences (also compositeSequenceService.findByGene)
- Getting all genes for a taxon: geneService.getGenesByTaxon
- Getting genes for a probe: compositeSequenceService.findByGene
- Getting data matrix for an expression experiment: processedExpressionDataVectorService.getProcessedDataArrays
- Getting data vectors for a gene: processedExpressionDataVectorService.getProcessedDataArrays
- Getting coexpression for one or more genes: geneCoexpressionService.coexpressionSearch
A method like ‘expressionExperimentService.load’ sounds straightforward – it loads one or more requested expression data sets. However, if you really only want some basic information about the experiment, like the name or even just the ID (primary key) in Gemma, you don’t want to load the entire data set. The full data set includes the raw expression data and a great deal of associated information.
To avoid costly fetching of data when it isn’t needed, many of Gemma’s key entity services provide ‘load’ and ‘find’ methods that return only proxies or partially-populated objects, and then provide ‘thaw’ methods. These methods take a ‘unthawed’ entity (proxy) and retrieve “the rest of the data”. What is meant by “the rest of the data” depends on the entity – for very complex entities like expression data sets, the thaw method doesn’t retrieve all the associated information; additional ‘thaw’s are needed.
In some cases, a basic ‘load’ call does thaw the entity. This is done when 1) Thawing isn’t costly and 2) The entity isn’t useful without being ‘thawed’.
An example might help make this clearer.
Very often in Gemma we need the ID’s of some expression experiments – but no other data. Therefore, the ‘load’ method is written to retrieve the bare minimum information about the experiment.
If we want to access information about the samples an experiment used, we must ‘thaw’ it. This is a common use case.
Somewhat more rarely, we need to retrieve the full data set. Instead of providing some kind of ‘thawFully’ method (which would have very unclear semantics), expression data is retrieved as a set of DataVectors, which are then thawed.
This structure allows us to access commonly-used data with minimal cost, and only perform expensive operations when we need to. But there is a downside. If you try to access an association of an entity that isn’t thawed, you will get a nasty Hibernate exception. Luckily, if this happens during testing the fix is usually obvious – call ‘thaw’ first.
In general in Gemma ‘load’ methods retrieve only Proxies, which are really only useful for obtaining a primary key (the ID). Of course having the ID is often all that is needed. For example, in order to construct a query involving some genes, you don’t need the list of the gene’s transcripts. You only need the primary key.
This keeps the Hibernate session open during data access by a client. By default it is enabled in Gemma for JSPs. Thus many accesses that wouldn’t work in ‘normal’ code might work in a jsp or even a controller. This is something to watch out for.
The implementation of OpenSessionInView is a Filter on the application server.
By design or inadvertently, Gemma may have multiple routes to the same data. The reasons for this (when by design) are for performance and/or convenience.
The desire to avoid fetching unnecessary data results in methods that take primary keys (IDs) as arguments rather than entities.