GeWorkbench Development Plan 2012

From Informatics

Jump to: navigation, search

Contents

Development Plan

This document describes the development plan of geWorkbench project in the second half of 2012. The focus is to develop geWorkbench web version, which is expected to be released at the end of this year.

Architecture Enhancement

The key thing in the development of geWorkbench web is to coordinate with code base for the desktop version. This not only saves the duplicate effort, but also force a more disciplined development, meaning to maintain all the logically reusable modules independent of the specific context it is used in. Especially, the current code (many of the relevant modules listed below) is unnecessarily tangled with GUI code, which is for Swing platform. We need refactor or re-implement such code in a way that their behavior and code are self-contained, not depending on either Swing package, or the Vaadin or other web platform.

Practically, geWorkbench web should be maintained to be always compatible with desktop code base. We already have a building script to create the jar file from desktop code to be put in the vaadin project. When we make any change in geWorkbench, we need to keep in mind the possibility of breaking vaadin project; on the other hand, the vaadin project needs to be kept up-to-date to the current desktop code.

  • web version needs to support the same workspace file as those used by the desktop version. Currently the web version does not support multiple workspaces, so this includes two parts: (1) support multiple workspace so the user can switch between them; (2) it needs to able to read the same workspace file that is saved from the desktop versoin, and vice versa. See 3084
  • data file parsing. Coordinated development between desktop version and vaadin version.
  • annotation file. Coordinated development between desktop version and vaadin version. This is not only about to separate Swing code. We need to design a proper way to manage the annotation information after it is parsed and loaded. The current mechanism used in desktop version does not fit the need of the web version, and it is not an ideal design by itself either.
  • gene ontology. Coordinated development between desktop version and vaadin version. The current approach does not fit web version's need.
  • proper working flow for long job. Vaadin version development. We need to develop a simple and flexible mechanism for jobs that does not return immediately. This will be the default workflow for most of analytical component. Basically, 1. when you submit a analytical job, GUI should be responsive right away; 2. there should be some GUI hint/representation of pending job, say a pending result node, or progress bar (I prefer the former); 3. When the job finished, the pending node becoming a complete result node, similar to how caGrid job is handled in desktop version - the specific design here is an important decision. See 3127
  • transition of analytic components to web services. So far we have analytical component (Aracne, Hierachical clustering) running as part of vaadin application. That is not feasible for a more realistic loading (both in memory and CPU). We need to convert computation-intensive component to web services. This also fit our longer strategy to convert more analytical components to web services. (Ideally we will change the destop version's analysis invocation mechanism to web service request, but that is a separate effort outside the scope of this plan). For new web services, we will use Axis2 for its wide acceptance and light overhead (an example done for t-test). For existing caGrid services, the current implementation has many problems, especially the extensive memory demand. The memory issues come from (1) the dependency on the current bison types, which often are unnecessary large and have complex dependency and hierarchy; (2) lousy way of using the bison data types (there is a major enhancement for arcne recently, but not done for other analysis). They should be reviewed and cleaned up case by case, but we will depend more on re-implemented simpler services for the goal of this plan.
  • capability of visualization component to (dynamically) show partial data sets. Visualization component needs to be able to handle partial data set, and ideally to interactive retrieve and visualization more data. It is a indispensable requirement considering some the result dataset could so large that (1) transmission to the browser is costly; (2) it may not be useful to show everything all together; (3) the browser visualization constraint would make it impossible to handle the result that is otherwise allowed by the server side computational limit. Thinking a very large table is enough to highlight the issue, but similar concern may exist for other visualization component, e.g. networks, clusters, plots, etc.
  • improve CNKB component, particularly (1) review the awkward data structure before and after "creating network" (why do we need to 'create network' explicitly?) (2) refactoring to smaller size of classes
  • ongoing review and improvement of geWorkbench web architecture, especially to make sure it is extendable both for new analytic and visualization components, and large data set.

New Analytic Components on Web Version

A large part of the discussion/description in the previous section also apply here. Each of these analytic components has its corresponding visualization components. The goal (especially when we don't have a formal specification document) is to duplicate as much as we can the feature and functionality of the Swing version. We need to consult Aris and Ken about what are more crucial than the other. Same approach applies to the existing components in Vaadin version.

  • ANOVA 3122 3123
  • MarkUs 3130
  • MARINA (caGrid service version of master regulator analysis)

Over sixty mantis issues have been created under "geWorkbench Web" project to document the items to be worked out.

Besides reviewing and refactoring at the architecture level, there are many issues to be reviewed and corrected at the implementation detail level too. The system still lacks adequate tests in general.

High Priority Tasks besides Web Version

  • ARACNe
  • LINCS project (\\floratos20\Fileshare\For Michael\LINCS supplement)
Personal tools