open science: doop
Posted by razor | Filed under Bioinformatika, English
The Database of Orthologous Promoters needs some debugging, cleanup, new features and updated data. I decided that I will publish all notes, problems, developments, data, graphs and results here, as an Open Science experiment. First, the main lines of future development are the following, in no particular order.
- The web search interface needs some debugging, as some of the features which used to work in the oooold version [or before the server crash] are not working now. Particularly the GeneOntology analysis in one very special case, and the BLAT server, used for the promoter sequence search.
- Maybe we should integrate the DoOP and DoOPSearch services. Basically it’s just a pain in the ass to use two separate domains. A simple redirect, and that’s all, problem solved.
- The underlying MySQL database needs some refactoring. I suspect it’s not the optimal version what we have now, and it won’t be easy to extend it and add new features, like known transcription factor binding sites, alternative promoters, or multiple reference species for the plant section.
- The Bio::DOOP API also needs some debugging and cleanup, maybe some more features. EMBOSS version checking, more flexible queries, some integration with BioPerl, and maybe the TFBS module [not sure about this]. Adding documentation.
- Adding much, much more documentation, describing the database, motifs, promoter analysis methods, etc.
- Adding a simple query-hit alignment view for the motifs, found during a search.
- Making it work as a web-service, registering it in Biocatalogue, making it usable for Taverna/myExperiment, and also for Galaxy. Possibly a DAS server. Now I’m a little unsure about these, need to read some stuff, like the Semantic Web by O’Reilly and a bunch of tutorials.
- Developing a more complex plant section, with more reference species [rice, grape, poplar, etc] as plants are not nearly as nice as chordates, concerning their genome evolution, polyploidy, orthologous groups, etc.
- Integrating data from other dabases, particularly JASPAR, PLACE, PAZAR, PlantCARE, ORegAnno, ABS, AGRIS. That’s a lot! Some clustering and cleaning of these data is also needed.
- And last, but not least, the database needs some more up-to-date data both in the plant and chordate section.
Nice list, I think I’ll go with the MySQL redesign first, and try to create some normalized tables and stuff.
Update: I also started to upload the various DoOP related codes and scripts to github. See here.
Tags: database, doop, open science, promóter, redesign