A FLEXIBLE DATA STORE FOR MANAGING BIOINFORMATICS DATA
With the abundance of scientific data in recent years, how to manage them effectively becomes a challenging problem. Data produced by scientific activities often evolve quickly and are too dynamic to have broadly agreed metadata structures. Bioinformatics data belong to this category. The advances of high-throughput genome sequencing and gene expression profiling technologies produce huge amounts of data. They are open for new interpretations, and the interpretations may change when new discoveries are made. The research community use data annotation heavily to record these interpretations. In a certain time frame, there could be a burst of annotations on certain data. Traditional database systems do not provide enough flexibility in managing such kinds of data. There are efforts to build new data management systems. A promising one is to allow data to be stored freely in any format and to index these data using small pieces of structured information so that data can be retrieved through these indexes. This technique has been attempted in social networking sites like flickr and del.icio.us, which allows users to annotate pictures and bookmarks flexibly. The structures of these annotations are as simple as tags and key-value pairs. The simplicity has advantage in usability but raises challenges for the accuracy of data search. ...