Next week, from May 7-11, we are welcoming Prof Alison Murray, from the Desert Research Institute for our Microbial Antarctic Resource System (MARS) workshop, which main objective is to delineate an approach to integrate Antarctic microbial diversity data (DNA sequence data) in existing biodiversity information systems, namely the Antarctic Biodiversity Information Facility (ANTABIF). To reach this objective, a multidisciplinary team including microbiologists, IT developer, biodiversity informaticians and information networks specialists will be building on the recent GBIF genomic workshop, held in Oxford.
is a primary biodiversity information resource for scientists studying Antarctic continental and marine systems. The data sets and architecture for data storage available in this resource currently are not optimized to handle DNA-based sequence data sets; rather the data system is organized to handle geo-referenced hierarchically organized taxonomic data in which the system currently hosts data sets from large birds, mammals down to some phytoplankton and bacterial surveys. As ANTABIF
is growing to include Antarctic terrestrial ecosystems, there is an urgent need to extend the capacity of the resource to include DNA sequence data sets given that the majority of terrestrial biodiversity is microbial in nature, and that the primary means of assessing microbial diversity is through DNA sequencing surveys. In addition to the terrestrial ecosystems, studies in marine systems (sea ice, water column, sediments) stand to benefit tremendously from this centralized resource.
The Benefits of providing an Antarctic Microbial Diversity data resource:
Facilitate circumpolar oceanic, limnological and continental data analyses regarding microbial diversity; this will augment current interest in Conservation and impact of humans and climate change in Antarctica.
There is no current resource available that lists all of the Antarctic data sets – this information is buried in the literature and in GenBank or other data repositories. The community of Antarctic scientists pursuing microbial diversity related sciences is a relatively small network that stands to be able to come together to build a resource that is manageable, and highly informative for future studies in biogeography, temporal variation studies etc.
Though many people are registering their projects in the GCMD
it does not provide the registry architecture for DNA sequence data sets, being a metadata repository.
There are a number of Challenges to consider in order to accommodate present, past, and future DNA sequence based data sets:
- Different levels of data processing on different data sets
- Different regions of the same gene (16S rRNA or 18S rRNA) that are not comparable or alignable vs. nearly full length gene sequences generated in the past.
- Different levels of sequencing effort
- Different sequencing technologies (Sanger sequencing, pyrosequencing, illumina)
- Providing raw/semi processed data AND the processed data that publications are built on.
- Can consider how ICoMM dealt with this for short-read pyrosequencing data; there are ~ 20 Southern Ocean data sets there in FASTA format and processed format
The workshop will be hosted at the Belgian Science Policy Office (BELSPO
), in Brussels.