OBIS GitHub libraries for data cleaning

Biodiversity data is messy. How to transform messy data into reusable data?

Data cleaning is essential to facilitate data sharing

We attended OBIS node training course November 2017 to learn about methods/tools available to clean biodiversity data in Darwin Core standard.

The teaching materials are available online to everyone. Ocean Biodiversity Information System (OBIS) and World Register of Marine Species (WoRMS) have put in a lot of effort to facilitate biodiversity data cleaning.

One of the very user friendly R packages that we learned is obistools. It allows user to easily achieve following objectives by just calling 1 command:

  • Perform taxon match using WoRMS API
    names <- c("Abra alva", "Buccinum fusiforme", "Buccinum fusiforme", "Buccinum fusiforme")
  • Plot points on map


  • Check required fields before publishing data through IPT.
    • Remark: IPT was developed by GBIF. OBIS also uses IPT to harvest data to their database. The required fields for OBIS and GBIF for darwin core standard is slightly different.
  • There are a lot more to offer from obistools. A more comprehensive tutorial is available at their GitHub page.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s