OBIS GitHub libraries for data cleaning

Biodiversity data is messy. How to transform messy data into reusable data?

messy
Data cleaning is essential to facilitate data sharing

We attended OBIS node training course November 2017 to learn about methods/tools available to clean biodiversity data in Darwin Core standard.

The teaching materials are available online to everyone. Ocean Biodiversity Information System (OBIS) and World Register of Marine Species (WoRMS) have put in a lot of effort to facilitate biodiversity data cleaning.

One of the very user friendly R packages that we learned is obistools. It allows user to easily achieve following objectives by just calling 1 command:

  • Perform taxon match using WoRMS API
    names <- c("Abra alva", "Buccinum fusiforme", "Buccinum fusiforme", "Buccinum fusiforme")
    match_taxa(names)
  • Plot points on map
    plot_map_leaflet(abra)

    abra_2

  • Check required fields before publishing data through IPT.
    • Remark: IPT was developed by GBIF. OBIS also uses IPT to harvest data to their database. The required fields for OBIS and GBIF for darwin core standard is slightly different.
  • There are a lot more to offer from obistools. A more comprehensive tutorial is available at their GitHub page.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s