FAIR metadata management is one way to ensure that data can be understood and utilised. Metadata is the source material that allows us to know how and when data was collected, what information was collected, who collected it, and how to find and use it.
Without metadata, you have no view of what is going on with your unstructured data, context as to what the data is or what it is for, who created the file and when, what does the file relate to, security clearance and much more.
Metadata allows organisations to capture contextual information about a file or piece of data allowing information governance, compliance, and descriptive information, so you know what the file contains, how long it should be kept for and how it relates to other files in your possession. This allows you to govern, search and put context to your data, adding a huge amount of value to your data.
Types of Metadata
When establishing a metadata standard, one of the most important considerations is determining which fields are essential for the project’s success and ensuring reusability. Setting an appropriate bar for mandatory information from submitters is crucial for ensuring data richness while not making requirements so time-consuming or cumbersome that they deter scientists from engaging with the project and providing data.
- Information about management and organisation. Examples include data owners,, collectors, collaborators, funders and project scope. This is usually assigned to the data automatically and set up before it is created.
- Data about a dataset or resource that allow people to discover and identify it. This includes basic fields such as title, keywords, IDs and abstracts.
- Structural metadata describes measurements and methods. Examples include sample data, categories, variables, measurement units and collection procedures. Descriptive and structural metadata are often added continuously through a project.
Data Quality Metadata
- Data quality is a critical issue in organisations because of the huge data volumes available in their systems. Therefore, the literature suggests that communicating the data quality level of a specific data set to decision-makers in the form of data quality metadata is essential. However, the additional work may be too time-consuming or complicated, adversely impacting decision outcomes.
FAIR Metadata Management
In 2019, the European Commission did a cost-benefit analysis of fair research data. They found that the estimated cost of not having the research data far outweighed the cost of implementation.
Without FAIR data, we’re losing some opportunities to improve and innovate. In the diagram below, the yellow boxes are all linked to metadata. Metadata has been identified as crucial for all four critical FAIR data principles.
One of the most important aspects of metadata is that it makes finding or discovering data and relevant connections easier. It does not matter how informative a dataset is if it’s difficult for humans and machines to find it.
Metadata should be future-proofed as much as possible for long-term storage and accessibility. Companies should have contingency plans to ensure that metadata remains accessible even if the source is moved or made unavailable.
Repository meta-data schema maps can be used to ensure that data is interoperable for use by humans or machine learning software. Ideally, metadata can be used to help automatically or semi-automatically link to relevant data.
Metadata can help ensure that data has appropriate tags and descriptions, which, in turn, improves reusability and future value.
Benefits & Applications of FAIR Metadata Management
The BioSamples database at EMBL-EBI is the central institutional repository for sample metadata storage and connection to EMBL-EBI archives and other resources.
The EMBL-EBI BioSamples database provides a unique reference for samples metadata across EMBL-EBI archives, the International Nucleotide Sequence Database Collaboration (INSDC) and beyond. It enables the centralised representation of samples and their description, relationships between samples and assays across repositories and linkage back to samples’ donors such as patients. The BioSamples database now has valuable features and processes to improve data quality in BioSamples, in particular enriching metadata content and following FAIR principles.
Final Thoughts & Conclusions
Metadata is often overlooked or misunderstood. Its importance for FAIR Data principles and implementation are slowly changing, and new methods are in development to automate and simplify metadata standardisation. For more on FAIR Data, consider joining us this September in London for Pharma Data UK: In-Person.