Mining the Lab for (meta)data

How automation can help researchers document experiments

During the first half of 2020 alone, the phrase “shake gently” was used over 3000 times in peer-reviewed research articles.

Without reliable metadata, researchers that want to replicate research studies cannot do so. This challenge is a key driver of the reproducibility crisis, casting doubt on large parts of published literature. This concern is particularly troubling in basic biomedical research, which forms the foundation of drug discovery efforts. Current estimates for the percentage of irreproducible preclinical research stand at 80 to 90%, costing as much as $28.2 billion per year in the US alone [3]. Among the key reasons contributing to this problem are well-known issues concerning the validity of biological reagents, protocol adherence, rigour and impartiality of data analysis, and reliability of data storage and retrieval. All of these can be improved by employing standardised scientific processes and complete data capture including all metadata.

Current estimates for the percentage of irreproducible preclinical research stand at 80 to 90%, costing as much as $28.2 billion per year in the US alone.

Recently, a study evaluated the quality of metadata for a collection of more than 11m biological samples and revealed serious anomalies [4]. The majority of metadata fields were neither standardised nor controlled. Non-compliant values were found even in simple binary fields. This issue has become even more concerning during the ongoing SARS-CoV-2 pandemic, in which the haste to publish new findings has led to metadata not being reported in sufficient depth and detail at an even larger scale [5]. These omissions are a severe obstacle when trying to reproduce and thus validate new hypotheses; in addition, they make it impossible for datasets to be successfully analysed using computational tools or machine learning techniques. These techniques could be used to discover patterns that evade scientists, but they are heavily reliant on the quality and annotation of the input data.

Figure 1.


  1. May, M. The Many Ways to Shake Samples. Lab Manager (2020).
  2. Best practice data life cycle approaches for the life sciences.
  3. Freedman, L. P., Cockburn, I. M. & Simcoe, T. S. The Economics of Reproducibility in Preclinical Research. PLOS Biol. 13, e1002165 (2015).
  4. Raphael, M. P., Sheehan, P. E. & Vora, G. J. A controlled trial for reproducibility. Nature 579, 190–192 (2020).
  5. Gonçalves, R. S. & Musen, M. A. The variable quality of metadata about biological samples used in biomedical experiments. Sci. Data 6, 190021 (2019).
  6. Kiermer, V. Reporting Checklist. (2015).
  7. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

About Arctoris Ltd

Arctoris Ltd is an Oxford-based research company that is revolutionising drug discovery for virtual and traditional biotechnology companies, pharmaceutical corporations and academia. Arctoris has established the world’s first fully automated drug discovery platform, offering pre-optimised and fully validated processes for its partners and customers globally. Accessible remotely, the platform provides on-demand access to a wide range of biochemical, cell biology and molecular biology assays conducted by robotics, enabling rapid, informed decision-making in basic biology, target validation, toxicology and phenotypic screening. These assay capabilities are accessed using a powerful online portal that streamlines experiment planning, ordering, tracking and data analysis. Thanks to the Arctoris platform, customers can rapidly, accurately and cost-effectively perform their research and advance their drug discovery programmes.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Arctoris Ltd

We are the world’s first fully automated R&D platform generating drug discovery data on demand