Van Leeuwenhoek Lecture on BioScience - Carole Goble (University of Manchester): 'Reproductability, research objects and FAIR data realities'



16:00 hrs


Gorlaeus laboratories, 04-28 LUMY (was Cell Observatory)


Over the past 5 years we have seen a change in expectation for the management of all the outcomes of research - that is the "assets" of data, models, codes, SOPs, workflows. The "FAIR" (Findable, Accessible, Interoperable, Reusable) GUiding Principles for scientific data management and stewardship have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post-publications. It all sounds very laudable and straightforward. But .......

Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.

In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorhithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not "finished": the codes fork, data is updated, algorhithms are revised, workflows break, service updates are released. is an effort to systematically support more portable and reproducible research exchange.

In this talk I will explore these issues in data-driven computational life sciences through the examples and stories from initiatives I am involved in (and Leiden is involved in too) including:

* FAIRDOM ( which has built a Commons ( for Systems and Synthetic Biology projects, with an emphasis on standards smuggled in by stealth and efforts to affecting sharing practices using behavioural interventions.

* ELIXIR (, the EU Research Data Infrastructure, and its efforts to exchange workflows.

* Bioschemas (, an ELIXIR_NIH-Google effort to support the finding of bio-assets through exploiting web-strength search infrastructure.