A catalogue for the crown jewels
Christine Staiger Research Engineer
There are no analyses, results or publications without data. At a time when the quantity of data and its cross-connections are growing as a result of developments in science and technology, data management is crucial. At WUR, we have therefore invested in iRODS, a cutting edge data management system.
Christine Staiger has firsthand experience of how complex data management can be. The research engineer, who advises academics on data management in large research programmes at the WUR IT department, was an academic at the Dutch Cancer Institute researching the probability of recovery in cancer patients. “I carried out analysis using machine learning to distinguish between cancer patients. The result was a nice curve. My PhD supervisor asked me to change one parameter. I knew that if I changed the parameter, I would get a new result. Yet I presented my supervisor with the original analysis. I had lost my way in my research data.”
Photos: Anne Reinke
For the past two years, Staiger has been working on the implementation of iRODS, together with her colleague Floris Jan Zwaan and his team. Thanks to iRODS (Integrated Rule-Oriented Data System) there is no danger of researchers getting lost in their data anymore. WUR - like other universities in the Netherlands - acquires iRODS through SURF (a partnership of universities). However, WUR has the knowledge to develop iRODS further and to support and advise researchers. This means close contact with experts, so researchers can receive proper guidance on how to deal with research data and metadata on the platform.
Meaningful post
What exactly does iRODS mean? Staiger compares it to the post. The data is contained in the letters as text, the metadata is made up by the number and thickness of the letters, as well as who sends which letters to whom and the networks this creates — nationally and internationally. The text in these letters is only truly meaningful once you know who the letter is intended for, for example. iRODS automatically links data with metadata.
‘Researchers need an infrastructure that can adapt to the research’
Once the data has been entered properly, the researcher no longer needs to think about what data is there, and they retain an overview and control of the research data. This is an enticing prospect for an academic who wants to be engaged in research, experiments and knowledge acquisition, and does not want to be hampered by IT. “Our goal is to make data management easy for academics,” confirms Staiger. In order for iRODS to work perfectly, it is important to categorise the data properly as soon as it is generated. Staiger advises researchers about this. “I sit down with researchers and ask them about the research data, then write everything down.” She draws parallels with digitally archiving holiday photos: “What do you want to do with these pictures? Are they all equally important? Do you want to divide the photos according to people displayed in them? Do you want to sort them by date, or by the places you visited? Do you want to publish all of them, or just a select few?”
Crown jewels become worthless
According to Staiger, the problem that researchers face is that, over time, they can no longer understand their data. Data are the crown jewels of research, but they are worthless if you do not understand them. “If you have not annotated your data stating, for example, which statistical test you used that led to a particular outcome, that data will be worthless. iRODS offers academics a platform in which all data and metadata can be stored in a FAIR way. These letters stand for Findable, Accessible, Interoperable and Reusable.
iRODS hosts a storage environment totalling 10 petabytes (10 million gigabytes)
FB-IT employs 9 people who deal with servers and storage 24/7
WUR has 250 physical servers and hosts almost 1,000 virtual servers
WUR has a high performance computer (called Anunna) for scientific data calculations
Anunna’s computing capacity is comparable to 500 regular computers
WUR keeps backups physically separate from the data centres. The backups are on 18-terabyte tapes (18,000 gigabytes)
Staiger asks researchers how much data they will produce during their research phase and where and how they will store the data. “As research progresses, the amount of knowledge grows and this ultimately has an impact on the data that needs to be collected and interpreted. Researchers therefore need a data management infrastructure that can adapt to developments that occur during the research. I am currently creating that infrastructure.”
Are you looking for the best IT solution for your research data? Use the Data Storage Finder to choose the best storage location. Would you rather talk to someone about storage and management of your research data? Send an e-mail to data@wur.nl.