LibGuides: Research Data Management (out of date). Please visit rdm.vu.nl: Selecting an Archive

This LibGuide is being phased out and the information in it is no longer up to date. The new RDM Handbook is now available at https://rdm.vu.nl/

Data archiving: mid- or long-term?

In the Data Management Plan the researcher describes if the data will be stored for the mid or the long term.

Mid-term archive: according to the VU RDM Policy, all publication-related data should be archived for at least ten years for verification and replication of research. For this purpose, Vrije Universiteit Amsterdam offers researchers two options to archive their data in one of the organisational repositories (DataverseNL and Yoda). Other archival options may be used depending on the discipline as described in faculty data management policy documents.

Long-term archive: data relevant for future research should be archived for the long term. A dataset is relevant for future research when at least one of the following general criteria applies:

1. The data have a scientific or historical value
2. The data are unique
3. Others may want to reuse the data
4. The data cannot be reproduced

Researchers should bear in mind that repositories can charge for archiving data. These costs can vary according to the data volume and the archive used. It is important that you consider in advance how you will budget for these costs. Whatever archiving option is used, proper descriptions of the dataset(s) and adding metadata are important.

Deposit your data @VU Amsterdam

VU Amsterdam requests that researchers archive the data used in a publication in a repository for at least ten years after the release of the publication (see also VU Policies & Regulations). There are a lot of digital archives and many more keep appearing.

The right archival option depends on the nature of the data and the field of science as described in faculty or departmental data management policy documents. The university offers 2 different general repositories for data archiving.

The RDM Support Desk and faculty data stewards can help researchers with the selection of a repository that meets all the relevant criteria of privacy (sensitivity), dataset size, etc.

DataverseNL - an online platform for the publication of citable research data in a semi-open environment. DataverseNL allows users to link publications to datasets directly, and to share the data through online archives such as DANS.

Specifications:

For publishing research data on the internet
The researcher publishing the data decides whether access to the data is public or restricted
Not suitable for privacy or otherwise sensitive information
Enables researchers to publish open data according to grant providers' regulations
Generates a link (persistent identifier), e.g. for data citations in publications
Retention period is at least 10 years

Yoda - besides active storage, Yoda also has an archive function: the vault. You can use the vault in two ways:

For archiving data securely; data are only available for verification purposes and may be access only by special request. A special procedure will be followed if anyone requests access to the data in order to verify them.
For publishing data; data can be available for anyone, or on request. The data will get a persistent identifier as well.

Before sending data to the vault, you will need to add metadata. A data steward, metadata specialist or functional manager can help you with the metadata and the entire process of sending data to the vault. Please get in touch with the RDM Support Desk to find this help.

Archiving vs. Publishing Data

There is a difference between archiving and publishing data. When we talk about archiving data, we mean that data are deposited securely, in a fixed state, in a location that is not accessible to the public or even a colleague at the VU. Archiving often happens for data that are confidential - for privacy or other reasons - and that should not be accessible publicly. Archiving is usually done for verification purposes, or, in case of medical research, to comply with the preservation requirements within the WMO.

Publishing refers to depositing data in a public repository that allows others to view, access and download your data. You can set certain restrictions, but as a rule of thumb, publishing should only happen for data that are not confidential at all. That includes data that have been anonymised, or were not personal to begin with, and data that were never otherwise confidential. If you cannot publish any data at all, we do usually recommend trying to publish some documentation, such as data collection protocols, scripts, codebooks, etc. In this way, others can see how the research was carried out, even if they cannot simply access the data.

Use the image below to remind yourself of the difference between archiving and publishing, and read the data publication page to find out what aspects are important when you decide to publish your data.

This illustration is created by Scriberia with The Turing Way community. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

Choosing a different repository

Besides the repositories offered by the VU, there are many others. Unless you are working with personal or otherwise confidential data and you need to archive them in Yoda, you are, in principle, free to choose a different repository from the ones hosted by the VU.

There can be various reasons to decide to use a different repository, including funder requirements, preferences of research partners, and a repository being a common choice in your field. For example, Dutch archaeologists mostly use DANS Data Stations to deposit and publish their data. Using a repository that is a common choice in your field will make your data more findable for your colleagues and increase the visibility of your work as a researcher. Some of the data repositories most commonly used in the Netherlands include:

DANS Data Stations: a domain-agnostic research data repository hosted by the Data Archiving and Networked Services, an institute of NWO and KNAW. DANS also develops policies, services and new infrastructures for research data and provides researchers with advice on how to preserve their data. VU researchers are also welcome to deposit their data at DANS-EASY;
4TU.ResearchData: a repository for science, engineering and design data hosted by the 4TU Federation. This is a consortium of the four Dutch technical universities: TU Delft, TU Eindhoven, University of Twente and Wageningen University and Research. VU researchers are also welcome to deposit their data at 4TU;
Zenodo: a domain-agnostic research data repository hosted by CERN in Switzerland and funded by the European Commission. Zenodo does not only host data, but also presentations, conference procedures and policy documents. It is also possible to archive GitHub repositories directly into Zenodo, by which you contribute to Open Science by making a snapshot of your code available in its current form and for the long term;
OSF (Open Science Framework): a data management and research dissemination platform. The VU is an institutional member of the OSF, which means that you can sign up (and in) using your VU account by clicking on the Institution Button on the sign in/up pages. You can use the OSF to create registrations and preregistrations for your research, to publish preprints, and publish and share data and documentation. You can also link other repositories such as DataverseNL to your OSF project. The same goes for GitHub and storage options such as Research Drive and Surfdrive. Do be careful about what you connect! A full guide for VU OSF users, including instructions about connecting external storage can be found here.

You can also find repositories via the Registry of Research Data Repositories. When you are choosing a repository, it is important to check that it provides all the services you need. A good way to find out is to check if a repository as a Core Trust Seal, which is a form of certification for quality repositories. But if a repository does not have the Core Trust Seal, it does not necessarily mean it is not a good repository. As a minimum, you should check that:

The repository provides a persistent identifier, such as a DOI;
The repository enables you to add rich metadata to your dataset and ideally follows an internationally recognised metadata standard, such as Dublin Core or DataCite;
The repository offers functionality to publish data with an embargo or under restrictions, if you need that;
The repository allows you to add a licence to the dataset;
The repository is funded sustainably for at least the next 50 years;
And, in some cases, that the repository's servers are located in the EU.

More recommendations for choosing a data repository can be found on the websites of OpenAire or CESSDA.

If you would like advice about what would be a good place for you to archive your research data, you can always reach out to the RDM Support Desk.