Skip to main content

Research Data Management

When you are doing research, good data management practices and transparency are essential. This toolbox provides practical information and guidelines for both PhD students and researchers when working with research data.

Selecting data for archiving

There are various reasons to archive your data: replication, longitudinal research, data being unique or expensive to collect, re-usability and acceleration of research inside or outside your own discipline. It is VU policy to archive your data for (at least) 10 years after the last publication based on the dataset. Part of preparing your dataset for archiving is appraising and selecting your data.

Make a selection before archiving your data

During your research you may accumulate a lot of data, some of which will be eligible for archiving. It is impossible to preserve all data infinitely. Archiving all digital data is associated with high costs for storage itself and for maintaining and managing this ever-growing volume of data and their metadata; it may also lead to decline in discoverability (see the website of the Digital Curation Centre). For those reasons, it is crucial that you make a selection.

Remove redundant and sensitive data

Selecting data means making choices about what to keep for the long term, and what data to archive securely and what data to publish openly. This means that you have to decide whether your dataset contains data that need to be removed or separated. Reasons to exclude data from publishing include (but are not limited to):

  • data are redundant
  • data concern temporary byproducts which are irrelevant for future use
  • data contain material that is sensitive, for example personal data in the sense of the GDPR, like consent forms, voice recordings, DNA data; state secrets; data that are sensitive to competition in a commercial sense. These data need to be separated from other data and archived securely
  • preserving data for the long term is in breach of contractual arrangements with your consortium partners or other parties involved

In preparing your dataset for archiving, the first step is to determine which parts of your data are sensitive, which can then be separated from the other data. Redundant data can be removed altogether.

Different forms of datasets for different purposes

Once you have separated the sensitive data from the rest of your dataset, you have to think about what to do with these sensitive materials. In some cases they may be destroyed, but you may also opt for archiving multiple datasets. For example, you may want to archive your dataset in more than one form depending on the purpose. For example:

  1. One for reusability to share, and
  2. A second one that contains the sensitive data, and needs to be handled differently.

For the first, the non-sensitive data can be stored in an archive under restricted or open access conditions, so that you can share it and link it to publications. For the second, you need to make a separate selection, so the sensitive part can be stored safely in a secure archive (a so-called offline or dark archive). In the metadata of both archives you can create stable links between the two datasets using persistent identifiers.

What to appraise for archiving

Below you find a flowchart that helps you determine what data to select for archiving. This might also help you or your department to think about a standard policy or procedures for what needs to be kept, what is vital for reproducing research or reuse in future research projects.

Flowchart describing the process of selecting and deselecting data for long-term archiving

More information on selecting data:

Data set packaging: Which files should be part of my dataset?

A data set consists of the following documents:

See the section Metadata for more information about documenting your data.

Depending on the research project it may be that more than one dataset is stored in more than one repository. Make sure that each consortium partner that collects data also stores all necessary data that is required for transparency and verification. A Consortium Agreement and Data Management Plan will include information on who is responsible for archiving the data.

Deposit your data @VU Amsterdam

VU Amsterdam requests that researchers archive the data used in a publication in a repository for at least ten years after the release of the publication (see also VU Policies & Regulations). There are a lot of digital archives and many more keep appearing.

The right archival option depends on the nature of the data and the field of science as described in faculty or departmental data management policy documents. The university offers 3 different general repositories for data archiving.

On the VU intranet researchers can find all three research data archives, including a wizard that allows for easy selection of a repository that meets all the relevant criteria of privacy (sensitivity), dataset size, etc.

DataverseNL - an online platform for the publication of citable research data in a semi-open environment. DataverseNL allows users to link publications to datasets directly, and to share the data through online archives such as DANS.


  • For publishing research data on the internet
  • The researcher publishing the data decides whether access to the data is public or restricted
  • Not suitable for privacy or otherwise sensitive information
  • Enables researchers to publish open data according to grant providers' regulations
  • Generates a link (persistent identifier), e.g. for data citations in publications
  • Retention period is at least 10 years

ArchStor - a research data archive with a 10-year retention period. Data stored in ArchStor can only be accessed for verification purposes.


  • For archiving research data at the VU archive
  • Researchers can only access their data through an email request
  • Enables researchers to adhere to the VSNU Code of Conduct for Research Integrity with respect to verifiability of research
  • Not suitable for privacy sensitive information
  • No link (persistent identifier) available
  • Retention period is 10 years

DarkStor - an offline archive for storing sensitive information/data. Information is considered sensitive when it involves matters like privacy or copyright. DarkStor is only suitable for datasets that require additional security. Once archived, access to the data can only be requested by authorized individuals i.e. the original researcher or a research coordinator.


  • For archiving data in a secured archive
  • Suitable for privacy sensitive information
  • Researchers can only access their data through an email request
  • Conform the security norm for data protection
  • No link (persistent identifier) available
  • Retention period is at least 10 years