Storing of data

[cite]

Suggested pre-reading	What this web page adds
Data collection	This web-page describes important considerations when considering what will happen to material collected in your research project. Reading this will give you a better understanding of what you need to consider and how to do it.

In many countries there is a requirement that original hard-copies and digital data files are stored for a number of years after completion of the project. Furthermore, many journals nowadays also require that data files are made public for any reader who wants to re-analyse data.

Table of Contents (with clickable links)

Important definitions

Identifiable: Data can be directly linked to an individual. The data set may contain names, addresses, date of birth or social security number.
Re-identifiable: The data set does not contain information enough to identify individual persons and link data to them. However, the data set contains some kind of identifier (could be a set of numbers or other random characters) that combined with a “key” (that can be another data file) can link each data set to an identifiable individual person.
Non-identifiable = anonymous: The data set does not contain information so individual persons can directly or indirectly be identified. It means there is no key that combined with other information can re-identify individual persons.

Please note that a data set that is intended to only contain anonymous data can accidentally have information that makes it possible to identify some individuals. An example can be visitor statistics at a hospital if postal code of patients is kept in the data set and some postal codes only have a few households.

Main problems with storing of research material

During and after the research project ensuring that sensitive information, such as the health of individual persons, is not leaked outside the research group.
During and after the research project ensure valuable information is not lost. This requires a backup strategy that is checked on a regular basis.
Ensure that data is stored when the project is completed. Ideally sensitive information is kept secure but non sensitive information , such as a coded data file with no link to individuals, are made publicly available so other researchers can validate results.
Ensure that data is stored in a way that it is possible to keep track of when research material that can be linked to individuals are up for destruction and where to find it when that happens.

A few practical tips:

It may not be legal to transfer data between different organisations. Rules differ between countries and regions and you may need to check what is acceptable in your circumstances. It is also likely that rules differ pending on if data are stored as identifiable, re-identifiable or non-identifiable.
CD and DVD discs may not be possible to read in the future. USB memory sticks tend to be corrupted and data may not be retrievable after a couple of years. Hence, avoid these if possible.
After a number of years when data are up for destruction please note that anonymous data should not be destroyed. In case of re-identifiable data only destroy the key linking the data set to individuals. If you have identifiable data try to make them anonymous and keep the anonymous data set.

Public storage for digital research data:

DataCite (a non-profit organisation that provides persistent identifiers -DOIs- for research data)
Dryad (a non-profit organisation offering researchers to store research data)
OSF
Swedish National Data Service (for Swedish and International research projects)
UK Data Archive (store research data from UK researchers)
Mendeley Data (owned by Elsevier)

[cite]

Enter what you want to find and click OK Search

Important definitions

Main problems with storing of research material

A few practical tips:

Public storage for digital research data: