Data management (storage and organisation)
In order to keep research data secure, findable and readable, you, as a researcher, must carry out various activities. On the one hand, activities related to storing the data and, on the other hand, activities related to organising the data.
Data Storage and Access
When storing your data, you need to pay attention to things like backing up, accessing and transferring data and keeping data readable. All research data can be safely stored during the research using the Research Drive tool. By using Research Drive, data backup and access are already well organised from the start.
Research Drive
All research data can be saved during the research using the Research Drive tool. A project folder is created per research project in which you can create your own folder structure or make use of a recommended folder structure. Backup is well organised through our supplier SURFSara. We also have a support agreement with SURFSara and a processor agreement for the secure storage of personal data.
A project folder in Research Drive can be requested via researchsupport@hhs.nl. You will be invited for an interview with the data steward who will also create a project folder. You will then receive an invitation to add your account details to Research Drive. Once you have accepted this, the functional administrator can make you owner of the project folder and the owner can then invite (research) staff to access Research Drive. These people can (after accepting the invitation) be given rights to the project folder or just some folders from the root folder. This is up to you, the owner, to decide.
The folder 'Test Project' in Research Drive contains some useful documents such as the Quick Guide Research Drive and the guide documentation and folder structure.
The shortcut to the tool can be found via this link, but is of course only accessible if you have registered your research with the functional manager.
Access
When organising access to your data during the course of your research, you should take into account the nature of the data. Personal or sensitive data require a higher level of security than anonymised or non-confidential data. When using data from an external party, you must comply with the specific restrictions (e.g. protected by intellectual property) that this data may have. Research Drive offers the possibility to set up settings and authorisations in a way that complies with these conditions and applicable legislation [link naar interne vertaalde pagina]. Periodically check that no unauthorised access takes place and verify who has access to which folders and files.
Questions related to data access:
- Who has access to the data?
- Who owns the data?
- How do you deal with possible terms of use of the data?
- Who is allowed to edit the data?
- Who controls the data?
- How do you ensure that the data remains accessible when you or other people leave the research?
Security Measures
Keep in mind that you can access your data securely via Research Drive from all the locations where you work. A decent firewall and reliable antivirus software are a must. Avoid using unsafe internet connections. Always lock your device when you walk away and never leave your device unattended/unprotected for long periods of time.
Handle passwords wisely. And apply encryption as an additional security measure. The Research Drive storage system uses encryption. If it is necessary to use your data outside Research Drive, apply encryption yourself using software such as VeraCrypt or Cryptomator. Sending your files by email is not safe. It is better to use the secure tool SURFfilesender, which also uses encryption. You log in with your THUAS account.
Consortium
When cooperating with other institutions or organisations, it will be necessary to consider jointly which institutions store which data in which format and who has access to which data. These agreements must be included in the (joint) data management plan but also laid down in writing in a consortium agreement. Periodically check that all parties continue to observe the procedures that have been agreed upon.
Organising Data
The time you invest in thinking about how to organise research data and the associated data and project documentation will pay off handsomely in the long run. It makes the data easier to find and understand. For yourself, for the researchers you work with and later for others who will reuse your data. It is therefore important to store the data in a consistent manner and to provide accurate documentation and metadata. Make sure your folders and files are clearly structured and use informative and meaningful file names.
File Names
As a researcher, you determine the strategy you want to follow when giving out file names. Different approaches are possible, but it is important to consider them carefully. A file name is in fact the most important element with which you can identify the file.
The following elements can be used as a basis for file names: project name, project number, research team name, measurement type, subject, creation date, version number. This list may be supplemented by other variables.
However, there are points and rules you should keep in mind when making your choice:
- Take into account the possibilities and limitations of the (storage) system you are working with. Sometimes, for example, the system determines the length of the file name.
- Choose one naming convention and apply it consistently by including the same information in the same order in the file names.
- Make file names specific, detailed and unique. This way there is no conflict when the files are moved to another folder and you avoid working in the wrong file without realising it.
- Observe the following fixed rules: the same number of digits (001...100...), fixed notation for dates (YYYY-MM-DD, YYYY-MM or YYYY-YYYY), underscores and hyphens instead of spaces, standard terms (get inspiration from bartoc.org), no special characters and leave file extensions unchanged.
- Keep file names as short and relevant as possible. Generally, about 25 characters long is enough to capture sufficient descriptive information. If necessary, you can encode file name elements.
- File names can be automatically generated by software you use (e.g. file names assigned to photos by your photo camera). Change these file names according to your chosen naming convention. Software such as Ant Renamer and NameChanger is available for renaming multiple files simultaneously.
In summary, therefore, file names should contain useful clues as to the contents, status and version of the file. The file name helps to distinguish files from each other and it provides assistance in classifying and sorting files.
Document your entire strategy with regard to giving file names. This documentation helps to remain consistent and to continue to understand the strategy long after you have completed your research. It is especially useful when you are working with several researchers on the same data.
Folder structure
The above guidelines for file names naturally also apply to the folder names. In order to keep an overview in your folder structure, the best approach is to reflect the different phases of your research. The names of your folders reflect these phases such as preparation (administration and documentation of research project, including your data management plan), raw data, manipulated data, reports of analyses and final products such as publications. It is also the start of your folder structure.
By reflecting the research phases in your folder structure, the structure also reflects the different versions of your research data. Always save the raw data file and ensure that no further changes can be made to it (e.g. save read-only or configure access rights). It is also wise to have a separate folder for the most advanced version of your data. This way, you can be sure that you are always working with the right version.
The hierarchy of folders must remain simple and clear. It is therefore advisable not to have too many levels in the folder structure.
Document the choices you make in terms of folder names and folder structure, including all changes in the folder structure and the associated arguments.
Documentation
Documenting both your research process (in the form of protocols, methodology descriptions, etc.) and your data (in the form of inventories, descriptions of relationships and manipulations, etc.) is important to avoid errors and to interpret data correctly during your research and after your research is completed (validation). README.txt is a file in which an overview is given of the data set. Here you describe the contents of each file in your data set. The guidelines for writing such a README.txt file have been established by the 4TU Centre for Research Data.
Version Management
Be consistent in the file naming of different versions, for example by adding the date (YYYY-MM-DD) in the file or the version number. In addition, record the differences between versions. You can do this using a simple table that contains the following columns: version number, a brief description of what was done with the data, who did it and the date it was done. Instead of manual version control, you can also use version control software such as Subversion.
More Info
- Guidelines folder structure in Research Drive
- Best practices in file naming from Stanford University Libraries
- Follow the 10 rules for best practice in naming files and folders
- Managing and sharing data from UK Data Archive with, among other things, an excellent explanation of why and how version control is used, especially when collaborating with others
Support by a Data Steward
Researchers can receive support in research data management. The research data steward(s) of THUAS can be contacted at researchsupport@hhs.nl