Managing data

If you are creating lots of data as part of your project it can quickly become disorganised. It is important to manage your data effectively throughout the project lifecycle so that you and your colleagues can easily find and use it when required.

Structuring data
How should I organise and manage my files?

When developing an electronic filing system:

  • Adhere to existing procedures - check for established approaches in your team you can adopt
  • Use folders - group files within folders so information on a particular topic is located in one place
  • Name folders appropriately - name folders after the areas of work to which they relate and not after individual researchers or students. This avoids confusion in shared workspaces if a member of staff leaves, and makes the file system easier for new staff or subsequent projects to navigate
  • Structure folders hierarchically - start with a limited number of folders for the broader topics and then create more specific folders within these
  • Review records - assess materials regularly or at the end of the project to ensure files aren't kept needlessly.
What's the best way to manage different versions of files?

By controlling versions, you can avoid working on outdated files or mistakenly including deleted content in a final version. Be consistent so everyone uses the same approach. Many content management systems have in-built version control so use this if it is available. Alternatively you can:

  • Use sequential numbers with point changes denoting small updates and full number denoting released versions like 1.1, 1.2, 2.0
  • Include dates in file names so you can order versions chronologically.
  • Use a version control table in documents so changes are noted and dated alongside version numbers
  • Use watermarks to identify files as draft, confidential or final and agree who will complete files and mark them as 'final'.
How can I organise and manage my emails effectively?

As with all recorded information held by the University, emails are covered by Data Protection and Freedom of Information legislation and so can be requested under these laws. This means they need to be managed effectively.

Here are some general tips for good email management:

  • Delete emails you don't need - remove any trivial or old messages from your inbox and sent items on a regular (ideally daily) basis
  • Use folders to organise messages - establish a structured file directory by subject, activity or project
  • Limit the use of attachments - use more secure methods to exchange data where possible and save important attachments to other places such as a University network drive. Right-click or control-click on an attachment to remove an attachment from the message.
  • Store important emails elsewhere - email inboxes are not appropriate places for long-term storage. Save emails which need to be kept to a University network drive or similar, in a folder with other documents relating to the same topic or project.
Further guidance

The University's records management team provide general guidance on document and version control, email management, and record retention and disposal.

Naming files
What do I need to consider when creating file names?

Develop some guidelines within your group or team to ensure that files are named consistently:

  • If you're creating content that may be for the web, avoid upper case characters and spaces in file names
  • Numbers – specify the number of digits that will be used so that files are listed numerically, for example 01, 002, 0003, etc.
  • Dates – agree on a logical use of dates so that they display chronologically, such as YYYYMMDD
  • Spaces – remove spaces or use punctuation such as underscores and hyphens to separate words, for example 'dept_seminars.pdf' or 'dept-seminars.pdf'
  • Vocabulary – choose a standard vocabulary so everyone uses agreed language and abbreviations.
What File Naming Convention should I use?

A File Naming Convention (FNC) is a framework for naming your files in a way that describes what they contain and how they relate to other files. Developing an FNC can be valuable when your project is likely to create large volumes of data. It is achieved by identifying the key elements of the project and the important differences and commonalities between your files.

If you want your files to be compatible across all operating systems they should be named according to the '8.3' convention which limits names to eight characters followed by a three-character extension. MS-DOS and older versions of Windows only support this convention and it also has the advantage of keeping your file names short and tidy.

If you are only using systems which operate newer versions of Windows or Mac OS then you can create a more complex File Naming Convention up to 255 characters long. In this case you should consider the following:

  • Find the right balance of components for your FNC. Too few components create ambiguity; too many limit discovery and understanding
  • Use meaningful abbreviations. File names that contain too many characters can be unwieldy and cause problems when transferring files
  • Document your decisions including: what components you will use (the "project name" for example); what are the appropriate entities ("DOEProject"); what acronyms mean (DOE stands for the Department of Energy)
  • Remember your files will be grouped together based on the first few components of your FNC so begin your file names with the more general components and move to the more specific ones later on
  • A File Naming Convention breaks down if not followed consistently. Be sure that everyone who needs to use the FNC is aware of it and knows how to apply it.
Example file names

These file names use a convention which clearly identifies the nature of the content, organises the files in a logical way, and conveys the work history.

  • AlmaProject_Documentation_20181202_POL_v1.docx
  • AlmaProject_Documentation_20181212_POL_v2.docx
  • AlmaProject_Documentation_20190308_RS.pdf
  • AlmaProject_MigrationData_20190201_Invoices_v1.xml
  • AlmaProject_MigrationData_20190216_Invoices_v2.xml
  • PrimoProject_Documentation_20181219_Profiles.docx
  • PrimoProject_MeetingMinutes_20181205.docx
  • PrimoProject_MeetingMinutes_20181212.docx
  • PrimoProject_MeetingMinutes_20181219.docx
  • PrimoProject_MeetingMinutes_20190107.docx
Can I rename large numbers of files at once?

You can easily batch rename files using either of these tools:

Documentation and Metadata
What is metadata and documentation?

Metadata is data about data. It's part of broader contextual information or 'documentation' that accompanies data to ensure it can be discovered and understood over time.

The information you choose to record can range from a detailed description of the data to explanatory material about why the data was created and how it has been used.

Why should I document my data?

Good documentation ensures your data can be:

  • Searched for and retrieved
  • Understood by others, both now and in the future
  • Properly interpreted, if relevant context is provided
What documentation do I need to make?

There are various pieces of information that it is useful to record. Ask yourself, "What information would I need to understand and use this data ten years from now?".

You may wish to include:

  • Some basic descriptive information: title, date (creation and/or last updated), creator, format, subject, rights, access information
  • A list of variables or field names, coverage and their values
  • An explanation of codes, classification schema and abbreviations
  • Details about how the data was created, analysed and anonymised (especially data which may have been redacted)
  • Information about the project, data creators and methodologies used for collection and analysis
  • Tips on usage, such as exceptions, quirks, questionable results

MIT Libraries provide guidelines for describing data which suggest ten elements to include.

How do I create useful documentation?

Documentation is best created alongside the data, as it is easier to capture it then, rather than trying to remember things at a later date. Decide at the outset what you want to record and build this into your data creation processes. There are a number of ways you can add documentation to your data but always ensure there are strong links between your data and the associated documentation:

  • Store a readme.txt file alongside the data which provides basic explanatory details. This is the simplest form of documentation but often the most effective
  • Include information within the data or document itself, for example in the document properties function of a file or the file header
  • Keep a database of metadata with links to files
  • Provide standardised metadata descriptions for your data, like data centre entries or bibliographic records
  • Record relevant context in lab notebooks or associated papers and reports
  • Link to websites or web pages which explain the context of the research

If you are sharing your data, use descriptive standards such as the Dublin Core Metadata Initiative (DCMI) or the Data Documentation Initiative (DDI). These are used by libraries, archives and data centres so their collections are discoverable and interoperable.

A detailed introduction to the different levels of documentation which can be used, with practical examples, is available on the UK Data Archive website.

Backing up data
​What is backing up?

The more important the data and the more often it is used, the more regularly it needs to be backed up.

Backing up is the process of producing a secondary copy of files to store elsewhere, so you can restore the originals in the event of data loss.

Such loss can be caused by malicious hacking and virus infection, hardware, software or storage media failure, accidental changes or deletion of files by colleagues, or unexpected events such as power failure, fires and floods.

To help you decide what to back up and when, think about what you'd need to restore in the case of loss. Which data is crucial for your work? You may choose to only back up certain data, or to back up files you use more regularly than others.

Who will back up my data?

That depends on where you store your data. You can find details in Annex 2 of the University's Information Classification and Handling Policy.

Information Services perform a regular back up of the University's centrally managed network file servers and shared areas. Data stored on OneDrive is automatically backed up [CHECK THIS!].  

You are responsible for backing up:

  • Anything on your computer hard drive, in other words the C: drive
  • Your laptop
  • Your home computer (if used for work)
  • Lab or office computers not on the University network
  • External storage media (such as hard drives, memory sticks and disks)

If you want to double-check whether you have to back up your files raise a call with the IT Helpdesk.

Information Services make no charge for reasonable amounts of file storage and backup. If your research requires a very large amount of storage and backup space, it is recommended to raise a call with the IT Helpdesk to discuss your requirements when writing your research proposal. When your research is likely to produce large volumes of data it is vital to prepare for this when compiling your Data Management Plan.

How should I back up my data?

If your data is stored on a centrally managed server you don't need to make backups. If data is stored elsewhere, make two or three backups of all important documents and data. If you need to back up data yourself it is best practice to create multiple backups. One backup should be stored in a different physical location from the others. Choosing different types of storage media or using media from different manufacturers is also preferable.

Your choice of storage media for backup depends on the quantity and type of data you have. Options include:

  • OneDrive can be used to back up the data stored locally on your PC
  • Memory sticks, CD or DVD and remote, online back up services may be convenient for small volumes of data
  • Hard drives or magnetic tapes may be more appropriate for large volumes or when you need to store data offline for security reasons
  • External hard drives are often the quickest, cheapest and most convenient method
  • Back up software or online services can help manage this process for you. Wikipedia provides a list of free and proprietary back-up software as well as detailing online back up services. There is a Windows back up and restore feature and if you're a Mac user, Apple Time Machine can be set to back up your data incrementally.
How do I know my backup is working?

You should test your back up at regular intervals to validate its completeness and integrity.

You can test your back up by comparing the copies against one another to make sure they correspond, for example by matching file size, dates and checksum or hash sum.

A checksum or hash sum is a unique fingerprint which can be used to ensure that the file or program has not been changed during transfer or storage.

 
Electronic lab notebooks

An Electronic lab notebook (ELN) is a piece of software which allows you to document research, experiments and procedures, and is designed to replace paper laboratory notebooks. Many different ELNs exist and most are proprietary products, but some good quality open source options do exist.

ELNs are relatively new tools and many are still being developed. If you are interested in using an ELN for your project make sure you trial the product first as many can be discipline specific and may not meet your requirements.

Jisc are running a project with the University of Glasgow throughout 2019 to trial different ELN tools with the aim of gathering evidence of requirements to inform the larger Research Data Shared Service (RDSS). If you have an interest in ELNs then visit the project blog for details of upcoming workshops.