Properly organizing your data can help you to understand, manipulate, visualise, and analyse the information you have available. There are many ways to structure your data, and the best option depends on how you plan to use it. For example, a data table presented in a paper will likely have a different structure from the data in your Excel worksheet. In this section, we will focus on one such structure: a structure where each variable has it's own column, each observation has it's own row, and each value has it's own cell. This is what Grolemund and Wickham (2017, ch. 12) call ‘tidy data’. After briefly explaing why this data structure is useful, we will show some examples of ‘un-tidy’ or ‘messy’ data, and how a tidy version of the same data would look. Although we will not explain in depth how to make a dataset tidy using different software packages, we will point you to some useful functions in Excel and Stata.
Much of the information in this LibGuide is taken from Grolemund and Wickham's work. For more background information, you can check their website, or Wickham (2014). Grolemund and Wickham use the R software package to illustrate their work, but their examples and explanations are useful independent of the software you use.
An important practical benefit of the tidy data structure is that in most cases, statistical software will expect your data to have this structure, and will not work as expected otherwise. Other benefits are that you can easily filter your data to make subsets (for example, to select only one year from a sample of five years), or to combine different datasets (see also this section).
Data structures are best understood by example. The table below presents a basic example for a tidy data set. If you check the requirements for tidy data, you will see that
You might also notice that there are five rows, but only four observations. The first row gives variable names. Although you can refer to the variables using these names in some software packages, the names are not part of the data. Something else you might have noticed is that there are some duplicate values. For example, Firm A and Firm B are in the data twice. This is not a problem for your data structure (although it might indicate a substantial problem with your data).
Data structures can be messy in many different ways; we cannot give an exhaustive list here. However, we can present some examples of messy data that we come across regularly.
A first common example is given in the table below. The difference with the tidy data structure above should be clear: the first column has values for two variables. In this case, splitting the two variables to get back to a tidy structure is not difficult because of the fixed length of both variables. However, this is not always the case, and splitting into separate columns needs careful consideration.
As a second example, the table above on the left takes the data on Employment from the tidy data example from the introduction, but represents it differently. This structure is very compact, which makes it an attractive option for presenting data in a paper. It is less useful for analysis, as the structure is not tidy. Inspecting the data, you will notice a few things.
Real world data can be much more messy than this, and will often combine many types of untidyness. Figuring out where the problems are can be a challenge. However, making your data tidy will make your data easier to manipulate and analyse. A good example of a task that is much easier with tidy data is combining data from different sources. We discuss that topic here.
NARCIS is dé nationale portal voor wie informatie zoekt over wetenschappers en hun werk. Naast wetenschappers maken ook studenten, journalisten en medewerkers binnen onderwijs, overheid en het bedrijfsleven gebruik van NARCIS.
NARCIS biedt toegang tot wetenschappelijke informatie waaronder (open access) publicaties afkomstig uit de repositories van alle Nederlandse universiteiten, KNAW, NWO en diverse wetenschappelijke instellingen, datasets van een aantal data-archieven, alsmede beschrijvingen van onderzoeksprojecten, onderzoekers en onderzoeksinstituten.
Dit houdt in dat NARCIS (nog) niet gebruikt kan worden als ingang tot complete overzichten van publicaties van onderzoekers. Er zijn echter steeds meer instellingen die al hun wetenschappelijke publicaties via NARCIS toegankelijk maken. Op deze wijze kunnen de publicatielijsten van de wetenschappers zo compleet mogelijk worden gemaakt.
In 2004 is de ontwikkeling van NARCIS gestart als een samenwerkingsproject van KNAW Onderzoek Informatie, NWO, VSNU en METIS in het kader van de dienstenontwikkeling binnen het DARE-programma van SURFfoundation. Dit project heeft de portal NARCIS verwezenlijkt, waarin in januari 2007 de dienst DAREnet is geïncorporeerd. Sinds 2011 is NARCIS een dienst van DANS.
National Academic Research and Collaborations Information System
NARCIS is the main national portal for those looking for information about researchers and their work. Besides researchers, NARCIS is also used by students, journalists and people working in educational and government institutions as well as the business sector.
NARCIS provides access to scientific information, including (open access) publications from the repositories of all the Dutch universities, KNAW, NWO and a number of research institutes, datasets from some data archives as well as descriptions of research projects, researchers and research institutes.
This means that NARCIS cannot be used as an entry point to access complete overviews of publications of researchers (yet). However, there are more institutions that make all their scientific publications accessible via NARCIS. By doing so, it will become possible to create much more complete publication lists of researchers.
In 2004, the development of NARCIS started as a cooperation project of KNAW Research Information, NWO, VSNU and METIS, as part of the development of services within the DARE programme of SURFfoundation. This project resulted in the NARCIS portal, in which the DAREnet service was incorporated in January 2007. NARCIS has been part of DANS since 2011.