When searching for research data, you will often find that the data you need is spread over different databases. Let's say you are interested in the relationship between natural disasters and economic growth, as in the article by Skidmore and Toya (2002). Data on long-run economic growth is available from the . The International Disaster Database has data on natural disasters. However, if you want to quantify the relationship between disasters and growth, you will probably need to have information from both sources combined in a single dataset. In this section, we will give some guidelines to achieve that goal.
In what follows, we will assume that your data is "tidy". If you don't know what that means, it's a good idea to read this section first.
There are two ways to combine data: to add more observations to existing variables, or to add variables to existing observations. The first type is sometimes called a ‘Vertical merge’ or ‘appending data’ the second a ‘Horizontal merge’, or simply a ‘merge’.
A vertical merge typically is not very complicated, and mostly a matter of "copy-and-paste", and possibly re-ordering your data as needed. Below is a simple example.
A horizontal merge can be more complicated. When merging data, we want to make sure that the observations are linked correctly: Employment for firm A in Year 2005 should be linked to Turnover for firm A in Year 2005. Any other combination will mess up your data. In vertical merges, this is never an issue, as no observations need to be 'linked'.
With horizontal merges, you need to be specific about the variables that will form the basis for the data merge. We will refer to these variables as the key variables. In the example below, 'Firm' and 'Year' are the key variables. Each combination of these key variables identifies one observation in the data.
For the final table, you would look at combinations of Firm and Year in Table A; look up those same combinations in Table B; and then merge the values for Turnover with the value for Employees to create a final merged table. You can do this by hand for this tiny example, but for larger datasets you will want to automate this. Most tools will have some kind of functionality. Have a look at the Software tab for resources that can help you with Excel, Stata, and SPSS.
The number of key variables is not limited to two. Sometimes, a single key variable will suffice. In other cases, you will need more. The only conditions are that ı) the key variables exist in both datasets; and ıı) the key variables need to uniquely identify the observations for at least one of the two data sets you are combining. That is, the key variables cannot have duplicates for both of the data sets.
Most datasets include some kind of identifier for the units in your data. For example, if you have firm level financial data, the data source will most likely include a firm identifier code such as Sedol or Cusip. Data at the national level will probably have some kind of country code. Unless there is a very good reason not to, using identification codes as key variable is preferable to using names. Different data bases often use slightly different spelling, which will make matching across different sources difficult. Identification codes are the same across all databases. There is more information on this topic in the Working with identifiers tab.
In the example above, the key variables uniquely identify observations in both data sets: Firm A in Year 2005 is in the data only once in each of the two data sets, and there are no duplicates. (Although each key variable separately does have duplicates: firm A is in the data more than once.) If we merge two data sets where the key variables are unique in both, we call this a "one-to-one merge": one observations in Table A is always linked to exactly one observation in Table B.
We mentioned earlier that key variables should be unique in at least one data set, but not necessarily both. Have a look at the example below.
Table A is the final table from the previous example; we would now like to add a location for each firm, as given in Table B. None of the firms changed location over time, and there is no Year variable in Table B, so it is not possible to have Firm and Year as key variables. That is not a problem: the Firm variable alone can act as the key. Although Firm does not uniquely identify observations in Table A, it does pinpoint observations in Table B. This means that the conditions for key variables are fulfilled: it exists in both Table A and B, and is unique for observations in Table B. In the final table on the right, you will notice that Location is linked to Firm, but not to Firm-Year: it does not change over time.
We call this type a "one-to-many merge". One observation in Table B is linked to multiple observations in Table A. Of course, depending on which Table you start from, a one-to-many merge can also be a many-to-one merge.
All software that can handle one-to-one merges can handle many-to-one merges as well. Some software packages (like Stata) demands that you are specific about the type of merge you are trying to do, while others (like Excel) are agnostic.
NARCIS is dé nationale portal voor wie informatie zoekt over wetenschappers en hun werk. Naast wetenschappers maken ook studenten, journalisten en medewerkers binnen onderwijs, overheid en het bedrijfsleven gebruik van NARCIS.
NARCIS biedt toegang tot wetenschappelijke informatie waaronder (open access) publicaties afkomstig uit de repositories van alle Nederlandse universiteiten, KNAW, NWO en diverse wetenschappelijke instellingen, datasets van een aantal data-archieven, alsmede beschrijvingen van onderzoeksprojecten, onderzoekers en onderzoeksinstituten.
Dit houdt in dat NARCIS (nog) niet gebruikt kan worden als ingang tot complete overzichten van publicaties van onderzoekers. Er zijn echter steeds meer instellingen die al hun wetenschappelijke publicaties via NARCIS toegankelijk maken. Op deze wijze kunnen de publicatielijsten van de wetenschappers zo compleet mogelijk worden gemaakt.
In 2004 is de ontwikkeling van NARCIS gestart als een samenwerkingsproject van KNAW Onderzoek Informatie, NWO, VSNU en METIS in het kader van de dienstenontwikkeling binnen het DARE-programma van SURFfoundation. Dit project heeft de portal NARCIS verwezenlijkt, waarin in januari 2007 de dienst DAREnet is geïncorporeerd. Sinds 2011 is NARCIS een dienst van DANS.
National Academic Research and Collaborations Information System
NARCIS is the main national portal for those looking for information about researchers and their work. Besides researchers, NARCIS is also used by students, journalists and people working in educational and government institutions as well as the business sector.
NARCIS provides access to scientific information, including (open access) publications from the repositories of all the Dutch universities, KNAW, NWO and a number of research institutes, datasets from some data archives as well as descriptions of research projects, researchers and research institutes.
This means that NARCIS cannot be used as an entry point to access complete overviews of publications of researchers (yet). However, there are more institutions that make all their scientific publications accessible via NARCIS. By doing so, it will become possible to create much more complete publication lists of researchers.
In 2004, the development of NARCIS started as a cooperation project of KNAW Research Information, NWO, VSNU and METIS, as part of the development of services within the DARE programme of SURFfoundation. This project resulted in the NARCIS portal, in which the DAREnet service was incorporated in January 2007. NARCIS has been part of DANS since 2011.