Está en la página 1de 44

Authority files: Breaking out of the library silo to become signposts for research information

Authority files: Breaking out of the library silo to become signposts for research information
Meeting todays stakeholder demands

Authority files: Breaking out of the library silo to become signposts for research information

Authority files: Breaking out of the library silo to become signposts for research information
_______________________________________________

Meeting todays stakeholder demands

Authors: Maurits van der Graaf; orcid.org/ 0000-0002-2296-7568; m.vdgraaf@pleiade.nl Leo Waaijers; orcid.org/0000-0003-1433-2543; leowaa@xs4all.nl Contributors: Brigitte Wiechmann; Deutsche Nationalbibliothek Maurice Vanderfeesten; 3TU Data Center Adrian Price; University of Copenhagen Najko Jahn; Universitt Bielefeld Paul Vierkant; Humboldt-Universitt zu Berlin; orcid.org/0000-0003-4448-3844 Acknowledgements for sharing their insights: Magchiel Bijsterbosch, Saskia Windhouwer, Rachel Bruce, Verena Weigert, Bas Cordewener. February 2014 This work is made available under a Creative Commons attribution 4.0 licence For details see: http://creativecommons.org/licenses/by/4.0

Authority files: Breaking out of the library silo to become signposts for research information

Contents
Management summary 1. A new life for authority files in the Web of Documents and the Web of Data 2. Four use cases 2.1 Towards a new authority file for research data repositories: re3data.org 2.2 Developing authority files for Open Access journals 2.3 Present usage and further development of authority files in current research information systems 2.4 The GND, Culturegraph, the research community, and the Wikipedia approach 3. A brief overview of global authority files in research information: authors, organisations, publications, and some new ones 4. Discussion, conclusions and recommendations 4.1 Benefits of authority files, and the tasks at hand 4.2 Where stakeholders stand 4.3 Possible actions 4 9 13 14 15 17 19

22 26 27 29 30

Appendix A. Authority files and the Web of Data Appendix B. A thought experiment describing current research information in Open Linked Data Information sources

35 39 42

Authority files: Breaking out of the library silo to become signposts for research information

Management summary
Authority files, crosswalks and Web of Data
Authority files serve to uniquely identify real world things or entities like documents, persons, organisations, and their properties, like relations and features. Already important in the classical library world, authority files are indispensable for adequate information retrieval and analysis in the computer age. This is because, even more than humans, computers are poor at handling ambiguity. Through authority files, people tell computers which terms, names or numbers refer to the same thing or have the same meaning by giving equivalent notions the same identifier. Thus, authority files signpost the internet where these identifiers are interlinked on the basis of relevance. When executing a query, computers are able to navigate from identifier to identifier by following these links and collect the queried information on these so-called crosswalks. In this context, identifiers also go under the name controlled access points. Identifiers become even more crucial now massive data collections like library catalogues or research datasets are releasing their till-now contained data directly to the internet. This development is coined Open Linked Data. The concatenating name for the internet is Web of Data instead of the classical Web of Documents.

Objective of this paper


The objective of this paper is to raise understanding among researchperforming organisations, infrastructure providers and policy makers of the critical significance of authority files to meet present-day demands in research information. We tell about the revival of authority files (Chapter 1), highlight some current developments and needs (Chapter 2), give an overview of the classic and recent global authority files (Chapter 3), signal what must be done and propose roles and tasks for the various stakeholders (Chapter 4). Case studies from the Knowledge Exchange network Four case studies from the Knowledge Exchange network1 are highlighted (Chapter 2): A new authority file for research data repositories, the re3data.org registry, will enable researchers to find the best match between their to-be-deposited research dataset and a research data repository A call for the creation of new journal authority files and/or the adjustment of existing journal authority files to answer queries on Open Access publishing

The Knowledge Exchange partners are: CSC - IT Center for Science in Finland Denmarks Electronic Research Library (DEFF) in Denmark German Research Foundation (DFG) in Germany Jisc in the United Kingdom SURF in the Netherlands

Authority files: Breaking out of the library silo to become signposts for research information

The opportunities that authority files offer to Current Research Information Systems (CRIS) in answering management information and business intelligence questions The GND (Gemeinsame Normdatei) authority file collection, originating in the German library cataloguing world, is expanding its usage to other national cultural institutes, the research community and the wider internet.

Main authority files at the global level


Currently, in the domain of research information the following authority files are predominantly in use (Chapter 3): For copyrights holders like publishers: ISNI For book authors: ISNI For journal article authors: ORCID (synced with ISNI) For scholarly publications: ISSN for periodicals, ISBN for books DOI for journal articles and research datasets For research funding organisations: FundRef

Why?
Authority files are beneficial to: Discovery: authority files are essential in discovery of research information Trust and reliability: authority files support reliable and trusted identification of key elements in research information Accountability: research-performing organisations and funding organisations must account for their budgets and activities to the taxpayer. Authority files support the ability to track, report and measure aspects like funding, research output and impact. For example, FundRef supports research funders to track the research output resulting from their funding Transactional efficiency: avoiding the need to re-key data many times and supporting data exchange are important effects of authority files that benefit all stakeholders

Authority files: Breaking out of the library silo to become signposts for research information

New knowledge: authority files greatly facilitate the ability to draw correlations across data and support analytics and decision making in the management of research. The CRIS systems of the research-performing organisations will play a key role here. Since they need to meet current managerial demands for information and business intelligence, they have the most to gain from accurate and reliable authority files. They should therefore play an essential role in the creation of new ones or extending existing ones

What?
The following developments with regard to the changing landscape of authority files in research information imply roles and tasks for members of the research information community: New applications of existing authority files: existing authority files are adapted for new applications, as illustrated by the GND New authority files need to answer different information needs: the case report on Open Access publishing (Chapter 2.2) calls for new journal authority files to answer management information needs Coordination and harmonisation of authority file systems: two authority file systems with regard to authors (ISNI and ORCID) are now coordinating and harmonising their systems. The Digital Author Identifier Summit, organised by Knowledge Exchange in 2013, demonstrated the urgent demand by all stakeholders in the research community for ORCID and ISNI to converge rather than diverge. It can be expected that in the fast-changing landscape of research information, similar coordination and harmonisation issues will regularly surface Spreading the application of authority files: the wider an authority file is applied, the more efficient and effective is its usage Need for openness and adjustment of authority files to the Web of Data: the evolving Web of Data calls for openness of authority files and their migration to the Open Linked Data format Sustainability and governance of authority files systems: business models and governance of authority files are crucial with an eye on their sustainability and their adaptability to new demands such as the call for openness by the Web of Data. In these processes trust and transparency are necessary conditions so there is reliable information and understanding across stakeholders

Authority files: Breaking out of the library silo to become signposts for research information

Who and how?


Institutes (research-performing organisations) Research-performing organisations are a major stakeholder in improving authority files in research information because of the importance of management information, accountability reports and business intelligence in the increasingly complex and competitive landscape of research. Institutes will gain the most by crosswalking authority files yet at the same time might suffer the most if the authority files have insufficient coverage, lack interoperability, or are expensive to use. What could institutes do to further authority files? We see the following areas for action: Spreading the use of authority files among their researchers: ORCID is an obvious candidate for a solution for researcher identifiers as it is gaining traction. Another important area for spreading the use of authority files is organisational IDs Inventory of issues in crosswalking authority files: institutes will be the first to notice when existing authority files are deficient. Their experiences should be bundled and brought to the national and/or supranational level in order to study possible solutions Tool development: tools using authority files for tracking, analysing and reporting research information should be developed, used, refined and shared with other institutes. This activity could be supported by funding and coordinated by national stakeholders National stakeholders There are quite a few stakeholders at the national level like national libraries, funding organisations, infrastructure providers, statistic agencies and science policy makers. We propose a number of areas for action: Integration and connection of authority files: national libraries could play an essential role in integrating and connecting various authority files Sustainability of authority files: national libraries could also play an important role in the business model of authority files, together with other actors at the national level Coordinating crosswalking issues and the development of analysis and reporting tools: national umbrella organisations for research institutes should take up the task of coordinating issues in crosswalking authority files and arrange studies of possible solutions as well as funding the development of tools for analysis and reporting

Authority files: Breaking out of the library silo to become signposts for research information

Awareness campaigns: national actors could play a pivotal role in setting up awareness campaigns for researchers to spread usage of certain authority files and developing pathways to adoption, coaxing consensus, buy in and agreements

Knowledge Exchange We propose four roles for Knowledge Exchange (KE) as a supranational actor for the five KE countries: Advocacy aimed at the national stakeholders in the five KE countries: despite its importance for all actors in the research information world, authority files are relatively unknown and unappreciated Coordinating national issues with crosswalking authority files: KE could act as a supranational coordinator by studying common issues and proposing common solutions Channelling issues with regard to international authority files to the international level: in its role as supranational coordinator, KE might encounter issues in crosswalking authority files that must be solved at the international level, and could channel those issues to the relevant international player(s) Encouraging transition of information on research to Open Linked Data: stimulate the transition of research-related metadata and authority files to the Open Linked Data environment. The openness of the Web of Data is congruent with Open Access. Its transparency is the hallmark of science and a basis for trust. Meanwhile it offers appealing and extremely valuable possibilities to meet the increasing need for use cases on research information in an effective and efficient manner

Authority files: Breaking out of the library silo to become signposts for research information

1.

A new life for authority files in the Web of Documents and the Web of Data

Authority files: Breaking out of the library silo to become signposts for research information

A new life for authority files in the Web of Documents and the Web of Data
Authority files thesauri or lists of controlled terms were typical topics for librarians: start talking about a thesaurus and a traditional librarian in your audience might perk up, but the others would politely suppress a yawn. For a while, it looked like authority files would fade out, along with the traditional libraries. However, with the rise of the internet and the explosion of digital information, authority files have gained a new life, becoming internet signposts in general, and of digital research information in particular. The landscape of research information Consider the landscape of research information: over 2.2 million journal articles are published annually in more than 25,000 peer-reviewed scholarly journals. Worldwide, over 2 million books are published yearly, and probably several hundred thousands of those are scholarly books. On top of that, ever more research datasets are openly circulated as an integral part of research information. Who is producing all this material? Seven million research full time employees, according to the UNESCO Institute for Statistics (UNESCO), with a sharp growth in researchers from Asia, whose names are often transcribed into the Roman alphabet but not always in the same way. Disambiguation is this the same author or not? can be a problem, but exactly the type of problem that authority files can solve. Who is interested in it? Obviously, research performers and society at large in the first place. Their interest is to reliably find the relevant information. For that, authority files are needed. Research-performing organisations and research enablers follow in their footsteps. They have a rising interest in demonstrating impact and driving innovation and need meta-information when asking how many, who did what, what is connected to what and who and what might be relevant, etc. For this, enabling links between authority files become very important. And in prospect, the Semantic Web is beckoning. Although not sufficient, authority files are certainly a necessary part of it. (See also Appendix A. Authority files and the Web of Data.) Defining authority files What are authority files? Let us look at three descriptions. The first is from the Functional Requirements of Authority Data (IFLA 2008): Bibliographic entities are known by names and/or identifiers. In the cataloguing process (whether it happens in libraries, museums, or archives), those names and identifiers are used as the basis for constructing controlled access points. Controlled access points are created or modified by agencies and governed by rules.

10

Authority files: Breaking out of the library silo to become signposts for research information

The second explains the creation of controlled access points: For bibliographic resources, important aspects of vocabulary control include determining the authoritative forms for author names, uniform titles of works, and the set of terms by which a particular subject will be known. In library science, the process of creating and maintaining the standard names and terms is known as authority control (Glusko 2013). A third definition emphasizes the role of authority files in classification (CASRAI 2013): An Authoritative List (aka codes table, pick list, authority file, controlled vocabulary, etc.) is defined as: any set of information that is organised into a uniquely identified and coded list of approved terms that is used to constrain data entry on certain fields in order to enable unambiguous classification. In other words, authority files are created by authority control, a process that is governed by transparent rules. Authority files create controlled access points with the purpose of organising information by disambiguation in support of the search-and-retrieval-process of that information. The role of authority files in the Web of Documents and the Web of Data The world wide web is not very organized; some organisation principles would greatly help the retrieval of information. That is why authority files have moved from controlled term lists within library cataloguing systems to identifiers that are widely used in the World Wide Web and are increasingly important for research information published on the internet. The world wide web is evolving from the present Web of Documents into the Web of Data. Two central tools for retrieving information determine the Web of Documents: searching and linking. The Web of Data applies these two mechanisms to datasets by proposing a uniform data format: Open Linked Data. Data from datasets published as separate entities on the web are assigned unique resource identifiers: URIs. As such, this would only extend the Web. However, the vision of Tim Berners Lee and others went further. All things on the web, that is URIs, could be divided into classes related to the nature of the real-world entities they represent, such as persons, books and cities. Links between things could be typed expressing their meaning in real life, like is written by or is friend of and also be given a URI. Thus the basis is laid for the Semantic Web (for further reading, see Appendix A). In a rudimentary form, this structure was present in library catalogues. A catalogue card implied that this book (with identifier: ISBN) was written by that person (with author number n) and was published by P (with a unique name) in discipline d (with Dewey Decimal Code) on subject s (with a term from a controlled list). The codes and controlled term lists from the library world were carefully maintained and some had great authority. The mentioned classes of URIs stand in this tradition and are often referred to as authority files.

11

Authority files: Breaking out of the library silo to become signposts for research information

They give the Web of Data its machine-readable format. The Web of Data enables new types of applications. Browsers allow users to start browsing in one data source and navigate or crosswalk along links to related data sources. Search engines can query aggregated data from sources that today have to be queried separately. In short, linked data applications operate on top of an unbound, global data space. This enables them to deliver more complete answers as new data sources appear on the web.

12

Authority files: Breaking out of the library silo to become signposts for research information

2.

Four use cases

Authority files: Breaking out of the library silo to become signposts for research information

Four use cases


2.1 Towards a new authority file for research data repositories: re3data.org
Research datasets are increasingly seen as important research output that has to be published for purposes of validation and reuse. This has led to a surge of many, often specialised research data repositories. Worldwide, the number of research data repositories is now estimated at around 1,000. How to find the right repository for your research dataset? In answer, the re3data.org registry was set up. The next question is how to describe a research dataset bibliographically. If the dataset is seen as equivalent to the article, the research data repository will be the equivalent of the journal. Hence, a possible evolution of re3data.org towards a full-blown authority file. The need for describing and identifying research data repositories Creating a culture of sharing research data is on the political agenda of many national governments (eg. USA, UK) and international political institutions (eg. European Commission). Echoing this development, researchers need infrastructures that ensure maximum accessibility, stability and reliability to facilitate working with and sharing of research data. In response, an increasing number of universities and research organisations are starting to build research data repositories that allow permanent access in a trustworthy environment to datasets resulting from the research at their institutions. Due to varying disciplinary requirements, the landscape of research data repositories is heterogeneous. This makes it hard for researchers, funding bodies, publishers and scholarly institutions to select an appropriate storage repository or to search for data. The solution is to establish a central registry where research data repositories are indexed and listed in detail. A schema to describe research data repositories The re3data.org project developed a schema to describe research data repositories (Pampel et al. 2013). The main goal was to offer researchers as data producers and/or users orientation in the heterogeneous landscape of research data repositories. The schema was the basis of a registry of research data repositories. The schema lists metadata properties of research data repositories including general scope, content and infrastructure as well as compliance with technical, metadata and quality standards. The schema includes required metadata properties and optional properties providing additional information, using mostly controlled vocabularies. The schema is designed to: Recommend an international standard for describing a research data repository Provide the basis for interoperability between research data repositories and re3data.org Be a first step towards the goal of a certificate for research data repositories

14

Authority files: Breaking out of the library silo to become signposts for research information

How the re3data.org registry came about When the re3data.org project started, there were only a few lists of research data repositories that included only basic information, such as the name of the repository, its operator and its disciplinary focus. It is a rather rare occasion of a green field to start with and an opportunity to showcase a pure example of building an ideal authority file, both functionally and technologically born to fully fit to the new web environment. The project team first collected and recorded information on approximately 400 infrastructures storing research data by December 2012. All three project partners independently examined a subset of 20 randomly selected research data repositories. This analysis confirmed the existence of an extremely heterogeneous research data repository landscape and served as a basis for the initial draft of the very first descriptive schema. The second step was to align this schema with similar metadata schemes, including modifying vocabulary elements and introducing basic requirements for research data repositories. A set of vocabulary-based icons shows the main characteristics of a repository and this icon system helps users identify a suitable repository for the storage of their data (see Figure 1). Re3data.org as an authority file It is planned to provide re3data.org metadata via interfaces (eg. OAI-PMH) or export (eg. RDF). The schema will map to Dublin Core, fostering the dissemination of a standard vocabulary. Figure 1 Icons in re3data.org Furthermore, the assignment of a unique identifier to each research data repository description will make re3data.org also usable as an authority file for the bibliographic descriptions of research data repositories.

2.2 Developing authority files for Open Access journals


Of some 25,000 academic journals, about 20 to 25% are Open Access (OA) only and many subscription journals may have an option to publish OA articles (hybrid journals). A lot of universities encourage OA publishing. How successful is this policy? What are the costs? Adjusting the existing authority files related to journal publishing will help to answer these questions and could stimulate the transition process. Deutsche Forschungs Gemeinschaft and Open Access In its funding programme Open Access Publizieren launched in January

15

Authority files: Breaking out of the library silo to become signposts for research information

2010, the German Research Foundation awards grants to higher education and research institutions to support submissions to OA journals that charge authors for processing articles (DFG). In this programme, the German Research Foundation asks applicants to track the number of quality-controlled scholarly publications in OA journals in comparison to the overall publication distribution at a given institution on an annual basis. The idea is to find a way to monitor the number of OA publications in relation to the total number of publications of an institute. The solution could be achieved by crosswalking several authority files on journals. This calls for adjustments to those files. Journal authority files The Directory of Open Access Journals (DOAJ) offers a comprehensive list of OA journals. Its mission is to cover all Open Access scientific and scholarly journals that use a quality control system to guarantee the content. Currently more than 9,800 journals are registered. Some disciplines, especially the life sciences, maintain their own well-established journal authority files covering information about OA status (eg. PMC journal list). Journal authority files also play an important role in reporting and quality assessment. Prominent (proprietary) examples are Journal Citation Reports of Thomson and Reuters and SCImago Journal & Country Rank of Elsevier. The institutional bibliographic record Information on publication output of academic institutions is commonly tracked by their CRIS and/or Open Access Repositories (see next paragraph). If none is available or deficient, either journal application programming interfaces (APIs) or proprietary multidisciplinary databases such as Web of Science or Scopus provide retrieval facilities and downloads of bibliographic data for affiliations. Crosswalking journal authority files and the institutional bibliographic record The ISSN is part of every journal authority file described above. This unique journal identification allows easy crosswalks between the sources in question: The share of publications in OA journals in comparison to overall publication output of an institution can be determined by matching an OA journal authority file (eg. DOAJ) against the aggregated bibliographic data by ISSN Analysis may combine accompanying data stored before in one source, ie. the journal authority file or bibliographic database. Possible DOAJ aggregation levels include, for instance, an indication if publication fee is required, the discipline of the journal or its publisher. For institutional services, information about persons, affiliation, or funding may be summarised What else will be needed? As academic journal publishing makes the transition towards OA, new

16

Authority files: Breaking out of the library silo to become signposts for research information

services and adjusted journal authority files will be needed. With the advent of OA journals, new registries such as the DOAJ have been launched to raise awareness of OA publishing opportunities. Now the amount of OA publishing is increasing, the number of use cases increases accordingly. New services are springing up to address new needs. Three recent examples: In December 2013, the international ISSN agency launched the beta version of ROAD, the Directory of Open Access scholarly Resources. ROAD provides free access to a subset of the ISSN registry describing OA serial works that have been assigned an ISSN Quality Open Access Market (QOAM) gives members of the academic community the opportunity to evaluate critical aspects of the quality of OA journals (Journal Score Card) The V4OA project is in the process of defining a standard vocabulary (hence an authority file!) to describe the key values of OA publishing like embargos, author rights, publication fees and access and reuse rights To address this increasing need for management information on OA publishing, journal authority files will have to be extended with information on financial aspects (eg. average article-processing charges, waivers and memberships and payment options) and quality aspects (eg. coverage in other services, such as indexing and abstracting databases or registries). The ability to support this increasing need for management information will also have its impact on the sustainability of OA services like the DOAJ (Swan 2012).

2.3 Present usage and further development of authority files in current research information systems
Current Research Information Systems (CRIS) fulfil the management information needs for research information at the institutional level. Which research units score best in terms of scientific impact or in terms of societal impact? How successful is our university in attracting funds from national and international funders? Below, a number of use cases describe an important role for (improved) authority files. The section ends with a thought experiment describing how a use case could be handled in the (near?) future, when research information is expressed in Open Linked Data. CRIS in Denmark fulfilling the need for management information All universities in Denmark share a common infrastructure: repository software, an authority file for journals (global), which is enhanced with local bibliometric data, and a national database for all researchers at Danish universities. The first steps have been taken to implement ORCID nationwide. The University of Copenhagen has Thomsen Reuters Impact Factors integrated in their repository for use in back-end reporting on research output. These integrated authority

17

Authority files: Breaking out of the library silo to become signposts for research information

files facilitate better quality in data, both at the import and input side and at the output side, where reliance on reporting is crucial. Based on this the CRIS of the University of Copenhagen can handle several straightforward use cases (eg. publication list of a researcher or research department) quite easily. But the CRIS reporting module is also often used for management information and business intelligence-like questions or bibliometric analyses for research departments. Because of the complexity of the reporting module, requests are handled by the staff of the CRIS. Its report functionality is used daily: a clear indication of the importance of the CRIS for the universitys policies. A weakness with regard to authority files for organisations Organisations are connected not only to research activities such as publications, projects, patents, data and funding, but also to meta-research activities such as bibliometric analyses, development of performance indicators, evaluations of research cooperation, as well as research documentation, such as repositories. The CRIS of the University of Copenhagen uses a local system with an authority file of data concerning only public organisations. Presently, the data are entered in a free text field of the records, which leads to different names in different languages, different abbreviations or organisation names at different levels. This makes the data nearly useless and requires much manual work to solve questions like how many publications have been published together with private organisations. What is needed now, amongst other things, is the full integration of ORCID in the workflow and the integration of an authority file covering organisations at the international level using ISNIs consistently. CRIS in the Netherlands The performance and quality of Dutch universities are measured by the Standard Evaluation Protocol (SEP).2 One form that university research groups have to fill in is the self-evaluation report. This includes indicators such as research output (number of publications), academic reputation (awards) and societal relevance (valorisation). A universitys CRIS system is pivotal in producing these reports. Weaknesses in the Dutch CRIS systems Current weaknesses in the Dutch CRIS situation are of organisational, legal and technical nature:3 Organisational: some authority files are missing. For example, the National University Association has proclaimed the existence of so called A-class journals (= top journals in a discipline), leaving it to the researchers to define the A-class status of a journal. An authority file identifying A-class journals would be extremely helpful
2

Only available in Dutch. 3 At the moment most universities in the Netherlands are migrating from their common Metis system to new proprietary systems like PURE and Converis. It is too early to say what new developments this will imply.

18

Authority files: Breaking out of the library silo to become signposts for research information

Legal: not all databases are (completely) open due to legal and privacy issues: the most relevant examples are the human resources databases of Dutch universities Technical: existing databases are not expressed as Open Linked Data. This can be solved by using relational-database to semantic-database converters, which exist in Open Source as well as in commercial packages To address these weaknesses, enhancing existing relevant authority files and possibly creating some new ones will be crucial for the CRIS work flows. Adequate authority files/identifier systems will facilitate the collection of metadata records for persons, institutions, documents, journals and funders. Then, the management information reports will be generated by defining and executing smart queries through these metadata files. A thought experiment describing current research information in Open Linked Data Earlier in this report, the transition of the internet from a Web of Document to a Web of Data using the Open Linked Data format was described as extremely important for authority files and research information. If CRIS information were available in Open Linked Data format, many use cases could be answered by query statements. Appendix B describes a thought experiment in detail to highlight the possibilities of Open Linked Data for a use case. What does it entail? If research information were available in Open Linked Data, the function of institutional CRIS systems would change. Instead of collecting information in advance, CRIS systems could combine internal authority files on their researchers and departments with external data sources. Use cases, for instance, producing a self-evaluation report on the publications of two researchers, could be solved by querying sources in the Open Linked Data format. The thought experiment compiles this report with 14 query statements using (crosswalking) 11 authority files and databases. This thought experiment illustrates the advantages of linking research information and authority files in Open Linked Data: increased efficiency and flexibility as a result of crosswalks across the Semantic Web.

2.4 The GND, Culturegraph, the research community, and the Wikipedia approach
The GND is an authority file that originated in the library catalogue world. The German National Library is now expanding its usage to other cultural institutes (museums, archives) and to the research community. The German National Library has also initiated an innovative project that uses the Wikipedia approach to facilitate the authority control of the GND. Introducing the GND GND stands for Gemeinsame Normdatei (Integrated Authority File). It started

19

Authority files: Breaking out of the library silo to become signposts for research information

in 2013 as a combination of the former national authority files for persons (PND), corporate bodies (GKD) and subject headings (SWD) and aims to solve the name ambiguity problem in the library world. The need for name disambiguity and entries having an authoritative character is an issue that concerns a lot more communities than the library world. GND authority records not only standardise the preferred names for persons, conferences or events, corporate bodies, places or geographic names, subject headings and works but also includes alternative names and relations to other authority records. GND is also provided as Open Linked Data. This makes GND a network of related data records, which is wellsuited for internet use, permits navigation within the authority file and thereby improves search, browse and query options for users. The GND is used nationwide in libraries and increasingly also in archives, museums and other cultural institutions. Partners such as Europeana and the German Digital Library are already using the GND. Culturegraph: using Open Linked Data to create new uses for GND data The German National Library has developed a platform called Culturegraph, which uses Open Linked Data. With regard to the GND, the aim is to use this platform to create cross-domain search possibilities for cultural databases. One such cross-domain option is the creation of machine-produced fact sheets. Using GND authority files, the computer can compile an on-the-fly fact sheet on Johann Wolfgang von Goethe, for instance, if an end-user searches on that name. The factsheet will contain basic facts on Goethe and links to relevant information in other databases, such as the national catalogue or Wikipedia. The fact sheet will appear on the right-hand side of the screen of the interface of the German Digital Library (Deutsche Digitale Bibliothek, DNB). The aim is to have this up and running by summer 2014. Using ORCID to benefit researchers and the library community Journal publishers are increasingly using ORCIDs for authors of research articles (see Chapter 3). However, until now ORCIDs are not included in authority records made by librarians. Bringing together the two communities would benefit both. Authority records would become available in the GND at an earlier stage of the publishing career. Researchers would benefit by linking all library-held publications via the GND. At the time of writing, the DNB is actively looking into options to create a connection between these two authority systems. Authority control via the Wikipedia approach Another possible step for the GND authority file outside the library world is a project using a Wikipedia approach for the authority control. Because of the greater exposure of GND, GND staff receive more feedback on data with corrections and updates to the information: over 5,000 emails per year! To spare its resources, the DNB has started an internal project to create a web-

20

Authority files: Breaking out of the library silo to become signposts for research information

based end-user interface to maintain data on persons. Libraries, other cultural institutions and the academic community will be invited to participate in expanding and updating work of the GND. The proposed procedure will enable the target groups after registration to submit via internet proposals for change to a GND authority record and/or submit a new author or creator of an intellectual property. You could call this authority by crowd control and of course, the results of this project will be closely monitored. However, Wikipedia follows a similar procedure with good results regarding the quality of its entries.

21

Authority files: Breaking out of the library silo to become signposts for research information

3.

A brief overview of global authority files in research information: authors, organisations, publications, and some new ones

Authority files: Breaking out of the library silo to become signposts for research information

A brief overview of global authority files in research information: authors, organisations, publications, and some new ones
What is the situation now regarding important authority files in research information available worldwide? What are the important developments? First, let us look at classical global authority files on authors, organisations, publications and subjects and then at some recent developments. ISSN, ISBN and DOI for publications Several authority files for scientific publications are well established. The International Standard Serial Number (ISSN) for journals and serials has existed since the 1970s and is managed by a network of national agencies coordinated by the ISSN International Centre. The International Book Standard Number (ISBN) also started in the 1970s. ISBNs are given out by a network of agencies coordinated by the international ISBN agency. The DOI (digital object identifier) was created more recently. The DOI system assigns persistent identifiers to electronic publications, mainly journal articles (eg. through CrossRef), datasets (through DataCite) and more outside the field of research information to video content and official EU publications. ISNI for organisations and authors The International Standard Name Identifier (ISNI) brings together information for national authority files related to intellectual rights attribution. For example in Germany, the Deutsche Nationalbibliothek produces an authority file (GND) of authors and organisations such as publishers (see Section 2.4). However, the German national copyright organisation VG Wort maintains an authority file on copyright holders too. ISNI combines these two German authority files with each other and other national authority files. Each public entity is given a unique ISNI identifier (16-digit code) that can be used across many applications, syncing alternate or disparate spellings of the same name, and eliminating confusion when names are alike. According to an ISNI press release (December 2013), there are now more than 7 million names with an ISNI (either organisations or persons). Recent authority files relevant to research information: ORCID, FundRef and the FRBR entity work In the field of research information, three recent developments in authority files are relevant:

23

Authority files: Breaking out of the library silo to become signposts for research information

ORCID (Open Researcher and Contributor Identifier): usually, authors of scientific or scholarly works signed away their copyrights to publishers. Therefore, they had practically no affinity with the ISNI world of contributors and distributors of creative works. However, for search and retrieval, as well as research assessment purposes (citation counts!), the need for unambiguous author identification has led to the recent implementation of ORCID: a (self-)registration system for researchers to receive a unique identifier. According to their website, ORCID grew very quickly in 2013 and now has over 460,000 registered scientists. ORCID and ISNI will coordinate their efforts meaning that ORCID will use a range of ISNI numbers and selfregistering scientists will be able to consult the ISNI database during the registration procedure to see if they already have an ISNI FundRef: introduced mid-2013 by CrossRef, FundRef has now given some 5,000 research funding organisations a standard name. The workflow is as follows: publishers will include an option to register the research funder(s) during the submission process of a journal article, using the FundRef name authority file. This information will then be published in the article itself, but is also deposited at the CrossRef site. A FundRef API (application programming interface) at CrossRef gives the option to look up all the article DOIs that are linked to a specific research funder FRBR entity work: A new set of authority files for publications might be in the making: the FRBR (Functional Requirements for Bibliographic Records) definition from the library world makes a distinction between the entities work, expression and manifestation (see Figure 2). National libraries are developing authority files for the entities work and expression, which will help end-users when searching and make cataloguing more efficient. These authority files on works will need a unique international identifier to become interoperable.4

Figure 2 FRBR scheme

24

Authority files: Breaking out of the library silo to become signposts for research information

Harmonising efforts by VIAF VIAF (Virtual International Authority File) started as a joint project with the US Library of Congress, the German national library and OCLC (Online Computer Library Center). Later the French National Library joined the project and since 2012 VIAF is an OCLC service. VIAFs main purpose is to integrate the worlds authority files and make them more usable. VIAF now has over 30 national libraries participating plus several other organisations such as ISNI. VIAF creates cluster records of the various authority records focusing on: Persons Organisations Geographical names Recently, VIAF also started creating cluster records of FRBR entities work and expression using extractions of bibliographic multilingual records. There are now several hundred thousand FRBR cluster records (with the FRBR entity expression mostly consisting of translations). The project expects to obtain extensive results at the end of 2014. VIAF contains now 36 million records, of which 23 million are cluster records. An interesting cooperation exists with (the English) Wikipedia: VIAF uses factual data from Wikipedia, such as date of birth/death and adds the appropriate Wikipedia URL to the VIAF cluster records. The other way around, VIAF adds VIAF identifiers to appropriate Wikipedia articles using an automated process called VIAFbot. VIAF continues to add an estimated four to six authority files to cluster records per year. OCLC uses VIAF to improve its WorldCat service, but the VIAF records are also published as Open Linked Data files and thus can be used by everyone.5

4 In the world of publishers, a similar concept called INDECS has been developed. INDECS only defines two levels: their definition of work is similar to the FRBR definition for expression. The related unique identifier system already exists: ISTC. 5 Based on an interview with Tom Hicky and Eric Childress.

25

Authority files: Breaking out of the library silo to become signposts for research information

4.

Discussion, conclusions and recommendations

Authority files: Breaking out of the library silo to become signposts for research information

Discussion, conclusions and recommendations


4.1. Benefits of authority files, and the tasks at hand
Benefits Authority files in research information are increasingly useful to support on the one hand discovery and information retrieval and on the other hand management information and business intelligence questions. With the promise of the Web of Data, authority files will bring even more benefits through the more widespread use of authority files and the interoperability that they bring. The benefits can be summarised as: Discovery: in a similar way as in the classical library world, on the Web of Documents authority files are the basis for precision and recall in information retrieval Trust and reliability: authority files support reliable, trusted identification of key elements in research information. This is nicely illustrated by ORCID, which aims to support unambiguous identification of authors of journal articles Accountability: research-performing organisations and funding organisations need to account for their budgets and activities to the taxpayer. Authority files support the ability to track, report and measure funding aspects, and research output and impact. The role of authority files in accountability is illustrated by the recently implemented FundRef, and also by the increasing need for management information on the costs of OA publishing that calls for the creation or extension of journal authority files Transactional efficiency: avoiding the need to re-key data many times and supporting the exchange of data across stakeholders in research information are important benefits of authority files. All stakeholders will gain here: a unique identifier for individuals will facilitate reputation management for the researcher, support analysis of the output and performance of organisations, and streamline submission procedures for research funders and publishers New knowledge: the ability to make correlations across data, support analytics and decision making in the management of research is greatly facilitated by authority files. The CRIS of research-performing organisations will play a key role here. As organisations need to meet the demand for management information and business intelligence, they have the most to gain from accurate reliable authority files and should play an essential role in the creation of new authority files or the extension of existing ones

27

Authority files: Breaking out of the library silo to become signposts for research information

Tasks at hand The following developments with regard to the changing landscape of authority files in research information imply tasks for the research information community: New applications of existing authority files: existing authority files are adapted for new applications. This paper highlighted the GND, that is moving from an authority file for library catalogues towards an authority file for all cultural institutes New authority files that need to be created to answer different information needs: FundRef and the re3data.org registry as presented in this paper are good examples of authority files that answer new demands in research information Coordination and harmonisation of authority file systems: the ISNI authority file was set up for legal entities from the perspective of intellectual property rights, while the ORCID authority file tries to meet the need for unambiguous identification of authors at academic institutions. However, there is a clear overlap between the two systems that might complicate the purposes of both files. Knowledge Exchange has been instrumental in aligning these two authority file systems. In this paper, we observe a possible new overlap between two authority file systems: re3data.org might use DOIs as identifiers for research data repositories while ROAD claims to assign ISSN identifiers to repositories. In the fast-changing landscape of research information, we can expect that similar coordination and harmonisation issues will regularly come to the surface and must be addressed Spreading the application of authority files: the wider an authority file is applied, the more efficient and effective is its usage. Here ORCID might serve as an illustration: within a year, it had registered nearly half a million authors of research papers, but the total number of authors is estimated much higher Need for openness and adjusting authority files to the Web of Data: the evolving Web of Data holds important promises for a stronger role for authority files and much greater adaptability to meet new demands in research information. However, this calls for openness of authority files and their migration to the Open Linked Data format Sustainability and governance of authority files systems: business models and governance are crucial with regard to the sustainability of authority files and their adaptability to new demands. The call for openness by the Web of Data will set new demands in this area

28

Authority files: Breaking out of the library silo to become signposts for research information

4.2 Where stakeholders stand


Researchers For a researcher, one of the major irritations of working life is being asked for the same information in slightly different forms or formats, over and over again. This quote comes from a senior lecturer interviewed about the benefits of ORCID (Ferguson 2013), when asked for the most important benefits of well-established authority files for researchers at the individual level. Other important issues for researchers are reputation management essential for attracting new funding and retrieval and discovery of information, as we have seen with regard to research data repository registry. With an eye on possible action, we would like to emphasise the importance of keeping in mind that plans for new or expanding existing authority files should lessen the (administrative) burden of researchers. This will hold the key to success or failure in many cases. Institutes (research-performing organisations) In the increasingly complex and competitive landscape of research, institutes are a major stakeholder in improving authority files in research information because of the importance of management information, accountability reports and business intelligence. Institutes will gain the most by crosswalking authority files and at the same time might suffer the most if the authority files have insufficient coverage, lack interoperability or are expensive to use. The more authority files fail, the more manual work must be done or, even worse, institutes might simply lack adequate business intelligence supporting their decisions on their research activities. National stakeholders National libraries, funding organisations and science policy makers are important stakeholders at the national level. They have their own needs for research information, often the same type that institutions need, albeit at a national scale. As such they have a direct interest in furthering authority files under their own steam. However, they also have a responsibility with respect to a well-functioning national research information infrastructure addressing issues of maintenance, architecture, trust and data protection. This encompasses coordination and (co-) funding of institutional activities. Moreover, they may act as national registration and licensing agents for identifier assignments. Knowledge Exchange as a supranational stakeholder The KE partners commonly pave the way for new initiatives with respect to the ICT infrastructure for higher education and research. As authority files are a critical component of this infrastructure, this study is a natural KE initiative. Next steps lie in advocacy and creating awareness, coordinating national efforts and

29

Authority files: Breaking out of the library silo to become signposts for research information

sharing best practices. As a supranational organisation KE is also in a position to observe and assess new developments and to stimulate welcome steps, and address less welcome steps. Individually the KE partners have roles to play in their own countries and to work towards consensus, manage projects, understand barriers and catalyse new developments. Other international stakeholders One could distinguish between two categories of other international stakeholders. The first category consists of the organisations who maintain and develop the international standards and authority files such as the ISO, ISSN, ISBN and ISNI agencies. In addition, one could mention international representatives of stakeholders who apply these standards such as the International Association of Scientific, Technical and Medical publishers (STM) and the International Federation of Library Associations (IFLA) and worldwide aggregators such as OCLC. The second category consists of international organisations who have an interest in the scientific infrastructure, such as the European Commission, European Research Council and UNESCO. Outlook The Web of Documents, the Web of Data and the ever-increasing possibilities created by web technologies mean that the application of authority files can be applied at scale. Research content, with an emphasis on research data, will become available for re-use, analysis and mining on a global scale. The creation, implementation and maintenance of authority files will have wider system benefits in research information, in terms of both efficiency and effectiveness. Common agreement in the area of authority files and their wide deployment will improve research and its impact and as such support the intentions of the European Research Area and Horizon 2020.

4.3 Possible actions


Institutional level What could institutes do to spread the use of authority files? Towards researchers Request your researchers to register for an ORCID iD and explain the benefits; more specifically, explain that this will reduce their administrative burden and increase their profiling potential. See for example the introduction on Jiscs ORCID website. And above all, make it easy. A head of a Research IT Application Department of a university in the UK states: We plan to generate ORCID iDs for everybody who hasnt yet claimed one (Ferguson 2013). Other interviewees plan to match internal iDs for researchers with ORCID iDs

30

Authority files: Breaking out of the library silo to become signposts for research information

Towards your library and CRIS At the institutional level, crosswalking authority files is essential. Institutes will be the first to notice when use cases cannot be answered by the existing authority files. Tools using authority files for tracking, analyses and reporting on research information will be developed, used, refined and shared with other institutes

Short-term actions at institutional level


1. Ask your library and CRIS to make an inventory of the position of your institute vis-a-vis Open Linked Data. Specifically: Which authority files are in use and/or maintained What is the (envisioned) role of Open Linked Data What strategic decisions are foreseen in the near future with respect to the CRIS, the repository and the library system 2. Pilot the adoption of key identifiers like ORCIDs for researchers, DOIs for research datasets or ISNIs for funders on a small scale in some wider used software systems for CRISs, repositories or library systems to better understand the issuing processes and models, as well as sharing lessons and learning collectively

Towards your institute Another important area is organisational iDs, as highlighted by the CRIS of the University of Copenhagen in this paper and underlined by a recent report on this issue (Hammond & Curtis 2013). Consequently, make sure that your institution and affiliated legal entities (eg. university foundations) have an ISNI iD assigned National level National libraries Integration and connection of authority files National libraries could play an essential role in integrating and connecting various authority files: this is their expertise and most often fits in their mission. In this paper, we have seen the example of the German National Library that strives to expand the significance of its GND authority file to the research community by connecting the GND to ORCID. In another example, the National Library of Finland recently became an ISNI member and aims to align Finnish data with the ISNI requirements initially and, in a next step,

31

Authority files: Breaking out of the library silo to become signposts for research information

transfer Finnish name data to the ISNI database. It is the librarys intention to become an agency of ISNI Sustainability of authority files National libraries could also play an important role in organising support for sustainability of authority files, together with other actors at the national level. Maintenance of authority files does indeed cost money, and the spread of usage of an authority file often depends on its business model. One that restricts usage in an Open Linked Data world will limit its usefulness in research information. National actors should be aware of this and therefore be willing to support relevant authority files financially or establish a workable business model National research umbrellas Coordinating crosswalking issues National umbrella organisations for research institutes should take up the task of coordinating issues in crosswalking authority files, arrange studies of possible solutions and act as an intermediary between the institutional level and the international level with its coordinating agencies for authority files Coordinating development of analysis and reporting tools National umbrella organisations for research institutes should take up the task of funding the development and institutional deployment of tools for crosswalking, caching, analysing and reporting research data Awareness campaigns National actors could play a pivotal role in setting up awareness campaigns for researchers to spread usage of certain authority files. Possible areas suggested by the cases presented in this paper could be research data repositories and ORCID Knowledge Exchange Advocacy aimed at the national stakeholders in the five KE countries Despite its importance for all actors in the research information world, authority files are relatively unknown and therefore unappreciated. KE could expand its advocacy role on this topic, highlighting the growing role of authority files and the possibilities that they bring for national stakeholders. We hope that this paper might serve as a catalyst in these advocacy efforts Coordinating national issues with crosswalking authority files The KE partners would take on the role of coordinating issues in crosswalking authority files experienced at the institutional level in each country. KE could act as a supranational coordinator by studying common issues and proposing common solutions.

32

Authority files: Breaking out of the library silo to become signposts for research information

Short-term action for KE


Start a debate (discussion list, webinar, workshop) on the two paradigms for CRISs: CRIS as a database vs CRIS as a crosswalk organiser.

Channelling issues with regard to international authority files to the international level As supranational coordinator, KE might encounter issues in crosswalking authority files that must be solved at the international level. KE could channel these issues to the organisations who maintain and develop authority files (such as ISNI or ISSN), bring these to the attention of international representatives of publishers or libraries (eg. STM or IFLA) or involve international actors that are focused on research infrastructure (European Commission or European Research Council) Encouraging Open Linked Data KE could support and encourage the migration of (metadata of and authority files on) research information to the Open Linked Data environment. The openness of the Web of Data is congruent with OA and with the transparency that is the hallmark of science, while its possibilities for solving the increasing number of use cases in research information in an effective and efficient manner are appealing and extremely valuable.

Short term actions for KE


1. Coordinate the establishment of authority files for documentary repositories (ROAD initiative, using ISSNs as identifiers) and data repositories (re3data.org, using DOIs as identifiers) 2. Stimulate FundRef to add ISNI identifiers to their name list and make it available in Open Linked Data or, alternatively, ask Jisc to do so via RIOXX

33

Authority files: Breaking out of the library silo to become signposts for research information

A.

Appendix

Authority files: Breaking out of the library silo to become signposts for research information

Appendix A. Authority files and the Web of Data


With the upswing of Open Linked Data, the world wide web will develop increasingly into the Web of Data. In the Web of Data, authority files will play an even greater role. Why is that?

The current Web of Documents is structured by hyperlinks; the coming Web of Data is structured by RDF triples. RDF triples are short sentences starting with a subject (this publication) followed by a predicate (is translated by) and an object (that person). The third part of the sentence may also be text, describing a feature (This publication is translated into French). These features may be free text but are preferably taken from standardised vocabularies (thesauruses, taxonomies), of old, the domain of libraries. An object of one triple can be a subject for another triple. This structure can be presented in a graph with nodes (subjects and objects) connected with lines (predicates). Thus, all RDF triples together constitute a global information graph. Google has named this the knowledge graph. Humans could follow these links and nodes step-by-step (following ones nose, also called toURIsm).

35

Authority files: Breaking out of the library silo to become signposts for research information

In order to enable computers to do the same in split seconds this whole structure must be machine processable. Therefore, subjects, objects and predicates are assigned a unique identity or URI (Universal Resource Identifier) and divided in classes that follow certain rules. So the triple This person is written by that book is forbidden. Classes of predicates may obey laws of elementary logic (symmetric, transitive, reflexive). The division in classes and the rules governing their interaction go under the name Web Ontology Language (OWL). RDF triples together with OWL define the fabric of the Semantic Web. Controlled access points are human terms for URIs. Well-defined, unambiguous and robust sets of URIs are called authority files. If we, humans, are the astronauts in the information universe called the web, authority files are the collections of coded stars, planets, clouds etc, and the predicates are light beams between the stars, conveying all sorts of information about their condition. Our organisations are galaxies and their stars our controlled access points, indispensable for our orientation. Our rules for efficient and secure voyaging are laid down in the ontology. Whatever, without authority files and ontology we can only grope the Web of Data. They are the signposts in an otherwise uncharted web.

http://bnode.org/media/2009/07/08/semantic_web_technology_stack.png

At the end of the day, Linked Data is the result of all these efforts and Open Linked Data is for the web what Open Source is for software and Open Access for knowledge. In other words, Open Linked Data is the part of the Semantic Web with a CC0 licence.

36

Authority files: Breaking out of the library silo to become signposts for research information

Inference on the Semantic Web is defined as discovering new relationships between resources. It is a process that can generate new relationships based on additional information in the form of a vocabulary or rule set, including discovering possible inconsistencies in the integrated data. An inference engine is needed for the processing of the knowledge encoded in the Semantic Web language OWL Web Ontology Language. It uses Open Linked Data to generate meaningful answers to web queries like, Get all A-class publications of the department of Popular Science at Exampleton University? Or, one step further, What is the number of quality-controlled scholarly contributions in OA journals in comparison to the overall publication distribution at our institution on an annual basis? or What is the total amount spent by our institution on OA publishing and give a breakdown of it to departments, the library, external project funders and others? Also, Open Linked Data enable merging of existing authority files for the same domain: for example, an authority file on geographical names in English can be linked to an authority files on geographical names in French and so play a role in translation but also make use of each others strong points: the English authority file is probably more detailed for geographical names in the UK, the French authority file for geographical names in France. Finally, how reliable is this information? This brings us back to the idea of trust. At the end of the day, trust, more than technology, constitutes the basis of the authority of files in the Semantic Web just as it did in the age of classical libraries (OHara et al. 2004). Nevertheless, organisation of data remains crucial, as always and authority files, par excellence, identify data. Vocabulary of Open Linked Data entity URI authority file RDF-triple subject predicate object controlled access point discrete thing (including concepts) in the real world unique code assigned to entity; also identity trusted set of identities/URIs subject- predicate- object expression a resource; is URI defines relation or feature of the resource; is coded with predicate-URI value of predicate; is URI when relation, is text when feature URI from authority file

37

Authority files: Breaking out of the library silo to become signposts for research information

B.

Appendix

Authority files: Breaking out of the library silo to become signposts for research information

Appendix B. A thought experiment describing current research information in Open Linked Data
University ABC has a research department called Popular Science where two researchers work: Robbert Dijkgraaf and Andr Leeuwenhoek. According to performance agreements of their university, they have to make a self-evaluation report for their research department. Both have written 43 publications last year, of which ten were in so-called A-class journals. If the relevant databases and authority files are available in Open Linked Data format, this report can be generated by defining SPARCQL queries for semantic data using external databases. The query statements would look as follows: That Robbert and Andr are authors with ISO Name Identifiers x and y: ISNI database (incl. ORCID) at global level ISNI: <ISNI: x is_a Author> That University ABC is a University with ISNI:z ISNI: <ISNI:z is_a Organisation> That Robbert (ISNI:x) and Andr (ISNI:y) have a working relation to University ABC (ISNI:z): HR-database at University HR@UniversityABC: <ISNI:z has_employee ISNI:x> That University ABC has different departments HR@UniversityABC: <department:n is_a department> What the departments of University ABC are in a specific Research Area HR@UniversityABC: <department:n is_research_area NOD:n> Who work in these departments HR@UniversityABC: <ISNI:n works_in department:n> What authors (ISNI) have contributed to which publications (DOI): DOI database at global level DOI: <DOI:n has_author ISNI:n>

39

Authority files: Breaking out of the library silo to become signposts for research information

What Journals there are: ISSN database at global level; ISSN: <ISSN:n is_a ScientificJournal> What publications (DOI) belong to what journals (ISSN): DOI database at global level; DOI: <DOI:n published_in ISSN:n> When a publication (DOI) has been published: DOI database at global level; DOI: <DOI:n published_on w3c:date:n> What the Journal Impact Factor (JIF) for each Journal (ISSN): CWTS database at global level; CWTS: <ISSN:n has_a JIF:CWTS:n> What the defined Research Areas are. (on a national level, since the reporting in this case is a national matter): NOD (Nederlandse Onderzoeksdatabase) at national level; In our case NOD: <NOD:PopularScience is_a ResearchArea> What the definition is for an A-class journal is per Research Area, by defining a minimal Journal Impact Factor for each Research Area (in this case Popular Science): VSNU at national level.; VSNU:A-ClassDefinition: <NOD:PopularScience has_minimal JIF:CWTS:n>

This all could be entered manually into a CRIS. However, the data needed for reporting are already stored in existing databases, which are authorities in their own domain. In principle, this report can be generated from the following databases: ISNI, ISSN, DOI, CWTS (a database with citation and impact scores6), NOD (a national database with research information), VSNU (the A-class journal definition), and the human resource database of University ABC. This is called crosswalking by some. In the field of knowledge representation, they use an inference engine to crosswalk all databases. One of the advantages of querying linked databases is you can make reports creating new queries. Such as: What is the total publication volume of all Popular Science departments in the Netherlands? Alternatively, What PhD students have published in A-class journals? The information needed is often already present in databases, in general in the form of metadata. What is needed is a restructuring of the information stored. This can be solved by using relational-database to semantic-database converters, which exist in Open Source as well as commercial packages. For example, SKOS has been developed to migrate structured vocabularies like

40

Authority files: Breaking out of the library silo to become signposts for research information

thesauri or classification schemes to RDF triples. In this way, more and more datasets and authority files are made available in Open Linked Data. Yet, some elaborate human tasks remain to be executed as well, like URI definitions for new entities (eg. repositories, funders), manual disambiguations and adding missing author affiliations. If undertaken, we expect that within and the next five years many of the questions in the realm of research information can be answered by crosswalks over the Semantic Web. The incubator project VIVO ushers in this approach. It is an open network of researcher profiles and has been implemented by over 100 institutions in a dozen countries and will soon cover over 1 million researcher profiles.

This is a proprietary database. Freely accessible alternatives like Jifactor or Global Impact

Factor contemporarily cover much less journals.

41

Authority files: Breaking out of the library silo to become signposts for research information

Information sources
CASRAI. Management of authoritative lists in CASRAI (internal JISC document, draft October 2. 2013. DFG. n.d. http://www.dfg.de/en/research_funding/programmes/infrastructure/ lis/funding_opportunities/open_access_publishing/index.html. Ferguson, Nicky. JISC ORCIDimplementation group: use cases and views on the future use of ORCID in UK Higher Education. 2013. Glusko, Robert J. The discipline of organising. MIT Press, 2013. Hammond, M. and Curtis, G. Landscape study for CASRAI-UK organisational ID. Curtis and Cartwright, 2013. IFLA. IFLA working group on the functional requirements and numbering of authority records. Functional Requirements for Authority Data. IFLA, 2008. Pampel, H., Vierkant, P., Scholze, F. et al. Making Research Data Repositories Visible: the re3data.org Registry. PLOSOne, 2013: 8,11,e78080. Swan, A. Sustainability of Open Access Services. Knowledge Exchange; Danish Agency for Culture, 2012. UNESCO. n.d. UNESCO Institute for Statistics (2009) Regional Totals for R&D Expenditures (GERD) and Researchers, 2002, 2007 and 2009 [data file]. (When you get to the list of tables, choose the Excel file that has the cited title; when you have it open, scroll to the right to see the estimates of number of researchers worldwide) http://stats.uis.unesco.org/unesco/ReportFolders/ ReportFolders.aspx?IF_ActivePath=P,54

42

Authority files: Breaking out of the library silo to become signposts for research information

43

También podría gustarte