Integrating Collector and Author Roles Across Specimen and Publication Datasets.
上市Deposited
Creator
Nicolson, Nicky
()
Paton, Alan
()
Phillips, Sarah
()
Tucker, Allan
()
2019
添加到收藏
您无权访问任何现有集合。您可以创建一个新集合。
Abstract
This work builds on the outputs of a collector data-mining exercise applied to GBIF mobilised herbarium specimen metadata, which uses unsupervised learning (clustering) to identify collectors from minimal metadata associated with field collected specimens (the DarwinCore terms , and ). Here, we outline methods to integrate these data-mined collector entities (large scale dataset, aggregated from multiple sources, created programatically) with a dataset of author entities from the International Plant Names Index (smaller scale, single source dataset, created via editorial management). The integration process asserts a generic "scientist" entity with activities in different stages of the species description process: collecting and name publication. We present techniques to investigate specialisations including content - taxa of study - and activity stages: examining if individuals focus on collecting and/or name publication. Finally, we discuss generalisations of this initially herbarium-focussed data mining and record linkage process to enable applications in a wider context, particularly in zoological datasets.
This article is part of: ST08 - More than Names : Identifying and Crediting People in Biodiversity Data Edited by Simon Chagnoux, David Shorthouse, Anne Thessen.