Selecting the right search term in query-based systems for deduplication

Essentially three approaches could be identiﬁed when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the ﬁrst term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or ﬁnally (...

Full description

Saved in:

Bibliographic Details
Main Author:	Jele, Harald
Format:	Article
Language:	English
Published:	2021
Subjects:	Library and information science
Online Access:	https://hdl.handle.net/10356/154222
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	Essentially three approaches could be identiﬁed when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the ﬁrst term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or ﬁnally (3) that term will be used, which has a certain number of hits below a deﬁned threshold. These three procedures are compared with each other here. The results derive from series of measurements done with bibliographic data from the Austrian Central Catalog.

Selecting the right search term in query-based systems for deduplication

Similar Items