Selecting the right search term in query-based systems for deduplication

Essentially three approaches could be identiﬁed when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the ﬁrst term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or ﬁnally (...

Full description

Saved in:

Bibliographic Details
Main Author:	Jele, Harald
Format:	Article
Language:	English
Published:	2021
Subjects:	Library and information science
Online Access:	https://hdl.handle.net/10356/154222
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-154222
record_format	dspace
spelling	sg-ntu-dr.10356-1542222021-12-22T20:11:18Z Selecting the right search term in query-based systems for deduplication Jele, Harald Library and information science Essentially three approaches could be identiﬁed when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the ﬁrst term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or ﬁnally (3) that term will be used, which has a certain number of hits below a deﬁned threshold. These three procedures are compared with each other here. The results derive from series of measurements done with bibliographic data from the Austrian Central Catalog. Published version 2021-12-16T03:44:18Z 2021-12-16T03:44:18Z 2013 Journal Article Jele, H. (2013). Selecting the right search term in query-based systems for deduplication. Library and Information Science Research E-Journal, 23(2), 1-13. https://dx.doi.org/10.32655/LIBRES.2013.2.1 1058-6768 https://hdl.handle.net/10356/154222 10.32655/LIBRES.2013.2.1 2 23 1 13 en Library and Information Science Research E-Journal © 2013 Harald Jele. All rights reserved. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Library and information science
spellingShingle	Library and information science Jele, Harald Selecting the right search term in query-based systems for deduplication
description	Essentially three approaches could be identiﬁed when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the ﬁrst term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or ﬁnally (3) that term will be used, which has a certain number of hits below a deﬁned threshold. These three procedures are compared with each other here. The results derive from series of measurements done with bibliographic data from the Austrian Central Catalog.
format	Article
author	Jele, Harald
author_facet	Jele, Harald
author_sort	Jele, Harald
title	Selecting the right search term in query-based systems for deduplication
title_short	Selecting the right search term in query-based systems for deduplication
title_full	Selecting the right search term in query-based systems for deduplication
title_fullStr	Selecting the right search term in query-based systems for deduplication
title_full_unstemmed	Selecting the right search term in query-based systems for deduplication
title_sort	selecting the right search term in query-based systems for deduplication
publishDate	2021
url	https://hdl.handle.net/10356/154222
_version_	1720447158048522240

Selecting the right search term in query-based systems for deduplication

Similar Items