Selecting the right search term in query-based systems for deduplication

Essentially three approaches could be identified when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the first term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or finally (...

Full description

Saved in:
Bibliographic Details
Main Author: Jele, Harald
Format: Article
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/154222
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-154222
record_format dspace
spelling sg-ntu-dr.10356-1542222021-12-22T20:11:18Z Selecting the right search term in query-based systems for deduplication Jele, Harald Library and information science Essentially three approaches could be identified when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the first term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or finally (3) that term will be used, which has a certain number of hits below a defined threshold. These three procedures are compared with each other here. The results derive from series of measurements done with bibliographic data from the Austrian Central Catalog. Published version 2021-12-16T03:44:18Z 2021-12-16T03:44:18Z 2013 Journal Article Jele, H. (2013). Selecting the right search term in query-based systems for deduplication. Library and Information Science Research E-Journal, 23(2), 1-13. https://dx.doi.org/10.32655/LIBRES.2013.2.1 1058-6768 https://hdl.handle.net/10356/154222 10.32655/LIBRES.2013.2.1 2 23 1 13 en Library and Information Science Research E-Journal © 2013 Harald Jele. All rights reserved. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Library and information science
spellingShingle Library and information science
Jele, Harald
Selecting the right search term in query-based systems for deduplication
description Essentially three approaches could be identified when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the first term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or finally (3) that term will be used, which has a certain number of hits below a defined threshold. These three procedures are compared with each other here. The results derive from series of measurements done with bibliographic data from the Austrian Central Catalog.
format Article
author Jele, Harald
author_facet Jele, Harald
author_sort Jele, Harald
title Selecting the right search term in query-based systems for deduplication
title_short Selecting the right search term in query-based systems for deduplication
title_full Selecting the right search term in query-based systems for deduplication
title_fullStr Selecting the right search term in query-based systems for deduplication
title_full_unstemmed Selecting the right search term in query-based systems for deduplication
title_sort selecting the right search term in query-based systems for deduplication
publishDate 2021
url https://hdl.handle.net/10356/154222
_version_ 1720447158048522240