Selecting the right search term in query-based systems for deduplication
Essentially three approaches could be identified when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the first term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or finally (...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/154222 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-154222 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1542222021-12-22T20:11:18Z Selecting the right search term in query-based systems for deduplication Jele, Harald Library and information science Essentially three approaches could be identified when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the first term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or finally (3) that term will be used, which has a certain number of hits below a defined threshold. These three procedures are compared with each other here. The results derive from series of measurements done with bibliographic data from the Austrian Central Catalog. Published version 2021-12-16T03:44:18Z 2021-12-16T03:44:18Z 2013 Journal Article Jele, H. (2013). Selecting the right search term in query-based systems for deduplication. Library and Information Science Research E-Journal, 23(2), 1-13. https://dx.doi.org/10.32655/LIBRES.2013.2.1 1058-6768 https://hdl.handle.net/10356/154222 10.32655/LIBRES.2013.2.1 2 23 1 13 en Library and Information Science Research E-Journal © 2013 Harald Jele. All rights reserved. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Library and information science |
spellingShingle |
Library and information science Jele, Harald Selecting the right search term in query-based systems for deduplication |
description |
Essentially three approaches could be identified when choosing a proper search term to detect bibliographic duplicates. Stop words are excluded in all of them, then (1) just the first term of an entry will be selected or (2) that term is selected, which produces the smallest number of hits or finally (3) that term will be used, which has a certain number of hits below a defined threshold. These three procedures are compared with each other here. The results derive from series of measurements done with bibliographic data from the Austrian Central Catalog. |
format |
Article |
author |
Jele, Harald |
author_facet |
Jele, Harald |
author_sort |
Jele, Harald |
title |
Selecting the right search term in query-based systems for deduplication |
title_short |
Selecting the right search term in query-based systems for deduplication |
title_full |
Selecting the right search term in query-based systems for deduplication |
title_fullStr |
Selecting the right search term in query-based systems for deduplication |
title_full_unstemmed |
Selecting the right search term in query-based systems for deduplication |
title_sort |
selecting the right search term in query-based systems for deduplication |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/154222 |
_version_ |
1720447158048522240 |