Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor

Due to the ubiquitous nature of CCTV cameras that record continuously, there is a large amount of video data that are unstructured. Often, when these recordings have to be reviewed, it is to look for a specific person that fits a certain description. Currently, this is achieved by manual inspection...

Full description

Saved in:
Bibliographic Details
Main Author: Yuenyong S.
Other Authors: Mahidol University
Format: Article
Published: 2023
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/83958
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
id th-mahidol.83958
record_format dspace
spelling th-mahidol.839582023-06-18T23:51:44Z Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor Yuenyong S. Mahidol University Business, Management and Accounting Due to the ubiquitous nature of CCTV cameras that record continuously, there is a large amount of video data that are unstructured. Often, when these recordings have to be reviewed, it is to look for a specific person that fits a certain description. Currently, this is achieved by manual inspection of the videos, which is both time-consuming and labor-intensive. While person description search is not a new topic, in this work, we made two contributions. First, we improve upon the existing state-of-the-art by proposing unsupervised finetuning on the language model that forms a main part of the text branch of person description search models. This led to higher recall values on the standard dataset. The second contribution is that we engineered a complete pipeline from video files to fast searchable objects. Due to the use of an approximate nearest neighbor search and some model optimizations, a person description search can be performed such that the result is available immediately when deployed on a standard PC with no GPU, allowing an interactive search. We demonstrated the effectiveness of the system on new data and showed that most people in the videos can be successfully discovered by the search. 2023-06-18T16:51:44Z 2023-06-18T16:51:44Z 2022-12-01 Article Big Data and Cognitive Computing Vol.6 No.4 (2022) 10.3390/bdcc6040136 25042289 2-s2.0-85144601088 https://repository.li.mahidol.ac.th/handle/123456789/83958 SCOPUS
institution Mahidol University
building Mahidol University Library
continent Asia
country Thailand
Thailand
content_provider Mahidol University Library
collection Mahidol University Institutional Repository
topic Business, Management and Accounting
spellingShingle Business, Management and Accounting
Yuenyong S.
Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
description Due to the ubiquitous nature of CCTV cameras that record continuously, there is a large amount of video data that are unstructured. Often, when these recordings have to be reviewed, it is to look for a specific person that fits a certain description. Currently, this is achieved by manual inspection of the videos, which is both time-consuming and labor-intensive. While person description search is not a new topic, in this work, we made two contributions. First, we improve upon the existing state-of-the-art by proposing unsupervised finetuning on the language model that forms a main part of the text branch of person description search models. This led to higher recall values on the standard dataset. The second contribution is that we engineered a complete pipeline from video files to fast searchable objects. Due to the use of an approximate nearest neighbor search and some model optimizations, a person description search can be performed such that the result is available immediately when deployed on a standard PC with no GPU, allowing an interactive search. We demonstrated the effectiveness of the system on new data and showed that most people in the videos can be successfully discovered by the search.
author2 Mahidol University
author_facet Mahidol University
Yuenyong S.
format Article
author Yuenyong S.
author_sort Yuenyong S.
title Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
title_short Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
title_full Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
title_fullStr Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
title_full_unstemmed Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
title_sort improving natural language person description search from videos with language model fine-tuning and approximate nearest neighbor
publishDate 2023
url https://repository.li.mahidol.ac.th/handle/123456789/83958
_version_ 1781414978059763712