Evidence aware neural pornographic text identification for child protection

Identifying pornographic text online is practically useful to protect children from access to such adult content. However, some authors may intentionally avoid using sensitive words in their pornographic texts to take advantage of the lack of human audits. Without prior knowledge guidance, real sema...

Full description

Saved in:
Bibliographic Details
Main Authors: SONG, Kaisong, KANG, Yangyang, GAO, Wei, GAO, Zhe, SUN, Changlong, LIU, Xiaozhong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6616
https://ink.library.smu.edu.sg/context/sis_research/article/7619/viewcontent/17753_Article_Text_21247_1_2_20210518.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7619
record_format dspace
spelling sg-smu-ink.sis_research-76192022-01-14T03:46:58Z Evidence aware neural pornographic text identification for child protection SONG, Kaisong KANG, Yangyang GAO, Wei GAO, Zhe SUN, Changlong LIU, Xiaozhong Identifying pornographic text online is practically useful to protect children from access to such adult content. However, some authors may intentionally avoid using sensitive words in their pornographic texts to take advantage of the lack of human audits. Without prior knowledge guidance, real semantics of such pornographic text is difficult to understand by existing methods due to its high context-sensitivity and heavy usage of figurative language, which brings huge challenges to the porn detection systems used in social media platforms. In this paper, we approach to the problem as a document-level porn identification task by locating and integrating sentence-level evidence and propose a novel Evidence-Aware Neural Porn Classification (eNPC) model. Specifically, we first propose a basic model which locates porn indicative sentences in the document with a multiple instance learning model, and then aggregate the sentence-level evidence to induce document label with self-attention mechanism. Moreover, we consider label dependencies within local context. Finally, we further enhance the sentence representation with prior knowledge produced by an automatic porn lexicon construction strategy. Extensive experimental results show that our model exhibits consistent superiority over competitors on two real-world Chinese novel datasets and an English story dataset. 2021-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6616 https://ink.library.smu.edu.sg/context/sis_research/article/7619/viewcontent/17753_Article_Text_21247_1_2_20210518.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
SONG, Kaisong
KANG, Yangyang
GAO, Wei
GAO, Zhe
SUN, Changlong
LIU, Xiaozhong
Evidence aware neural pornographic text identification for child protection
description Identifying pornographic text online is practically useful to protect children from access to such adult content. However, some authors may intentionally avoid using sensitive words in their pornographic texts to take advantage of the lack of human audits. Without prior knowledge guidance, real semantics of such pornographic text is difficult to understand by existing methods due to its high context-sensitivity and heavy usage of figurative language, which brings huge challenges to the porn detection systems used in social media platforms. In this paper, we approach to the problem as a document-level porn identification task by locating and integrating sentence-level evidence and propose a novel Evidence-Aware Neural Porn Classification (eNPC) model. Specifically, we first propose a basic model which locates porn indicative sentences in the document with a multiple instance learning model, and then aggregate the sentence-level evidence to induce document label with self-attention mechanism. Moreover, we consider label dependencies within local context. Finally, we further enhance the sentence representation with prior knowledge produced by an automatic porn lexicon construction strategy. Extensive experimental results show that our model exhibits consistent superiority over competitors on two real-world Chinese novel datasets and an English story dataset.
format text
author SONG, Kaisong
KANG, Yangyang
GAO, Wei
GAO, Zhe
SUN, Changlong
LIU, Xiaozhong
author_facet SONG, Kaisong
KANG, Yangyang
GAO, Wei
GAO, Zhe
SUN, Changlong
LIU, Xiaozhong
author_sort SONG, Kaisong
title Evidence aware neural pornographic text identification for child protection
title_short Evidence aware neural pornographic text identification for child protection
title_full Evidence aware neural pornographic text identification for child protection
title_fullStr Evidence aware neural pornographic text identification for child protection
title_full_unstemmed Evidence aware neural pornographic text identification for child protection
title_sort evidence aware neural pornographic text identification for child protection
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/6616
https://ink.library.smu.edu.sg/context/sis_research/article/7619/viewcontent/17753_Article_Text_21247_1_2_20210518.pdf
_version_ 1770576010591338496