Evidence aware neural pornographic text identification for child protection
Identifying pornographic text online is practically useful to protect children from access to such adult content. However, some authors may intentionally avoid using sensitive words in their pornographic texts to take advantage of the lack of human audits. Without prior knowledge guidance, real sema...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2021
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6616 https://ink.library.smu.edu.sg/context/sis_research/article/7619/viewcontent/17753_Article_Text_21247_1_2_20210518.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7619 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-76192022-01-14T03:46:58Z Evidence aware neural pornographic text identification for child protection SONG, Kaisong KANG, Yangyang GAO, Wei GAO, Zhe SUN, Changlong LIU, Xiaozhong Identifying pornographic text online is practically useful to protect children from access to such adult content. However, some authors may intentionally avoid using sensitive words in their pornographic texts to take advantage of the lack of human audits. Without prior knowledge guidance, real semantics of such pornographic text is difficult to understand by existing methods due to its high context-sensitivity and heavy usage of figurative language, which brings huge challenges to the porn detection systems used in social media platforms. In this paper, we approach to the problem as a document-level porn identification task by locating and integrating sentence-level evidence and propose a novel Evidence-Aware Neural Porn Classification (eNPC) model. Specifically, we first propose a basic model which locates porn indicative sentences in the document with a multiple instance learning model, and then aggregate the sentence-level evidence to induce document label with self-attention mechanism. Moreover, we consider label dependencies within local context. Finally, we further enhance the sentence representation with prior knowledge produced by an automatic porn lexicon construction strategy. Extensive experimental results show that our model exhibits consistent superiority over competitors on two real-world Chinese novel datasets and an English story dataset. 2021-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6616 https://ink.library.smu.edu.sg/context/sis_research/article/7619/viewcontent/17753_Article_Text_21247_1_2_20210518.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems Numerical Analysis and Scientific Computing |
spellingShingle |
Databases and Information Systems Numerical Analysis and Scientific Computing SONG, Kaisong KANG, Yangyang GAO, Wei GAO, Zhe SUN, Changlong LIU, Xiaozhong Evidence aware neural pornographic text identification for child protection |
description |
Identifying pornographic text online is practically useful to protect children from access to such adult content. However, some authors may intentionally avoid using sensitive words in their pornographic texts to take advantage of the lack of human audits. Without prior knowledge guidance, real semantics of such pornographic text is difficult to understand by existing methods due to its high context-sensitivity and heavy usage of figurative language, which brings huge challenges to the porn detection systems used in social media platforms. In this paper, we approach to the problem as a document-level porn identification task by locating and integrating sentence-level evidence and propose a novel Evidence-Aware Neural Porn Classification (eNPC) model. Specifically, we first propose a basic model which locates porn indicative sentences in the document with a multiple instance learning model, and then aggregate the sentence-level evidence to induce document label with self-attention mechanism. Moreover, we consider label dependencies within local context. Finally, we further enhance the sentence representation with prior knowledge produced by an automatic porn lexicon construction strategy. Extensive experimental results show that our model exhibits consistent superiority over competitors on two real-world Chinese novel datasets and an English story dataset. |
format |
text |
author |
SONG, Kaisong KANG, Yangyang GAO, Wei GAO, Zhe SUN, Changlong LIU, Xiaozhong |
author_facet |
SONG, Kaisong KANG, Yangyang GAO, Wei GAO, Zhe SUN, Changlong LIU, Xiaozhong |
author_sort |
SONG, Kaisong |
title |
Evidence aware neural pornographic text identification for child protection |
title_short |
Evidence aware neural pornographic text identification for child protection |
title_full |
Evidence aware neural pornographic text identification for child protection |
title_fullStr |
Evidence aware neural pornographic text identification for child protection |
title_full_unstemmed |
Evidence aware neural pornographic text identification for child protection |
title_sort |
evidence aware neural pornographic text identification for child protection |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2021 |
url |
https://ink.library.smu.edu.sg/sis_research/6616 https://ink.library.smu.edu.sg/context/sis_research/article/7619/viewcontent/17753_Article_Text_21247_1_2_20210518.pdf |
_version_ |
1770576010591338496 |