Author Identification of English Tweets for Social Media Forensics
Authorship Identification (AI) is the process of determining the most likely author of a given text by analysing writing style characteristics and linguistic patterns. Identifying the author of online social network (OSN) text becomes a pressing issue nowadays as the increase of cyberbully cases amo...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
Universiti Malaysia Sarawak
2023
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/43079/7/Nursyahirah%20Tarmizi_dsva.pdf http://ir.unimas.my/id/eprint/43079/8/Thesis%20Master_Nursyahirah%20Binti%20Tarmizi%20-%2024%20pages.pdf http://ir.unimas.my/id/eprint/43079/11/Nursyahirah%20ft.pdf http://ir.unimas.my/id/eprint/43079/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaysia Sarawak |
Language: | English English English |
id |
my.unimas.ir.43079 |
---|---|
record_format |
eprints |
spelling |
my.unimas.ir.430792024-02-20T04:57:23Z http://ir.unimas.my/id/eprint/43079/ Author Identification of English Tweets for Social Media Forensics Nursyahirah, Tarmizi QA75 Electronic computers. Computer science Authorship Identification (AI) is the process of determining the most likely author of a given text by analysing writing style characteristics and linguistic patterns. Identifying the author of online social network (OSN) text becomes a pressing issue nowadays as the increase of cyberbully cases among the social media users. AI plays vital role in social media forensics (SMF) to unveil the true identity of the cyberbullying perpetrator from the OSN text. However, OSN text has been an open problem in AI as the limited length of the text and the usage of Internet jargon affecting the performance of AI system. In this research, AI task is conducted to facilitate the SMF activity by analysing the writing style of tweets from Twitter in identifying most plausible author for anonymized tweet. The writing style of the author or known as the stylometric features including character n-grams, word n-grams and Part-of-Speech (POS) n-grams are extracted from the text. These features are used widely in identifying the author of short text as they are language independent and tolerant of grammatical errors. The features are represented using different text representation models namely TF-IDF and Embedding model. The models are examined to compare which one could best represent the OSN text. For classification, machine learning and deep learning are used to evaluate the classification model by maintaining the optimum performance of AI system. The findings shown that Twitter native features are very useful in boosting the performance of AI system. Embedding-based model achieved better performance in representing n-grams with fix and distributed representation. The best result was achieved when CNN mix with embedding-based model with accuracy of 95.02% for English and 94% for KadazanDusun and both 95 % precision for both languages. Universiti Malaysia Sarawak 2023-08-24 Thesis NonPeerReviewed text en http://ir.unimas.my/id/eprint/43079/7/Nursyahirah%20Tarmizi_dsva.pdf text en http://ir.unimas.my/id/eprint/43079/8/Thesis%20Master_Nursyahirah%20Binti%20Tarmizi%20-%2024%20pages.pdf text en http://ir.unimas.my/id/eprint/43079/11/Nursyahirah%20ft.pdf Nursyahirah, Tarmizi (2023) Author Identification of English Tweets for Social Media Forensics. Masters thesis, Universiti Malaysia Sarawak. |
institution |
Universiti Malaysia Sarawak |
building |
Centre for Academic Information Services (CAIS) |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sarawak |
content_source |
UNIMAS Institutional Repository |
url_provider |
http://ir.unimas.my/ |
language |
English English English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Nursyahirah, Tarmizi Author Identification of English Tweets for Social Media Forensics |
description |
Authorship Identification (AI) is the process of determining the most likely author of a given text by analysing writing style characteristics and linguistic patterns. Identifying the author of online social network (OSN) text becomes a pressing issue nowadays as the increase of cyberbully cases among the social media users. AI plays vital role in social media forensics (SMF) to unveil the true identity of the cyberbullying perpetrator from the OSN text. However, OSN text has been an open problem in AI as the limited length of the text and the usage of Internet jargon affecting the performance of AI system. In this research, AI task is conducted to facilitate the SMF activity by analysing the writing style of tweets from Twitter in identifying most plausible author for anonymized tweet. The writing style of the author or known as the stylometric features including character n-grams, word n-grams and Part-of-Speech (POS) n-grams are extracted from the text. These features are used widely in identifying the author of short text as they are language independent and tolerant of grammatical errors. The features are represented using different text representation models namely TF-IDF and Embedding model. The models are examined to compare which one could best represent the OSN text. For classification, machine learning and deep learning are used to evaluate the classification model by maintaining the optimum performance of AI system. The findings shown that Twitter native features are very useful in boosting the performance of AI system. Embedding-based model achieved better performance in representing n-grams with fix and distributed representation. The best result was achieved when CNN mix with embedding-based model with accuracy of 95.02% for English and 94% for KadazanDusun and both 95 % precision for both languages. |
format |
Thesis |
author |
Nursyahirah, Tarmizi |
author_facet |
Nursyahirah, Tarmizi |
author_sort |
Nursyahirah, Tarmizi |
title |
Author Identification of English Tweets for Social Media Forensics |
title_short |
Author Identification of English Tweets for Social Media Forensics |
title_full |
Author Identification of English Tweets for Social Media Forensics |
title_fullStr |
Author Identification of English Tweets for Social Media Forensics |
title_full_unstemmed |
Author Identification of English Tweets for Social Media Forensics |
title_sort |
author identification of english tweets for social media forensics |
publisher |
Universiti Malaysia Sarawak |
publishDate |
2023 |
url |
http://ir.unimas.my/id/eprint/43079/7/Nursyahirah%20Tarmizi_dsva.pdf http://ir.unimas.my/id/eprint/43079/8/Thesis%20Master_Nursyahirah%20Binti%20Tarmizi%20-%2024%20pages.pdf http://ir.unimas.my/id/eprint/43079/11/Nursyahirah%20ft.pdf http://ir.unimas.my/id/eprint/43079/ |
_version_ |
1792160709301239808 |