Automatic occupation identification of Twitter users using graph neural network

With the advent of the digital age, the Internet has gradually become an integral part of people's lives. The emergence and development of the Internet have significantly improved people's lives, and information can be spread to all parts of the world in a very short period of time through...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Jiaheng
Other Authors: Na Jin Cheon
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/166247
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-166247
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Social sciences::Communication
spellingShingle Social sciences::Communication
Li, Jiaheng
Automatic occupation identification of Twitter users using graph neural network
description With the advent of the digital age, the Internet has gradually become an integral part of people's lives. The emergence and development of the Internet have significantly improved people's lives, and information can be spread to all parts of the world in a very short period of time through the Internet. So, people can communicate with each other across distances and sit at home and understand what's happening worldwide. As one of the most popular online social platforms in the world, Twitter has brought great convenience to the Internet for people to transmit information. People can tweet about what's happening around them, find out what's happening in other parts of the world, find the articles and information they want, and more. Thus, Twitter also facilitates the democratization of scholarly articles. More and more users are now publishing academic articles on Twitter because they want to share their research results, want to have discussions on Twitter with practitioners in related fields covered by others, and want to share cutting-edge research results. Both researchers and the public have seen Twitter as a search platform for scholarly articles. Slowly, more and more researchers have discovered that they can use Twitter to investigate and analyze what academic fields are more interesting to the public. In this article, we build a classifier that helps researchers in such analytical studies. My purpose is to automatically classify the user types of Twitter accounts that post academic articles on Twitter so that future researchers can use the data provided by this project to analyze the audience of their research articles. This will help to understand better the needs of the public for scientific information and can shed light on future research directions. I used an existing dataset that was collected from Almetrics.com. These data include specific account information of Twitter users who posted academic papers, including account names, personal descriptions, tweets, collections, forwarding, etc. These datasets include eight different user occupations, Academic publishers, Academic researchers & institutions, Health science professionals & institutions, Mass Media, Non-academic researchers & institutions, Research feeds, Topic feeds & news alerts and Others. Next, the collected data have a graph for each Twitter account. The edges in the graph represented the relationship between other accounts, and the vertex represented a Twitter account. Then, a variety of graph neural network algorithms was applied to establish a classifier to automatically classify the accounts that publish academic articles. I used some original graph neural network algorithms, including GATv1, GATv2, GraphSage, GIN, TransformerConv, and Bert&GAT, and created some novel algorithms including Bert&GraphSAGE, Bert&TransformerConv, and TransformerConv&Linear. Afterward, I built nine different classifiers using these algorithms. After that, these several classifiers are compared and adjusted to find the classifier with the best performance. Finally, a novel graph neural network classifier with excellent performance is built. In this article, after several experimental comparisons, the results show that the BERT&TransformerConv algorithm has the best performance among the nine algorithms. Finally, the test accuracy of the classifier built by this algorithm reached 86%. But the study also had limitations. The amount of experimental data I used is not particularly large, and the IP addresses of all Twitter accounts are local to Singapore. Therefore, the accuracy of this classifier in other regions remains to be tested.
author2 Na Jin Cheon
author_facet Na Jin Cheon
Li, Jiaheng
format Thesis-Master by Coursework
author Li, Jiaheng
author_sort Li, Jiaheng
title Automatic occupation identification of Twitter users using graph neural network
title_short Automatic occupation identification of Twitter users using graph neural network
title_full Automatic occupation identification of Twitter users using graph neural network
title_fullStr Automatic occupation identification of Twitter users using graph neural network
title_full_unstemmed Automatic occupation identification of Twitter users using graph neural network
title_sort automatic occupation identification of twitter users using graph neural network
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/166247
_version_ 1764208022387687424
spelling sg-ntu-dr.10356-1662472023-04-23T15:40:28Z Automatic occupation identification of Twitter users using graph neural network Li, Jiaheng Na Jin Cheon Wee Kim Wee School of Communication and Information TJCNa@ntu.edu.sg Social sciences::Communication With the advent of the digital age, the Internet has gradually become an integral part of people's lives. The emergence and development of the Internet have significantly improved people's lives, and information can be spread to all parts of the world in a very short period of time through the Internet. So, people can communicate with each other across distances and sit at home and understand what's happening worldwide. As one of the most popular online social platforms in the world, Twitter has brought great convenience to the Internet for people to transmit information. People can tweet about what's happening around them, find out what's happening in other parts of the world, find the articles and information they want, and more. Thus, Twitter also facilitates the democratization of scholarly articles. More and more users are now publishing academic articles on Twitter because they want to share their research results, want to have discussions on Twitter with practitioners in related fields covered by others, and want to share cutting-edge research results. Both researchers and the public have seen Twitter as a search platform for scholarly articles. Slowly, more and more researchers have discovered that they can use Twitter to investigate and analyze what academic fields are more interesting to the public. In this article, we build a classifier that helps researchers in such analytical studies. My purpose is to automatically classify the user types of Twitter accounts that post academic articles on Twitter so that future researchers can use the data provided by this project to analyze the audience of their research articles. This will help to understand better the needs of the public for scientific information and can shed light on future research directions. I used an existing dataset that was collected from Almetrics.com. These data include specific account information of Twitter users who posted academic papers, including account names, personal descriptions, tweets, collections, forwarding, etc. These datasets include eight different user occupations, Academic publishers, Academic researchers & institutions, Health science professionals & institutions, Mass Media, Non-academic researchers & institutions, Research feeds, Topic feeds & news alerts and Others. Next, the collected data have a graph for each Twitter account. The edges in the graph represented the relationship between other accounts, and the vertex represented a Twitter account. Then, a variety of graph neural network algorithms was applied to establish a classifier to automatically classify the accounts that publish academic articles. I used some original graph neural network algorithms, including GATv1, GATv2, GraphSage, GIN, TransformerConv, and Bert&GAT, and created some novel algorithms including Bert&GraphSAGE, Bert&TransformerConv, and TransformerConv&Linear. Afterward, I built nine different classifiers using these algorithms. After that, these several classifiers are compared and adjusted to find the classifier with the best performance. Finally, a novel graph neural network classifier with excellent performance is built. In this article, after several experimental comparisons, the results show that the BERT&TransformerConv algorithm has the best performance among the nine algorithms. Finally, the test accuracy of the classifier built by this algorithm reached 86%. But the study also had limitations. The amount of experimental data I used is not particularly large, and the IP addresses of all Twitter accounts are local to Singapore. Therefore, the accuracy of this classifier in other regions remains to be tested. Master of Science (Information Systems) 2023-04-19T00:20:20Z 2023-04-19T00:20:20Z 2023 Thesis-Master by Coursework Li, J. (2023). Automatic occupation identification of Twitter users using graph neural network. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166247 https://hdl.handle.net/10356/166247 en application/pdf Nanyang Technological University