Automatic occupation identification of Twitter users using graph neural network

With the advent of the digital age, the Internet has gradually become an integral part of people's lives. The emergence and development of the Internet have significantly improved people's lives, and information can be spread to all parts of the world in a very short period of time through...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Jiaheng
Other Authors: Na Jin Cheon
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/166247
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With the advent of the digital age, the Internet has gradually become an integral part of people's lives. The emergence and development of the Internet have significantly improved people's lives, and information can be spread to all parts of the world in a very short period of time through the Internet. So, people can communicate with each other across distances and sit at home and understand what's happening worldwide. As one of the most popular online social platforms in the world, Twitter has brought great convenience to the Internet for people to transmit information. People can tweet about what's happening around them, find out what's happening in other parts of the world, find the articles and information they want, and more. Thus, Twitter also facilitates the democratization of scholarly articles. More and more users are now publishing academic articles on Twitter because they want to share their research results, want to have discussions on Twitter with practitioners in related fields covered by others, and want to share cutting-edge research results. Both researchers and the public have seen Twitter as a search platform for scholarly articles. Slowly, more and more researchers have discovered that they can use Twitter to investigate and analyze what academic fields are more interesting to the public. In this article, we build a classifier that helps researchers in such analytical studies. My purpose is to automatically classify the user types of Twitter accounts that post academic articles on Twitter so that future researchers can use the data provided by this project to analyze the audience of their research articles. This will help to understand better the needs of the public for scientific information and can shed light on future research directions. I used an existing dataset that was collected from Almetrics.com. These data include specific account information of Twitter users who posted academic papers, including account names, personal descriptions, tweets, collections, forwarding, etc. These datasets include eight different user occupations, Academic publishers, Academic researchers & institutions, Health science professionals & institutions, Mass Media, Non-academic researchers & institutions, Research feeds, Topic feeds & news alerts and Others. Next, the collected data have a graph for each Twitter account. The edges in the graph represented the relationship between other accounts, and the vertex represented a Twitter account. Then, a variety of graph neural network algorithms was applied to establish a classifier to automatically classify the accounts that publish academic articles. I used some original graph neural network algorithms, including GATv1, GATv2, GraphSage, GIN, TransformerConv, and Bert&GAT, and created some novel algorithms including Bert&GraphSAGE, Bert&TransformerConv, and TransformerConv&Linear. Afterward, I built nine different classifiers using these algorithms. After that, these several classifiers are compared and adjusted to find the classifier with the best performance. Finally, a novel graph neural network classifier with excellent performance is built. In this article, after several experimental comparisons, the results show that the BERT&TransformerConv algorithm has the best performance among the nine algorithms. Finally, the test accuracy of the classifier built by this algorithm reached 86%. But the study also had limitations. The amount of experimental data I used is not particularly large, and the IP addresses of all Twitter accounts are local to Singapore. Therefore, the accuracy of this classifier in other regions remains to be tested.