Disentangling transformer language models as superposed topic models

Topic Modelling is an established research area where the quality of a given topic is measured using coherence metrics. Often, we infer topics from Neural Topic Models (NTM) by interpreting their decoder weights, consisting of top-activated words projected from individual neurons. Transformer-based...

Full description

Saved in:
Bibliographic Details
Main Authors: LIM, Jia Peng, LAUW, Hady Wirawan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8470
https://ink.library.smu.edu.sg/context/sis_research/article/9473/viewcontent/2023.emnlp_main.534__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9473
record_format dspace
spelling sg-smu-ink.sis_research-94732024-01-04T09:35:50Z Disentangling transformer language models as superposed topic models LIM, Jia Peng LAUW, Hady Wirawan Topic Modelling is an established research area where the quality of a given topic is measured using coherence metrics. Often, we infer topics from Neural Topic Models (NTM) by interpreting their decoder weights, consisting of top-activated words projected from individual neurons. Transformer-based Language Models (TLM) similarly consist of decoder weights. However, due to its hypothesised superposition properties, the final logits originating from the residual path are considered uninterpretable. Therefore, we posit that we can interpret TLM as superposed NTM by proposing a novel weight-based, model-agnostic and corpus-agnostic approach to search and disentangle decoder-only TLM, potentially mapping individual neurons to multiple coherent topics. Our results show that it is empirically feasible to disentangle coherent topics from GPT-2 models using the Wikipedia corpus. We validate this approach for GPT-2 models using Zero-Shot Topic Modelling. Finally, we extend the proposed approach to disentangle and analyse LLaMA models. 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8470 https://ink.library.smu.edu.sg/context/sis_research/article/9473/viewcontent/2023.emnlp_main.534__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Programming Languages and Compilers
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Programming Languages and Compilers
spellingShingle Databases and Information Systems
Programming Languages and Compilers
LIM, Jia Peng
LAUW, Hady Wirawan
Disentangling transformer language models as superposed topic models
description Topic Modelling is an established research area where the quality of a given topic is measured using coherence metrics. Often, we infer topics from Neural Topic Models (NTM) by interpreting their decoder weights, consisting of top-activated words projected from individual neurons. Transformer-based Language Models (TLM) similarly consist of decoder weights. However, due to its hypothesised superposition properties, the final logits originating from the residual path are considered uninterpretable. Therefore, we posit that we can interpret TLM as superposed NTM by proposing a novel weight-based, model-agnostic and corpus-agnostic approach to search and disentangle decoder-only TLM, potentially mapping individual neurons to multiple coherent topics. Our results show that it is empirically feasible to disentangle coherent topics from GPT-2 models using the Wikipedia corpus. We validate this approach for GPT-2 models using Zero-Shot Topic Modelling. Finally, we extend the proposed approach to disentangle and analyse LLaMA models.
format text
author LIM, Jia Peng
LAUW, Hady Wirawan
author_facet LIM, Jia Peng
LAUW, Hady Wirawan
author_sort LIM, Jia Peng
title Disentangling transformer language models as superposed topic models
title_short Disentangling transformer language models as superposed topic models
title_full Disentangling transformer language models as superposed topic models
title_fullStr Disentangling transformer language models as superposed topic models
title_full_unstemmed Disentangling transformer language models as superposed topic models
title_sort disentangling transformer language models as superposed topic models
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8470
https://ink.library.smu.edu.sg/context/sis_research/article/9473/viewcontent/2023.emnlp_main.534__1_.pdf
_version_ 1787590775475798016