Event detection for biomedical text
In the last decade, text mining in biomedical domain has received significant attention in research and many studies have been devoted to advancing the state-of-the-art natural language processing (NLP) techniques to biomedical text. Event detection is the primary step in the event extraction task...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156520 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In the last decade, text mining in biomedical domain has received significant attention in research and
many studies have been devoted to advancing the state-of-the-art natural language processing (NLP)
techniques to biomedical text. Event detection is the primary step in the event extraction task, whose
objective is to detect events via trigger mentions that signify the occurrence of events with a particular
type. Essentially, event detection requires the construction of a semantic relationship between input
text representations and the set of predefined event type labels. However, existing methods tend to
pay most attention to learn the input text representations and simply use one-hot vectors for event type
labels, which overlooks the importance of understanding the type label meaning. In this research, we
propose a novel Label-Pivoting Biomedical Event Detection model (LPBED) which is pretrained with
PubMedBERT language model and exploits the semantic meaning of the type label set. More
specifically, our proposed model makes use of the underlying semantic meaning of type labels to
pivot event types as clues for detecting trigger candidates. Our model gains significant benefits from
the pretrained PubMedBERT model for the domain-specific knowledge of the widely-used
biomedical data sources. We conduct experiments based on the benchmark GENIA Event 2011
(GE11) dataset. Without using any external knowledge bases and syntactic tools, the experimental
results show that our model is robust in performance under the scenarios of limited data availability.
In addition, our proposed LPBED model also outperforms the baseline BERT-CRF model used for the
MAVEN dataset in general domain. It demonstrates that our proposed model achieves competitive
performance for event detection in biomedical text, which provides the potential for further
investigation on the event extraction task. |
---|