Single Concatenated Input is Better than Indenpendent Multiple-input for CNNs to Predict Chemical-induced Disease Relation from Literature

Chemical compounds (drugs) and diseases are among top searched keywords on the PubMed database of biomedical literature by biomedical researchers all over the world (according to a study in 2009). Working with PubMed is essential for researchers to get insights into drugs’ side effects (chemical-ind...

全面介紹

Saved in:
書目詳細資料
Main Authors: Pham, Thi Quynh Trang, Bui, Manh Thang, Dang, Thanh Hai
格式: Article
語言:English
出版: H. : ĐHQGHN 2020
主題:
在線閱讀:http://repository.vnu.edu.vn/handle/VNU_123/89096
https://doi.org/10.25073/2588-1086/vnucsce.237
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Chemical compounds (drugs) and diseases are among top searched keywords on the PubMed database of biomedical literature by biomedical researchers all over the world (according to a study in 2009). Working with PubMed is essential for researchers to get insights into drugs’ side effects (chemical-induced disease relations (CDR), which is essential for drug safety and toxicity. It is, however, a catastrophic burden for them as PubMed is a huge database of unstructured texts, growing steadily very fast (~28 millions scientific articles currently, approximately two deposited per minute). As a result, biomedical text mining has been empirically demonstrated its great implications in biomedical research communities. Biomedical text has its own distinct challenging properties, attracting much attetion from natural language processing communities. A large-scale study recently in 2018 showed that incorporating information into indenpendent multiple-input layers outperforms concatenating them into a single input layer (for biLSTM), producing better performance when compared to state-of-the-art CDR classifying models. This paper demonstrates that for a CNN it is vice-versa, in which concatenation is better for CDR classification. To this end, we develop a CNN based model with multiple input concatenated for CDR classification. Experimental results on the benchmark dataset demonstrate its outperformance over other recent state-of-the-art CDR classification models.