Single Concatenated Input is Better than Indenpendent Multiple-input for CNNs to Predict Chemical-induced Disease Relation from Literature

Chemical compounds (drugs) and diseases are among top searched keywords on the PubMed database of biomedical literature by biomedical researchers all over the world (according to a study in 2009). Working with PubMed is essential for researchers to get insights into drugs’ side effects (chemical-ind...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: Pham, Thi Quynh Trang, Bui, Manh Thang, Dang, Thanh Hai
التنسيق: مقال
اللغة:English
منشور في: H. : ĐHQGHN 2020
الموضوعات:
الوصول للمادة أونلاين:http://repository.vnu.edu.vn/handle/VNU_123/89096
https://doi.org/10.25073/2588-1086/vnucsce.237
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:Chemical compounds (drugs) and diseases are among top searched keywords on the PubMed database of biomedical literature by biomedical researchers all over the world (according to a study in 2009). Working with PubMed is essential for researchers to get insights into drugs’ side effects (chemical-induced disease relations (CDR), which is essential for drug safety and toxicity. It is, however, a catastrophic burden for them as PubMed is a huge database of unstructured texts, growing steadily very fast (~28 millions scientific articles currently, approximately two deposited per minute). As a result, biomedical text mining has been empirically demonstrated its great implications in biomedical research communities. Biomedical text has its own distinct challenging properties, attracting much attetion from natural language processing communities. A large-scale study recently in 2018 showed that incorporating information into indenpendent multiple-input layers outperforms concatenating them into a single input layer (for biLSTM), producing better performance when compared to state-of-the-art CDR classifying models. This paper demonstrates that for a CNN it is vice-versa, in which concatenation is better for CDR classification. To this end, we develop a CNN based model with multiple input concatenated for CDR classification. Experimental results on the benchmark dataset demonstrate its outperformance over other recent state-of-the-art CDR classification models.