Automatically Generating Gene Summaries from Biomedical Literature

Biologists often need to find information about genes whose function is not described in the genome databases. Currently they must try to search disparate biomedical literature to locate relevant articles, and spend considerable efforts reading the retrieved articles in order to locate the most rele...

Full description

Saved in:
Bibliographic Details
Main Authors: LINg, Xu, JIANG, Jing, HE, Xin, MEI, Qiaozhu, ZHAI, ChengXiang, Schatz, Bruce
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2006
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1256
http://dx.doi.org/10.1142/9789812701626_0005
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2255
record_format dspace
spelling sg-smu-ink.sis_research-22552018-07-13T02:58:01Z Automatically Generating Gene Summaries from Biomedical Literature LINg, Xu JIANG, Jing HE, Xin MEI, Qiaozhu ZHAI, ChengXiang Schatz, Bruce Biologists often need to find information about genes whose function is not described in the genome databases. Currently they must try to search disparate biomedical literature to locate relevant articles, and spend considerable efforts reading the retrieved articles in order to locate the most relevant knowledge about the gene. We describe our software, the first that automatically generates gene summaries from biomedical literature. We present a two-stage summarization method, which involves first retrieving relevant articles and then extracting the most informative sentences from the retrieved articles to generate a structured gene summary. The generated summary explicitly covers multiple aspects of a gene, such as the sequence information, mutant phenotypes, and molecular interaction with other genes. We propose several heuristic approaches to improve the accuracy in both stages. The proposed methods are evaluated using 10 randomly chosen genes from FlyBase and a subset of Medline abstracts about Drosophila. The results show that the precision of the top selected sentences in the 6 aspects is typically about 50-70%, and the generated summaries are quite informative, indicating that our approaches are effective in automatically summarizing literature information about genes. The generated summaries not only are directly useful to biologists but also serve as useful entry points to enable them to quickly digest the retrieved literature articles. 2006-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1256 info:doi/10.1142/9789812701626_0005 http://dx.doi.org/10.1142/9789812701626_0005 http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
LINg, Xu
JIANG, Jing
HE, Xin
MEI, Qiaozhu
ZHAI, ChengXiang
Schatz, Bruce
Automatically Generating Gene Summaries from Biomedical Literature
description Biologists often need to find information about genes whose function is not described in the genome databases. Currently they must try to search disparate biomedical literature to locate relevant articles, and spend considerable efforts reading the retrieved articles in order to locate the most relevant knowledge about the gene. We describe our software, the first that automatically generates gene summaries from biomedical literature. We present a two-stage summarization method, which involves first retrieving relevant articles and then extracting the most informative sentences from the retrieved articles to generate a structured gene summary. The generated summary explicitly covers multiple aspects of a gene, such as the sequence information, mutant phenotypes, and molecular interaction with other genes. We propose several heuristic approaches to improve the accuracy in both stages. The proposed methods are evaluated using 10 randomly chosen genes from FlyBase and a subset of Medline abstracts about Drosophila. The results show that the precision of the top selected sentences in the 6 aspects is typically about 50-70%, and the generated summaries are quite informative, indicating that our approaches are effective in automatically summarizing literature information about genes. The generated summaries not only are directly useful to biologists but also serve as useful entry points to enable them to quickly digest the retrieved literature articles.
format text
author LINg, Xu
JIANG, Jing
HE, Xin
MEI, Qiaozhu
ZHAI, ChengXiang
Schatz, Bruce
author_facet LINg, Xu
JIANG, Jing
HE, Xin
MEI, Qiaozhu
ZHAI, ChengXiang
Schatz, Bruce
author_sort LINg, Xu
title Automatically Generating Gene Summaries from Biomedical Literature
title_short Automatically Generating Gene Summaries from Biomedical Literature
title_full Automatically Generating Gene Summaries from Biomedical Literature
title_fullStr Automatically Generating Gene Summaries from Biomedical Literature
title_full_unstemmed Automatically Generating Gene Summaries from Biomedical Literature
title_sort automatically generating gene summaries from biomedical literature
publisher Institutional Knowledge at Singapore Management University
publishDate 2006
url https://ink.library.smu.edu.sg/sis_research/1256
http://dx.doi.org/10.1142/9789812701626_0005
_version_ 1770570910871322624