A literature review framework for multi-document summarization of research papers

This study is in the area of multi-document summarization of research papers. It addresses the gap identified between the structure and readability of human-written summaries and other automatic multi-document summaries, which only focus on selecting the more important information from the set of do...

Full description

Saved in:
Bibliographic Details
Main Author: Kokil Jaidka
Other Authors: Jin Cheon Na
Format: Theses and Dissertations
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/61137
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This study is in the area of multi-document summarization of research papers. It addresses the gap identified between the structure and readability of human-written summaries and other automatic multi-document summaries, which only focus on selecting the more important information from the set of documents but neglect to consider its readability. In the context of this overall goal, the first part of this study develops a literature review framework which specifies the structural, rhetorical and content characteristics of human-written literature reviews. In the second part of the study, an automatic method is developed which partially implements this framework, to generate multi-document summaries of research papers emulating some characteristic of human-written literature reviews. The framework is based on extensive discourse and information analyses of literature reviews in the domain of information science. The corpus for analysis comprised 120 literature review sections published as a part of research papers in international peer-reviewed top information science journals – Journal of the American Society for Information Science and Technology (JASIST), Journal of Information Science (JIS) and Journal of Documentation (JDoc) over the years 2000-2008. The macro-level analysis identifies the document structure within a literature review, which comprises 9 types of discourse elements. The sentence-level analysis identifies 22 rhetorical functions employed in literature reviews and 153 linguistic devices which frame information within sentences. The information analysis identifies significant associations between the source sections of selected sentences and the transformations performed on them. Results show that literature reviews are written in two main styles – integrative literature reviews and descriptive literature reviews. Integrative literature reviews present information from several studies in a condensed form as a critical summary, possibly complemented with a comparison, evaluation or comment on the research gap. They focus on highlighting relationships amongst concepts or comparing studies against each other. Descriptive reviews present more experimental detail about previous studies, such as their approach, results and evaluation. These findings are incorporated into the multi-level literature review framework, comprising their macro-level structure and their rhetorical functions, as well as the information summarization strategies. Based on this framework, in the second part of the study a multi-document summarization method emulating characteristics of human literature reviews is developed to generate an integrative summary that combines information across the papers and highlights the agreements and disagreements among them. It extracts information concepts from research papers by imitating researchers’ preferences, integrates them across the set of related papers and organizes them as a topic tree; finally, it presents them using sentence templates which realize rhetorical functions. The method which is presented here only focuses on summarizing and comparing the research objectives information across papers, and hence it applies only those components of the framework which are appropriate to choose and synthesize research objective information. Automatic content evaluation shows no significant difference between the summaries generated by the automatic method, and the baseline sentence extraction system, MEAD. However, the quality characteristics of the automatic summaries are a significant improvement over MEAD summaries because about two-thirds of all assessors (35 PhD students and professors in Library and Information Science) preferred to use them over MEAD summaries; they are also perceived as significantly more useful for obtaining a research overview or seeing comparisons across studies. The automatic summaries are also considered more readable in the way they relate topics and sentence to each other. However, they still have grammatical errors and repetitions; to resolve those, it is recommended to improve include some post-processing steps in the automatic method. Assessors with different levels of research experience are found to hold different expectations from the final summary – the ones with less experience look for more details about individual studies; it can be inferred that they prefer a more descriptive literature review. More experienced assessors want to understand the bigger picture and the main themes of the research; evidently, they want a more integrative literature review. These insights can help in customizing the automatic method for its users.