A literature review framework for multi-document summarization of research papers

This study is in the area of multi-document summarization of research papers. It addresses the gap identified between the structure and readability of human-written summaries and other automatic multi-document summaries, which only focus on selecting the more important information from the set of do...

Full description

Saved in:
Bibliographic Details
Main Author: Kokil Jaidka
Other Authors: Jin Cheon Na
Format: Theses and Dissertations
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/61137
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-61137
record_format dspace
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Library and information science::General
DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
DRNTU::Humanities::Linguistics
DRNTU::Humanities::Linguistics::Sociolinguistics::Computational linguistics
spellingShingle DRNTU::Library and information science::General
DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
DRNTU::Humanities::Linguistics
DRNTU::Humanities::Linguistics::Sociolinguistics::Computational linguistics
Kokil Jaidka
A literature review framework for multi-document summarization of research papers
description This study is in the area of multi-document summarization of research papers. It addresses the gap identified between the structure and readability of human-written summaries and other automatic multi-document summaries, which only focus on selecting the more important information from the set of documents but neglect to consider its readability. In the context of this overall goal, the first part of this study develops a literature review framework which specifies the structural, rhetorical and content characteristics of human-written literature reviews. In the second part of the study, an automatic method is developed which partially implements this framework, to generate multi-document summaries of research papers emulating some characteristic of human-written literature reviews. The framework is based on extensive discourse and information analyses of literature reviews in the domain of information science. The corpus for analysis comprised 120 literature review sections published as a part of research papers in international peer-reviewed top information science journals – Journal of the American Society for Information Science and Technology (JASIST), Journal of Information Science (JIS) and Journal of Documentation (JDoc) over the years 2000-2008. The macro-level analysis identifies the document structure within a literature review, which comprises 9 types of discourse elements. The sentence-level analysis identifies 22 rhetorical functions employed in literature reviews and 153 linguistic devices which frame information within sentences. The information analysis identifies significant associations between the source sections of selected sentences and the transformations performed on them. Results show that literature reviews are written in two main styles – integrative literature reviews and descriptive literature reviews. Integrative literature reviews present information from several studies in a condensed form as a critical summary, possibly complemented with a comparison, evaluation or comment on the research gap. They focus on highlighting relationships amongst concepts or comparing studies against each other. Descriptive reviews present more experimental detail about previous studies, such as their approach, results and evaluation. These findings are incorporated into the multi-level literature review framework, comprising their macro-level structure and their rhetorical functions, as well as the information summarization strategies. Based on this framework, in the second part of the study a multi-document summarization method emulating characteristics of human literature reviews is developed to generate an integrative summary that combines information across the papers and highlights the agreements and disagreements among them. It extracts information concepts from research papers by imitating researchers’ preferences, integrates them across the set of related papers and organizes them as a topic tree; finally, it presents them using sentence templates which realize rhetorical functions. The method which is presented here only focuses on summarizing and comparing the research objectives information across papers, and hence it applies only those components of the framework which are appropriate to choose and synthesize research objective information. Automatic content evaluation shows no significant difference between the summaries generated by the automatic method, and the baseline sentence extraction system, MEAD. However, the quality characteristics of the automatic summaries are a significant improvement over MEAD summaries because about two-thirds of all assessors (35 PhD students and professors in Library and Information Science) preferred to use them over MEAD summaries; they are also perceived as significantly more useful for obtaining a research overview or seeing comparisons across studies. The automatic summaries are also considered more readable in the way they relate topics and sentence to each other. However, they still have grammatical errors and repetitions; to resolve those, it is recommended to improve include some post-processing steps in the automatic method. Assessors with different levels of research experience are found to hold different expectations from the final summary – the ones with less experience look for more details about individual studies; it can be inferred that they prefer a more descriptive literature review. More experienced assessors want to understand the bigger picture and the main themes of the research; evidently, they want a more integrative literature review. These insights can help in customizing the automatic method for its users.
author2 Jin Cheon Na
author_facet Jin Cheon Na
Kokil Jaidka
format Theses and Dissertations
author Kokil Jaidka
author_sort Kokil Jaidka
title A literature review framework for multi-document summarization of research papers
title_short A literature review framework for multi-document summarization of research papers
title_full A literature review framework for multi-document summarization of research papers
title_fullStr A literature review framework for multi-document summarization of research papers
title_full_unstemmed A literature review framework for multi-document summarization of research papers
title_sort literature review framework for multi-document summarization of research papers
publishDate 2014
url http://hdl.handle.net/10356/61137
_version_ 1681046353146281984
spelling sg-ntu-dr.10356-611372019-12-10T12:51:53Z A literature review framework for multi-document summarization of research papers Kokil Jaidka Jin Cheon Na Khoo Soo Guan, Christopher Wee Kim Wee School of Communication and Information DRNTU::Library and information science::General DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications DRNTU::Humanities::Linguistics DRNTU::Humanities::Linguistics::Sociolinguistics::Computational linguistics This study is in the area of multi-document summarization of research papers. It addresses the gap identified between the structure and readability of human-written summaries and other automatic multi-document summaries, which only focus on selecting the more important information from the set of documents but neglect to consider its readability. In the context of this overall goal, the first part of this study develops a literature review framework which specifies the structural, rhetorical and content characteristics of human-written literature reviews. In the second part of the study, an automatic method is developed which partially implements this framework, to generate multi-document summaries of research papers emulating some characteristic of human-written literature reviews. The framework is based on extensive discourse and information analyses of literature reviews in the domain of information science. The corpus for analysis comprised 120 literature review sections published as a part of research papers in international peer-reviewed top information science journals – Journal of the American Society for Information Science and Technology (JASIST), Journal of Information Science (JIS) and Journal of Documentation (JDoc) over the years 2000-2008. The macro-level analysis identifies the document structure within a literature review, which comprises 9 types of discourse elements. The sentence-level analysis identifies 22 rhetorical functions employed in literature reviews and 153 linguistic devices which frame information within sentences. The information analysis identifies significant associations between the source sections of selected sentences and the transformations performed on them. Results show that literature reviews are written in two main styles – integrative literature reviews and descriptive literature reviews. Integrative literature reviews present information from several studies in a condensed form as a critical summary, possibly complemented with a comparison, evaluation or comment on the research gap. They focus on highlighting relationships amongst concepts or comparing studies against each other. Descriptive reviews present more experimental detail about previous studies, such as their approach, results and evaluation. These findings are incorporated into the multi-level literature review framework, comprising their macro-level structure and their rhetorical functions, as well as the information summarization strategies. Based on this framework, in the second part of the study a multi-document summarization method emulating characteristics of human literature reviews is developed to generate an integrative summary that combines information across the papers and highlights the agreements and disagreements among them. It extracts information concepts from research papers by imitating researchers’ preferences, integrates them across the set of related papers and organizes them as a topic tree; finally, it presents them using sentence templates which realize rhetorical functions. The method which is presented here only focuses on summarizing and comparing the research objectives information across papers, and hence it applies only those components of the framework which are appropriate to choose and synthesize research objective information. Automatic content evaluation shows no significant difference between the summaries generated by the automatic method, and the baseline sentence extraction system, MEAD. However, the quality characteristics of the automatic summaries are a significant improvement over MEAD summaries because about two-thirds of all assessors (35 PhD students and professors in Library and Information Science) preferred to use them over MEAD summaries; they are also perceived as significantly more useful for obtaining a research overview or seeing comparisons across studies. The automatic summaries are also considered more readable in the way they relate topics and sentence to each other. However, they still have grammatical errors and repetitions; to resolve those, it is recommended to improve include some post-processing steps in the automatic method. Assessors with different levels of research experience are found to hold different expectations from the final summary – the ones with less experience look for more details about individual studies; it can be inferred that they prefer a more descriptive literature review. More experienced assessors want to understand the bigger picture and the main themes of the research; evidently, they want a more integrative literature review. These insights can help in customizing the automatic method for its users. This study is in the area of multi-document summarization of research papers. It addresses the gap identified between the quality of human-written summaries and other automatic multi.­ document summaries, which only focus on selecting the more important information from the set of documents but neglect to consider its readability. In the context of this overall goal, the first part of this study develops a literature review framework which specifies the structural, rhetorical and content characteristics of human-written literature reviews. The framework is based on extensive discourse and content analysis of literature reviews which identified the macro-level structure, sentence-level rhetorical functions and the authors' selection and transformation strategies which constitute literature reviews. The second part of the study develops an automatic method to partially implement this framework and generate multi-document summanes of research papers emulating some characteristics of human-written literature reviews in selecting, integrating, organizing and framing information. Assessors perceive this automatic summary as significantly more useful and readable than the summaries of the baseline system, MEAD, which employs a sentence extraction method. ​Doctor of Philosophy (WKWSCI) 2014-06-05T06:43:07Z 2014-06-05T06:43:07Z 2014 2014 Thesis http://hdl.handle.net/10356/61137 en Nanyang Technological University 258 p. application/pdf