An approach for clone detection in documentation reuse

The paper focuses on the searching method for repetitions in DocBook/DRL or plain text documents. An algorithm has been designed based on software clone detection. The algorithm supports filtering results: clones are rejected if clone length in the group is less than 5 symbols, intersection of clone...

Full description

Saved in:
Bibliographic Details
Main Authors: LUTSIV, Dmitry V., KOZNOV, Dmitry, BASIT, Hamid A., OUH, Eng Lieh, SMIRNOV, Mikhail N., ROMANOVSKY, Konstantin Y.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2014
Subjects:
DRL
Online Access:https://ink.library.smu.edu.sg/sis_research/3984
https://ink.library.smu.edu.sg/context/sis_research/article/4986/viewcontent/an_approach.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4986
record_format dspace
spelling sg-smu-ink.sis_research-49862022-08-29T09:13:02Z An approach for clone detection in documentation reuse LUTSIV, Dmitry V. KOZNOV, Dmitry BASIT, Hamid A. OUH, Eng Lieh SMIRNOV, Mikhail N. ROMANOVSKY, Konstantin Y. The paper focuses on the searching method for repetitions in DocBook/DRL or plain text documents. An algorithm has been designed based on software clone detection. The algorithm supports filtering results: clones are rejected if clone length in the group is less than 5 symbols, intersection of clone groups is eliminated, meaningfulness clones are removed, the groups containing clones consisting only of XML are eliminated. Remaining search is supported: found clones are extracted from the documentation, and clone search is repeated. One step is proved to be enough. Adaptive reuse technique of Paul Bassett – Stan Jarzabek has been implemented. A software tool has been developed on the basis of the algorithm. The tool supports setting parameters for repetitions detection and visualization of the obtained results. The tool is integrated into DocLine document development environment, and provides refactoring of documents using found clones. The Clone Miner clone detection utility is used for clones search. The method has been evaluated for Linux Kernel Documentation (29documents, 25000 lines). Five semantic kinds of clones have been selected: terms (abbreviations, one word and two word terms), hyperlinks, license agreements, functionality description, and code examples. 451 meaningful clone groups have been found, average clone length is 4.43 tokens, and average number of clones in a group is 3.56. 2014-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3984 https://ink.library.smu.edu.sg/context/sis_research/article/4986/viewcontent/an_approach.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University software documentation documentation reuse software clone detection adaptive reuse refactoring DocBook DocLine DRL Programming Languages and Compilers Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic software documentation
documentation reuse
software clone detection
adaptive reuse
refactoring
DocBook
DocLine
DRL
Programming Languages and Compilers
Software Engineering
spellingShingle software documentation
documentation reuse
software clone detection
adaptive reuse
refactoring
DocBook
DocLine
DRL
Programming Languages and Compilers
Software Engineering
LUTSIV, Dmitry V.
KOZNOV, Dmitry
BASIT, Hamid A.
OUH, Eng Lieh
SMIRNOV, Mikhail N.
ROMANOVSKY, Konstantin Y.
An approach for clone detection in documentation reuse
description The paper focuses on the searching method for repetitions in DocBook/DRL or plain text documents. An algorithm has been designed based on software clone detection. The algorithm supports filtering results: clones are rejected if clone length in the group is less than 5 symbols, intersection of clone groups is eliminated, meaningfulness clones are removed, the groups containing clones consisting only of XML are eliminated. Remaining search is supported: found clones are extracted from the documentation, and clone search is repeated. One step is proved to be enough. Adaptive reuse technique of Paul Bassett – Stan Jarzabek has been implemented. A software tool has been developed on the basis of the algorithm. The tool supports setting parameters for repetitions detection and visualization of the obtained results. The tool is integrated into DocLine document development environment, and provides refactoring of documents using found clones. The Clone Miner clone detection utility is used for clones search. The method has been evaluated for Linux Kernel Documentation (29documents, 25000 lines). Five semantic kinds of clones have been selected: terms (abbreviations, one word and two word terms), hyperlinks, license agreements, functionality description, and code examples. 451 meaningful clone groups have been found, average clone length is 4.43 tokens, and average number of clones in a group is 3.56.
format text
author LUTSIV, Dmitry V.
KOZNOV, Dmitry
BASIT, Hamid A.
OUH, Eng Lieh
SMIRNOV, Mikhail N.
ROMANOVSKY, Konstantin Y.
author_facet LUTSIV, Dmitry V.
KOZNOV, Dmitry
BASIT, Hamid A.
OUH, Eng Lieh
SMIRNOV, Mikhail N.
ROMANOVSKY, Konstantin Y.
author_sort LUTSIV, Dmitry V.
title An approach for clone detection in documentation reuse
title_short An approach for clone detection in documentation reuse
title_full An approach for clone detection in documentation reuse
title_fullStr An approach for clone detection in documentation reuse
title_full_unstemmed An approach for clone detection in documentation reuse
title_sort approach for clone detection in documentation reuse
publisher Institutional Knowledge at Singapore Management University
publishDate 2014
url https://ink.library.smu.edu.sg/sis_research/3984
https://ink.library.smu.edu.sg/context/sis_research/article/4986/viewcontent/an_approach.pdf
_version_ 1770574111736594432