WebArc: Website Archival using a structured approach

Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC...

Full description

Saved in:
Bibliographic Details
Main Authors: LIM, Ee Peng, MARISSA, Maria
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2005
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/891
https://ink.library.smu.edu.sg/context/sis_research/article/1890/viewcontent/Lim_Marissa2005_Chapter_WebArcWebsiteArchivalUsingAStr.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1890
record_format dspace
spelling sg-smu-ink.sis_research-18902018-06-25T07:20:35Z WebArc: Website Archival using a structured approach LIM, Ee Peng MARISSA, Maria Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages. 2005-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/891 info:doi/10.1007/11599517_49 https://ink.library.smu.edu.sg/context/sis_research/article/1890/viewcontent/Lim_Marissa2005_Chapter_WebArcWebsiteArchivalUsingAStr.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Scheduling Downloading World wide web Internet Classification Software tool Surveillance Monitoring Electronic library Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Scheduling
Downloading
World wide web
Internet
Classification
Software tool
Surveillance
Monitoring
Electronic library
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Scheduling
Downloading
World wide web
Internet
Classification
Software tool
Surveillance
Monitoring
Electronic library
Databases and Information Systems
Numerical Analysis and Scientific Computing
LIM, Ee Peng
MARISSA, Maria
WebArc: Website Archival using a structured approach
description Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages.
format text
author LIM, Ee Peng
MARISSA, Maria
author_facet LIM, Ee Peng
MARISSA, Maria
author_sort LIM, Ee Peng
title WebArc: Website Archival using a structured approach
title_short WebArc: Website Archival using a structured approach
title_full WebArc: Website Archival using a structured approach
title_fullStr WebArc: Website Archival using a structured approach
title_full_unstemmed WebArc: Website Archival using a structured approach
title_sort webarc: website archival using a structured approach
publisher Institutional Knowledge at Singapore Management University
publishDate 2005
url https://ink.library.smu.edu.sg/sis_research/891
https://ink.library.smu.edu.sg/context/sis_research/article/1890/viewcontent/Lim_Marissa2005_Chapter_WebArcWebsiteArchivalUsingAStr.pdf
_version_ 1770570759952924672