WebArc: Website Archival using a structured approach

Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC...

Full description

Saved in:
Bibliographic Details
Main Authors: LIM, Ee Peng, MARISSA, Maria
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2005
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/891
https://ink.library.smu.edu.sg/context/sis_research/article/1890/viewcontent/Lim_Marissa2005_Chapter_WebArcWebsiteArchivalUsingAStr.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages.