WebArc: Website Archival using a structured approach
Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2005
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/891 https://ink.library.smu.edu.sg/context/sis_research/article/1890/viewcontent/Lim_Marissa2005_Chapter_WebArcWebsiteArchivalUsingAStr.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-1890 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-18902018-06-25T07:20:35Z WebArc: Website Archival using a structured approach LIM, Ee Peng MARISSA, Maria Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages. 2005-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/891 info:doi/10.1007/11599517_49 https://ink.library.smu.edu.sg/context/sis_research/article/1890/viewcontent/Lim_Marissa2005_Chapter_WebArcWebsiteArchivalUsingAStr.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Scheduling Downloading World wide web Internet Classification Software tool Surveillance Monitoring Electronic library Databases and Information Systems Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Scheduling Downloading World wide web Internet Classification Software tool Surveillance Monitoring Electronic library Databases and Information Systems Numerical Analysis and Scientific Computing |
spellingShingle |
Scheduling Downloading World wide web Internet Classification Software tool Surveillance Monitoring Electronic library Databases and Information Systems Numerical Analysis and Scientific Computing LIM, Ee Peng MARISSA, Maria WebArc: Website Archival using a structured approach |
description |
Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages. |
format |
text |
author |
LIM, Ee Peng MARISSA, Maria |
author_facet |
LIM, Ee Peng MARISSA, Maria |
author_sort |
LIM, Ee Peng |
title |
WebArc: Website Archival using a structured approach |
title_short |
WebArc: Website Archival using a structured approach |
title_full |
WebArc: Website Archival using a structured approach |
title_fullStr |
WebArc: Website Archival using a structured approach |
title_full_unstemmed |
WebArc: Website Archival using a structured approach |
title_sort |
webarc: website archival using a structured approach |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2005 |
url |
https://ink.library.smu.edu.sg/sis_research/891 https://ink.library.smu.edu.sg/context/sis_research/article/1890/viewcontent/Lim_Marissa2005_Chapter_WebArcWebsiteArchivalUsingAStr.pdf |
_version_ |
1770570759952924672 |