Using Collaborative Filtering Algorithm to Estimate the Predictive Power of a Functional Requirement

Collaborative filtering (CF) algorithm uses the preferences expressed by previous users of items being studied and is widely applied to build recommender systems. A collaborative filter predicts items that a user will like based on the vote similar users gave to that item. In this study, we use CF t...

Full description

Saved in:
Bibliographic Details
Main Authors: Hidalgo, Reynald Jay F, Fernandez, Proceso L, Jr
Format: text
Published: Archīum Ateneo 2020
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/276
https://dl.acm.org/doi/10.1145/3377571.3377605
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
Description
Summary:Collaborative filtering (CF) algorithm uses the preferences expressed by previous users of items being studied and is widely applied to build recommender systems. A collaborative filter predicts items that a user will like based on the vote similar users gave to that item. In this study, we use CF to estimate how much the knowledge of the presence or absence of one software feature can contribute to the correct prediction of the presence or absence of each of the possible remaining features. Completed software project documentations from the Master in Information Technology programs of selected Northern Luzon higher education institutions were first collected. An analysis of these documents revealed 26 unique software features and yielded a binary matrix indicating the presence or absence of a feature in a specific project. Leave-one-out cross-validation was performed to estimate the predictive power of each element of a given holdout vector, using the 26x26 cosine similarity matrix generated from the remaining vectors. The results show that, on average, knowing correctly the presence or absence of only 1 feature can predict with an accuracy of about 58% the presence or absence of the remaining features. This is 8% better than that of a naïve 50-50 random binary guessing algorithm, and somehow indicates the amount of information contributed by one feature value under the CF algorithm.