Automated abuse detection of privacy policy for Android apps

Regulators such as Apple and Google Play require smartphone application (APP) to publish privacy policy (PP) that specify the details of what and how information is collected, stored and processed and the PP should be unique to every smartphone APP. However, most users do not spend adequate time to...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Yuan Jun
Other Authors: Liu Yang
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137955
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Regulators such as Apple and Google Play require smartphone application (APP) to publish privacy policy (PP) that specify the details of what and how information is collected, stored and processed and the PP should be unique to every smartphone APP. However, most users do not spend adequate time to read the PPs, and they would also be unaware of how their information is handled. Furthermore, creating a unique PP is seen as an excruciating and time-consuming task, most APP developer would find shortcut to generate PP either by copying from existing PP or take advantage of online PP generators. Such PPs are referred as online generated privacy policies (OGPPs) which are considered to be an abused PP as the content are not uniquely describing the services the APP is providing. On top of that, such OGPPs generated potentially omit specific compliance established by the legislation and often may result in legal repercussion as the content are similar despite providing different services. All in all, PP is important, but it is often overlooked by both users and APP developers. In this report, Term Frequency – Inverse Document Frequency – Cosine Similarity (TF-IDF-CS) algorithm and Knuth–Morris–Pratt (KMP) algorithm were used to identify similarities between OGPPs and a database of existing PPs, as well as, analysing similarity results to establish a benchmark to distinguish APPs that published abused PP.