Mobile app marketplace mining : methods and applications
With the rocketing development of mobile applications, app marketplace has drawn much more attention among researchers in multiple important research areas, ranging from data mining, machine learning, software engineering to security. App marketplace is a new form of software repository which contai...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/65354 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | With the rocketing development of mobile applications, app marketplace has drawn much more attention among researchers in multiple important research areas, ranging from data mining, machine learning, software engineering to security. App marketplace is a new form of software repository which contains a wealth of multi-modal heterogeneous data associated with apps, e.g., description text, screenshot images, user reviews, and so on. Such app markets data is (i) large in volume; (ii) growing and changing rapidly; (iii) complex in its variety; and (iv) potentially valuable for various stakeholders in the mobile app ecosystem, e.g., developers, users, app platform providers, and etc. However, until now, there still lacks of mining approaches that can help app ecosystem stakeholders exploit such valuable data in an effective and efficient way. In this thesis, we present three novel mobile app marketplace data mining/machine learning schemes and apply them to address three crucial applications for app ecosystem stakeholders by exploring a specific and increasingly important data source, i.e., app markets data. First of all, in order to assist app developers find the most ``informative" user reviews from a large and rapidly increasing pool of user reviews in app markets, we present a novel framework named ``AR-Miner" (App Review Miner) which consists of four main steps: (i) AR-Miner first filters noisy and irrelevant reviews, (ii) then groups the remaining informative reviews by applying topic modeling, (iii) further prioritizes the informative reviews by using our proposed novel ranking model, (iv) and finally presents an intuitive visualized summarization to app developers. We conduct an extensive set of empirical studies on four popular Android apps (with hundred thousands of user reviews) to evaluate the performance of AR-Miner, from which the encouraging results show that AR-Miner is effective, efficient and promising. Second, in order to model the high-level app similarity, we present ``SimApp" -- a novel framework which consists of two stages: (i) we define a set of kernel (similarity) functions to measure app similarity for each modality of data; (ii) we assume the target app similarity function is a linear combination of the multiple kernels, and develop a new online kernel learning algorithm to learn the optimal combination weights of these kernels from training data streams. We conduct extensive experiments on a real-world dataset crawled from Google Play to evaluate SimApp, from which the encouraging results validate its efficacy in app similarity modeling. Finally, we address the issue of automatic app annotation, which could be potentially useful for different app ecosystem stakeholders. Most mainstream app markets, e.g., Google Play, Apple App Store, etc., currently do not explicitly support automatic annotation for apps. To address this problem, we propose a novel retrieval-based app annotation framework for automatically annotating apps. Given a query app (without any tags), our proposed framework (i) first retrieves a set of N apps which are most semantically similar to the query app from a large app database; and (ii) then mines the ``Description" and ``Update" text of both the query app and its top-N similar apps to discover relevant tags for the query app. To evaluate the efficacy of our proposed framework, we conduct a series of qualitative and quantitative experiments. The encouraging results demonstrate that our technique is both effective and promising. |
---|