Android authorship attribution using source code-based features
Özet
With the widespread use of mobile devices, Android has become the most popular operating system, and new applications are uploaded to the Android market every day. However, because of the ease of modifying and repackaging Android binaries, Android applications can easily be modified and imitated by other developers and released in third-party Android markets. Therefore, determining the original developers of Android applications is a challenging problem known as authorship attribution. This study explored the distinctive features of Android applications to identify their authors. Software developers generally leave a footprint that describes their writing styles on their applications. Therefore, this footprint, which can be extracted from either the source code or binary code, can help identify the authors of software applications. Because obtaining the source code of applications in the wild can be impractical, especially when dealing with malware, researchers prefer to focus on the binaries of applications. Therefore, this study proposes an approach that identifies Android developers by deriving a wide range of features from different parts of Android applications, such as smali files, libraries, manifest files, and metadata information. Moreover, other features such as configuration, dex code, resource-based, and string-related features are inherited from other studies in Android authorship attribution and fused with the proposed feature set. The proposed approach was evaluated on benign and malware datasets and compared with those of other studies. The results show that the proposed features increased the accuracy by showing 82.5\% and 95.6\% in the market and malware datasets, respectively. The results demonstrate the positive effect of the proposed features on Android authorship attribution.