AndroOBFS: Time-tagged Obfuscated Android Malware Dataset with Family Information

Abstract

With the large-scale adaptation of Android OS and ever-increasing contributions in the Android application space, Android has become the number one target of malware writers. In recent years, a large number of automatic malware detection and classification systems have evolved to tackle the dynamic nature of malware growth using either static or dynamic analysis techniques. Performance of static malware detection methods degrade due to the obfuscation attacks. Although many benchmark datasets are available to measure the performance of malware detection and classification systems, only a single obfuscated malware dataset (PRAGuard) is available to showcase the efficacy of the existing malware detection systems against the obfuscation attacks. PRAGuard contains outdated samples till March 2013 and does not represent the latest application categories. Moreover, PRAGuard does not provide the family information for malware because of which PRAGuard can not be used to evaluate the efficacy of the malware family classification systems.

In this work, we create and release AndroOBFS, a time-tagged (at month granularity) obfuscated malware dataset with familial information spanning over three years from 2018 to 2020. We create this dataset by obfuscating 16279 unique real-world malware in six different obfuscation categories. Out of 16279 obfuscated malware samples, 14579 samples are distributed across 158 families with at least two unique malware samples in each family. We release this dataset to facilitate Android malware study towards designing robust and obfuscation resilient malware detection and classification systems.

Publication
2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR-2022). [Core: A Ranked]
Saurabh Kumar
Saurabh Kumar
Postdoctoral Scholar

My research interests include cybersecurity, Android security, malware analysis and ceyber forensics.