Expertini Research Research
Computer Science PDF Available Non-peer-reviewed Preprint

Data Augmentation for Opcode Sequence Based Malware Detection

Niall McLaughlin, Jesus Martinez del Rincon  ·  Published 2021-06-22

Abstract

In this paper we study data augmentation for opcode sequence based Android malware detection. Data augmentation has been successfully used in many areas of deep-learning to significantly improve model performance. Typically, data augmentation simulates realistic variations in data to increase the apparent diversity of the training-set. However, for opcode-based malware analysis it is not immediately clear how to apply data augmentation. Hence we first study the use of fixed transformations, then progress to adaptive methods. We propose a novel data augmentation method -- Self-Embedding Language Model Augmentation -- that uses a malware detection network's own opcode embedding layer to measure opcode similarity for adaptive augmentation. To the best of our knowledge this is the first paper to carry out a systematic study of different augmentation methods for opcode sequence based Android malware classification.
📄 Full Paper Available as PDF
This paper is available as a downloadable PDF.
📄 Download PDF

✨ AI Plain-English Summary

Get a plain-English summary of this paper generated by AI (5 free per day).

Comments (0)

No comments yet. Be the first to comment.