Malware, Information Retrieval, Semantic Space, Deep Learning, MultilabelAbstract
This research presents an advanced approach to enhance the performance of a Malware Retrieval (MR) system by incorporating semantic-aware metric learning techniques. The study leverages labeled datasets obtained from VirusTotal, combining expert-verified labels with automated labeling from antivirus engines. The MR system is trained using various models, including single-label and multi-label baselines, and introduces center models with semantic components. Extensive quantitative and qualitative evaluations demonstrate that centerless models outperform baselines, especially in precision. In addition, class variance analysis confirms the effectiveness of centerloss in im-proving the discriminative power of representation vectors. This research showcases the potential for MR systems to incorporate semantic understanding and achieve improved performance in malware retrieval tasks.
