Seminar: Automatic Malicious Code Generation Using GPT Models

Marcus Botacin
Texas A&M University

Thursday, March 14, 10 am
Harold Frank Hall 1132

Abstract: Recent research advances introduced large textual models, which enable many applications, such as generating text and code. Whereas the model's capabilities might be explored for good, they might also cause some negative impact: The model's code generation capabilities might be used by attackers to assist in malware creation, a phenomenon that must be understood. In this talk, I answer the question: Can current large textual models already be used by attackers to generate malware? And how? I introduce multiple coding strategies that can be explored, ranging from the entire code description to separate descriptions of malicious functions that can be used as building blocks. I present experimental results that show that models like GPT-3 still have trouble generating entire malware samples from complete descriptions but that they can easily construct malware via building block descriptions. They also still have limitations to understand the described contexts, but once it is done they generate multiple versions of the same semantic (malware variants), some of them being able to evade antivirus detection.

Biography: Marcus Botacin is an Assistant Professor at the Computer Science and Engineering (CSE) Department at Texas A&M University (TAMU). Marcus holds a Computer Science Ph.D. (Federal University of Paraná, Brazil, 2021), a Master in Computer Science (University of Campinas, Brazil, 2017), and a Computer Engineering Bachelor (University of Campinas, Brazil, 2015). His main research interests are malware analysis, reverse engineering, and the science of security.

Host: Giovanni Vigna

List of Links

Marcus Botacin Web

Date

Thu, 03/14/2024 - 10:00

Location

HFH 1132 and via Zoom