Utilizing Large Programming Language Models on Software Vulnerability Detection

Aslan, Mert Kaan; Alkan, Yunus Emre; Alican, Muhammed Burak; Ozdemir, Ozgur

Utilizing Large Programming Language Models on Software Vulnerability Detection

Tarih

2025

Yazarlar

Aslan, Mert Kaan

Alkan, Yunus Emre

Alican, Muhammed Burak

Ozdemir, Ozgur

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Following the success of large language models, pre-trained programming language models (PLMs) have shown prominent achievements in the software engineering field. This paper focuses on examining the performance of pre-trained PLMs in detecting software vulnerabilities in source codes. In this study, two distinct transformer-based approaches are utilized: the encoder-only CodeBERT and the decoder-only Qwen-2.5Coder. The selected models are evaluated on two benchmark datasets, namely PrimeVul and BigVul, differing significantly in terms of data duplication and label quality. Experimental studies reveal that while Qwen-2.5-Coder outperforms CodeBERT on the BigVul benchmark, both models suffer a substantial performance drop on the realistic and deduplicated PrimeVul dataset. Notably, Qwen-2.5-Coder shows extreme sensitivity to high-quality samples, achieving only 2.37% recall, suggesting that decoder-only models may overfit on noisy or redundant data. In contrast, CodeBERT demonstrates relatively more stable behavior with its encoder architecture's suitability for classification tasks. These findings highlight not only the critical role of dataset design, such as duplication control and label accuracy, but also the impact of architectural choices on generalization. This paper aims to contribute to the development of more effective vulnerability detection tools that can automatically detect software vulnerabilities by leveraging these findings. © 2025 IEEE.

Açıklama

2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025 -- 10 September 2025 through 12 September 2025 -- Bursa -- 214381

Anahtar Kelimeler

Programming Language Models, Software Analysis, Software Security, Software Vulnerability Detection

Kaynak

2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025

Scopus Q Değeri

N/A

Bağlantı

https://doi.org/10.1109/ASYU67174.2025.11208282
https://hdl.handle.net/11411/10230

Koleksiyon

Scopus Indexed Publications

Detaylı Öğe Kaydı

Utilizing Large Programming Language Models on Software Vulnerability Detection

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon