Utilizing Large Programming Language Models on Software Vulnerability Detection

dc.contributor.authorAslan, Mert Kaan
dc.contributor.authorAlkan, Yunus Emre
dc.contributor.authorAlican, Muhammed Burak
dc.contributor.authorOzdemir, Ozgur
dc.date.accessioned2026-04-04T18:48:34Z
dc.date.available2026-04-04T18:48:34Z
dc.date.issued2025
dc.description2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025 -- 10 September 2025 through 12 September 2025 -- Bursa -- 214381
dc.description.abstractFollowing the success of large language models, pre-trained programming language models (PLMs) have shown prominent achievements in the software engineering field. This paper focuses on examining the performance of pre-trained PLMs in detecting software vulnerabilities in source codes. In this study, two distinct transformer-based approaches are utilized: the encoder-only CodeBERT and the decoder-only Qwen-2.5Coder. The selected models are evaluated on two benchmark datasets, namely PrimeVul and BigVul, differing significantly in terms of data duplication and label quality. Experimental studies reveal that while Qwen-2.5-Coder outperforms CodeBERT on the BigVul benchmark, both models suffer a substantial performance drop on the realistic and deduplicated PrimeVul dataset. Notably, Qwen-2.5-Coder shows extreme sensitivity to high-quality samples, achieving only 2.37% recall, suggesting that decoder-only models may overfit on noisy or redundant data. In contrast, CodeBERT demonstrates relatively more stable behavior with its encoder architecture's suitability for classification tasks. These findings highlight not only the critical role of dataset design, such as duplication control and label accuracy, but also the impact of architectural choices on generalization. This paper aims to contribute to the development of more effective vulnerability detection tools that can automatically detect software vulnerabilities by leveraging these findings. © 2025 IEEE.
dc.description.sponsorshipTürkiye Bilimsel ve Teknolojik Araştırma Kurumu, TUBITAK, (1919B012426590); Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, TUBITAK
dc.identifier.doi10.1109/ASYU67174.2025.11208282
dc.identifier.isbn979-833159727-6
dc.identifier.scopus2-s2.0-105022451593
dc.identifier.scopusqualityN/A
dc.identifier.urihttps://doi.org/10.1109/ASYU67174.2025.11208282
dc.identifier.urihttps://hdl.handle.net/11411/10230
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_Scopus_20260402
dc.subjectProgramming Language Models
dc.subjectSoftware Analysis
dc.subjectSoftware Security
dc.subjectSoftware Vulnerability Detection
dc.titleUtilizing Large Programming Language Models on Software Vulnerability Detection
dc.typeConference Paper

Dosyalar