Utilizing Large Programming Language Models on Software Vulnerability Detection
| dc.contributor.author | Aslan, Mert Kaan | |
| dc.contributor.author | Alkan, Yunus Emre | |
| dc.contributor.author | Alican, Muhammed Burak | |
| dc.contributor.author | Ozdemir, Ozgur | |
| dc.date.accessioned | 2026-04-04T18:48:34Z | |
| dc.date.available | 2026-04-04T18:48:34Z | |
| dc.date.issued | 2025 | |
| dc.description | 2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025 -- 10 September 2025 through 12 September 2025 -- Bursa -- 214381 | |
| dc.description.abstract | Following the success of large language models, pre-trained programming language models (PLMs) have shown prominent achievements in the software engineering field. This paper focuses on examining the performance of pre-trained PLMs in detecting software vulnerabilities in source codes. In this study, two distinct transformer-based approaches are utilized: the encoder-only CodeBERT and the decoder-only Qwen-2.5Coder. The selected models are evaluated on two benchmark datasets, namely PrimeVul and BigVul, differing significantly in terms of data duplication and label quality. Experimental studies reveal that while Qwen-2.5-Coder outperforms CodeBERT on the BigVul benchmark, both models suffer a substantial performance drop on the realistic and deduplicated PrimeVul dataset. Notably, Qwen-2.5-Coder shows extreme sensitivity to high-quality samples, achieving only 2.37% recall, suggesting that decoder-only models may overfit on noisy or redundant data. In contrast, CodeBERT demonstrates relatively more stable behavior with its encoder architecture's suitability for classification tasks. These findings highlight not only the critical role of dataset design, such as duplication control and label accuracy, but also the impact of architectural choices on generalization. This paper aims to contribute to the development of more effective vulnerability detection tools that can automatically detect software vulnerabilities by leveraging these findings. © 2025 IEEE. | |
| dc.description.sponsorship | Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, TUBITAK, (1919B012426590); Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, TUBITAK | |
| dc.identifier.doi | 10.1109/ASYU67174.2025.11208282 | |
| dc.identifier.isbn | 979-833159727-6 | |
| dc.identifier.scopus | 2-s2.0-105022451593 | |
| dc.identifier.scopusquality | N/A | |
| dc.identifier.uri | https://doi.org/10.1109/ASYU67174.2025.11208282 | |
| dc.identifier.uri | https://hdl.handle.net/11411/10230 | |
| dc.indekslendigikaynak | Scopus | |
| dc.language.iso | en | |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | |
| dc.relation.ispartof | 2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025 | |
| dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | |
| dc.rights | info:eu-repo/semantics/closedAccess | |
| dc.snmz | KA_Scopus_20260402 | |
| dc.subject | Programming Language Models | |
| dc.subject | Software Analysis | |
| dc.subject | Software Security | |
| dc.subject | Software Vulnerability Detection | |
| dc.title | Utilizing Large Programming Language Models on Software Vulnerability Detection | |
| dc.type | Conference Paper |











