Utilizing Large Programming Language Models on Software Vulnerability Detection

Aslan, Mert Kaan; Alkan, Yunus Emre; Alican, Muhammed Burak; Ozdemir, Ozgur

Utilizing Large Programming Language Models on Software Vulnerability Detection

dc.contributor.author	Aslan, Mert Kaan
dc.contributor.author	Alkan, Yunus Emre
dc.contributor.author	Alican, Muhammed Burak
dc.contributor.author	Ozdemir, Ozgur
dc.date.accessioned	2026-04-04T18:48:34Z
dc.date.available	2026-04-04T18:48:34Z
dc.date.issued	2025
dc.description	2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025 -- 10 September 2025 through 12 September 2025 -- Bursa -- 214381
dc.description.abstract	Following the success of large language models, pre-trained programming language models (PLMs) have shown prominent achievements in the software engineering field. This paper focuses on examining the performance of pre-trained PLMs in detecting software vulnerabilities in source codes. In this study, two distinct transformer-based approaches are utilized: the encoder-only CodeBERT and the decoder-only Qwen-2.5Coder. The selected models are evaluated on two benchmark datasets, namely PrimeVul and BigVul, differing significantly in terms of data duplication and label quality. Experimental studies reveal that while Qwen-2.5-Coder outperforms CodeBERT on the BigVul benchmark, both models suffer a substantial performance drop on the realistic and deduplicated PrimeVul dataset. Notably, Qwen-2.5-Coder shows extreme sensitivity to high-quality samples, achieving only 2.37% recall, suggesting that decoder-only models may overfit on noisy or redundant data. In contrast, CodeBERT demonstrates relatively more stable behavior with its encoder architecture's suitability for classification tasks. These findings highlight not only the critical role of dataset design, such as duplication control and label accuracy, but also the impact of architectural choices on generalization. This paper aims to contribute to the development of more effective vulnerability detection tools that can automatically detect software vulnerabilities by leveraging these findings. © 2025 IEEE.
dc.description.sponsorship	Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, TUBITAK, (1919B012426590); Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, TUBITAK
dc.identifier.doi	10.1109/ASYU67174.2025.11208282
dc.identifier.isbn	979-833159727-6
dc.identifier.scopus	2-s2.0-105022451593
dc.identifier.scopusquality	N/A
dc.identifier.uri	https://doi.org/10.1109/ASYU67174.2025.11208282
dc.identifier.uri	https://hdl.handle.net/11411/10230
dc.indekslendigikaynak	Scopus
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof	2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/closedAccess
dc.snmz	KA_Scopus_20260402
dc.subject	Programming Language Models
dc.subject	Software Analysis
dc.subject	Software Security
dc.subject	Software Vulnerability Detection
dc.title	Utilizing Large Programming Language Models on Software Vulnerability Detection
dc.type	Conference Paper

Koleksiyon

Scopus Indexed Publications

Utilizing Large Programming Language Models on Software Vulnerability Detection

Dosyalar

Koleksiyon