A Knowledge-Poor Approach to Turkish Text Categorization

dc.authoridyildirim, savas/0000-0002-7764-2891
dc.authorwosidyildirim, savas/AAG-4639-2019
dc.contributor.authorYildirim, Savas
dc.date.accessioned2024-07-18T20:51:00Z
dc.date.available2024-07-18T20:51:00Z
dc.date.issued2014
dc.departmentİstanbul Bilgi Üniversitesien_US
dc.description15th Annual Conference on Intelligent Text Processing and Computational Linguistics (CICLing) -- APR 06-12, 2014 -- Ctr Commun & Dev, Kathmandu, NEPALen_US
dc.description.abstractDocument categorization is a way of determining a category for a given document. Supervised methods mostly rely on a training data and rich linguistic resources that are either language-specific or generic. This study proposes a knowledge-poor approach to text categorization without using any sets of rules or language specific resources such as part-of-speech tagger or shallow parser. Knowledge-poor here refers to lack of a reasonable amount of background knowledge. The proposed system architecture takes data as-is and simply separates tokens by space. Documents represented in vector space models are used as training data for many machine learning algorithm. We empirically examined and compared a several factors from similarity metrics to learning algorithms in a variety of experimental setups. Although researchers believe that some particular classifiers or metrics are better than others for text categorization, the recent studies disclose that the ranking of the models purely depends on the class, experimental setup and domain as well. The study features extensive evaluation, comparison within a variety of experiments. We evaluate models and similarity metrics for Turkish language as one of the agglutinative language especially within poor-knowledge framework. It is seen that output of the study would be very beneficial for other studies.en_US
dc.description.sponsorshipInst Politecnico Nacl Centro Invest Computac Nat Language &Text Proc Lab,Mexican Soc Artificial Intelligenceen_US
dc.identifier.endpage440en_US
dc.identifier.isbn978-3-642-54902-1
dc.identifier.isbn978-3-642-54903-8
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.scopus2-s2.0-84958521342en_US
dc.identifier.scopusqualityQ3en_US
dc.identifier.startpage428en_US
dc.identifier.urihttps://hdl.handle.net/11411/8344
dc.identifier.volume8404en_US
dc.identifier.wosWOS:000342990000036en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherSpringer-Verlag Berlinen_US
dc.relation.ispartofComputational Linguistics and Intelligent Text Processing, Cicling 2014, Part Iien_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectText Categorizationen_US
dc.subjectVector Space Modelen_US
dc.subjectMachine Learningen_US
dc.titleA Knowledge-Poor Approach to Turkish Text Categorizationen_US
dc.typeConference Objecten_US

Dosyalar