Yazar "Yildiz, Tugba" seçeneğine göre listele
Listeleniyor 1 - 11 / 11
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe Acquisition of Turkish meronym based on classification of patterns(Springer, 2016) Yildiz, Tugba; Diri, Banu; Yildirim, SavasThe identification of semantic relations from a raw text is an important problem in Natural Language Processing. This paper provides semi-automatic pattern-based extraction of part-whole relations. We utilized and adopted some lexico-syntactic patterns to disclose meronymy relation from a Turkish corpus. We applied two different approaches to prepare patterns; one is based on pre-defined patterns that are taken from the literature, second automatically produces patterns by means of bootstrapping method. While pre-defined patterns are directly applied to corpus, other patterns need to be discovered first by taking manually prepared unambiguous seeds. Then, word pairs are extracted by their occurrence in those patterns. In addition, we used statistical selection on global data that is obtaining from all results of entire patterns. It is a whole-by-part matrix on which several association metrics such as information gain, T-score, etc., are applied. We examined how all these approaches improve the system accuracy especially within corpus-based approach and distributional feature of words. Finally, we conducted a variety of experiments with a comparison analysis and showed advantage and disadvantage of the approaches with promising results.Öğe A cascaded framework for identification and extraction of antonym for Turkish language(Springer, 2019) Yildiz, Tugba; Yildirim, SavasIdentification and extraction of semantic relations are challenging tasks in Natural Language Processing. In this paper, we design and propose three different models for the two separate tasks of identifying and extracting antonyms. In the first model, we develop two methods to identify antonyms: the first method consists of a probabilistic approach to calculate the probability of a given target/candidate pair being an antonym, whereby two distinct scoring functions are proposed to decide about the correct candidate for each target word; the second method consists of learning word embeddings and measuring embedding similarity to identify antonym pairs. In the second proposed model, we represent target/candidate pairs by a set of features that are compatible with those that are used by a supervised machine learning algorithm. The first and second models both especially well-suited for the identification of antonymy. In the last and third model, we adopt a minimally supervised bootstrapping approach, which operates by starting with a few antonym pairs and producing, thereafter, both seeds and patterns in an iterative fashion. Our study is deemed to be a significant contribution toward enriching the lexicon of the Turkish language.Öğe Deep Neural Network Architecture for Part-of-Speech Tagging for Turkish Language(IEEE, 2018) Bahcevan, Cenk Anil; Kutlu, Emirhan; Yildiz, TugbaParts of Speech (POS) tagging is one of the most well-studied problems in the field of Natural Language Processing (NLP). In this paper, a Neural Network Language Models (NNLM) such as Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) have been trained and assessed to address the POS tagging problem for the Turkish Language. The performance is compared to the state-of-art methods. The results show that LSTM outperl4ms RNN with 88.7% Fl-score. This study is the first study that contributes to the literature utilizing word embedding and NNLM for the Turkish language.Öğe An Empirical Investigation of Performances of Different Word Embedding Algorithms in Comment Clustering(IEEE, 2019) Dorani, Eimal; Duru, Nevcihan; Yildiz, TugbaWith the rapid growth of the usage and interest in social network services, evaluating comment clustering has become increasingly important for various commercial and scientific applications. Analyzing, organizing and ascertaining the overall theme of a large volume of comments is a challenging and time-consuming task which has attracted much attention recently. In this study, we proposed a method to address the comment clustering problem. Extensive experiments have been conducted on seven different comment datasets using TF-IDF and different word embedding algorithms, namely Word2vec, Glove and FastText; the internal clustering validation have been conducted to evaluate the performance of each method in clustering of the comments. We observed that word embedding produced significantly better results in comment clustering than TF-IDF. In addition, word2vec has shown the best performance among all; however, we found that Glove is the most stable and consistent across all datasets such that the performance improved as dataset size increased.Öğe A Hybrid Method for Extracting Turkish Part-Whole Relation Pairs from Corpus(IEEE, 2016) Sahin, Gurkan; Diri, Banu; Yildiz, TugbaExtraction of various semantic relation pairs from different sources (dictionary definitions, corpus etc.) with high accuracy is one of the most popular topics in natural language processing (NLP). In this study, a hybrid method is proposed to extract Turkish part-whole pairs from corpus. Corpus statistics, WordNet similarities and Word2Vec word vector similarities are used together in this study. Firstly, initial part-whole seeds are prepared and by using these seeds part-whole patterns are extracted from corpus. For each pattern, a reliability score is calculated and reliable patterns are selected to produce new pairs from corpus. Various reliability scores are used for new pairs. To measure success of method, 19 target whole words are selected and average 83% (first 10 pairs), 74% (first 20 pairs), 68% (first 30 pairs) precisions are obtained, respectively.Öğe Image Captioning in Turkish Language(IEEE, 2019) Yilmaz, Berk Dursun; Demir, Ali Emre; Sonmez, Elena Battini; Yildiz, TugbaImage captioning is one of the everlasting challenging tasks in the field of artificial intelligence that requires computer vision and natural language processing. Plenty of salient works have been proposed throughout the time for English language however, the number of studies in Turkish language is still too limited. This paper couples an encoder CNN-the component that is responsible for extracting the features of the given images-, with a decoder RNN -the component that is responsible for generating captions using the given inputs-to generate Turkish captions within human gold-standards. We conducted the experiments using the most common evaluation metrics such as BLEU, METEOR, ROUGE and CIDEr. Results show that the performance of the proposed model is satisfactory in both qualitative and quantitatively evaluations. A Web App is already deployed to allow volunteers to contribute to the improvements of the Turkish captioned dataset.Öğe An Integrated Approach to Automatic Synonym Detection in Turkish Corpus(Springer International Publishing Ag, 2014) Yildiz, Tugba; Yildirum, Savas; Diri, BanuIn this study, we designed a model to determine synonymy. Our main assumption is that synonym pairs show similar semantic and dependency relation by the definition. They share same meronym/holonym and hypernym/hyponym relations. Contrary to synonymy, hypernymy and meronymy relations can probably be acquired by applying lexico-syntactic patterns to a big corpus. Such acquisition might be utilized and ease detection of synonymy. Likewise, we utilized some particular dependency relations such as object/subject of a verb, etc. Machine learning algorithms were applied on all these acquired features. The first aim is to find out which dependency and semantic features are the most informative and contribute most to the model. Performance of each feature is individually evaluated with cross validation. The model that combines all features shows promising results and successfully detects synonymy relation. The main contribution of the study is to integrate both semantic and dependency relation within distributional aspect. Second contribution is considered as being first major attempt for Turkish synonym identification based on corpus-driven approach.Öğe Pattern and Semantic Similarity Based Automatic Extraction of Hyponym-Hypernym Relation from Turkish Corpus(IEEE, 2015) Sahin, Gurkan; Diri, Banu; Yildiz, TugbaExtraction of semantic relations from various resources (Wikipedia, Web, corpus etc.) is an important issue in natural language processing. In this paper, automatic extraction of hyponym-hypernym pairs from Turkish corpus is aimed. For extraction of hyponym-hypernym pairs, pattern and semantic similarity based methods are used together. Patterns are extracted from initial hyponym-hypernym pairs and using patterns, hyponyms are extracted for various hypernyms. Incorrect candidate hyponyms are removed using document frequency and semantic similarity based elimination methods. After experiments for 14 hypernyms, average accuracy of 77% was obtained.Öğe Pronoun Resolution in Turkish Using Decision Tree and Rule-Based Learning Algorithms(Springer-Verlag Berlin, 2009) Yildirim, Savas; Kilicaslan, Yilmaz; Yildiz, TugbaThis paper reports on the results of some pronoun resolution experiments performed by applying a decision tree and a rule-based algorithm on an annotated Turkish text. The text has been compiled mostly from various popular child stories in a semi-automatic way. A knowledge-lean learning model has been devised using only nine most commonly employed features. An evaluation and comparison of the performances achieved with the two different algorithms is offered in terms of the recall, precision and f-measure metrics.Öğe Sentiment Analysis Using Learning Approaches over Emojis for Turkish Tweets(IEEE, 2018) Velioglu, Riza; Yildiz, Tugba; Yildirim, SavasWith the rise of the usage and interest on social media platforms, emojis have become an increasingly important part of the written language and one of the most important signals for micro-blog sentiment analysis. In this paper, we employed and evaluated classification models using two different representations based on hag-of-words and fasilText to address the problem of sentiment analysis over emojis/emoticons for Turkish positive, negative and neutral tweets. At first, the hag of-words approach is used as a simple and efficient baseline method for tweet representation, where the classifiers such as Naive Bayes, Logistic Regression, Support Vector Machines, Decision Trees have been applied to these tweets. Secondly, we utilized fastText to represent tweets as word n-grams for sentiment analysis problem. The results show that there is no significant difference between the two models. While fastText shows 79% and the Linear Regression classifier obtains 77% Fl-score for binary classification, fastText performs 62% and Linear Regression has 58% Fl-score for multi-class classification. This study is considered as the first study that contributes to the literature by applying different vector representations such as bag-of-words and fastText to predict Turkish tweets over emojis. This study can also be utilized to predict emojis on social media context in the future.Öğe A Study on Turkish Meronym Extraction Using a Variety of Lexico-Syntactic Patterns(Springer International Publishing Ag, 2016) Yildiz, Tugba; Yildirim, Savas; Diri, BanuIn this paper, we applied lexico-syntactic patterns to disclose meronymy relation from a huge Turkish raw text. Once, the system takes a huge raw corpus and extract matched cases for a given pattern, it proposes a list of whole-part pairs depending on their co-occur frequencies. For the purpose, we exploited and compared a list of pattern clusters. The clusters to be examined could fall into three types; general patterns, dictionary-based pattern, and bootstrapped pattern. We evaluated how these patterns improve the system performance especially within corpusbased approach and distributional feature of words. Finally, we discuss all the experiments with a comparison analysis and we showed advantage and disadvantage of the approaches with promising results.