Grand National Assembly of Turkish Parliament's Transcripts

A processable digitized collection of the transcripts of The Grand National Assembly of Turkish Parliament (TBMM) between 1920 and 2015. The texts are represented with universal character coding. The corpus and the source code with resources can be accessed at:

Neural NER Tagger for Morphologically RIch Languages

This repository is basically a Bi-LSTM based sequence tagger in both Tensorflow and Dynet which can utilize several sources of information about each word unit like word embeddings, character based embeddings and morphological tags from an FST to obtain the representation for that specific word unit. The details of the methods is explained in:

Gungor, O., Yildiz, E., Uskudarli, S., & Gungor, T. (2017). Morphological Embeddings for Named Entity Recognition in Morphologically Rich Languages. arXiv preprint arXiv:1706.00506.


Neural Turkish morphological disambiguator

This tool disambiguates the potential morphological analyses of a Turkish word given its context. It employs word representations that are composed of character based embeddings which are based on the surface form and word embeddings. It models each potential morphological analysis as a composition of two embeddings: embeddings based on character sequences of its root and embeddings based on morpheme sequence in the analysis. This is a re-implementation of:
Shen, Q., Clothiaux, D., Tagtow, E., Littell, P., & Dyer, C. (2016). The Role of Context in Neural Morphological Disambiguation. In COLING (pp. 181-191).


S-BounTI: Extracting semantic topics from microblogs

See topic browser and query interface

Topico ontology owl file
Topics published in the semantic Web
Evaluation annotations of topics

S-BounTI is a topic identification approach that identifies topics of a crowd of microblog users. It represents topics using Topico ontology which is designed to express microblog topics.

S-BounTI and Topico are products of Ahmet Yildirim's PhD work under the supervision of Suzan Uskudarli, members of SoSLab in Department of Computer Engineering, Bogazici University, Istanbul, Turkey.

If you have a set of tweets and want to produce semantic topics, you can contact us.

Boun-TI: Identifying Topics in Microblogs Using Wikipedia

tf values, word frequency values for gathering idf values, and evaluation data of our article.

See details

Download dataset


This dataset provides the topics identified by our approach BOUN-TI, on the data collected from Twitter while the 2012 U.S.A. presidential debates were holding. The dataset also provides tf values of words in a Wikipedia snapshot, and the values required to gain idf values of words. Word frequency distribution of an interval of Twitter english public stream tweets' is provided.

If you have a set of tweets and want to produce human readable topics represented by Wikipedia page titles, you can contact us.

If you use this dataset or our approach, you can site this article as follows:

  • Ahmet Yıldırım, Suzan Üsküdarlı, Arzucan Özgür, Identifying Topics in Microblogs Using Wikipedia, PLOS ONE, 11, 3, pp.1-20, 2016, Public Library of Science, doi: 10.1371/journal.pone.0151885

Accelerometer Based Calculator For Visually-Impaired People Using Mobile Devices

See for details

  • Data Set (which is explained below) (.plist): Download Data Set
  • Training results (which is explained below) (.plist): Download Training Result
  • Gesture Recognition Framework (made with objective-c and XCode) used in both training and classification parts (.framework): Download Framework
  • Gesture Recognition Project : Download XCode Project
  • Thesis Document (.pdf) : Thesis in PDF
  • Within the popularity of new interface devices such as accelerometer based game controllers or touch-screen smartphones, the need of new accessibility options for these interfaces have become emergent. Previous studies gave the idea of using accelerometer based gesture recognition system on touch-screen smartphones with accelerometer as a new interface for visually-impaired people to use touchscreen keyboards. However, almost all studies, which have high accuracy results, are used user-dependent classifications or very limited gesture sets.

    Thanks to Dogukan Erenel

    Reuters News Co-occurrence Network


    We investigate the social network of co-occurrence in Reuters-21578 corpus, which consists of news articles that appeared in the Reuters newswire in 1987. People are represented as vertices and two persons are connected if they co-occur in the same article.

    Thanks to Arzucan Özgür.

    If you use this data set please cite as follows:

    Effects Of Obstruction On The Social Influence Model

    Download code

    We extend Axelrod’s social influence model to incorporate an obstruction into the net- work in the form of a wall and a door on it. In order to investigate the effects of different types of obstructions on the emergence of cultural regions, various wall and door set- tings (in terms of their locations and the width of the door) are examined in a fixed-sized network for agents with a fixed-length of cultural vector and for various heterogeneity levels (ranges of cultural possibilities).

    Thanks to Neval Polat Eden