Complex Systems Research Laboratory (SoSLab), CMPE, Boğaziçi University

Grand National Assembly of Turkish Parliament's Transcripts

A processable digitized collection of the transcripts of The Grand National Assembly of Turkish Parliament (TBMM) between 1920 and 2015. The texts are represented with universal character coding. The corpus and the source code with resources can be accessed at:

Turkish Parliament Texts (v0.4b) (released corpus)
Turkish Parliament Texts (Source code on GitHub)

Neural NER Tagger for Morphologically RIch Languages

This repository is basically a Bi-LSTM based sequence tagger in both Tensorflow and Dynet which can utilize several sources of information about each word unit like word embeddings, character based embeddings and morphological tags from an FST to obtain the representation for that specific word unit. The details of the methods is explained in:

Gungor, O., Yildiz, E., Uskudarli, S., & Gungor, T. (2017). Morphological Embeddings for Named Entity Recognition in Morphologically Rich Languages. arXiv preprint arXiv:1706.00506.

Repository: https://github.com/onurgu/ner-tagger-dynet

Neural Turkish morphological disambiguator

This tool disambiguates the potential morphological analyses of a Turkish word given its context. It employs word representations that are composed of character based embeddings which are based on the surface form and word embeddings. It models each potential morphological analysis as a composition of two embeddings: embeddings based on character sequences of its root and embeddings based on morpheme sequence in the analysis. This is a re-implementation of:
Shen, Q., Clothiaux, D., Tagtow, E., Littell, P., & Dyer, C. (2016). The Role of Context in Neural Morphological Disambiguation. In COLING (pp. 181-191).

Repository: https://github.com/onurgu/neural-turkish-morphological-disambiguator

S-BounTI: Extracting semantic topics from microblogs

See topic browser and query interface

Topico ontology owl file
Topics published in the semantic Web
Evaluation annotations of topics

S-BounTI is a topic identification approach that identifies topics of a crowd of microblog users. It represents topics using Topico ontology which is designed to express microblog topics.

S-BounTI and Topico are products of Ahmet Yildirim's PhD work under the supervision of Suzan Uskudarli, members of SoSLab in Department of Computer Engineering, Bogazici University, Istanbul, Turkey.

If you have a set of tweets and want to produce semantic topics, you can contact us.

Boun-TI: Identifying Topics in Microblogs Using Wikipedia

tf values, word frequency values for gathering idf values, and evaluation data of our article.

See details

Download dataset

This dataset provides the topics identified by our approach BOUN-TI, on the data collected from Twitter while the 2012 U.S.A. presidential debates were holding. The dataset also provides tf values of words in a Wikipedia snapshot, and the values required to gain idf values of words. Word frequency distribution of an interval of Twitter english public stream tweets' is provided.

If you have a set of tweets and want to produce human readable topics represented by Wikipedia page titles, you can contact us.

If you use this dataset or our approach, you can site this article as follows:

Ahmet Yıldırım, Suzan Üsküdarlı, Arzucan Özgür, Identifying Topics in Microblogs Using Wikipedia, PLOS ONE, 11, 3, pp.1-20, 2016, Public Library of Science, doi: 10.1371/journal.pone.0151885

Accelerometer Based Calculator For Visually-Impaired People Using Mobile Devices

See for details

Data Set (which is explained below) (.plist): Download Data Set

Training results (which is explained below) (.plist): Download Training Result

Gesture Recognition Framework (made with objective-c and XCode) used in both training and classification parts (.framework): Download Framework

Gesture Recognition Project : Download XCode Project

Thesis Document (.pdf) : Thesis in PDF

Within the popularity of new interface devices such as accelerometer based game controllers or touch-screen smartphones, the need of new accessibility options for these interfaces have become emergent. Previous studies gave the idea of using accelerometer based gesture recognition system on touch-screen smartphones with accelerometer as a new interface for visually-impaired people to use touchscreen keyboards. However, almost all studies, which have high accuracy results, are used user-dependent classifications or very limited gesture sets.

Thanks to Dogukan Erenel

Reuters News Co-occurrence Network

Download

We investigate the social network of co-occurrence in Reuters-21578 corpus, which consists of news articles that appeared in the Reuters newswire in 1987. People are represented as vertices and two persons are connected if they co-occur in the same article.

Thanks to Arzucan Özgür.

If you use this data set please cite as follows:

Arzucan Özgür, Haluk Bingol, Social Network of Co-occurence in News Articles, LNSC 3280, pp.688 (2004).
Arzucan Ozgur, Burak Cetin, Haluk Bingol, Co-occurence network of Reuters news, IJMPC, 19, 5, 689-702, 2008. arXiv:0712.2491

Effects Of Obstruction On The Social Influence Model

Download code

We extend Axelrod’s social influence model to incorporate an obstruction into the net- work in the form of a wall and a door on it. In order to investigate the effects of different types of obstructions on the emergence of cultural regions, various wall and door set- tings (in terms of their locations and the width of the door) are examined in a fixed-sized network for agents with a fixed-length of cultural vector and for various heterogeneity levels (ranges of cultural possibilities).

Thanks to Neval Polat Eden

Downloads