A curated list of Turkish AI models, datasets, papers
The purpose of this repo to share and spread the information of Turkish AI models, datasets and papers. The amount of these Turkish resources are low and spread across the web. This repo aims to bring a curated selection of these resources together. This is not a list of all Turkish NLP/LLM models or datasets but a selection. So not all BERT or LLaMA based models are gonna make it here. The same applies to low quality Google translate translations of datasets. We aim each entry to have some kind of unique element to its own. This can be model performance, uniqueness in the task, highlighting the groups/companies (not everyone share their stuff so why not appreciate it!) etc. If you want to add anything you are welcomed 😏 , please check out the contributing section.
- ytu-ce-cosmos/Turkish-Llama
- Trendyol/Llama-3-Trendyol-LLM-8b-chat-v2.0
- TURKCELL/Turkcell-LLM-7b-v1
- KOCDIGITAL/Kocdigital-LLM-8b-v0.1
- WiroAI/OpenR1-Qwen-7B-Turkish Reasoning model
- WiroAI/wiroai-turkish-llm-9b
- Trendyol/tybert
- Trendyol/tyroberta
- ytu-ce-cosmos/turkish-base-bert-uncased
- ytu-ce-cosmos/turkish-colbert
- ytu-ce-cosmos/turkish-gpt2-large
- dbmdz/bert-base-turkish-128k-uncased
- TURKCELL/bert-offensive-lang-detection-tr
- asafaya/kanarya-2b
- boun-tabi-LMG/TURNA
- Helsinki-NLP group Lots of translation models for turkish
- VRLLab/TurkishBERTweet Tweet sentiment analysis
- akdeniz27/bert-base-turkish-cased-ner
To be added
- kesimeg/lora-turkish-clip CLIP model finetuned on turkish dataset
- merve/turkish_instructions Instruction tuning dataset
- BrewInteractive/alpaca-tr Instruction tuning dataset
- AYueksel/TurkishMMLU
- Metin/WikiRAG-TR
- MBZUAI/Bactrian-X
- alibayram/turkish_mmlu
- Helsinki-NLP group Lots of translation models datasets for turkish
- ytu-ce-cosmos/gsm8k_tr
- turkish-nlp-suite/turkish-wikiNER
- turkish-nlp-suite/InstrucTurca
- WiroAI/dolphin-r1-turkish Reasoning dataset
- allenai/c4 Web scrape
- HPLT/HPLT2.0_cleaned Web scrape
- unimelb-nlp/wikiann NER
- ytu-ce-cosmos/Turkish-LLaVA-Finetune
- ytu-ce-cosmos/Turkish-LLaVA-Pretrain
- ytu-ce-cosmos/turkce-kitap
- 99eren99/LLaVA1.5-Data-Turkish
- TasvirEt
- mozilla-foundation/common_voice_17_0 This dataset also has older versions v16,v15, etc.
- Cosmos-LLaVA: Chatting with the Visual
- Introducing cosmosGPT: Monolingual Training for Turkish Language Models
- TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish
To be added
- Glosbe
- Wiktionary
- Zemberek Some turkish NLP tools
- 3rt4nm4n/turkish-apis A list of turkish-apis
- KUIS-AI Youtube channel
- TR-AI Youtube channel
- Trendyol Tech Youtube channel Has videos related to their AI products
- Mukayese: Turkish NLP Strikes Back
- Mukayese github repo
- Wikipedia dumps Can be used as a dataset
If you got anything to be added here just make a pull request! Before making a pull request please consider if a model/dataset/etc. has enough quality/uniqueness. Huggingface is crowded with finetuning of LLama and BERT, same applies to dataset. Many datasets have multiple machine translation version. This makes it hard to find good quality sources. We want to keep this list as curated as possible but still be able to cover enough sources.