Skip to content

kesimeg/awesome-turkish-language-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

awesome-turkish-language-models Awesome

A curated list of Turkish AI models, datasets, papers

The purpose of this repo to share and spread the information of Turkish AI models, datasets and papers. The amount of these Turkish resources are low and spread across the web. This repo aims to bring a curated selection of these resources together. This is not a list of all Turkish NLP/LLM models or datasets but a selection. So not all BERT or LLaMA based models are gonna make it here. The same applies to low quality Google translate translations of datasets. We aim each entry to have some kind of unique element to its own. This can be model performance, uniqueness in the task, highlighting the groups/companies (not everyone share their stuff so why not appreciate it!) etc. If you want to add anything you are welcomed 😏 , please check out the contributing section.

Table of Contents

Models

LLMs

  1. ytu-ce-cosmos/Turkish-Llama
  2. Trendyol/Llama-3-Trendyol-LLM-8b-chat-v2.0
  3. TURKCELL/Turkcell-LLM-7b-v1
  4. KOCDIGITAL/Kocdigital-LLM-8b-v0.1
  5. WiroAI/OpenR1-Qwen-7B-Turkish Reasoning model
  6. WiroAI/wiroai-turkish-llm-9b

VLMs

  1. ytu-ce-cosmos/Turkish-LLaVA

NLP

  1. Trendyol/tybert
  2. Trendyol/tyroberta
  3. ytu-ce-cosmos/turkish-base-bert-uncased
  4. ytu-ce-cosmos/turkish-colbert
  5. ytu-ce-cosmos/turkish-gpt2-large
  6. dbmdz/bert-base-turkish-128k-uncased
  7. TURKCELL/bert-offensive-lang-detection-tr
  8. asafaya/kanarya-2b
  9. boun-tabi-LMG/TURNA
  10. Helsinki-NLP group Lots of translation models for turkish
  11. VRLLab/TurkishBERTweet Tweet sentiment analysis
  12. akdeniz27/bert-base-turkish-cased-ner

Speech models

To be added

Multi-modal models

  1. kesimeg/lora-turkish-clip CLIP model finetuned on turkish dataset

Datasets

Text only

  1. merve/turkish_instructions Instruction tuning dataset
  2. BrewInteractive/alpaca-tr Instruction tuning dataset
  3. AYueksel/TurkishMMLU
  4. Metin/WikiRAG-TR
  5. MBZUAI/Bactrian-X
  6. alibayram/turkish_mmlu
  7. Helsinki-NLP group Lots of translation models datasets for turkish
  8. ytu-ce-cosmos/gsm8k_tr
  9. turkish-nlp-suite/turkish-wikiNER
  10. turkish-nlp-suite/InstrucTurca
  11. WiroAI/dolphin-r1-turkish Reasoning dataset
  12. allenai/c4 Web scrape
  13. HPLT/HPLT2.0_cleaned Web scrape
  14. unimelb-nlp/wikiann NER

Text & Images

  1. ytu-ce-cosmos/Turkish-LLaVA-Finetune
  2. ytu-ce-cosmos/Turkish-LLaVA-Pretrain
  3. ytu-ce-cosmos/turkce-kitap
  4. 99eren99/LLaVA1.5-Data-Turkish
  5. TasvirEt

Text & Speech

  1. mozilla-foundation/common_voice_17_0 This dataset also has older versions v16,v15, etc.

Papers

  1. Cosmos-LLaVA: Chatting with the Visual
  2. Introducing cosmosGPT: Monolingual Training for Turkish Language Models
  3. TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Benchmarks

  1. malhajar/OpenLLMTurkishLeaderboard_v0.2
  2. KUIS-AI/Cetvel

Tutorials and Codes

To be added

Tools and APIs

  1. Glosbe
  2. Wiktionary
  3. Zemberek Some turkish NLP tools
  4. 3rt4nm4n/turkish-apis A list of turkish-apis

State of AI in Türkiye

  1. KUIS-AI Youtube channel
  2. TR-AI Youtube channel
  3. Trendyol Tech Youtube channel Has videos related to their AI products

Miscellaneous

  1. Mukayese: Turkish NLP Strikes Back
  2. Mukayese github repo
  3. Wikipedia dumps Can be used as a dataset

Contributing

If you got anything to be added here just make a pull request! Before making a pull request please consider if a model/dataset/etc. has enough quality/uniqueness. Huggingface is crowded with finetuning of LLama and BERT, same applies to dataset. Many datasets have multiple machine translation version. This makes it hard to find good quality sources. We want to keep this list as curated as possible but still be able to cover enough sources.

Releases

No releases published

Packages

No packages published