This repository aggregates datasets that can be used to develop conversational AI techniques. In this repository, we cover the research tasks of open-domain conversation, conversational recommendation and conversational search.
Dataset | #dialogues | collection | year | download |
---|---|---|---|---|
QuAC | 13,569 | Crowdsourcing | 2018 | Download |
MANtIS | 80,324 | Stack Exchange | 2019 | Download |
CoQA | 8,399 | Crowdsourcing | 2019 | Download |
ShARC | 948 | Crowdsourcing | 2018 | Download |
MSDialog | 2,199 | Microsoft Community | 2018 | Download |
Dataset | #dialogues | Corpus Size | collection | year | download |
---|---|---|---|---|---|
CAsT-19,20,21,22 | 30 - 50 | 38,426,252 | Crowdsourcing | 2019 | Download |
OR-QuAC | 5,644 | 11,377,951 | Update QuAC for self-containment | 2020 | Download |
Dataset | #dialogues | #utternaces | domain | collection | language | year | download |
---|---|---|---|---|---|---|---|
ReDial | 10,006 | 182,150 | Movie | Amazon Mechanical Turk (AMT) | ENG | 2018 | Download |
OpenDialKG | 12,320 | 71,873 | Movies & Books | KG-walk Crowdsourcing | ENG | 2019 | Download |
INSPIRED | 1,001 | 35,811 | Movie | Social-encouraged crowdsourcing (AMT) | ENG | 2020 | Download |
TG-ReDial | 10,000 | 129,392 | Movie | Topic-driven generation, crowdsourcing | CHN | 2020 | Download |
DuRecDial2.0 | 16,482 | 255,346 | Movie, music, star, food, restaurant, weather | translation from DuRecDial (crowdsourced) | ENG, CHN | 2021 | Download |
INSPIRED2 | 1,001 | 35,811 | Movie | clean & augment INSPIRED | ENG | 2022 | Download |
U-NEED | 7,698 | 53,712 | e-commerce | pre-sale dialogues from Taobao | CHN | 2023 | Download |
PEARL | 57,277 | 548,061 | Movie | review-based syntheic dialogues | ENG | 2024 | Download |
Dataset | #dialogues | #utternaces | #domain | collection | language | year | download |
---|---|---|---|---|---|---|---|
MultiWoZ | 8,438 | 113,556 | 7 | Wizard-of-Oz | EN | 2018 | Download |
SGD | 16,142 | 329,964 | 16 | outline simulation then crowdsourced paraphrasing | EN | 2020 | Download |
Dataset | Paper | Link |
---|---|---|
MG-ShopDial | MG-ShopDial: A Multi-Goal Conversational Dataset for e-Commerce | link |
Dataset | Paper | Link |
---|---|---|
DialogStudio | DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI | link |