Project: Information Retrieval-Based Question Answering System
The project was done while studying "Introduction to Natural Language Processing" unit
The project includes following sections:
- Read question-answer data: The question-answer pairs data were collected from classmates. These data were categorized by topics.
- Extract text data features: The TF-IDF technique was used to extract features.
- Train model to predict the topic of the input question.
- Find answers for input questions:
- The input question was fed into the model to predict its topic.
- Cosine Similarity was used to search for similar questions with the same topic in the available question-answer dataset.
- Diplay the answer from the question with the highest similarity from the dataset.
Library | Version |
---|---|
numpy | 1.19.5 |
scipy | 1.5.4 |
keras_preprocessing | 1.1.2 |
sklearn | 0.19.0 |
- retrieval_based_qa.ipynb file: the main Jupyter Notebook file of the project.
- chatbot folder: The text files contained the data of question-answer pairs categorized by topic.
Refer to the section 'Information Retrieval based chatbots (IR-based)' at: Tìm hiểu và xây dựng hệ thống chatbot trong thực tế
The project was done by a group of 3 members: