Skip to content

haram082/Duolingo_Spaced_Repetition_Models

Repository files navigation

Math of Big Data Project

This project aims to analyze the forgetting curve based on the exposure to words and phrases using the Duolingo learning dataset. The forgetting curve is a mathematical model that describes how information is forgotten over time.

Methodology

The methodology for this project is based on Duolingo's half-life regression model. This model takes into account the number of times a user has been exposed to specific words and phrases, as well as their performance on related exercises. By analyzing this data, we can gain insights into the relationship between exposure and the rate of forgetting.

Goals

The main goals of this project are:

  1. Analyze the relationship between exposure to words/phrases and the rate of forgetting.
  2. Develop a predictive model for the forgetting curve based on the dataset.
  3. Explore potential applications of the forgetting curve model in language learning and education.

Steps

To achieve these goals, the following steps will be taken:

  1. Data preprocessing: Clean and prepare the Duolingo dataset for analysis.
  2. Exploratory data analysis: Explore the dataset to gain insights into the distribution of exposure and forgetting rates.
  3. Model development: Build a predictive model for the forgetting curve using machine learning techniques.
  4. Model evaluation: Assess the performance of the model and validate its predictive capabilities.
  5. Interpretation and application: Analyze the results and discuss potential implications for language learning.

By following this methodology and leveraging Duolingo's half-life regression model, we can gain valuable insights into the forgetting curve and its implications for language learning.

Authors

This project is authored by Haram Yoon and Aimee Co.

Dataset

The dataset used for this project is the Duolingo learning dataset, which contains information about the learning progress of users on the Duolingo platform. It includes data on the number of times a user has been exposed to specific words and phrases, as well as their performance on related exercises.

Goals

The main goals of this project are:

  1. Analyze the relationship between exposure to words/phrases and the rate of forgetting.
  2. Develop a predictive model for the forgetting curve based on the dataset.
  3. Explore potential applications of the forgetting curve model in language learning and education.

Methodology

To achieve these goals, the project is based on the half-life regression model from Duolingo. The following steps will be taken:

  1. Data preprocessing: Clean and prepare the Duolingo dataset for analysis.
  2. Exploratory data analysis: Explore the dataset to gain insights into the distribution of exposure and forgetting rates.
  3. Model development: Build a predictive model for the forgetting curve using machine learning techniques.
  4. Model evaluation: Assess the performance of the model and validate its predictive capabilities.
  5. Interpretation and application: Analyze the results and discuss potential implications for language learning.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published