Math of Big Data Project

This project aims to analyze the forgetting curve based on the exposure to words and phrases using the Duolingo learning dataset. The forgetting curve is a mathematical model that describes how information is forgotten over time.

Methodology

The methodology for this project is based on Duolingo's half-life regression model. This model takes into account the number of times a user has been exposed to specific words and phrases, as well as their performance on related exercises. By analyzing this data, we can gain insights into the relationship between exposure and the rate of forgetting.

Goals

The main goals of this project are:

Analyze the relationship between exposure to words/phrases and the rate of forgetting.
Develop a predictive model for the forgetting curve based on the dataset.
Explore potential applications of the forgetting curve model in language learning and education.

Steps

To achieve these goals, the following steps will be taken:

Data preprocessing: Clean and prepare the Duolingo dataset for analysis.
Exploratory data analysis: Explore the dataset to gain insights into the distribution of exposure and forgetting rates.
Model development: Build a predictive model for the forgetting curve using machine learning techniques.
Model evaluation: Assess the performance of the model and validate its predictive capabilities.
Interpretation and application: Analyze the results and discuss potential implications for language learning.

By following this methodology and leveraging Duolingo's half-life regression model, we can gain valuable insights into the forgetting curve and its implications for language learning.

Authors

This project is authored by Haram Yoon and Aimee Co.

Dataset

The dataset used for this project is the Duolingo learning dataset, which contains information about the learning progress of users on the Duolingo platform. It includes data on the number of times a user has been exposed to specific words and phrases, as well as their performance on related exercises.

Goals

The main goals of this project are:

Analyze the relationship between exposure to words/phrases and the rate of forgetting.
Develop a predictive model for the forgetting curve based on the dataset.
Explore potential applications of the forgetting curve model in language learning and education.

Methodology

To achieve these goals, the project is based on the half-life regression model from Duolingo. The following steps will be taken:

Data preprocessing: Clean and prepare the Duolingo dataset for analysis.
Exploratory data analysis: Explore the dataset to gain insights into the distribution of exposure and forgetting rates.
Model development: Build a predictive model for the forgetting curve using machine learning techniques.
Model evaluation: Assess the performance of the model and validate its predictive capabilities.
Interpretation and application: Analyze the results and discuss potential implications for language learning.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
results		results
.gitignore		.gitignore
Half_life_Regression___An_Investigation_of_the_Forgetting_Curve (2).pdf		Half_life_Regression___An_Investigation_of_the_Forgetting_Curve (2).pdf
README.md		README.md
autoencoder.py		autoencoder.py
evaluation.r		evaluation.r
experiment.py		experiment.py
hierarchical.py		hierarchical.py
kmeans.py		kmeans.py
model.ipynb		model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Math of Big Data Project

Methodology

Goals

Steps

Authors

Dataset

Goals

Methodology

About

Releases

Packages

Contributors 2

Languages

haram082/Duolingo_Spaced_Repetition_Models

Folders and files

Latest commit

History

Repository files navigation

Math of Big Data Project

Methodology

Goals

Steps

Authors

Dataset

Goals

Methodology

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages