This dataset was queried from stackexchange.com.
It consists of two different csv files.
- One containing the questions posts
- Another containing the answer posts
This project was worked upon by the following 4 members together as a team:
- Siddharth Suresh
- Sneha Choudhary
- Suchetha Sharma
- Vidhi Gupta
The intent of the project was to use regression modeling to:
- Determine the parameters that contributed to a higher probability of getting an accepted answer for a question
- Predict the score of a posted Answer
- Determine the parameters that contributed to a higher probability of getting an accepted answer for a question
- Increase in Score, ViewCount, PostLength_Words, InactiveSince_Days increases the odds of getting an accepted answer.
- Increase in AnswerCount, CommentCount and MeanAnswerTime_Days decreases the odds of getting an accepted answer.
- If a question is ever marked favorite, it increases the odds of getting an accepted answer.
- Questions belonging to , and tag have higher odds of getting an accepted answer.
- Questions belonging to and have lower odds of getting an accepted answer.
- Predicting the score of a posted Answer
The entire Dataset was split into Training (70%) and Testing (30%) Dataframe. Min-Max accuracy method was implemented which is the average between the minimum and the maximum prediction of the model.
The result achieved was 91.6% implying that the model is statistically significant.