Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update testing practice #6

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

d-alex-hughes
Copy link
Contributor

This PR creates an activity that gives students a chance to run four different hypothesis tests against some real, interesting data.

If these tests look good, then

  • Save this file into a new file called *_answers.Rmd
  • Remove the answers from this document, and then save.
  • Update the README link to create a nbgitpuller link that will bring this data into the ischool.datahub.berkeley.edu environment.

Copy link
Contributor

@paul-laskowski paul-laskowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some really great practice here! My only big comment is that this is a LOT of work. We could call it a homework, but to help the most students I think it's worth thinking of how to make it go a lot faster.

---

# Hypothesis Test Practice Activity

In this short activity, you're going to write, and execute a short series of hypothesis tests using the `R` estimating framework.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what the R estimating framework is. just use R? I don't think short is accurate at the moment :D

Suggested change
In this short activity, you're going to write, and execute a short series of hypothesis tests using the `R` estimating framework.
In this activity, you're going to execute a series of hypothesis tests using R.

- `salary_potential`
- `diversity_school`

We are going to ask as series of questions that can be answered with the constrained set of tests that we have available to use from the course.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just trying to clarify a bit

Suggested change
We are going to ask as series of questions that can be answered with the constrained set of tests that we have available to use from the course.
We are going to ask a series of questions that can be answered with one of the hypothesis tests presented in the course.


For each of the questions, the data *is* available, but you might have to join a table or two, recode a variable or two, or otherwise do a little bit of data work to get the data ready to run the test.

For each test that you conduct, please (a) evaluate the assumptions of the test to see if the data satisfy these assumptions; (b) state the null hypothesis that is being evaluated; (c) state the criteria that would lead you to reject the null hypothesis; (d) conduct and interpret the test; and (e) tell us whether the difference you observe between the groups is an *important* one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"state the criteria that would lead you to reject the null hypothesis" is it valid for a student to write "p<.05" for every test? perhaps that's a reason to remove this component and test it more directly another way.

```

Now then, using data that is available to you, please test whether public school tuition has changed from 1985 to the present. In order to do you, you will have to select the appropriate rows and columns of data, and conduct the appropriate test for this data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big issue I see is that you don't know the 1985 tuition level - you only have an estimate, but your one-sample test below treats this as a true value.

To fix this, you could instead create a single variable to be the difference in tuition from 1985 to the present, then test if it's zero. Students might not initially recognize this as a place to use a one sample test, but it coudl be good learning for them.


1. What is the appropriate test?
2. What are the assumptions for this test?
3. Are these assumptions for the test satisfied in this case?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that this phrasing makes it sound like we're really interested in a final yes or no answer. I prefer the way we did it on the lab, something like "evaluate each assumption, based on your background knowledge, data visualizations, and numerical summaries."

```

1. Are the tuition costs different?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a gotcha question? if students answer yes or no, I would call that wrong - you just have evidence, you haven't proved the hypothesis. perhaps something like the following?

Suggested change
1. Are the tuition costs different?
1. Have you found evidence that the tuition costs are different?

```

1. Are the tuition costs different?
2. Is this a big difference or a little difference? What makes you think this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importance feels more clear to me.

Suggested change
2. Is this a big difference or a little difference? What makes you think this?
2. Is this an important difference or an unimportant difference? What makes you think this?

geom_point(alpha = 0.3) +
facet_grid(cols = vars(type))
```
Given what you have seen, how would you recommend proceeding with your test? Proceed in the way that you think is most reasonable. State the assumptions for your test, evaluate whether they are satisfied, and conduct the test, describing what you have learned about the statistics, and also what the practical meaning of these statistics are.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the learning is about the effect or the model parameters or the difference in tuition, not the statistics, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants