-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unjustified NA output? #98
Comments
The main issue here is that covar.1 is on a wildly different scale (never exceeding +/- 0.00001) than your other covariates (which have variance close to 1). plink2's current implementation of logistic regression isn't designed to handle this scale difference. The simplest way to address this is to add the --covar-variance-standardize flag, which linearly rescales all covariates to have mean 0, variance 1. This is sufficient to solve your immediate problem. (I'm strongly considering replacing the current single-precision implementation with a slower double-precision one; but I haven't done so yet since, when used properly, single-precision should be enough. In fact, one recent trend in machine learning is to use even lower precision to achieve higher throughput. Maybe the best temporary solution is for plink2 to error out when this large of a scale difference is present, unless a 'yes-really' modifier is present. I'll try to add this tonight.) Another issue to keep in mind is that this variant has a rather low minor allele frequency (0.73%). Basic logistic regression becomes increasingly unstable as MAF decreases. The Firth logistic regression built into plink2, which you can invoke by replacing --logistic with "--glm firth" or "--glm firth-fallback" (the latter only requests Firth regression when the basic logistic regression yields "NA") is often a good solution here; in addition to almost always reporting an actual result instead of just "NA", it's designed to counter the basic logistic regression's tendency to yield overconfident p-values like 2.6e-57 when there are very few meaningful observations. In this particular case, it doesn't really make a difference when --covar-variance-standardize is already present (in fact, it reports a slightly more extreme p-value of 2.8e-59), since even 0.73% of 40000 allele observations isn't that small. But if you're analyzing more variants like this with plink2, it's practically always a good idea to use at least firth-fallback. |
The temporary solution was implemented on 22 Jan 2019. A better solution is for plink2 to automatically perform variance-standardization, and then convert beta/OR/SE/CI back to the original units in the final report; I'll plan on implementing this within the next two weeks. |
Hello and thanks for the work!
We've been looking at various association tools. Based on the code, it looks like PLINK2 computes the logistic regression model using a Cholesky decomposition method? If that is the case, we were wondering how potential numerical stability issues are handled.
Here is an example where PLINK2 appears to return NA. However if we load the same data into R and compute a logistic model using
glm()
we successfully get a result with a significant p-value.The example data files are attached. The example is run from
~/plink_example
1. This is the PLINK version we used.
2. Run the example in PLINK
Observe plink returns NA. There doesn't appear to be a warning
3. Use PLINK export to convert to a text file
4. Run that text file through R using standard glm()
For this case, R computes successfully and returns a significant p-value of 2.6e-57
Thanks for your time! Please let us know if maybe we missed anything.
The data we used is attached:
plink_example.tar.gz
The text was updated successfully, but these errors were encountered: