As it is not possible for an organization to star/follow repositories/organizations, here is a list of potentially interesting github repositories/organizations/users and links to various other things that might be of interest (e.g. guidelines, papers, ...).
- markdoc - literate programming for Stata
- weaver - more dynamic docs from Stata (more like a log file though)
- github - install Stata packages from github
- rcall - call R from within Stata
- ftools - faster variant of Stata's collapse/by/egen (particularly useful for larger datasets)
- gtools - even faster variant of Stata's collapse/by/egen (particularly useful for larger datasets)
- fastreshape - a faster variant of reshape (syntax is almost identical to reshape)
- World banks iefieldkit - commands for data collection
- World banks ietoolkit - commands for impact evaluation
- World banks stata-linter for tidying code/highlighting poor coding practices
- CodeMap - MAC application for examining dependencies in R or Stata analyses
- Power calculation
- equivalent of R's here package for Stata: here
- Packages
Gmisc
- package that includes a nice approach to creating flowcharts- flowchart - Alan's flowchart package - create flowcharts without needing to play around with the layout (much)
- Predictions from mixed models, with SEs are available in the
AICcmodavg
package atable
package for baseline tables. Very flexible.gtsummary
also for baseline tables. Also flexible.metamisc
'Facilitate meta-analysis of diagnosis and prognosis research studies. It includes functions to summarize multiple estimates of prediction model discrimination and calibration performance, as described by Debray et al. (2019) doi:10.1177/0962280218785504. It also includes functions to evaluate funnel plot asymmetry, as described by Debray et al. (2018) doi:10.1002/jrsm.1266. Finally, the package provides functions for developing multivariable prediction models from datasets with clustering. 'ipcwswitch
'Inverse Probability of Censoring Weights to Deal with Treatment Switch in Randomized Clinical Trials'margins
ports much of Stata's margins function to R- Power calculation:
ggsurvfit
for KM plots
- CodeMap - MAC application for examining dependencies in R or Stata analyses
- Scottish health/social care - Various stuff including funnel plots
- R package template (based on secuTrialR). Includes e.g. continuous integration testing with appveyor and TravisCI. Click the "use this template" button to copy it to a new repo (under a new name) then edit it as required.
- "Books" etc
- Free online resources for learning R
- Big Book of R - listing of a lot of online R books
- Flexible Imputation of Missing Data book from the author of the
mice
package. - Forecasting: Principles and Practice Uses R for time series analysis and forecasting
- Data visualization lots on ggplot and R
- Bayesian inference with INLA covers a LOT of INLA stuff.
- Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA
- Michael Clarks documents loads of tips and how-to for R
- Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R
- An Introduction for Statistical Learning includes a link to the PDF of the 1st edition
- Statistical rethinking with brms, ggplot2, and the tidyverse is based on "Statistical rethinking" and fits models with the rbms package
- Elements of Statistical Learning includes a link to the 2nd edition
- Outstanding User Interfaces with Shiny
- The Art of Data Science by Roger Peng and Elizabeth Matsui
- RStudio::conf materials contains (some) talks and workshops from the different conferences
- tidymodel resources (mostly from here)
- A Gentle Introduction to tidymodels
- Introduction to Machine Learning with the Tidyverse
- Choose your own tidymodels adventure
- Tidy Modeling with R (a tidymodels pagedown)
- Get started with tidymodels and #TidyTuesday Palmer penguins
- and generally other stuff from Julia Silge
- tidmodels labs
- the tidymodels site, of course
- notes on a book club from the R4DS people
- tidymodels labes based on An Intro to Statistical Learning
- RStudio::conf 2022 workshop on tidymodels by Julia Silge + Max Kuhn + David Robinson
- not tidymodels, but ML... code for ML by ML in neuroscience lab in zurich
- and their paper on ML for clinicians which refers to the code
- also not strictly tidymodels, but ML a comic book on ML in R
- marginal means from models with interactions
- mapping with R - gallery of maps from the #30DayMapChallenge with links to code
- Discussions on .Rproj and here - Jenny Bryan is for them: Project-oriented workflow, while Tarak Shah seems to be against them: .Rproj considered harmful
Validating R itself is a big task. The R Foundation Board has a statement on the issue which covers a few points that need not be covered by a validation of R itself (as they should be covered by the host system, e.g. Windows, MacOS, ...), but puts the onus for other issues in the hands of the organisation using the software.
The R validation hub has various info on the topic, including a white paper. They have also created an R Package, riskmetric
for estimating the risk posed by a given package based on it's documentation, bugs/issues, downloads etc.
The valtools
package may be useful for documenting validation of R packages (see a presentation on the package here) and a white paper on the topic from the PHUSE group.
Information from RStudio on environment validation
Stata's internal approach to testing https://journals.sagepub.com/doi/10.1177/1536867X0100100102
Verifizierung und Validierung: Unterschied & Definitionen
- StatTag - a method to make dynamic WORD documents, supports Stata, R, and others
- Use Notepad++ as a Stata editor
- Use Notepad++ as an R code editor (see also sourceforge for the exe)
- Datamethods reference collection on "common myths"
- e.g. has sections on post-hoc power (see also this paper from Feb. 2022), p-values in baseline tables and much more. A very useful resource.
- BBR - Biostats for biomedical research
- Frank Harrell and co's course
- PDF to course material
- ebook website
- Youtube videos of sessions/"lectures"
- Ben Bolkers mixed models 'FAQ'
- got a question about mixed models (particularly in R, but also in general), there's a chance it's covered there
- Intro to statistical learning book website (includes PDF)
- Common statistical tests are linear models (even non-parametric ones)
- Statistical Problems to Document and to Avoid. Checklist for Authors
- git/github notes
- C++ by example
- presize
- Stepped wedge
- Sample size for prediction models
- A note on estimating the Cox-Snell R2 from a reported C statistic (AUROC) to inform sample size calculations for developing a prediction model with a binary outcome
- Sample size for binary logistic prediction models: Beyond events per variable criteria
- Calculating the sample size required for developing a clinical prediction model
- No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
- Minimum sample size for developing a multivariable prediction model: Part I – Continuous outcomes
- Minimum sample size for external validation of a clinical prediction model with a continuous outcome
- Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes (pmsampsize for stata and R)
- Minimum sample size for external validation of a clinical prediction model with a binary outcome
- Sample size considerations and predictive performance of multinomial logistic prediction models
- Adaptive sample size determination for the development of clinical prediction models
- Some suggestions for measuring predictive performance
- WebPower - pretty powerful looking website/R package with a wide range of methods/tests.
- A simple, step-by-step guide to interpreting decision curve analysis
- Table 2 Fallacy on interpretation of model parameters in the presence of confounding/effect modifiers
- Standardized classification and framework for reporting, interpreting, and analysing medication non-adherence in cardiovascular clinical trials: a consensus report from the Non-adherence Academic Research Consortium (NARC) - a method for handling levels of adherence
- Introduction to computational causal inference using reproducible Stata, R, and Python code: A tutorial - We illustrate the implementation of different methods [G formula/methods, IPTW, MSM, double-robust, targeted maximum likelihood estimation (TMLE)] using an empirical example from the Connors study based on intensive care medicine, and most importantly, we provide reproducible and commented code in Stata, R, and Python for researchers to adapt in their own observational study.
- R, Stata and SAS code is available here
- Covariate Adjustment in Randomized Trials - Covariate Adjustment in Randomized Trials
- RMST
- Other possible CIs - Some new confidence intervals for Kaplan-Meier based estimators from one and two sample survival data
- Are restricted mean survival time methods especially useful for noninferiority trials?
- Why restricted mean survival time methods are especially useful for non-inferiority trials
- Adding a new analytical procedure with clinical interpretation in the tool box of survival analysis
- Utility of Restricted Mean Survival Time Analysis for Heart Failure Clinical Trial Evaluation and Interpretation
- A comparison of different population-level summary measures for randomised trials with time-to-event outcomes, with a focus on non-inferiority trials
- When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts - useful as a reference for the choice of 5% missing to start using MI and >40% missing as a reason to stop
- Win odds: An adaptation of the win ratio to include ties
- Lessons learnt when accounting for competing events in the external validation of time-to-event prognostic models - long story short... its important...
- FDA guidance on multiple testing in clinical trials
- A roadmap to using randomization in clinical trials - a comparison of different randomization methods
- Statistical code for clinical research papers in a high-impact specialist medical journal
- Ten simple rules for initial data analysis
- TRIPOD - guidelines for reporting of predictive/prognostic models (validation or derivation)
- Latent trajectory studies (GRoLTS)
- STROBE - guidelines for reporting of observational studies
- STROBE website
- STROBE checklists
- The STROBE statement itself has been published in many journals (see here)
- CONSORT - guidelines for publishing RCTs
- Checklist
- Statement
- Flow diagram
- DISCOURAGES USE OF P-VALUES/CONFIDENCE INTERVALS/STANDARD ERRORS IN DESCRIPTIVE TABLES (e.g. Table 1) - POSSIBLY USEFUL TO REFUTE REVIEWER REQUESTS FOR THEM (see also the Datamethods reference collection on "common myths")
- Extension for adaptive designs
- (checklist in the supplementary materials)
- CONSORT-AI site, including checklist, and the publication
- PRISMA - guidelines for transparent reporting of systematic reviews
- Homepage
- Checklist
- Extensions to the original statement
- Link registration details page where registrations are actually made on PROSPERO
- Note that systematic reviews/metaanalyses are sometimes/often rejected by journals for not having been registered
The EQUATOR Network also has links to many other guidelines (SPIRIT, CARE, AGREE, ...)
Not a reporting guideline per se, but a method of assessing risk of bias and applicability of prediction model studies - PROBAST
- PROBAST paper
- further explanations and elaboration of PROBAST
- useful along side TRIPOD perhaps?
- use of PROBAST to assess ML models in ocology. Long story short, most models are high risk
123 (81%, 95% CI: 73.8 to 86.4) developed models and 19 (51%, 95% CI: 35.1 to 67.3) validated models were at high risk of bias due to their analysis, mostly due to shortcomings in the analysis including insufficient sample size and split-sample internal validation
The COMET Initiative has a searchable list of standardised outcome sets for diseases, conditions etc.
SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials)
- SPIRIT-AI website and publication
ICEMAN provides an approach for assessing the credibility of subgroup analyses.
Feel free to add to this list.