links

As it is not possible for an organization to star/follow repositories/organizations, here is a list of potentially interesting github repositories/organizations/users and links to various other things that might be of interest (e.g. guidelines, papers, ...).

SCTO Organisation

Stata

markdoc - literate programming for Stata
weaver - more dynamic docs from Stata (more like a log file though)
github - install Stata packages from github
rcall - call R from within Stata
ftools - faster variant of Stata's collapse/by/egen (particularly useful for larger datasets)
gtools - even faster variant of Stata's collapse/by/egen (particularly useful for larger datasets)
fastreshape - a faster variant of reshape (syntax is almost identical to reshape)
World banks iefieldkit - commands for data collection
World banks ietoolkit - commands for impact evaluation
World banks stata-linter for tidying code/highlighting poor coding practices
CodeMap - MAC application for examining dependencies in R or Stata analyses
Power calculation
- Simon's two stage design
equivalent of R's here package for Stata: here

Validation

R

Packages
- Gmisc - package that includes a nice approach to creating flowcharts
  - Link to vignette
- flowchart - Alan's flowchart package - create flowcharts without needing to play around with the layout (much)
- Predictions from mixed models, with SEs are available in the AICcmodavg package
- atable package for baseline tables. Very flexible.
- gtsummary also for baseline tables. Also flexible.
- metamisc 'Facilitate meta-analysis of diagnosis and prognosis research studies. It includes functions to summarize multiple estimates of prediction model discrimination and calibration performance, as described by Debray et al. (2019) doi:10.1177/0962280218785504. It also includes functions to evaluate funnel plot asymmetry, as described by Debray et al. (2018) doi:10.1002/jrsm.1266. Finally, the package provides functions for developing multivariable prediction models from datasets with clustering. '
- ipcwswitch 'Inverse Probability of Censoring Weights to Deal with Treatment Switch in Randomized Clinical Trials'
- margins ports much of Stata's margins function to R
- Power calculation:
- ggsurvfit for KM plots
CodeMap - MAC application for examining dependencies in R or Stata analyses
Scottish health/social care - Various stuff including funnel plots
R package template (based on secuTrialR). Includes e.g. continuous integration testing with appveyor and TravisCI. Click the "use this template" button to copy it to a new repo (under a new name) then edit it as required.
"Books" etc
- Free online resources for learning R
- Big Book of R - listing of a lot of online R books
- Flexible Imputation of Missing Data book from the author of the mice package.
- Forecasting: Principles and Practice Uses R for time series analysis and forecasting
- Data visualization lots on ggplot and R
- Bayesian inference with INLA covers a LOT of INLA stuff.
- Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA
- Michael Clarks documents loads of tips and how-to for R
- Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R
- An Introduction for Statistical Learning includes a link to the PDF of the 1st edition
- Statistical rethinking with brms, ggplot2, and the tidyverse is based on "Statistical rethinking" and fits models with the rbms package
- Elements of Statistical Learning includes a link to the 2nd edition
- Outstanding User Interfaces with Shiny
- The Art of Data Science by Roger Peng and Elizabeth Matsui
- RStudio::conf materials contains (some) talks and workshops from the different conferences
tidymodel resources (mostly from here)
- A Gentle Introduction to tidymodels
- Introduction to Machine Learning with the Tidyverse
- Choose your own tidymodels adventure
- Tidy Modeling with R (a tidymodels pagedown)
- Get started with tidymodels and #TidyTuesday Palmer penguins
  - and generally other stuff from Julia Silge
- tidmodels labs
- the tidymodels site, of course
- notes on a book club from the R4DS people
- tidymodels labes based on An Intro to Statistical Learning
- RStudio::conf 2022 workshop on tidymodels by Julia Silge + Max Kuhn + David Robinson
- not tidymodels, but ML... code for ML by ML in neuroscience lab in zurich
  - and their paper on ML for clinicians which refers to the code
- also not strictly tidymodels, but ML a comic book on ML in R
marginal means from models with interactions
mapping with R - gallery of maps from the #30DayMapChallenge with links to code
Discussions on .Rproj and here - Jenny Bryan is for them: Project-oriented workflow, while Tarak Shah seems to be against them: .Rproj considered harmful

Validation

Validating R itself is a big task. The R Foundation Board has a statement on the issue which covers a few points that need not be covered by a validation of R itself (as they should be covered by the host system, e.g. Windows, MacOS, ...), but puts the onus for other issues in the hands of the organisation using the software.

The R validation hub has various info on the topic, including a white paper. They have also created an R Package, riskmetric for estimating the risk posed by a given package based on it's documentation, bugs/issues, downloads etc.

The valtools package may be useful for documenting validation of R packages (see a presentation on the package here) and a white paper on the topic from the PHUSE group.

Information from RStudio on environment validation

Stata's internal approach to testing https://journals.sagepub.com/doi/10.1177/1536867X0100100102

Verifizierung und Validierung: Unterschied & Definitionen

Misc

StatTag - a method to make dynamic WORD documents, supports Stata, R, and others
Use Notepad++ as a Stata editor
Use Notepad++ as an R code editor (see also sourceforge for the exe)
Datamethods reference collection on "common myths"
- e.g. has sections on post-hoc power (see also this paper from Feb. 2022), p-values in baseline tables and much more. A very useful resource.
BBR - Biostats for biomedical research
- Frank Harrell and co's course
- PDF to course material
- ebook website
- Youtube videos of sessions/"lectures"
Ben Bolkers mixed models 'FAQ'
- got a question about mixed models (particularly in R, but also in general), there's a chance it's covered there
Intro to statistical learning book website (includes PDF)
Common statistical tests are linear models (even non-parametric ones)
Statistical Problems to Document and to Avoid. Checklist for Authors
git/github notes
C++ by example

Sample size

Useful literature

A simple, step-by-step guide to interpreting decision curve analysis
Table 2 Fallacy on interpretation of model parameters in the presence of confounding/effect modifiers
Standardized classification and framework for reporting, interpreting, and analysing medication non-adherence in cardiovascular clinical trials: a consensus report from the Non-adherence Academic Research Consortium (NARC) - a method for handling levels of adherence
Introduction to computational causal inference using reproducible Stata, R, and Python code: A tutorial - We illustrate the implementation of different methods [G formula/methods, IPTW, MSM, double-robust, targeted maximum likelihood estimation (TMLE)] using an empirical example from the Connors study based on intensive care medicine, and most importantly, we provide reproducible and commented code in Stata, R, and Python for researchers to adapt in their own observational study.
- R, Stata and SAS code is available here
- Covariate Adjustment in Randomized Trials - Covariate Adjustment in Randomized Trials
RMST
When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts - useful as a reference for the choice of 5% missing to start using MI and >40% missing as a reason to stop
Win odds: An adaptation of the win ratio to include ties
Lessons learnt when accounting for competing events in the external validation of time-to-event prognostic models - long story short... its important...
FDA guidance on multiple testing in clinical trials
A roadmap to using randomization in clinical trials - a comparison of different randomization methods
Statistical code for clinical research papers in a high-impact specialist medical journal
Ten simple rules for initial data analysis

Reporting guidelines

TRIPOD - guidelines for reporting of predictive/prognostic models (validation or derivation)
Latent trajectory studies (GRoLTS)
- The GRoLTS-Checklist: Guidelines for Reporting on Latent Trajectory Studies
STROBE - guidelines for reporting of observational studies
- STROBE website
- STROBE checklists
- The STROBE statement itself has been published in many journals (see here)
CONSORT - guidelines for publishing RCTs
- Checklist
- Statement
- Flow diagram
- DISCOURAGES USE OF P-VALUES/CONFIDENCE INTERVALS/STANDARD ERRORS IN DESCRIPTIVE TABLES (e.g. Table 1) - POSSIBLY USEFUL TO REFUTE REVIEWER REQUESTS FOR THEM (see also the Datamethods reference collection on "common myths")
- Extension for adaptive designs
  - (checklist in the supplementary materials)
- CONSORT-AI site, including checklist, and the publication
PRISMA - guidelines for transparent reporting of systematic reviews
- Homepage
- Checklist
- Extensions to the original statement
- Link registration details page where registrations are actually made on PROSPERO
- Note that systematic reviews/metaanalyses are sometimes/often rejected by journals for not having been registered

The EQUATOR Network also has links to many other guidelines (SPIRIT, CARE, AGREE, ...)

Not a reporting guideline per se, but a method of assessing risk of bias and applicability of prediction model studies - PROBAST

PROBAST paper
further explanations and elaboration of PROBAST
useful along side TRIPOD perhaps?
use of PROBAST to assess ML models in ocology. Long story short, most models are high risk

123 (81%, 95% CI: 73.8 to 86.4) developed models and 19 (51%, 95% CI: 35.1 to 67.3) validated models were at high risk of bias due to their analysis, mostly due to shortcomings in the analysis including insufficient sample size and split-sample internal validation

The COMET Initiative has a searchable list of standardised outcome sets for diseases, conditions etc.

SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials)

SPIRIT-AI website and publication

ICEMAN provides an approach for assessing the credibility of subgroup analyses.

Feel free to add to this list.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
github		github
shared		shared
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

links

SCTO Organisation

Stata

Validation

R

Validation

Misc

Sample size

Useful literature

Reporting guidelines

About

Releases

Packages

Contributors 2

CTU-Bern/links

Folders and files

Latest commit

History

Repository files navigation

links

SCTO Organisation

Stata

Validation

R

Validation

Misc

Sample size

Useful literature

Reporting guidelines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages