CFDE ICC Evaluation Core

Overview

This repo supports the activities described in this repo.

Development

Requirements

Linux or MacOS system
Node v22+
Bun (for package management only, as faster/smaller replacement for Yarn)

Pipeline

The automated steps in this repo are roughly as follows:

Gather
1. Get raw data from an external resource, e.g. scraping an HTML page, downloading/parsing a PDF/CSV, making a request to an API, etc.
2. Save raw data exactly as-is for provenance and caching.
3. Collate most important information from raw data into common high-level output data format suited to making desired dashboard pages and PDF reports.
4. Repeat previous steps in order of dependency (e.g. opportunity number -> grant numbers) until all needed info is gathered.
Print
1. Run dashboard webapp.
2. Import output data from gather step, and do some minimal final processing (e.g. combine journal info with each publication listing).
3. Render select dashboard pages (e.g. /core-project/abc123) to PDF reports.
Deploy dashboard and PDFs to private web addresses.

Repo content

/app - Dashboard webapp made with Vue. Also used for generating PDF reports.
- /public/pdfs - Outputted PDF reports.
/data - All other functionality involving data.
- /api - Types and functions for getting raw data from external APIs.
- /raw - Raw data gathered from external sources, for provenance.
- /gather - Functions for gathering data and putting it in a common format.
- /output - Gathered data in format for making desired reports.
- /print - Functions specific to making printed reports.
- /util - Small-scope general purpose functions.

Technology

TypeScript - Language used to provide type-safety from beginning to end of pipeline.
Playwright - Tool used for scraping public web pages and rendering dashboard pages to PDF reports.
Netlify - Service used for privately hosting dashboard webapp (and PR previews).

The pipeline is optimized wherever possible and appropriate. Things like network requests and rendering are parallelized (e.g. PDF reports are printed simultaneously in separate tabs of the same Playwright browser instance). External resources are cached in their raw format to speed up subsequent runs, and to avoid being rate-limited or blocked by those providers.

Commands

Use ./run.sh with a --flag to conveniently run a script of the same name in /data/package.json and /app/package.json (if it exists) from the root of this repo.

Most important scripts:

Flag	Description
`--install`	Install packages and dependencies
`--install-playwright`	Install Playwright
no flag	Run main pipeline steps in order
`--test`	Run all tests (type-checking, linting/formatting checks, etc.)
`--lint`	Auto-fix linting/formatting
`--dev`	Run dashboard webapp in dev mode

See readmes in sub folders for all commands.

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
.github/workflows		.github/workflows
app		app
data		data
.gitignore		.gitignore
README.md		README.md
Research Organization.xlsx		Research Organization.xlsx
SETUP.md		SETUP.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CFDE ICC Evaluation Core

Overview

Development

Requirements

Pipeline

Repo content

Technology

Commands

About

Contributors 3

Languages

nih-cfde/icc-eval-core

Folders and files

Latest commit

History

Repository files navigation

CFDE ICC Evaluation Core

Overview

Development

Requirements

Pipeline

Repo content

Technology

Commands

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages