Skip to content

nih-cfde/icc-eval-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CFDE ICC Evaluation Core

Run pipeline Run tests

Overview

This repo supports the activities described in this repo.

Development

Requirements

  • Linux or MacOS system
  • Node v22+
  • Bun (for package management only, as faster/smaller replacement for Yarn)

Pipeline

The automated steps in this repo are roughly as follows:

  1. Gather
    1. Get raw data from an external resource, e.g. scraping an HTML page, downloading/parsing a PDF/CSV, making a request to an API, etc.
    2. Save raw data exactly as-is for provenance and caching.
    3. Collate most important information from raw data into common high-level output data format suited to making desired dashboard pages and PDF reports.
    4. Repeat previous steps in order of dependency (e.g. opportunity number -> grant numbers) until all needed info is gathered.
  2. Print
    1. Run dashboard webapp.
    2. Import output data from gather step, and do some minimal final processing (e.g. combine journal info with each publication listing).
    3. Render select dashboard pages (e.g. /core-project/abc123) to PDF reports.
  3. Deploy dashboard and PDFs to private web addresses.

Repo content

  • /app - Dashboard webapp made with Vue. Also used for generating PDF reports.
    • /public/pdfs - Outputted PDF reports.
  • /data - All other functionality involving data.
    • /api - Types and functions for getting raw data from external APIs.
    • /raw - Raw data gathered from external sources, for provenance.
    • /gather - Functions for gathering data and putting it in a common format.
    • /output - Gathered data in format for making desired reports.
    • /print - Functions specific to making printed reports.
    • /util - Small-scope general purpose functions.

Technology

  • TypeScript - Language used to provide type-safety from beginning to end of pipeline.
  • Playwright - Tool used for scraping public web pages and rendering dashboard pages to PDF reports.
  • Netlify - Service used for privately hosting dashboard webapp (and PR previews).

The pipeline is optimized wherever possible and appropriate. Things like network requests and rendering are parallelized (e.g. PDF reports are printed simultaneously in separate tabs of the same Playwright browser instance). External resources are cached in their raw format to speed up subsequent runs, and to avoid being rate-limited or blocked by those providers.

Commands

Use ./run.sh with a --flag to conveniently run a script of the same name in /data/package.json and /app/package.json (if it exists) from the root of this repo.

Most important scripts:

Flag Description
--install Install packages and dependencies
--install-playwright Install Playwright
no flag Run main pipeline steps in order
--test Run all tests (type-checking, linting/formatting checks, etc.)
--lint Auto-fix linting/formatting
--dev Run dashboard webapp in dev mode

See readmes in sub folders for all commands.