Livelike: Vivid Synthetic Populations

This package provides a high-level wrapper for generating synthetic populations via Census APIs based on the American Community Survey (ACS) 5-Year Estimates. Synthetic populations are virtual representations of people and households produced for small census areas (block groups, tracts) and can be attributed by a variety of demographic, economic, social, worker, student, mobility, housing, health, and communication characteristics found in the ACS.

Specifying a P-MEDM Problem

Synthetic populations are generated by allocating records from the ACS Public Use Microdata Sample (PUMS) from their native spatial resolution of Public-Use Microdata Areas (100,000+ people) to small census areas (typically <8000 people) such that the aggregate characteristics of people and households align closely with population profiles of the small census areas available in the ACS Summary File (SF). This is accomplished using Penalized Maximum-Entropy Dasymetric Modeling (P-MEDM), which seeks to recreate the error variances on each small-area variable estimate in the ACS SF. LiveLike makes it simple to design and solve P-MEDM problems by fetching all of the necessary P-MEDM inputs for a given PUMA via Census APIs.

The bulk of P-MEDM setup is handled automatically by the acs module via the Census Microdata API.

In a basic use-case, inputs are simply:

The 2010 or 2020 PUMA ID (<State FIPS> + <PUMA FIPS>, as shown here
A Census API key (optional).

Examples are provided in the notebooks directory.

Supported Geographies

P-MEDM requires a target geography and an aggregate geography to account for error variances. The selected target geography determines the aggregate geography:

Level	Code	Population (approx.)	Aggregate
Block group	`bg`	600 - 3000	Tract
Tract	`trt`	1200 - 8000	Supertract

LiveLike handles tracts, which have no sub-county aggregation level, using a regionalization approach to generate custom "supertracts" (see notebooks/tract_supertract_2019.ipynb for an example).

Supported ACS Years

The ACS 5-Year Estimates are a rolling 5% sample of the United States population weighted to be representative of the release year (vintage), with additional adjustments for factors like income. LiveLike uses the ACS 2019 5-Year Estimates as its default vintage.

Year	Vintage	Available
2016	ACS 2012 - 2016 5-Year Estimates	✅
2017	ACS 2013 - 2017 5-Year Estimates	✅
2018	ACS 2014 - 2018 5-Year Estimates	✅
2019	ACS 2015 - 2019 5-Year Estimates	✅
2020	ACS 2016 - 2020 5-Year Estimates	❌
2021	ACS 2017 - 2021 5-Year Estimates	❌
2022	ACS 2018 - 2022 5-Year Estimates	❌
2023	ACS 2019 - 2023 5-Year Estimates	✅

Currently, years between 2016 and 2019 and 2023 are supported. The gap between 2020 - 2022 is due to mixed geography problems that P-MEDM cannot directly handle (2010 PUMAs with 2020 small areas for 2020, 2021; mixture of 2010/2020 PUMAs with 2020 small areas for 2022).

P-MEDM Constraints

P-MEDM constraints are sets of residential and population characteristics common between the ACS SF and PUMS that can be used to design a P-MEDM model and attribute the synthetic population. LiveLike provides several configurations of prebuilt constraints:

Base (default): Baseline modeling constraints representing population totals, routine daily activities (workers, students), and mobility characteristics, available in config.up_base_constraints_selection.
Expanded: Baseline modeling constraints with a selection of demographic, social, economic, and housing characteristics, available in config.up_expanded_constraints_selection. The Base constraints can be overwritten by the Expanded ones using:
```
from config import up_expanded_constraints_selection

acs.puma(..., constraints_selection=up_expanded_constraints_selection)
```

Several additional constraint themes (health, communications) are available outside the prebuilt configurations and can be added onto a custom constraints selection.

Theme	Description	Base	Expanded	Notes
universe	Sampling universe totals (population, civilian noninstituionalized population, group quarters population, housing units, occupied housing units).	x	x
worker	Worker characteristics (employment, class of worker, industry, occupation, hours worked per week).	x	x
student	Student characteristics (grade level attending, public/private school).	x	x
mobility	Mobility characteristics (commute time/mode, vehicles available).	x	x
demographic	Basic demographics (sex, age) and living arrangement characteristics.		x	Expanded: Sex by age and household type only
social	Social characteristics (race/ethnicity, language, place of birth, veteran status).		x	Expanded: Race/ethnicity only
economic	Economic characteristics (household income, poverty, educational attainment).		x	Expanded: Household income and income to poverty ratio only
housing	Housing characteristics (tenure, dwelling type, year built, number of rooms, house heating fuel).		x	Expanded: Dwelling type and year built only
health	Health insurance coverage type.
communications	Household internet access.

Custom Constraint Selection

Constraint selections are passed to acs.puma(constraint_selection=...) as a dict with keys representing ACS variable themes and values representing specific subjects (tables). If the value passed is a bool type, a True value will include variables for all subjects in the theme, while a False value will bypass that theme (the same as omitting the theme from the selection). If the value passed is a list type, only listed subjects will be included in the result.

Example:

custom_constraints_selection = {
    "universe" : True,
    "worker" : True,
    "student" : True,
    "mobility" : True,
    "demographic" : [
        "sex_age",
        "hhtype",
    ],
    "economic" : [
        "hhinc",
        "ipr",
    ],
    "health" : True,
    "communications" : True,
}

Use all variables listed under the universe, worker, student, and mobility, health, and communications themes.
Use only household income (hhinc) and income to poverty ratio (ipr) from the economic theme.

The Constraints File

The constraints file (livelike/data/constraints.csv) underlies the constraint selection process, describing relationships between available PUMS variables, P-MEDM constraints, and ACS Summary File (SF) variables, as well as year of availability for constraints. It is used to generate individual-level representations of ACS SF tables/variables based on PUMS data.

level: PUMS file level (person or household).
geo_base_level: Baseline geography for which the constraint is available (bg: block group; trt: tract).
theme : Constraint topics/themes. Each theme points to a PUMS/SF crosswalking function in livelike.pums.
subject: The subject of the ACS SF table to be represented at the individual level using PUMS data. This column references the function in the pums module used to produce a P-MEDM constraint.
constraint: P-MEDM constraining variable name.
pums[1...n]: Multiple columns the PUMS variables associated with each P-MEDM constraint table. These are parsed using a regex search for any columns in the file beginning with pums.
code: ACS SF variable codes matching each P-MEDM constraint.
desc: P-MEDM constraining variable longform description.
begin_year: the initial year in which the constraint was availble.
end_year: the final year in which the constraint was available.

Census API Key

Using a Census API Key is optional but is recommended to avoid hitting request limits.

Register for a Census API Key.
Activate your key via the confirmation email link you receive.
In the top directory of livelike, run:

echo YOUR_CENSUS_API_KEY > censusapikey.txt

The file that is created, censusapikey.txt, is not tracked by git. This ensures that your personal API key is never exposed on a remote branch.

Population Synthesis

Utilities for population synthesis can be found in the homesim module. Our current approach is to sample from the P-MEDM allocation matrix ($i...n$ PUMS records by $j...m$ areas) for a given area based on family status/household size, group quarters, and vacant housing, such that the area's total population is approximately preserved.

Batch operations

The multi module provides utilities for population synthesis across multiple PUMAs, including:

Making PUMA instances across multiple geographies or replicates (alternative PUMS weights)
Population synthesis
Querying and extracting PUMS descriptors from Census Microdata API

Testing

Rebuilding Test Data

The scripts to rebuild test data are stored in the utilities directory. Execute them from the main directory, for example:

python utilities/prep_test_build_puma.py
python utilities/prep_test_notebook_solutions.py

Running Testing Suite Locally

To run the testing suite locally, enter:

bash run_tests.sh

Rough edges

Constraint order matters

The default P-MEDM solver, pymedm, gives different solutions when constraint order varies. This seems to be tied to floating point underflow errors in jax, a core dependency of pymedm, that seem to be caused by differing positions of the model input variables. LiveLike for both prebuilt and custom constraints, implementing a method in the puma constructor to consistently sort constraints by theme and code.

Negative replicate weights

In rare cases, the values of PUMS replicate household weights can be negative. For compatibility with P-MEDM, we zero out these negative values. See this thread for further details.

The P-MEDM population constraint is approximated as a sum of the ratio of each household member's person weight (PWGTP) to the head of household's weight (which itself roughly matches the household weight). When the head of household's replicate person weight is less than one, we use a placeholder value of 1 so that each additional household member still contributes to the population constraint for the household. We welcome community contributions for more robust improvements to this approach.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.ci		.ci
.github/workflows		.github/workflows
livelike		livelike
notebooks		notebooks
utilities		utilities
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Livelike: Vivid Synthetic Populations

Specifying a P-MEDM Problem

Supported Geographies

Supported ACS Years

P-MEDM Constraints

Custom Constraint Selection

The Constraints File

Census API Key

Population Synthesis

Batch operations

Testing

Rebuilding Test Data

Running Testing Suite Locally

Rough edges

Constraint order matters

Negative replicate weights

About

Releases 2

Packages

Contributors 3

Languages

License

likeness-pop/livelike

Folders and files

Latest commit

History

Repository files navigation

Livelike: Vivid Synthetic Populations

Specifying a P-MEDM Problem

Supported Geographies

Supported ACS Years

P-MEDM Constraints

Custom Constraint Selection

The Constraints File

Census API Key

Population Synthesis

Batch operations

Testing

Rebuilding Test Data

Running Testing Suite Locally

Rough edges

Constraint order matters

Negative replicate weights

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages