Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-46562: Add bps templates for LSSTCam/LSSTComCam/LATISS calibration creation. #292

Merged
merged 17 commits into from
Jan 29, 2025

Conversation

erykoff
Copy link
Contributor

@erykoff erykoff commented Jan 22, 2025

No description provided.

output: ${USER_CALIB_PREFIX}${INSTRUMENT}/calib/${TICKET}/${TAG}/flatGen-${FLAT_BAND}.${RERUN}
butlerConfig: ${REPO}
inCollection: ${USER_CALIB_PREFIX}${INSTRUMENT}/calib/${TICKET}/${TAG},${RAW_COLLECTION},${CALIB_COLLECTIONS}
dataQuery: ${!SELECTION_FLAT_BAND}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this exclamation point is a typo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bash "indirection" which should allow one template to be used for all the flats with different selection variables. That's the idea, at least. I haven't gotten to testing this part yet!

```
export FLAT_BAND=g
export SELECTION_FLAT_BAND=SELECTION_FLAT_g
export SELECTION_FLAT_g="instrument='LSSTComCam' and selection-string"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses selection-string instead of selection_string used elsewhere. Not really a "bug" as this is documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual comment: If we have a band variable, we should constrain the query to just that band. This will prevent future accidents where we generate a 15 input g flat and a 1 input r flat in the same collection because an endpoint was misread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but not trivial because we need to know the physical filter in the constraint, so I can't easily template that generally for LATISS/LSSTComCam/LSSTCam. But we can put strong recommendation that SELECTION_FLAT_g and friends include a physical filter constraint!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to know the physical_filter. I assume pipetask and butler query-dimension-records use the same resolver, so something like this should work:

butler query-dimension-records /repo/main exposure --where "instrument='LSSTComCam' AND physical_filter.band='g'" --limit 1
instrument       id      day_obs         group         physical_filter        obs_id        exposure_time    dark_time    observation_type observation_reason seq_num seq_start seq_end target_name science_program tracking_ra tracking_dec sky_angle azimuth zenith_angle has_simulated can_see_sky               timespan (TAI)              
---------- ------------- -------- -------------------- --------------- -------------------- ------------- --------------- ---------------- ------------------ ------- --------- ------- ----------- --------------- ----------- ------------ --------- ------- ------------ ------------- ----------- ------------------------------------------
LSSTComCam 2021040100003 20210401 CALSET_20210401_1601            g_07 CC_O_20210401_000003           1.0 1.5343017578125             flat               flat       3         3       3     UNKNOWN         unknown        None         None      None    None         None         False        None [2021-04-01T16:24:36, 2021-04-01T16:24:37)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"instrument='LSSTComCam' AND band='g'" should work and is shorter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When getting a flat from a calib repo that doesn't work, so I guess I had assumed we needed it in the constraint as well.

output: ${USER_CALIB_PREFIX}${INSTRUMENT}/calib/${TICKET}/${TAG}/linearizerGen.${RERUN}
butlerConfig: ${REPO}
inCollection: ${USER_CALIB_PREFIX}${INSTRUMENT}/calib/${TICKET}/${TAG},${RAW_COLLECTION},${CALIB_COLLECTIONS}
dataQuery: ${SELECTION_PTC_LINEARIZER-$SELECTION_PTC}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a typo as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More bash magic that is supposed to use $SELECTION_PTC_LINEARIZER if it is set and otherwise use $SELECTION_PTC. Also not yet tested if the bps vars support this...

Environment Variables
---------------------

These are the environment variables that must be used. Examples are substituted in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also note that selection_string is a dummy variable here, and that it needs to be filled with something explicit. I was going to suggest putting in the information being used on DM-46562 (the ticket used below), but I see that's this ticket adding the BPS templates, not a generation ticket.

@erykoff erykoff force-pushed the tickets/DM-46562 branch 7 times, most recently from 87341d4 to e8253b5 Compare January 28, 2025 16:27
Copy link
Contributor

@czwa czwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of my comments are me not fully understanding how the variables work here.

* `export SELECTION_PTC_LINEARIZER=$SELECTION_PTC`: The selection of raws to generate the linearizer; usually will be the same as the PTC selection.
* `export SELECTION_PTC_BFK=$SELECTION_PTC`: The selection of raws to generate the brighter-fatter kernel; usually will be the same as the PTC selection.
* `export SELECTION_PTC_CTI=$SELECTION_PTC`: The selection of raws to generate the charge-transfer-inefficiency dataset; usually will be the same as the PTC selection.
* `export SELECTION_FLAT_g="instrument='LSSTComCam and selection_string": The selection of raws to generate the g-band flat. See below for additional info.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing trailing "`".


These are the environment variables that must be used. Examples are substituted in.

* `export USER_CALIB_PREFIX=""` or `export USER_CALIB_PREFIX=u/erykoff/`: Set to a blank string for official calibrations, or the user collection prefix (**with trailing slash**).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "an empty string"?

* `export CALIB_COLLECTIONS=LSSTComCam/calib/DM-46825`: Comma-separated list of curated or previously generated calibration collections to use as a starting point.
* `export TAG=newCalibs`: A human-readable tag to help indicate why a set of calibs were built (should also be findable from the ticket name).
* `export RERUN=20250122a`: The rerun name to indicate when the calibrations were generated.
* `export BOOTSTRAP_RUN_NUMBER=1`: The bootstrap run number ensures that the bootstrap run collections are unique in case of an initial mistake. (Because the run name for bootstraps are deterministic without a timestamp they cannot be rerun into the same collection).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit fragile, as the biasBootstrap could have run successfully, but the darkBootstrap needed N iterations. Maybe a comment here to be aware of that?


Overview
--------

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this section also have a pointer to the "how to run BPS" documentation? We have the "check variables" example below, but no "here's how to run this after you've set up the variables."

uses python `os.path.expandvars()` which is not as sophisticated. The most
relevant limitation is that `envsubst` will replace an unset env var with a
blank string, while `expandvars()` will leave that raw
(e.g. `${USER_CALIB_PREFIX}` can be left verbatim). Nevertheless, this is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alkjsdaisoghg.


Due to limitations in BPS environment variable expansion, we have one template
file for each band. There are template files for each of ugrizy. Note that it is
**strongly** recommended that the relevant physical filter be used in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have per-filter flat files, can't be bake the band into the dataQuery?

campaign: "${TICKET}"

biasBootstrapCollection: "${USER_CALIB_PREFIX}$INSTRUMENT/calib/$TICKET/$TAG/biasBootstrapGen.$RERUN"
biasBootstrapRun: "{biasBootstrapCollection}/run${BOOTSTRAP_RUN_NUMBER}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this missing a $ for biasBootstrapCollection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bps variable, and must be quoted because yaml gets confused otherwise.


darkBootstrapCollection: "${USER_CALIB_PREFIX}$INSTRUMENT/calib/$TICKET/$TAG/darkBootstrapGen.$RERUN"
biasBootstrapRun: "${USER_CALIB_PREFIX}$INSTRUMENT/calib/$TICKET/$TAG/biasBootstrapGen.$RERUN/run${BOOTSTRAP_RUN_NUMBER}"
darkBootstrapRun: "{darkBootstrapCollection}/run${BOOTSTRAP_RUN_NUMBER}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as on biasBootstrap re $.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see inCollection is the same, so maybe I'm just confused about variable scope. Those defined within the yaml only need braces, but those that are inherited from the environment need ${}?

campaign: "${TICKET}"

payload:
payloadName: "${INSTRUMENT}_${TICKET}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the payloadName need to be unique? Will it complain if you launch flats in all filters simultaneously? I supposed this could apply to any of the steps, but flats feel more parallelizable than the others.

@erykoff erykoff merged commit 036654c into main Jan 29, 2025
2 checks passed
@erykoff erykoff deleted the tickets/DM-46562 branch January 29, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants