(January 18, 2025)
The system that I am setting up here attempts to do a few things.
- List the datasets published by Cal HHS. The discoverability there leaves something to be desired.
- Put all the different file types (csv files, xlsx files, etc, etc) into mysql tables.
- Straighten out the data types. The data goes into varchar columns first but this can be moved into proper types.
- Publish front-ends for as much of the data as possible. See the "App?" column. Just getting started.
- Generally make the data more obviously useful. We will see.
I am not publishing the data that I have, unless there is a front-end for it. Right now, there is too much data and I am not sure how useful any of it is. Lots of it is old data and might not be getting updates. There may have been laws or regulations that required the collection and publishing of it and those have changed.
The code I am checking in can be used for anyone else to download the data and construct the tables for whichever of the datasets that I have working. I have been working on the csv files first and then the xlsx files. There are other Excel file types that my current python code cannot deal with, such as ".xls" files, ".xlsm" files, and so on. The issues with these may be small. I just have not bothered with them yet.
More explaining to come.