Skip to content

Commit

Permalink
chore: notes for changes [2025-02-02]
Browse files Browse the repository at this point in the history
  • Loading branch information
CHRISCARLON committed Feb 2, 2025
1 parent 4400dd1 commit fb5790f
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 2 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -197,4 +197,3 @@ src/os_open_usrn_functions/explore_and_prep.py
.test_venv
settings.json*
main.py
notes.md
2 changes: 1 addition & 1 deletion HerdingCats/explorer/cat_explore.py
Original file line number Diff line number Diff line change
Expand Up @@ -1499,7 +1499,7 @@ def get_dataset_resource_meta(self, data: dict) -> List[Dict[str, Any]] | None:
Fetches metadata for a specific resource within a dataset.
Args:
Dict with meta info
Dict with dataset meta info
Returns:
dict: Resource details or empty dict if not found
Expand Down
54 changes: 54 additions & 0 deletions notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Notes for thinking through problems

## Breadcrumbs for next time...

Reduce code duplication in each of the catalogues data loaders...

So..

Implement a shared DataUploader trait/protocol for motherduck and aws s3 for all catalogues
Implement a shared DataFrameLoader trait/protocol for all catalogues

## Notes for implementing shared loaders behaviours

Need to understand the structure of we pass to the data loaders of eacgh catalogues first

## SSEN (Scottish and Southern Electricity Networks) Data
**Return Type:** `List[List]`

| Field | Data Structure |
|-------|---------------|
| Index 0 (Resource Name) | String |
| Index 1 (Timestamp) | ISO 8601 DateTime String |
| Index 2 (Format) | String |
| Index 3 (URL) | String URL |

## UK Power Networks Data
**Return Type:** `List[Dict]`

| Field | Data Structure |
|-------|---------------|
| download_url | String URL |
| format | String |

## Data.gouv.fr (French Government Data)
**Return Type:** `List[Dict]`

| Field | Data Structure |
|-------|---------------|
| dataset_id | String |
| resource_created_at | ISO 8601 DateTime String with Timezone |
| resource_extras | Dictionary/Object |
| resource_format | String |
| resource_frequency | Nullable |
| resource_id | String |
| resource_last_modified | ISO 8601 DateTime String with Timezone |
| resource_latest | String URL |
| resource_title | String |
| resource_url | String URL |
| slug | String |

### Summary:
- SSEN: Each list element contains 4 fields
- UK Power Networks: Each dictionary contains 2 key-value pairs
- Data.gouv.fr: Each dictionary contains 11 key-value pairs

0 comments on commit fb5790f

Please sign in to comment.