Skip to content

Fix/gfs preprocessing #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

AoufNihed
Copy link

@AoufNihed AoufNihed commented Apr 1, 2025

GFS Preprocessing Fix: Longitude Range and Dimension Naming

Problem

The GFS data preprocessing had several issues:

  1. Longitude range mismatch (-180° to 180° vs 0° to 360°)
  2. Dimension naming conflict between 'variable' and 'channel'
  3. Inefficient chunking strategy affecting performance

Solution

Implemented fixes in gfs_preprocessing.py and gfs_dataset.py:

  1. Longitude Range Standardization:

    • Convert all longitudes to [0°, 360°) range
    • Added sorting to ensure consistent ordering
    • Updated UK region selection to handle wrap-around (350° to 2°)
  2. Dimension Structure:

    • Standardized on 'channel' dimension name
    • Properly stacked variables into channel dimension
    • Added validation checks for required dimensions
  3. Performance Optimization:

    • Implemented efficient chunking strategy
    • Set optimal chunk sizes for different dimensions
    • Added cleanup of existing chunk encoding

Testing

The changes can be verified by:

  1. Loading preprocessed data and checking longitude range
  2. Confirming dimension names and structure
  3. Verifying UK region selection

- Add dedicated GFS processing module
- Fix longitude range to [0, 360)
- Use 'channel' dimension consistently
- Add verification function
- Update CLI and documentation
- Add dedicated GFS processing module
- Fix longitude range to [0, 360)
- Use 'channel' dimension consistently
- Add verification function
- Update CLI and documentation
@AoufNihed
Copy link
Author

Hey @peterdudfield
GFS Preprocessing: Fix Longitude Range and Dimension Structure

  • Fixed longitude range to [0°, 360°] for consistent UK region selection
  • Standardized dimension naming to use 'channel' instead of 'variable'
  • Optimized chunking for better performance
  • Added data validation checks

Tested with sample GFS data and verified UK region selection works correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant