Skip to content

Speed up cost calculation tools in gdp.py and regional_differentiation.py #334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Wegatriespython
Copy link

@Wegatriespython Wegatriespython commented Apr 10, 2025

Required: This PR optimizes two core cost calculation functions - adjust_cost_ratios_with_gdp in tools/costs/gdp.py and get_weo_data in tools/costs/regional_differentiation.py - using vectorized operations, and removing repeated IO operations, reducing function execution times by approximately 75%.

In the updated adjust_cost_ratios_with_gdp function, we now compute slopes and intercepts using direct DataFrame operations rather than iterating over groups. By extracting and merging base‐year values in a single step and applying constraints via boolean indexing, the computation is streamlined—reducing execution time significantly while maintaining output parity with the original approach.

Similarly, the new get_weo_data_fast function reduces I/O overhead by opening the Excel file just once rather than repeatedly for each technology and cost type combination.

Using new tests_gdp_parity.py and test_regional_differentiation_parity.py I tested to ensured parity with the older versions. Averaging 5 runs with the test suite on my local windows laptop these were the performance changes :
For adjust_cost_ratios_with_gdp, execution time decreased from ~11 seconds to ~1.6 seconds, and for get_weo_data, from ~4.3 seconds to ~1.9 seconds.

How to review

  1. Run the added parity tests to verify the optimized functions produce identical outputs to their legacy versions:

    • Execute test_gdp_parity to verify the vectorized adjust_cost_ratios_with_gdp function
    • Execute test_regional_differentiation_parity to verify the new get_weo_data_fast function
  2. Review the implementation changes:

    • In gdp.py: The vectorized adjust_cost_ratios_with_gdp function replaces the iterative group-by-group processing with direct vectorized operations
    • In regional_differentiation.py: The new get_weo_data_fast function reduces I/O overhead by opening the Excel file once for all sheets
  3. Examine the performance benchmarks in the timing files to confirm the speed improvements.

  4. After review, the parity tests can be removed as they serve only verification purposes during the PR review.

PR checklist

  • Continuous integration checks all ✅
  • Add or expand tests; coverage checks both ✅
  • Add, expand, or update documentation.
  • Update doc/whatsnew.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant