Speed up cost calculation tools in gdp.py and regional_differentiation.py #334
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Required: This PR optimizes two core cost calculation functions -
adjust_cost_ratios_with_gdp
in tools/costs/gdp.py andget_weo_data
in tools/costs/regional_differentiation.py - using vectorized operations, and removing repeated IO operations, reducing function execution times by approximately 75%.In the updated
adjust_cost_ratios_with_gdp
function, we now compute slopes and intercepts using direct DataFrame operations rather than iterating over groups. By extracting and merging base‐year values in a single step and applying constraints via boolean indexing, the computation is streamlined—reducing execution time significantly while maintaining output parity with the original approach.Similarly, the new
get_weo_data_fast
function reduces I/O overhead by opening the Excel file just once rather than repeatedly for each technology and cost type combination.Using new tests_gdp_parity.py and test_regional_differentiation_parity.py I tested to ensured parity with the older versions. Averaging 5 runs with the test suite on my local windows laptop these were the performance changes :
For
adjust_cost_ratios_with_gdp
, execution time decreased from ~11 seconds to ~1.6 seconds, and forget_weo_data
, from ~4.3 seconds to ~1.9 seconds.How to review
Run the added parity tests to verify the optimized functions produce identical outputs to their legacy versions:
test_gdp_parity
to verify the vectorizedadjust_cost_ratios_with_gdp
functiontest_regional_differentiation_parity
to verify the newget_weo_data_fast
functionReview the implementation changes:
gdp.py
: The vectorizedadjust_cost_ratios_with_gdp
function replaces the iterative group-by-group processing with direct vectorized operationsregional_differentiation.py
: The newget_weo_data_fast
function reduces I/O overhead by opening the Excel file once for all sheetsExamine the performance benchmarks in the timing files to confirm the speed improvements.
After review, the parity tests can be removed as they serve only verification purposes during the PR review.
PR checklist