Remove `Column` class entirely #157

glatterf42 · 2025-01-31T14:51:38Z

As discovered in #151, using sqlalchemy's associationproxies is both (a bit) more performant and (considerably) less confusing than having a DB table called Column. Thus, this PR completes the removal of this construct.

TODO

DB migrations, though ideally, Remove Column class for Parameter #156 gets its own migrations first

On top of this, the PR includes a few minor fixes:

Make type hints of .data in the core layer more accurate
Fix some type: ignores by casting appropriately
Fix some docstrings wrt the errors they raise
Make Variable.indexsets None when appropriate
Fix missing .set_creation_info() calls
Skip data validation iif appropriate (we didn't skip checks for e.g. variable.data with only "levels" and "marginals" after Introduce run.optimization.remove_solution() #139 and never skipped for tables, even when {} was added)

glatterf42 · 2025-01-31T14:53:54Z

ixmp4/server/rest/optimization/variable.py

@@ -25,7 +23,7 @@ class VariableCreateInput(BaseModel):


 class DataInput(BaseModel):
-    data: dict[str, Any]
+    data: dict[str, list[float] | list[int] | list[str]]


I changed this to reflect the other changes, then run into the thing that the tests on the REST API platforms always converted data to dict[str, list[float]] for some reason. I don't understand why, but it doesn't seem to be this change because the same happens for equations. Still, to show that, I didn't also change the other type hints in this layer, though I suppose I still could.

codecov · 2025-01-31T15:01:39Z

Codecov Report

Attention: Patch coverage is 98.87640% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.0%. Comparing base (0249e7a) to head (8335b93).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
ixmp4/data/db/optimization/parameter/model.py	91.6%	1 Missing ⚠️
ixmp4/data/db/optimization/table/model.py	95.2%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main    #157     +/-   ##
=======================================
- Coverage   89.0%   89.0%   -0.1%     
=======================================
  Files        236     228      -8     
  Lines       8723    8646     -77     
=======================================
- Hits        7770    7698     -72     
+ Misses       953     948      -5

Files with missing lines	Coverage Δ
ixmp4/core/optimization/equation.py	`93.3% <100.0%> (-0.3%)`	⬇️
ixmp4/core/optimization/parameter.py	`93.0% <100.0%> (-0.2%)`	⬇️
ixmp4/core/optimization/table.py	`92.4% <100.0%> (-0.2%)`	⬇️
ixmp4/core/optimization/variable.py	`93.4% <100.0%> (-0.3%)`	⬇️
ixmp4/data/abstract/optimization/__init__.py	`100.0% <ø> (ø)`
ixmp4/data/abstract/optimization/equation.py	`97.0% <100.0%> (+<0.1%)`	⬆️
ixmp4/data/abstract/optimization/table.py	`96.9% <100.0%> (+<0.1%)`	⬆️
ixmp4/data/abstract/optimization/variable.py	`97.0% <100.0%> (+<0.1%)`	⬆️
ixmp4/data/api/optimization/equation.py	`94.2% <100.0%> (ø)`
ixmp4/data/api/optimization/table.py	`94.0% <100.0%> (ø)`
... and 18 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

meksor · 2025-04-17T13:05:30Z

ixmp4/data/db/optimization/equation/model.py

+    @property
+    def indexsets(self) -> list[str]:
+        return [indexset.name for indexset in self._indexsets]
+
+    @property
+    def column_names(self) -> list[str] | None:
+        return cast(list[str], self._column_names) if any(self._column_names) else None


What is the rationale behind the one list being None if empty?

list(self._column_names.all_()) could work for the list conversion as well?

I don't think ._column_names has a .all_() attribute. There is an all_() function in sqlalchemy, but that seems to produce an SQL ALL expression, so I'm not sure how that's useful here.

https://docs.sqlalchemy.org/en/20/orm/extensions/associationproxy.html#sqlalchemy.ext.associationproxy.AssociationProxyInstance.all_

It should, but looking at this again i have to retract my statement since it's kind of unnecessary to use this.
I'm just wondering why we are casting this to a list instead of converting it. I'm not sure how AssociationProxyInstance works exactly. Is this a generator that uses a cursor to yield items from the database?
In any case it doesn't have an append function, nor will it work with len() (educated guess) so essentially we are misleading anyone who tries to use these functions on this so called "list" ;)

And I have to reiterate my concern about the possibility of this being None, it seems spurious to me that one has to constantly check if X is None so the type checker doesn't complain instead of always treating it like list...

In any case it doesn't have an append function, nor will it work with len() (educated guess) so essentially we are misleading anyone who tries to use these functions on this so called "list" ;)

Sorry, just for clarification, this is wrong. I think I am getting mixed up between the functionalities of the django version of this type of mapping and misreading the sqla docs.

However, being confused by the new version I have to admit that my feedback was uneducated, since if I now understand correctly:
column_names is optionally provided when making a new equation/parameter/... object. If column_names is supplied it must be of the same length as constrained_to_indexsets and will act as a "key" for the columns instead of the "indexset name". From this perspective returning the value supplied to the creation function kind of makes sense, however pulling this info back out of the association table like this was confusing me and I wasnt able to make the connection.
I am wondering if it is necessary to be able to distinguish if an equation has been supplied with overridden "column_names" later. Depending on if thats the case it might be beneficial to /always/ store something in column_name on the association table. For one, this eliminates the need for two properties, and it will also address another potential problem:
It is unclear to me how the association_proxy orders list items. By default postgres will return items in a practically random ordering, which /could mean/ that when cloning via:

self.backend.optimization.equations.create( run_id=run.id, name=equation.name, constrained_to_indexsets=equation.indexset_names, column_names=equation.column_names, )

...the indexsets and column names are suddenly switched! I can unfortunately not find out if this is a problem in practice since there does not seem to be a test case that clones a run with an equation like that. Trying to adjust test_run_clone to cover the case reveals that adding data to an equation with column_names throws an error and in turn that there is no test covering that either:

ixmp4/data/db/optimization/equation/repository.py:148: in add_data data.set_index(index_list).combine_first(existing_data).reset_index() TypeError: The parameter "keys" may be a column key, one-dimensional array, or a list containing only valid column keys and one-dimensional arrays.. Received column of type <class 'sqlalchemy.ext.associationproxy._AssociationList'>

Which brings me back to a complaint from earlier in this thread: This cast() call is not enough for any run-time type checks and pandas seems to not like that very much.

Okay, thanks for the thorough feedback :)

column_names is optionally provided when making a new equation/parameter/... object. If column_names is supplied it must be of the same length as constrained_to_indexsets and will act as a "key" for the columns instead of the "indexset name".

This understanding is correct :)

I agree that the cast() call is not ideal, so I adapted that now to read

@property def column_names(self) -> list[str]: return ( [] if self._column_names[0] is None else list(map(str, self._column_names)) )

instead. This always returns a true list object :)

This change also already enables a test like this:

def test_order_column_names(self, platform: ixmp4.Platform) -> None: run = platform.backend.runs.create("Model", "Scenario") # Test normal creation indexsets = create_indexsets_for_run(platform=platform, run_id=run.id, amount=5) indexset_names = [iset.name for iset in indexsets] column_names = [f"Column {i}" for i in range(len(indexset_names))] equation = platform.backend.optimization.equations.create( run_id=run.id, name="Equation", constrained_to_indexsets=indexset_names, column_names=column_names, ) assert equation.column_names == column_names assert equation.indexset_names == indexset_names equation_2 = platform.backend.optimization.equations.create( run_id=run.id, name="Equation 2", constrained_to_indexsets=equation.indexset_names, column_names=equation.column_names, ) assert equation_2.column_names == column_names assert equation_2.indexset_names == indexset_names

(Note that cloning with run_id=run.id, name=equation.name could not work because of our UniqueConstraint on the run.id/equation.id columns.)
This test passes and I don't think this is a coincidence. Unfortunately, I can't find an explicit statement in sqlalchemy's docs for this, but here's what I gather:

association_proxy() is applied to the User class to produce a “view” of the kw relationship, which exposes the string value of .keyword associated with each Keyword object.

from here

Details such as if the locally proxied attribute is a collection (as is typical) or a scalar reference, as well as if the collection acts like a set, list, or dictionary is taken into account, so that the proxy should act just like the underlying collection or attribute does.

from the same page

the type of collection used for the relationship() is derived from the collection type passed to the Mapped container type

from here

and finally, there's this page on collection details

As I said, no explicit statement, but the there are two default collection styles for relationships: set and list. This alone indicates to me that one of that is ordered, while the other one isn't.
For all our (optimization) relationships, we use list, so I expect the order to remain the same.
I'm not sure if you think the above test is sufficient to confirm this, but it is passing. I'm happy to write the same test for the other data-layer optimization items, too.

Okay, thanks for the thorough feedback :)

column_names is optionally provided when making a new equation/parameter/... object. If column_names is supplied it must be of the same length as constrained_to_indexsets and will act as a "key" for the columns instead of the "indexset name".

This understanding is correct :)

Good, it would be helpful to get more context about stuff like this during reviews even if I'm questioning something you (in this case correctly) think is obvious.

I agree that the cast() call is not ideal, so I adapted that now to read

@property def column_names(self) -> list[str]: return ( [] if self._column_names[0] is None else list(map(str, self._column_names)) )

instead. This always returns a true list object :)

Ok, so now when creating an equation like this:

equation = platform.backend.optimization.equations.create( run_id=run.id, name="Equation", constrained_to_indexsets=[], )

(Which I did not expect to work, but it does) equation.column_names throws an IndexError

This change also already enables a test like this:

def test_order_column_names(self, platform: ixmp4.Platform) -> None: run = platform.backend.runs.create("Model", "Scenario") # Test normal creation indexsets = create_indexsets_for_run(platform=platform, run_id=run.id, amount=5) indexset_names = [iset.name for iset in indexsets] column_names = [f"Column {i}" for i in range(len(indexset_names))] equation = platform.backend.optimization.equations.create( run_id=run.id, name="Equation", constrained_to_indexsets=indexset_names, column_names=column_names, ) assert equation.column_names == column_names assert equation.indexset_names == indexset_names equation_2 = platform.backend.optimization.equations.create( run_id=run.id, name="Equation 2", constrained_to_indexsets=equation.indexset_names, column_names=equation.column_names, ) assert equation_2.column_names == column_names assert equation_2.indexset_names == indexset_names

(Note that cloning with run_id=run.id, name=equation.name could not work because of our UniqueConstraint on the run.id/equation.id columns.)

run in this case refers to a new, cloned run, so it should work. The RunRepository.clone() method does exactly that.

This test passes and I don't think this is a coincidence. Unfortunately, I can't find an explicit statement in sqlalchemy's docs for this, but here's what I gather:

* > [association_proxy()](https://docs.sqlalchemy.org/en/20/orm/extensions/associationproxy.html#sqlalchemy.ext.associationproxy.association_proxy) is applied to the User class to produce a “view” of the kw relationship, which exposes the string value of .keyword associated with each Keyword object. from [here](https://docs.sqlalchemy.org/en/20/orm/extensions/associationproxy.html#simplifying-scalar-collections) * > Details such as if the locally proxied attribute is a collection (as is typical) or a scalar reference, as well as if the collection acts like a set, list, or dictionary is taken into account, so that the proxy should act just like the underlying collection or attribute does. from the same page * > the type of collection used for the [relationship()](https://docs.sqlalchemy.org/en/20/orm/relationship_api.html#sqlalchemy.orm.relationship) is derived from the collection type passed to the [Mapped](https://docs.sqlalchemy.org/en/20/orm/internals.html#sqlalchemy.orm.Mapped) container type from [here](https://docs.sqlalchemy.org/en/20/orm/basic_relationships.html#relationship-patterns-o2m-collection) * and finally, there's [this page on collection details](https://docs.sqlalchemy.org/en/20/orm/collection_api.html#custom-collections)

As I said, no explicit statement, but the there are two default collection styles for relationships: set and list. This alone indicates to me that one of that is ordered, while the other one isn't. For all our (optimization) relationships, we use list, so I expect the order to remain the same. I'm not sure if you think the above test is sufficient to confirm this, but it is passing. I'm happy to write the same test for the other data-layer optimization items, too.

This sounds to me more like statements referring to the fact that the _AssociationList class (which is /actually/ used for Equation._column_names, not AssociationProxy(Instance) or whatever) is supposed to behave exactly like a list would. I'm pretty sure sqlalchemy would not mention the order in which items are returned from the database since this is database-specific unless explicitly counteracted. For this reason we have these lines in the codebase for example:

ixmp4/data/db/base.py:

class Lister(QueryMixin[ModelType], Selecter[ModelType]): def list(self, *args: Any, **kwargs: Any) -> list[ModelType]: _exc = self.select(*args, **kwargs) _exc = _exc.order_by(self.model_class.id.asc()) # << !!! return self.list_query(_exc) class Tabulator(QueryMixin[ModelType], Selecter[ModelType]): def tabulate(self, *args: Any, _raw: bool = False, **kwargs: Any) -> pd.DataFrame: _exc = self.select(*args, **kwargs) _exc = _exc.order_by(self.model_class.id.asc()) # << !!! return self.tabulate_query(_exc)

Additionally, sqlalchemy provides an ordering_list which can be used to go even further and save an index/key column to have finer control over the positioning of items. It can also be used in conjunction with association_proxy to get a list of values ordered by another value on the association table:

https://docs.sqlalchemy.org/en/20/orm/extensions/orderinglist.html

https://stackoverflow.com/questions/21292726/how-to-properly-use-association-proxy-and-ordering-list-together-with-sqlalchemy

Reading through the docs I suspect that the association proxies always use the data loaded by the relationship field. Maybe the relationship field orders items in a consistent way and it is not documented, maybe it depends on the loading technique (join, select, etc. (more info)[https://docs.sqlalchemy.org/en/20/orm/relationship_api.html#sqlalchemy.orm.relationship.params.lazy]), maybe something else is going on. I wouldn't rely on this behavior though and there seems to be an easy way to make sure it is consistent explicitly:
The relationship field has an order_by parameter so that's what I would use just to be safe since ordering problems with the postgres cluster happen everywhere across our infrastructure.

I also just noticed that there is a relatively old PR already addressing this problem! #162

Okay, I've opened a new issue about the test case you reported, and I'll push a commit that adds order_by to the *_indexset_associations relationship. Please check to see if that's all we need for now :)

Not sure why the tests are not running for 7512142, but the checks for 818b782 and 098f586 are passing (and 7512142 fixes the one code style test that was broken somehow).

meksor · 2025-04-17T13:55:46Z

Looks alright to me, does this depend on #156 ?

glatterf42 · 2025-04-29T11:25:39Z

It doesn't depend on #156, but it's just as well that that was merged first :)

EDIT: From the last time you reviewed, I essentially only added the migrations and renamed the .indexsets attributes as you suggested for Parameter.

…rity

ixmp4/data/db/optimization/equation/model.py

ixmp4/data/db/optimization/table/model.py

…rder

glatterf42 added the enhancement New feature or request label Jan 31, 2025

glatterf42 requested a review from meksor January 31, 2025 14:51

glatterf42 self-assigned this Jan 31, 2025

glatterf42 commented Jan 31, 2025

View reviewed changes

glatterf42 force-pushed the remove-column-class-entirely branch from aec6d3a to 9c36f7a Compare January 31, 2025 14:54

This was referenced Feb 4, 2025

Fix not enough column names used for same IndexSet #161

Merged

Fix actual return type of column_names property #162

Merged

meksor reviewed Apr 17, 2025

View reviewed changes

glatterf42 force-pushed the enh/introduce-has-solution branch from 0733b76 to b35225d Compare April 29, 2025 10:27

Base automatically changed from enh/introduce-has-solution to main April 29, 2025 10:50

✨ feat: Remove Column class completely

412e1e8

glatterf42 force-pushed the remove-column-class-entirely branch from 9c36f7a to 6ce3ef1 Compare April 29, 2025 11:25

glatterf42 added 2 commits April 29, 2025 13:30

♻️ refactor: Rename .indexsets to .indexset_names for consistency/cla…

6350d43

…rity

✨ feat: Add DB migration

eeb0f81

glatterf42 force-pushed the remove-column-class-entirely branch from 6ce3ef1 to eeb0f81 Compare April 29, 2025 11:30

meksor requested changes Apr 29, 2025

View reviewed changes

ixmp4/data/db/optimization/equation/model.py Outdated Show resolved Hide resolved

meksor requested changes Apr 29, 2025

View reviewed changes

ixmp4/data/db/optimization/table/model.py Outdated Show resolved Hide resolved

♻️ refactor: Rename Association *__id columns for consistency

818b782

glatterf42 requested a review from meksor April 30, 2025 09:56

glatterf42 force-pushed the remove-column-class-entirely branch from aff692d to 818b782 Compare May 6, 2025 10:08

glatterf42 and others added 2 commits May 6, 2025 12:10

🐛 fix: Fix actual return type of column_names property (#162)

098f586

✅ test: Fix undefined name

7512142

glatterf42 mentioned this pull request May 6, 2025

How should constrained_to_indexsets=[] be handled? #175

Open

✨ feat: Ensure *_indexset_associations are returned with consistent o…

8335b93

…rder

meksor approved these changes May 6, 2025

View reviewed changes

glatterf42 merged commit b4e10c9 into main May 6, 2025
13 checks passed

glatterf42 deleted the remove-column-class-entirely branch May 6, 2025 12:24

glatterf42 mentioned this pull request May 12, 2025

Remove data from linked items when removing data from indexset items #171

Merged

Remove Column class entirely #157

Remove Column class entirely #157

Uh oh!

Conversation

glatterf42 commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meksor May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meksor May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meksor commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glatterf42 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Remove `Column` class entirely #157

Remove `Column` class entirely #157

glatterf42 commented Jan 31, 2025 •

edited

Loading

codecov bot commented Jan 31, 2025 •

edited

Loading

meksor May 5, 2025 •

edited

Loading

meksor May 6, 2025 •

edited

Loading

meksor commented Apr 17, 2025 •

edited

Loading

glatterf42 commented Apr 29, 2025 •

edited

Loading