Skip to content

[DB-40533] Add support for new type VECTOR(<dim>,DOUBLE) #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

alampe3ds
Copy link

Add encode/decode support for new VECTOR datatype.

A result column of VECTOR(, DOUBLE) shows as type datatype.VECTOR_DOUBLE in the metadata description.

It is returned as datatype.Vector with subtype
datatype.Vector.DOUBLE which can be used as a list.

For a parameter to be detected as VECTOR(,DOUBLE) it also needs to use the datatype.Vector datatype. Alternatively, binding a string is also possible - which currently includes binding a list of numbers which will then be converted to string.

This commit also adds tests for the described behavior.

Ansgar Lampe added 3 commits June 17, 2025 17:18
Add encode/decode support for new VECTOR datatype.

A result column of VECTOR(<dim>, DOUBLE) shows as type
datatype.VECTOR_DOUBLE in the metadata description.

It is returned as datatype.Vector with subtype
datatype.Vector.DOUBLE which can be used as a list.

For a parameter to be detected as VECTOR(<dim>,DOUBLE)
it also needs to use the datatype.Vector datatype.
Alternatively, binding a string is also possible -
which currently includes binding a list of numbers which
will then be converted to string.

This commit also adds tests for the described behavior.
- remove whitespace line
- add documentation
Copy link
Contributor

@madscientist madscientist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider my comments on detecting the server version in the test.

Otherwise it looks OK to me.

@@ -149,10 +151,37 @@ def __cmp__(self, other):
return -1


class Vector(list):
"""A specific type for SQL VECTOR(<dim>, DOUBLE)
Copy link
Contributor

@madscientist madscientist Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not quite right, is it? A Vector element doesn't have to be a DOUBLE (I mean, it does now but it might not in the future)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It is supposed to be usable with other element types as well. Will fix that.

Comment on lines 135 to 141
# only activate this tests if tested against version 8 or above
cursor.execute("select cast(substring_index(release_ver, '.', 1) as int)"
" from system.nodes limit 1")
row = cursor.fetchone()
database_major_version = row[0]
if database_major_version < 8:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a problem with this, per se.

I think it would be cleaner to obtain this information once somewhere in conftest.py and make it available there, for example maybe in the DATABASE_FIXTURE since that already creates a connection.

The current DATABASE_FIXTURE is the connection arguments but you could add this there and then extract it again in the nuodb_base.NuoBase._setup().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into that. If it isn't extremely hard (which I doubt) I'll do that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that.

Comment on lines 136 to 137
cursor.execute("select cast(substring_index(release_ver, '.', 1) as int)"
" from system.nodes limit 1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be safer and easier to use geteffectiveversion() and compare based on that integer value, rather than parsing the version string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just didn't know we have that function. Much better idea to use that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to that function now. The version numbering makes it harder to see this is about version 8, but appart from that it is more readable and less brittle that way.

Ansgar Lampe and others added 2 commits June 30, 2025 14:13
- correct documentation for VECTOR type
- check system version using GETEFFECTIVEPLATFORMVERSION()
- add system version information to separate field in test class
Copy link
Contributor

@madscientist madscientist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few small cleanups you can implement, or not, as you like. Thanks for this work!!

Comment on lines 312 to 313
effective_version = row[0]
system_information['effective_version'] = effective_version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little funny to me to assign a new local variable effective_version, just to then re-assign it again... I spent some seconds looking for where else this new variable was used. But it works.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that style of coding with names for stuff. But this time I might have gone too far. Will shorten it a bit :-)


# Verify the database is up and has a running TE
dbname = database['database']
dbname = database['connect_args']['database']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use self.conect_args['database'] here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to make this a simple substitution for review purposes. Will change that, there is no benefit to doing it this way.

Copy link
Contributor

@butson butson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datatype.__all__ should include Vector.

As user of pynuodb would need/want to use pynuodb.Vector not pynuodb.datatype.Vector

@alampe3ds
Copy link
Author

datatype.__all__ should include Vector.

As user of pynuodb would need/want to use pynuodb.Vector not pynuodb.datatype.Vector

Good point.
I changed that and modified the test to import pynuodb instead of the datatype to reflect customer usage there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants