Decimal results incorrect depending on locale #753

gv-collibris · 2020-04-30T09:42:11Z

Environment

Python: 3.6.3
pyodbc: 4.0.30
OS: WSL, Windows 10, Debian 10
DB: SQL Server
driver: ODBC Driver 17 for SQL Server

Issue

Numbers (e.g. 3656.880000) from the database are turned to different numbers in Python, depending on the locale, e.g. 3656.88 for an English locale, and 3656880000 for a French one.

Expected behavior

I would expect to always obtain the correct number in the Python code.

gordthompson · 2020-04-30T16:33:57Z

Please provide a minimal reproducible example.

gv-collibris · 2020-05-04T09:48:39Z

Hi,
Thanks for your reply.
Here is a minimal reproducible example.
Cheers

gordthompson · 2020-05-04T12:25:02Z

Okay, that's weird. I was able to reproduce the issue using your example, but it works fine when I use a simple stand-alone script:

import locale

import pyodbc

if input("English or French? (e/f): ").startswith("f"):
    locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
print(locale.getlocale())

connection_string = (
    "DRIVER=ODBC Driver 17 for SQL Server;"
    "SERVER=192.168.0.179,49242;"
    "DATABASE=myDb;"
    "UID=sa;PWD=_whatever_"
)
cnxn = pyodbc.connect(connection_string)
crsr = cnxn.cursor()
print(crsr.execute("SELECT 1890 * 1.0 / 100").fetchone())

""" console output
English or French? (e/f): e
('en_CA', 'UTF-8')
(Decimal('18.900000'), )

English or French? (e/f): f
('fr_FR', 'UTF-8')
(Decimal('18.900000'), )
"""

gv-collibris · 2020-05-04T12:45:25Z

yeah, I've seen that too
I struggled a bit to reproduced the issue
I have no idea how the use of a package can be different from a script...

gordthompson · 2020-05-04T13:56:52Z

Even more mysterious: With your MRE and my script the ODBC driver returns the same string value ...

[ODBC][6457][1588599816.070747][SQLGetData.c][237]
        Entry:
            Statement = 0x2467540
            Column Number = 1
            Target Type = -8 SQL_WCHAR
            Buffer Length = 4096
            Target Value = 0x24593a0
            StrLen Or Ind = 0x7ffe67f0bd10
[ODBC][6457][1588599816.070863][SQLGetData.c][545]
        Exit:[SQL_SUCCESS]                
            Buffer = [18.900000](unicode)                
            Strlen Or Ind = 0x7ffe67f0bd10 -> 18

... but when it pops out of fetchall your MRE has [(Decimal('18900000'), )] while my script has [(Decimal('18.900000'), )], and that holds true whether we use

SELECT 1890 * 1.0 / 100

or

SELECT CAST(1890 * 1.0 / 100 AS decimal(10,6))

However, if I change your MRE to use

SELECT CAST(1890 * 1.0 / 100 AS float)

then the ODBC driver returns ...

[ODBC][6501][1588600241.966768][SQLGetData.c][237]
        Entry:
            Statement = 0x15a7d90
            Column Number = 1
            Target Type = 8 SQL_DOUBLE
            Buffer Length = 8
            Target Value = 0x7ffc7e61ca58
            StrLen Or Ind = 0x7ffc7e61ca60
[ODBC][6501][1588600241.966842][SQLGetData.c][545]
        Exit:[SQL_SUCCESS]                
            Buffer = [18,9]                
            Strlen Or Ind = 0x7ffc7e61ca60 -> 8

... and the (correct) resulting value is [(18.9, )]

gv-collibris · 2020-05-04T14:23:40Z

My (uneducated) guess is that the creation of a Decimal uses somehow the Python locale.
As we receive 18.900000 from the DB, and as , is the decimal delimiter in French, the dot is ignored altogether.
The use of a float do not present this issue, as it's a different data structure, which do not use the locale (I guess).

Initially, I thought I made a mistake in my own code when formatting the numbers, that's where this guess comes from...

keitherskine · 2020-05-04T14:55:54Z

Just for clarity, because it might be significant here. In T-SQL the "1.0" literal is not a float, somewhat counter-intuitively, it's a decimal, so multiplying by 1.0 does not automatically generate a float, it typically generates a decimal:

SELECT
SQL_VARIANT_PROPERTY(1.0,'BaseType') AS 'BaseType',
SQL_VARIANT_PROPERTY(1.0,'Precision') AS 'Precision',
SQL_VARIANT_PROPERTY(1.0,'Scale') AS 'Scale'
UNION
SELECT
SQL_VARIANT_PROPERTY(1890 * 1.0,'BaseType') AS 'BaseType',
SQL_VARIANT_PROPERTY(1890 * 1.0,'Precision') AS 'Precision',
SQL_VARIANT_PROPERTY(1890 * 1.0,'Scale') AS 'Scale';

results in:

BaseType    Precision    Scale
numeric        2            1
numeric        7            1

Apologies if you knew this already, but I know it was kind of a surprise to me.

gordthompson · 2020-05-04T15:00:02Z

@gv-collibris - You're probably right; we're just trying to give @v-makouz as much information as we can.

I also tried adding ;Regional=Yes to the end of the connection string in the MRE but that resulted in

pyodbc.Error: ('07006', '[07006] [Microsoft][ODBC Driver 17 for SQL Server]Restricted data type attribute violation (0) (SQLGetData)')

keitherskine · 2020-05-04T15:34:11Z

I think the difference in behavior between Gord's script and the package might be explained by exactly when pyodbc is being imported. In pyodbc, it looks like the numeric locale information is read from the environment when pyodbc is initialized (i.e. imported):
https://github.com/mkleehammer/pyodbc/blob/master/src/pyodbcmodule.cpp#L1226
and appears to be fixed after that.

Hence, if you add an import pyodbc to the very beginning of test.py, i.e. before setting the locale, which then emulates Gord's script, then you appear to get a different result. Not sure if this solves anything, but I thought I'd mention it.

keitherskine · 2020-05-04T15:41:59Z

One other thing, if you pyodbc.setDecimalSeparator('.') at the start of test_function() (i.e. set the decimal point as English-style, despite the French locale), it appears you get the right result (18.9).

keitherskine · 2020-05-04T16:06:37Z

@gv-collibris Just for the record, what is the database instance language you are using, and the database collation? If you run the following:

USE <your database>;
SELECT @@language AS instance_lang;
SELECT CONVERT(varchar(256),SERVERPROPERTY('collation')) AS database_collation;

...what do you get?

gv-collibris · 2020-05-04T16:09:05Z

Instance language: us_english
Database collation: SQL_Latin1_General_CP1_CI_AS

keitherskine · 2020-05-04T16:50:26Z

Thanks, @gv-collibris . I thought perhaps you might be using a French-collated database instance but that does not appear to be the case.

Currently, it appears pyodbc reads the decimal point character from the current locale when it is imported (and only once). Personally, I'm not sure whether it should be doing that at all because the decimal point character in the result set is probably added by the SQL Server database engine rather than any C libraries in the server itself (although I'm speculating there).

In the scenario you describe, the workarounds appears to be either:

import pyodbc before setting the locale, causing pyodbc to use the default locale (i.e. US English).
use pyodbc.setDecimalSeparator('.') to explicitly set the decimal point character to a period before running SQL queries.

I hope that helps.

gv-collibris · 2020-05-05T13:09:04Z

I used an other workaround: I round and cast to integer, then divide by 100 in the Python code.
It's not particularly an issue, but it is a weird surprise I wanted to avoid to other people ;)

keitherskine · 2020-05-05T13:40:36Z

Glad to hear you're not being held up by this. This curious locale behavior was not something I was aware of, so it's good you raised this. Many thanks.

v-makouz · 2020-05-05T22:15:52Z

The reason for this is in getdata.cpp GetDataDecimal function. There it loops through all the characters trying to detect the decimal mark, but it uses the one based on locale for comparison, while ODBC driver always uses "."

One solution I can think of is to replace
if (*pch == chDecimal)
with
if (*pch == chDecimal || *pch == '.')

It looks a little hacky, but it should work for all drivers, whether they use the locale based one or default one.

What do you guys think?

keitherskine · 2020-05-05T23:47:53Z

I can kind of understand why the decimal separator from the Python locale was chosen to be used by pyodbc when parsing decimal values, but it still seems something of an odd choice. First off, I'm making a big assumption that decimal values are generated on the database server as essentially strings, and are sent over the wire more-or-less untouched. So a German-language database might generate a decimal value like "12.345.678,99". This is my big assumption and I'm happy to be corrected on that.

On SQL Server, changing the language of a database (e.g. SET LANGUAGE German) doesn't appear to make any difference to the decimal values in the ODBC trace, they are for example "12345678.99" regardless of the language. Perhaps German-style decimals can be generated by setting the language on the server itself, or by using a different RDBMS like Oracle.

My concern about using the Python locale is that there's no guarantee the app server (running Python) is going to have the same locale as the database server. The database server might be "German" but the app server could be "British". The locales might match, but it's not guaranteed. Hence, it seems something of a stretch to use the app server's locale to figure out what the decimal point character is going to be. By the way, @v-makouz , that's also an argument not to use '.' all the time as per your suggestion. In my German example above, we wouldn't want to use the period character when parsing "12.345.678,99". The trouble is, it's difficult to figure out the decimal point character any other way, hence I can understand why the Python locale is being used.

I appreciate I'm not coming up with any answers here. The big question seems to be whether decimal values can be received in a whole variety of formats (including monetary values), as per the comment on the GetDataDecimal function. If so, then this is not an easy problem to solve. If decimal values can truly be "12,345.678000", "12.345.678,99", "$876.55", or "-345678.77€", then this is very tricky. For example, how would you distinguish "1,234"? Is that one thousand two hundred and thirty four, or one point two three four? Short of making a test query like SELECT CAST(1.5 AS DECIMAL(2,1)) and grabbing the middle character, or perhaps parsing from the right and assuming the first period or comma is the decimal separator, I don't know how the decimal point character can be determined in a watertight manner.

Ref: https://docs.microsoft.com/en-us/globalization/locale/number-formatting

mkleehammer · 2023-04-11T15:37:58Z

The comments indicated that the "proper" way to read using a binary format weren't working for all database. Maybe the 5.0 version should default to binary but have a function to go back to strings and configuring the separator.

I'd love to be able to write a script that would print out the configuration you should set. The difficulty is you'd have to be tricky getting the information into the DB to read back. That is, if you don't know whether to use "." or "," you have to not use any in your inputs. That is, instead of "select 123.45" you need something like "select (cast 12345 as decimal(19,2)) / 100".

This might be a good idea even if just provided as a script alongside pyodbc.

This was referenced Dec 27, 2022

Execution of select query returns wrong values (floats become very large integers) sqlalchemy/sqlalchemy#9030

Closed

Execution of select query returns wrong values (floats become very large integers) #1142

Closed

mkleehammer added this to 5.0 Release Apr 11, 2023

mkleehammer added this to the 6.0 Stable ABI milestone Apr 11, 2023

gordthompson mentioned this issue Jun 27, 2023

Regression from 4.0.35 to 4.0.39 - fetch fails for decimal fields when system locale decimal separator is not '.' #1236

Closed

gordthompson mentioned this issue Apr 24, 2024

Changing decimal separator to comma using locale.setlocale causes crash on MSSQL columns of decimal type sqlalchemy/sqlalchemy#11314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decimal results incorrect depending on locale #753

Decimal results incorrect depending on locale #753

gv-collibris commented Apr 30, 2020 •

edited

Loading

gordthompson commented Apr 30, 2020

gv-collibris commented May 4, 2020

gordthompson commented May 4, 2020

gv-collibris commented May 4, 2020 •

edited

Loading

gordthompson commented May 4, 2020 •

edited

Loading

gv-collibris commented May 4, 2020 •

edited

Loading

keitherskine commented May 4, 2020

gordthompson commented May 4, 2020

keitherskine commented May 4, 2020

keitherskine commented May 4, 2020

keitherskine commented May 4, 2020

gv-collibris commented May 4, 2020

keitherskine commented May 4, 2020

gv-collibris commented May 5, 2020

keitherskine commented May 5, 2020

v-makouz commented May 5, 2020

keitherskine commented May 5, 2020

mkleehammer commented Apr 11, 2023

Decimal results incorrect depending on locale #753

Decimal results incorrect depending on locale #753

Comments

gv-collibris commented Apr 30, 2020 • edited Loading

Environment

Issue

Expected behavior

gordthompson commented Apr 30, 2020

gv-collibris commented May 4, 2020

gordthompson commented May 4, 2020

gv-collibris commented May 4, 2020 • edited Loading

gordthompson commented May 4, 2020 • edited Loading

gv-collibris commented May 4, 2020 • edited Loading

keitherskine commented May 4, 2020

gordthompson commented May 4, 2020

keitherskine commented May 4, 2020

keitherskine commented May 4, 2020

keitherskine commented May 4, 2020

gv-collibris commented May 4, 2020

keitherskine commented May 4, 2020

gv-collibris commented May 5, 2020

keitherskine commented May 5, 2020

v-makouz commented May 5, 2020

keitherskine commented May 5, 2020

mkleehammer commented Apr 11, 2023

gv-collibris commented Apr 30, 2020 •

edited

Loading

gv-collibris commented May 4, 2020 •

edited

Loading

gordthompson commented May 4, 2020 •

edited

Loading

gv-collibris commented May 4, 2020 •

edited

Loading