-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decimal results incorrect depending on locale #753
Comments
Please provide a minimal reproducible example. |
Hi, |
Okay, that's weird. I was able to reproduce the issue using your example, but it works fine when I use a simple stand-alone script: import locale
import pyodbc
if input("English or French? (e/f): ").startswith("f"):
locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
print(locale.getlocale())
connection_string = (
"DRIVER=ODBC Driver 17 for SQL Server;"
"SERVER=192.168.0.179,49242;"
"DATABASE=myDb;"
"UID=sa;PWD=_whatever_"
)
cnxn = pyodbc.connect(connection_string)
crsr = cnxn.cursor()
print(crsr.execute("SELECT 1890 * 1.0 / 100").fetchone())
""" console output
English or French? (e/f): e
('en_CA', 'UTF-8')
(Decimal('18.900000'), )
English or French? (e/f): f
('fr_FR', 'UTF-8')
(Decimal('18.900000'), )
""" |
yeah, I've seen that too |
Even more mysterious: With your MRE and my script the ODBC driver returns the same string value ...
... but when it pops out of SELECT 1890 * 1.0 / 100 or SELECT CAST(1890 * 1.0 / 100 AS decimal(10,6)) However, if I change your MRE to use SELECT CAST(1890 * 1.0 / 100 AS float) then the ODBC driver returns ...
... and the (correct) resulting value is |
My (uneducated) guess is that the creation of a Initially, I thought I made a mistake in my own code when formatting the numbers, that's where this guess comes from... |
Just for clarity, because it might be significant here. In T-SQL the "1.0" literal is not a float, somewhat counter-intuitively, it's a decimal, so multiplying by 1.0 does not automatically generate a float, it typically generates a decimal: SELECT
SQL_VARIANT_PROPERTY(1.0,'BaseType') AS 'BaseType',
SQL_VARIANT_PROPERTY(1.0,'Precision') AS 'Precision',
SQL_VARIANT_PROPERTY(1.0,'Scale') AS 'Scale'
UNION
SELECT
SQL_VARIANT_PROPERTY(1890 * 1.0,'BaseType') AS 'BaseType',
SQL_VARIANT_PROPERTY(1890 * 1.0,'Precision') AS 'Precision',
SQL_VARIANT_PROPERTY(1890 * 1.0,'Scale') AS 'Scale'; results in:
Apologies if you knew this already, but I know it was kind of a surprise to me. |
@gv-collibris - You're probably right; we're just trying to give @v-makouz as much information as we can. I also tried adding
|
I think the difference in behavior between Gord's script and the package might be explained by exactly when pyodbc is being imported. In pyodbc, it looks like the numeric locale information is read from the environment when pyodbc is initialized (i.e. imported): Hence, if you add an |
One other thing, if you |
@gv-collibris Just for the record, what is the database instance language you are using, and the database collation? If you run the following: USE <your database>;
SELECT @@language AS instance_lang;
SELECT CONVERT(varchar(256),SERVERPROPERTY('collation')) AS database_collation; ...what do you get? |
Instance language: us_english |
Thanks, @gv-collibris . I thought perhaps you might be using a French-collated database instance but that does not appear to be the case. Currently, it appears pyodbc reads the decimal point character from the current locale when it is imported (and only once). Personally, I'm not sure whether it should be doing that at all because the decimal point character in the result set is probably added by the SQL Server database engine rather than any C libraries in the server itself (although I'm speculating there). In the scenario you describe, the workarounds appears to be either:
I hope that helps. |
I used an other workaround: I round and cast to integer, then divide by 100 in the Python code. |
Glad to hear you're not being held up by this. This curious locale behavior was not something I was aware of, so it's good you raised this. Many thanks. |
The reason for this is in getdata.cpp GetDataDecimal function. There it loops through all the characters trying to detect the decimal mark, but it uses the one based on locale for comparison, while ODBC driver always uses "." One solution I can think of is to replace It looks a little hacky, but it should work for all drivers, whether they use the locale based one or default one. What do you guys think? |
I can kind of understand why the decimal separator from the Python locale was chosen to be used by pyodbc when parsing decimal values, but it still seems something of an odd choice. First off, I'm making a big assumption that decimal values are generated on the database server as essentially strings, and are sent over the wire more-or-less untouched. So a German-language database might generate a decimal value like "12.345.678,99". This is my big assumption and I'm happy to be corrected on that. On SQL Server, changing the language of a database (e.g. My concern about using the Python locale is that there's no guarantee the app server (running Python) is going to have the same locale as the database server. The database server might be "German" but the app server could be "British". The locales might match, but it's not guaranteed. Hence, it seems something of a stretch to use the app server's locale to figure out what the decimal point character is going to be. By the way, @v-makouz , that's also an argument not to use I appreciate I'm not coming up with any answers here. The big question seems to be whether decimal values can be received in a whole variety of formats (including monetary values), as per the comment on the GetDataDecimal function. If so, then this is not an easy problem to solve. If decimal values can truly be "12,345.678000", "12.345.678,99", "$876.55", or "-345678.77€", then this is very tricky. For example, how would you distinguish "1,234"? Is that one thousand two hundred and thirty four, or one point two three four? Short of making a test query like Ref: https://docs.microsoft.com/en-us/globalization/locale/number-formatting |
The comments indicated that the "proper" way to read using a binary format weren't working for all database. Maybe the 5.0 version should default to binary but have a function to go back to strings and configuring the separator. I'd love to be able to write a script that would print out the configuration you should set. The difficulty is you'd have to be tricky getting the information into the DB to read back. That is, if you don't know whether to use "." or "," you have to not use any in your inputs. That is, instead of "select 123.45" you need something like "select (cast 12345 as decimal(19,2)) / 100". This might be a good idea even if just provided as a script alongside pyodbc. |
Environment
Issue
Numbers (e.g. 3656.880000) from the database are turned to different numbers in Python, depending on the locale, e.g. 3656.88 for an English locale, and 3656880000 for a French one.
Expected behavior
I would expect to always obtain the correct number in the Python code.
The text was updated successfully, but these errors were encountered: