-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1859974: Issue with SnowflakeCursor.executemany() ? #2123
Comments
Hello @fcremer-nl , Thanks for raising the issue. I tried to reproduce it in Windows 11 using Python 3.10 and Snowpark 1.26.0, and I am not able to reproduce it. Could you please share the full code snippet? `load_schema = StructType([ rows = [ df = session.create_dataframe(rows, load_schema) target_table = "example_table" df.write.mode('append').save_as_table(target_table) result_df = session.table(target_table) Output: |
Hi Sujan,
Thank you for looking into this issue.
I have tried your example, but it does not trigger the
SnowflakeCursor.executemany() function. If I use the sample code to try to
insert the same data that triggered the bug, it indeed triggers the
executemany() function; apparently snowpark determines which function to
use based on the amount of data(?). However, the sample code with my data
surprising works.
I have traced the issue to the following line SnowflakeCursor.executemany()
function:
if self._connection.is_pyformat:
The function SnowflakeConnection.is_pyformat() is a single line:
return self._paramstyle in ("pyformat", "format")
It turns out that _paramstyle = 'qmark' when the session is created from a
config file. However, if the connection is created from a existing
connection [ session = Session.builder.configs({"connection":
existing_snowflake_connection}).create() ] then paramstyle='pyformat'.In
that case overwriting the paramstyle [ session.connection._paramstyle =
'qmark') ] alleviates the problem.
I think the assignment of paramstyle is definitely an issue with snowpark
and I am not sure if this is correct behaviour.
Still I am wondering if even when paramstyle='pyformat' whether the code in
SnowflakeCursor.executemany() is correct. I cannot see how the [ fmt %
self._connection._process_params_pyformat(param, self) ] can work when
format is a string like '(?,?,?,?)' and the argument after the % is a tuple.
Looking forward to hear from you.
Best regards,
Frank
…--
Frank Cremer
On Fri, Dec 20, 2024 at 4:36 AM Sujan Ghosh ***@***.***> wrote:
Hello @fcremer-nl <https://github.com/fcremer-nl> ,
Thanks for raising the issue.
I tried to reproduce it in Windows 11 using Python 3.10 and Snowpark
1.26.0, and I am not able to reproduce it. Could you please share the full
code snippet?
`load_schema = StructType([
StructField("ID", IntegerType()),
StructField("NAME", StringType())
])
rows = [
(1, "Alice"),
(2, "Bob"),
(3, "Charlie")
]
df = session.create_dataframe(rows, load_schema)
target_table = "example_table"
df.write.mode('append').save_as_table(target_table)
result_df = session.table(target_table)
result_df.show()
------------------------------
|"ID" |"NAME" |
|1 |Alice |
|2 |Bob |
|3 |Charlie |`
—
Reply to this email directly, view it on GitHub
<#2123 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BNYU4BG3C46R6FR73FNEYX32GOGFXAVCNFSM6AAAAABTYEVSY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJWGIZDCNRRHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hello @fcremer-nl , Snowpark internally decides whether to batch writes or use executemany() based on the size of the data or other internal heuristics. In terms of performance and feature, there is no difference whether you use qmark or pyformat or format or numeric. You can not use pyformat with positional arguments ( ?,?) etc because If you would like to use using positional placeholders (?,?,?,?) with tuples, please use "qmark" style. Regards, |
Hi Sujan,
Thank you for your reply. I am now convinced that the issue is not with the
Snowflake connector, but with Snowpark. It works fine when you create a
connection from scratch, but if you use an existing connection using
''' self.session = Session.builder.configs({"connection":
my_existing_python_connection}).create() ''' then it takes over the
paramstyle as shown below in snowpark/session.py:
# Set paramstyle to qmark by default to be consistent with
previous behavior
if "paramstyle" not in self._options:
self._options["paramstyle"] = "qmark"
Which fails if paramstyle!="qmark" when you try to write a dataframe '''
df.write.mode('append').save_as_table(target_table) ''' as the dataframe is
written under the assumption that the paramstyle is "qmark". So I am
wondering if the paramstyle for the Snowpark session should not be
permanently fixed to "qmark" in the code above.
Best regards,
Frank
…--
Frank Cremer
+31 6 3891 8872
On Thu, Jan 2, 2025 at 10:10 AM Sujan Ghosh ***@***.***> wrote:
Hello @fcremer-nl <https://github.com/fcremer-nl> ,
Snowpark internally decides whether to batch writes or use executemany()
based on the size of the data or other internal heuristics.
In terms of performance and feature, there is no difference whether you
use qmark or pyformat or format or numeric.
Its mentioned in the documentation as well
https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-api
You can not use pyformat with positional arguments ( ?,?) etc because
The format string and parameter substitution are mismatched.
*The pyformat style explicitly requires named placeholders and
dictionary-based parameters.*
For pyformat, the parameters must be dictionaries.
If you would like to use using positional placeholders (?,?,?,?) with
tuples, please use "qmark" style.
Regards,
Sujan
—
Reply to this email directly, view it on GitHub
<#2123 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BNYU4BFOTB44W36HDIIQULL2IT67VAVCNFSM6AAAAABTYEVSY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRXGQ3DGMZZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hello @fcremer-nl , Thanks for the update.
At present, there is no plan to change the bindings for Snowpark. For python connector, you can change the bindings by specifying at connection level. from snowflake import connector Snowflake connection configurationssnowflake_config = { } conn = snowflake.connector.connect(**snowflake_config) print(f"Paramstyle: {conn._paramstyle}") I hope all clear now. Regards, |
Hi Sujan,
I am sorry for the confusion. I am not trying to set the paramstyle for any
performance improvements etc within Snowpark. My issue is the following:
- If I open a session directly in Snowpark, I see paramstyle = "qmark",
see the first part of the code below and writing a dataframe works. This is
correct, but not how I am using it in my application.
- In my application I am opening a session directly in Python (see
second part of the code below).
- When I also use that connection for initiating a snowpark session, I
get paramstyle = "pyformat"
- Now if I write a snowpark dataframe to this session, it fails, because
internally snowpark assumes paramstyle="qmark" but the snowflake code
handles it using paramstyle="pyformat".
- My workaround is to set session2.connection._paramstyle = "qmark",
after I have initiated the session
So my question is: shouldn't the snowpark session.connection._paramstyle
always be set tot "qmark" whether this parameter is set or not?
Best regards,
Frank
…---
from snowflake.snowpark import Session
import snowflake.connector
# Open session directly in snowpark
session = Session.builder.config("connection_name",
"snowflake_ae_de_dev").create()
print(f'paramstyle = {session.connection._paramstyle} (expected "qmark")')
# OUTPUT: paramstyle = qmark (expected "qmark")
# Open session first and then use it in snowpark
with
snowflake.connector.connect(connection_name="snowflake_ae_de_dev",paramstyle='pyformat')
as conn:
session2 = Session.builder.configs({"connection": conn}).create()
print(f'paramstyle = {session2.connection._paramstyle} (expected
"qmark")')
# OUPUT: paramstyle = pyformat (expected "qmark")
# The following will fail, because paramstyle != "qmark"
df = session.create_dataframe(rows, schema)
df.write.mode('append').save_as_table(target_table)
# Forcing the paramstyle and it works
session2.connection._paramstyle = "qmark"
df = session.create_dataframe(rows, schema)
df.write.mode('append').save_as_table(target_table)
--
Frank Cremer
+31 6 3891 8872
On Fri, Jan 3, 2025 at 7:42 AM Sujan Ghosh ***@***.***> wrote:
Hello @fcremer-nl <https://github.com/fcremer-nl> ,
Thanks for the update.
1. qmark and numeric bindings are server side binding which helps to
increase performance for batch inserts via streaming and thats the reason
why its kept as default when creating a session in snowpark.
2. Snowpark only supports qmark binding and its default for snowpark.
its mentioned in the documentation
https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.Session.sql
At present, there is no plan to change the bindings for Snowpark.
For python connector, you can change the bindings by specifying at
connection level.
from snowflake import connector
Example:
Snowflake connection configurations
snowflake_config = {
"account": "", # Add your Snowflake account
"user": "", # Add your Snowflake user
"password": "", # Add your Snowflake password (consider using a secure
method)
"role": "", # Add your Snowflake role
"warehouse": "", # Add your Snowflake warehouse
"database": "", # Add your Snowflake database
"schema": "", # Add your Snowflake schema
"paramstyle": "pyformat"
}
conn = snowflake.connector.connect(**snowflake_config)
cs = conn.cursor()
print(f"Paramstyle: {conn._paramstyle}")
I hope all clear now.
Regards,
Sujan
—
Reply to this email directly, view it on GitHub
<#2123 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BNYU4BA7VTSRFYRZCTUPHMD2IYWN7AVCNFSM6AAAAABTYEVSY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRYG42TQMRQGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Python version
Python 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
Operating system and processor architecture
Windows-10-10.0.19045-SP0
Installed packages
What did you do?
What did you expect to see?
I expected the data to be written to the table, but got the error "not all arguments converted during string formatting" at
Where the format (fmt) is a string, something like '(?,?,?,?,?)' and the right side the tuple with the data for the row.
If I change the line of code above to:
fmt.replace('?','%s') % self._connection._process_params_pyformat(param, self)
Then it works correctly. Similarly if I change the code original code to write row by row:
It also works!
Is this an issue with Snowpark or am I doing something stupid?
Can you set logging to DEBUG and collect the logs?
The text was updated successfully, but these errors were encountered: