Postgres bit is a string but converted to Numeric in Bigquery #740

AlexStansfield · 2021-07-14T04:14:45Z

@jmriego ran into issues with our bit fields.

As described by postgres a bit field is a string made up on 1s and 0s (https://www.postgresql.org/docs/8.2/datatype-bit.html).

If you have a bit(x) where x > 1 then it's possible you have a value that starts with a 0. However the fastsync to bigquery tries to put it into a NUMERIC field.

That results in issues like this:

ERROR: Could not parse '011111111111111111111111111111111111111111111111111111111111111111111111111111' as NUMERIC for field provinces (position 65) starting at location 0  with message 'Invalid NUMERIC value: 011111111111111111111111111111111111111111111111111111111111111111111111111111'

Offending line appears to be here:

pipelinewise/pipelinewise/fastsync/postgres_to_bigquery.py

Line 43 in 01613ee

'bit': ['BOOL', 'NUMERIC'],

When I changed NUMERIC to STRING and ran again then all is well.

The text was updated successfully, but these errors were encountered:

jmriego · 2021-07-14T13:23:56Z

thanks @AlexStansfield , I'm having a look now. I have to compare this to what the non-fastsync replication is doing and try to get the same result. It should be easy enough with all this detail

jmriego · 2021-07-14T13:53:12Z

that's strange, I tested tap-postgres and it seems like it's not reading the BIT column at all:

create table jmtest (
id INTEGER,
name VARCHAR,
somebytes BIT(8),
etl_updated_timestamp TIMESTAMP);

@koszti when I run the tap in discovery mode is not detecting the type of somebytes
I'm going to prepare a MR to fix this issue and load correctly as a NUMERIC (in fastsync) to be as similar to the Snowflake target as possible. Does that make sense?

jmriego · 2021-07-14T15:14:27Z

sorry for all these messages @AlexStansfield @koszti

So what it's really happening is this is trying to load too many bits into a numeric field, not because of the leading zeroes. From what I can see this is affecting all targets, not just BigQuery:

BigQuery has a maximum of 38 digits
Snowflake also has a maximum of 38 digits
Redshift also has a maximum of 38 digits
Postgres is recording it into a DOUBLE PRECISION

I think if we are going to fix this it makes sense to make the same change in all targets. What do you think?

AlexStansfield · 2021-07-15T03:08:03Z

@jmriego

I see what you mean about what's causing the error.

However I was wondering how a bit like 00011001 would be recorded in a NUMERIC field? My understanding is it would translate to 11001, which isn't what we'd really want.

jmriego · 2021-07-15T14:22:24Z

@AlexStansfield I'm afraid that's the case. I am collaborating in adding the BigQuery capabilities to PipelineWise, but that exact same replication is done in the other targets.
I understand what you mean anyway. You could always LPAD it with 0's to the length of the source data, but the problem is that you don't know the length of the source column once you are already in the target table

jmriego mentioned this issue Jul 14, 2021

Bit types of length > 1 not being detected transferwise/pipelinewise-tap-postgres#104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres bit is a string but converted to Numeric in Bigquery #740

Postgres bit is a string but converted to Numeric in Bigquery #740

AlexStansfield commented Jul 14, 2021

jmriego commented Jul 14, 2021

jmriego commented Jul 14, 2021 •

edited

Loading

jmriego commented Jul 14, 2021

AlexStansfield commented Jul 15, 2021 •

edited

Loading

jmriego commented Jul 15, 2021

Postgres bit is a string but converted to Numeric in Bigquery #740

Postgres bit is a string but converted to Numeric in Bigquery #740

Comments

AlexStansfield commented Jul 14, 2021

jmriego commented Jul 14, 2021

jmriego commented Jul 14, 2021 • edited Loading

jmriego commented Jul 14, 2021

AlexStansfield commented Jul 15, 2021 • edited Loading

jmriego commented Jul 15, 2021

jmriego commented Jul 14, 2021 •

edited

Loading

AlexStansfield commented Jul 15, 2021 •

edited

Loading