Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AWS S3 Simple Authentication (Access/Secret Key) #410

Closed
spiegela opened this issue Sep 12, 2024 · 4 comments
Closed

Support AWS S3 Simple Authentication (Access/Secret Key) #410

spiegela opened this issue Sep 12, 2024 · 4 comments
Assignees

Comments

@spiegela
Copy link

What went wrong?

When creating as new table with qbeast-spark on an S3 bucket configured with Access Key/Secret Key credentials, Spark inaccurately indicates that the table already exists.

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

Configure spark to use S3 with simple credentials & Qbeast:

spark.packages io.delta:delta-spark_2.12:3.2.0,org.apache.hadoop:hadoop-aws:3.3.1,io.qbeast:qbeast-spark_2.12:0.6.0

spark.sql.extensions io.qbeast.spark.internal.QbeastSparkSessionExtension
spark.sql.catalog.spark_catalog io.qbeast.spark.internal.sources.catalog.QbeastCatalog

spark.hadoop.fs.s3a.access.key			<ACCESS KEY>
spark.hadoop.fs.s3a.secret.key			<SECRET KEY>
spark.hadoop.fs.s3.impl				org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.hive.metastore.warehouse.dir	s3a://<BUCKET NAME>/

Attempt to create the bucket:

spark-sql (default)> CREATE TABLE delta.`s3a://spieg-qbeast/qbeast-table` (id INT, name VARCHAR(255)) USING qbeast;
[TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `delta`.`s3a://spieg-qbeast/qbeast-table` because it already exists.
Choose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects.

2. Branch and commit id:

0.6.0

3. Spark version:

3.5.1

4. Hadoop version:

3.3.4

5. How are you running Spark?

Local computer. Reproduced in Qbeast cloud

6. Stack trace:

[TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `delta`.`s3a://spieg-qbeast/qbeast-table` because it already exists.
Choose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects.
@fpj
Copy link
Contributor

fpj commented Sep 12, 2024

Thanks, @spiegela. It sounds like the table is already registered in the catalog you're using. Out of curiosity, not that I particularly think it is going to make a difference, have you tried using qbeast-spark 0.7.0?

@cugni cugni self-assigned this Sep 13, 2024
@cugni
Copy link
Member

cugni commented Sep 13, 2024

Hi Aaron!
The proper way to create that table is to issue the following command:

CREATE TABLE  my_table (id INT, name STRING)
USING qbeast
LOCATION 's3a://spieg-qbeast/qbeast-table'
OPTIONS( 'columnsToIndex'='id,name')

I had to change the type to STRING and add the LOCATION and OPTIONS settings.
As far as I know, the syntax delta.s3://somewhere is used to delta table from the storage without registering it as a table in the catalog. For instance, you can run this:

SELECT * FROM delta.`s3a://spieg-qbeast/qbeast-table`

However, this syntax only works if you are using the DeltaCatalog, while for some reason, we have to investigate why it doesn't work with our catalog (also see syntax qbeast.s3/ doesn't work). I'll open an issue about this.

By the way, be careful, the correct configuration for adding packages is spark.jars.packages.

@cugni
Copy link
Member

cugni commented Sep 13, 2024

I've created issue #412 to take care of the lack of support of delta.` path` and qbeast.`path` syntax in QbeastCatalog.

@spiegela
Copy link
Author

Thanks @cugni. Now that I figured out, I had an obsolete jar in my include path, the USING qbeast LOCATION ... syntax is working fine. I'm GTG.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants