Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IDENTITY_INSERT to work cross-db #3402

Conversation

thephantomthief
Copy link
Contributor

@thephantomthief thephantomthief commented Jan 14, 2025

Description

Previously, passing three part object names to SET IDENTITY_INSERT was
throwing an error. In this commit, if the relation name is a three part
name, we will parse the catalog name and set the IDENTITY_INSERT
property for the relation in that catalog.

Also with this commit, we will check if current user has permission to SET
IDENTITY_INSERT for the relation.

Task: BABEL-3212
Signed-off-by: Sharu Goel [email protected]

Test Scenarios Covered

  • Use case based - Yes

  • Boundary conditions - N/A

  • Arbitrary inputs - Yes

  • Negative test cases - Yes

  • Minor version upgrade tests - N/A

  • Major version upgrade tests - N/A

  • Performance tests - N/A

  • Tooling impact - N/A

  • Client tests - N/A

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Sharu Goel added 2 commits January 14, 2025 06:42
Previously, passing three part object names to SET IDENTITY_INSERT was
throwing an error. In this commit, if the relation name is a three part
name, we will parse the catalog name and set the IDENTITY_INSERT
property for the relation in that catalog.

Also with this commit, we will check if current user has permission to SET
IDENTITY_INSERT for the relation.

Task: BABEL-3212
Signed-off-by: Sharu Goel <[email protected]>
@coveralls
Copy link
Collaborator

coveralls commented Jan 14, 2025

Pull Request Test Coverage Report for Build 12909229214

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 34 of 42 (80.95%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.006%) to 74.97%

Changes Missing Coverage Covered Lines Changed/Added Lines %
contrib/babelfishpg_tsql/src/pl_handler.c 34 42 80.95%
Totals Coverage Status
Change from base Build 12891231321: 0.006%
Covered Lines: 47027
Relevant Lines: 62728

💛 - Coveralls

@thephantomthief thephantomthief requested review from HarshLunagariya and shalinilohia50 and removed request for HarshLunagariya January 14, 2025 14:15
@@ -1208,7 +1208,7 @@ Index Only Scan using babel_3384_test_pkey on babel_3384_test

~~START~~
text
Babelfish T-SQL Batch Parsing Time: 0.159 ms
Babelfish T-SQL Batch Parsing Time: 0.154 ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't these values change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +328 to +329
schema ? schema : "",
schema ? "." : "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will produce db1.obj1 instead of db1..obj1 when there is no schema name supplied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually SET IDENTITY_INSERT db1..obj1 ON/OFF will throw syntax error at or near ".."

Comment on lines 412 to +416
schema_name = get_physical_schema_name(cur_db_name,
schema_name);
logical_schema_name);

/* If no schema name is provided, we should use default schema */
if (!schema_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this logic, We will be using default schema when user supplies wrong schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any testcase covering it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 1 to 4
-- tsql
CREATE DATABASE identity_insert_db
GO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid making this new file IDENTITY-INSERT-CROSS-DB? , instead I suggest moving this to BABEL-IDENTITY.

Comment on lines 2464 to 2480
Oid
get_pg_class_obj_owner_id(Oid relation_oid)
{
HeapTuple tuple;
Oid ownerId;

tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relation_oid));
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_TABLE),
errmsg("relation with OID %u does not exist", relation_oid)));
ownerId = ((Form_pg_class) GETSTRUCT(tuple))->relowner;
ReleaseSysCache(tuple);

return ownerId;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use existing engine function object_ownercheck for checking if user is owner of table.

@@ -335,6 +348,10 @@ assign_identity_insert(const char *newval, void *extra)
char *id_insert_rel_name = NULL;
char *id_insert_schema_name = NULL;
char *cur_db_name;
char *catalog_name = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we naming this as catalog_name ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database name is also commonly referred as catalog name

Comment on lines 467 to 470
if (!(has_privs_of_role(curr_user_id, get_pg_class_obj_owner_id(rel_oid)) ||
has_privs_of_role(curr_user_id, get_db_ddladmin_oid(cur_db_name, false)) ||
has_privs_of_role(curr_user_id, get_db_owner_oid(cur_db_name, false))))
throw_error_for_identity_insert(catalog_name, logical_schema_name, rel_name);
Copy link
Contributor

@tanscorpio7 tanscorpio7 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ideal way to check permission is to figure out the logical db_id from the relation schema OID.
Relation schema OID --> belongs to which logical database --> now check permissions using this db_id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to force scenarios like

  1. Current db is some_db_1 but search path contains schema from some_other_db_2
  2. If we create database db1 & db1_db1, I can refer db1_db1 database object using db1.db1_schema_name.object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I can force (1). For (2) if there exists database db1 and db1_db1, you cannot create schema db1_dbo in db1 database. It will throw error schema "db1_db1_dbo" already exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE DATABASE db1

CREATE DATABASE db1_db1

USE db1_db1

Changed database context to 'db1_db1'.
CREATE SCHEMA s1

CREATE VIEW s1.v1 AS SELECT 'I AM IN DB1_DB1'

USE db1

Changed database context to 'db1'.
SELECT * FROM db1.db1_s1.v1

                                                                                                                                                                                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I AM IN DB1_DB1                                                                                                                                                                                                                                                 

(1 rows affected)

Copy link
Contributor

@tanscorpio7 tanscorpio7 Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1, easiest way is to set the search path yourself using set_config() and we internally do some set_config("Search_path", ..) that will have the same effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2

1> use master
2> go
Changed database context to 'master'.
1> create procedure sp_proc_in_master_dbo as select * from v1
2> go
1> create view v1 as select 'I am in master_dbo'
2> go
1> use tempdb
2> go
Changed database context to 'tempdb'.
1> sp_proc_in_master_dbo
2> go
                                                                                                                                                                                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I am in master_dbo                                                                                                                                                                                                                                              

(1 rows affected)

This is just one case. I can force many other such cases where we pick an object which does not exists in current logical database when object is not schema qualified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking on lines of incorrectly resolving from object A to object B where both A and B exist. But you are talking about the case where A does not exist and we resolve to B. Will fix this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about following scenarios in our discussion, but it seems like INSERT statement is resolving it properly. We are good here:

1> create database [db1.sch1];
2> go
1> create database [db1_sch1];
2> go
1> use [db1_sch1];
2>
3> go
Changed database context to 'db1_sch1'.
1> create schema [sch2];
2> go
1> create table [sch2].t1(a int identity not null);
2> go
1> set identity_insert [db1.sch1].[sch2].t1 ON;  --> It should have thrown error since there is no such table in [db1.sch1_sch2]
2>
3> go
1> insert into [db1.sch1].[sch2].t1 values (1);
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
relation "db1.sch1_sch2.t1" does not exist
1>

Comment on lines 333 to 335
static void
assign_identity_insert(const char *newval, void *extra)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move this to check hook. A GUC assign hook must never fail.
see src/backend/utils/misc/README

Note that there is no provision for a failure result code.  assign_hooks
should never fail except under the most dire circumstances, since a failure
may for example result in GUC settings not being rolled back properly during
transaction abort.  In general, try to do anything that could conceivably
fail in a check_hook instead, and pass along the results in an "extra"
struct, so that the assign hook has little to do beyond copying the data to
someplace.  This applies particularly to catalog lookups: any required
lookups must be done in the check_hook, since the assign_hook may be
executed during transaction rollback when lookups will be unsafe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this one might not be a problem in this case since we are not actually using the GUC value anywhere and instead maintaining are own static variables.

@@ -362,40 +379,61 @@ assign_identity_insert(const char *newval, void *extra)
option_flag = (char *) linitial(elemlist);
rel_name = (char *) lsecond(elemlist);

/* Use catalog name if provided */
if (list_length(elemlist) == 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add test cases where schema name/table name has a full stop. Using !SplitGUCList(input_string, '.', &elemlist) does not make me feel good.

Copy link
Contributor Author

@thephantomthief thephantomthief Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is wrong. We need to remove the grammar from bison and set the GUC from ANTLR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic because customer can do [schema.name].table for set identity insert and we will incorrectly change identity insert value of some other logical databases table.

/* Check the user provided schema value */
if (list_length(elemlist) >= 3)
{
schema_name = (char *) lthird(elemlist);
logical_schema_name = (char *) lthird(elemlist);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let use add a test case with schema name missing set identity_insert db1..t5 on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added them before but everything threw syntax error. I'll add them back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let keep the failing tests even if we are not fixing it right now. This will ensure any future changed does not crash here because of NULL logical_schema_name

Sharu Goel added 2 commits January 22, 2025 10:21
@shardgupta shardgupta merged commit f60d3c0 into babelfish-for-postgresql:BABEL_5_X_DEV Jan 22, 2025
46 checks passed
@shardgupta shardgupta deleted the sharu-dev-cross-db-identity branch January 22, 2025 14:17
thephantomthief added a commit to amazon-aurora/babelfish_extensions that referenced this pull request Jan 22, 2025
Previously, passing three part object names to SET IDENTITY_INSERT was
throwing an error. In this commit, if the relation name is a three part
name, we will parse the catalog name and set the IDENTITY_INSERT
property for the relation in that catalog.

Also with this commit, we will check if current user has permission to SET
IDENTITY_INSERT for the relation.

Task: BABEL-3212

Signed-off-by: Sharu Goel <[email protected]>
shardgupta pushed a commit that referenced this pull request Jan 22, 2025
Previously, passing three part object names to SET IDENTITY_INSERT was
throwing an error. In this commit, if the relation name is a three part
name, we will parse the catalog name and set the IDENTITY_INSERT
property for the relation in that catalog.

Also with this commit, we will check if current user has permission to SET
IDENTITY_INSERT for the relation.

Task: BABEL-3212

Signed-off-by: Sharu Goel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants