-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL stored procs for labels to replace python procs #591
base: main
Are you sure you want to change the base?
Conversation
joshelser
commented
Mar 20, 2024
- Split up label tests into separate files
Non-scientific results show create label latency on the low end around 500ms and on the upper end around 2.5s. I would imagine we can chop some of that off by consolidating the number of "validation rows". I wasn't focused on that in this pass (more focused on correctness and maintainable tests). |
85733dc
to
bbf646c
Compare
Jacques mentioned that he wanted to take a look at this change, so I've tagged him for review. |
FWIW, there wasn't much in terms of python tests we added that weren't already encapsulated in the label test cases (largely, because we didn't write pytests that required Snowflake -- just exercising the pydantic validations). |
Hey, can you separate this into two PRs? PR1: Moving runtime validation logic from python to SQL. |
bootstrap/004_labels.sql
Outdated
$$ | ||
begin | ||
-- Ungrouped labels | ||
insert into internal.validate_labels (sql, message, simple, create_only) values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a few thoughts here.
- Let's just make this a view
- Let's use convention over configuration for optional arguments
- Let's use Snowflake object literals to make this cleaner/clearer as opposed to positional
- There is a bunch of duplicate logic. Let's factor out into common checks. Some are direct copy and paste. Some are simple modifications to avoid nearly complete duplication
I suggest we iterate on a couple tests looking good in this code before moving all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand this feedback either (maybe my english parser is broken) Maybe a short call would make it easier?
bootstrap/004_labels.sql
Outdated
-- label name must be unique across all group_names | ||
('(select count(*) = 0 from internal.labels where group_name = f:name and group_name is not null)', 'A label group already exists with this name', true, true), | ||
-- Condition must compile | ||
('with result as procedure (input varchar) returns boolean language sql as \$\$ begin let c varchar := (select parse_json(:input):condition);execute immediate \'select case when \' || :c || \' then true else false end from reporting.enriched_query_history limit 1\';return true;end;\$\$ call result(?);', 'Label condition failed to compile', false, false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not like the others. Should come up with different pattern.
bbf646c
to
ebe3a1b
Compare
test/unit/test_label_procs.py
Outdated
# Create a label with this name | ||
old_label = generate_unique_name("old_label", timestamp_string) | ||
sql = f"call ADMIN.CREATE_LABEL('{old_label}', NULL, NULL, 'rows_produced > 100');" | ||
assert run_proc(conn, sql) is None, "ADMIN.CREATE_LABEL did not return NULL value!" | ||
|
||
# Create a dynamic label | ||
old_dyn_label = generate_unique_name("old_dyn_label", timestamp_string) | ||
sql = f"call ADMIN.CREATE_LABEL(NULL, '{old_dyn_label}', NULL, 'QUERY_TYPE', TRUE);" | ||
assert run_proc(conn, sql) is None, "ADMIN.CREATE_LABEL did not return NULL value!" | ||
|
||
label = generate_unique_name("label", timestamp_string) | ||
sql = statement.format(label=label) | ||
sql = statement.format( | ||
label=label, old_label=old_label, old_dyn_label=old_dyn_label | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The python label implementation validated the conditions first and then verified existing labels against the input (e.g. cannot create a label with the same name as one that exists, a label has to exist with the name being updated).
I can separate this change out to its own PR, but hopefully this is clear enough.
ebe3a1b
to
b28e4c9
Compare
Pushed a few commits:
I didn't migrate the grouped labels and dynamic labels validations (so those tests will fail). I wanted to see what you two thought. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can split this into a few PRs. Just having trouble reviewin geveyrhting,
bootstrap/004_labels.sql
Outdated
-- Ungrouped labels | ||
-- simple=true and create_only=false as the defaults | ||
CREATE OR REPLACE VIEW INTERNAL.VALIDATE_LABELS AS | ||
select $1 as obj from ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a bit weird. Could we use VALUES
isntead of a union all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frustratingly, not with the UDFs as the code currently has (to my knowledge). You can only use values when each value evaluates to a constant expression at compilation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yes, Snowflake can collapse the UDF to a constant expression so I could switch this back to using VALUES
but you can't use object_construct()
in a VALUES statement.
bootstrap/004_labels.sql
Outdated
CREATE OR REPLACE VIEW INTERNAL.VALIDATE_GROUPED_LABELS AS | ||
select $1 as sql, $2 as message, $3 as simple, $4 as create_only from ( | ||
-- Basic null checks, converting variant null to sql null | ||
SELECT VALIDATION.NOT_NULL('group_name'), 'Group name must not be null', true, false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh, its annoying we have these 3 types and so many duplicate checks across them
bootstrap/004_labels.sql
Outdated
SELECT '(select count(*) = 0 from internal.labels where group_name = f:group_name and group_rank = f:group_rank)', 'A label already already exists with this rank', true, true | ||
UNION ALL | ||
-- Condition must compile | ||
SELECT 'with result as procedure (input varchar) returns boolean language sql as \$\$ begin let c varchar := (select parse_json(:input):condition);execute immediate \'select substring(\' || :c || \', 0, 0) from reporting.enriched_query_history where false\';return true;end;\$\$ call result(?);', 'Invalid label condition', false, false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to do something about these complex checks. Lets discuss in our sync up
@rymurr feedback from friday. Notably, I dropped grouped and dynamic label stuff from the current PR to reduce the size of it. One addition that we didn't discuss was that I also dropped the overloaded CREATE_LABEL and UPDATE_LABEL procedures (making The UPDATE_LABEL procedure is still complicated -- I feel like the clean way to address complexity there is to make separate procedures for other labels, rather than overloading into singular procedures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this! Looking goood. Can we get some accurate timings for this?
bootstrap/0036_sql_tools.sql
Outdated
$$ | ||
begin | ||
-- Select the validation rows. In a create, choose all. In an update, omit rows which are for create_only. | ||
let validation_query text := (select tools.templatejs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the interest of fewer calls to js I am not sure if we need this. I would maybe do smoething like:
let validation_query text := 'select * from identifier('internal.validate_' || ?) where iff(?, true, NOT COALESCE(obj[\'create_only\'], FALSE))'
let rs resultset := (execute immediate :validation_query using (:validation_table, :is_create));
bootstrap/004_labels.sql
Outdated
AS | ||
BEGIN | ||
let c varchar := (select parse_json(:input):condition); | ||
let stmt varchar := (select 'select case when ' || :c || ' then true else false end from reporting.enriched_query_history limit 1'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this coul duse execute immediate stmt using (c)
which would simplify the concat above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can use execute immediate
in this case. c
is just a string, but the case
statement in the query needs to be a condition which evaluates to a boolean.
I can't think of any tricks to your suggestion work (I thought string concat to build the query is the only solution). Have I missed one?
bootstrap/004_labels.sql
Outdated
let stmt varchar := (select 'select ' || :n || ' from reporting.enriched_query_history where false'); | ||
execute immediate stmt; | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
bootstrap/004_labels.sql
Outdated
let c varchar := (select parse_json(:input):condition); | ||
let stmt varchar := (select 'select case when ' || :c || ' then true else false end from reporting.enriched_query_history limit 1'); | ||
execute immediate stmt; | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this return the output of execute immediate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we rarely hit the else branch (often raise an exception), but yes that would be correct to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, no, that's not right. We are just testing the compilation here. The actual row we get from query history may evaluate to true or false..
bootstrap/0036_sql_tools.sql
Outdated
let rs resultset := (select false); | ||
return table(rs); | ||
EXCEPTION | ||
WHEN STATEMENT_ERROR THEN | ||
let rs resultset := (select true); | ||
return table(rs); | ||
WHEN OTHER then | ||
let rs resultset := (select false); | ||
return table(rs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change this proc and the previous one to returning a table so https://github.com/sundeck-io/OpsCenter/pull/591/files#diff-4d8db5ec382ac980083e32c616facb93597373fbc86135753e973a6fb38493a3R30 works w/o special handling.
I could capture the boolean
and have a single table(rs)
at the very end, but I really wish I didn't have to.
Short of wrapping a anonymous procedure around the real procedure, I couldn't come up with a way to "rename" the output column from the proc (the column name of the result is always the name of the procedure itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 minor question and seems tests are still failing. I think it looks great though.
Any comments on time and improvements?
let c varchar := (select parse_json(:input):condition); | ||
let stmt varchar := (select 'select ' || :c || ' from reporting.enriched_query_history limit 1'); | ||
execute immediate :stmt; | ||
-- if the condition is invalid, we let the exception propagate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we allow the exception to propagate here but below we capture it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't managed the exceptions below so we can distinguish statement_error
from other
exceptions. In this procedure, any exception indicates a failed validation (and we can let the caller handle it since we have the caller doing this as a fail-safe).
SELECT VALUES must give constant values. Switch the view validation statements to select and union all since the UDF is not constant expression.
* Remove templatejs from validation UDFs as they cannot be used in VALUES statements. * Extract anonymous procedures into standalone procs, use those in label validation * Fiddle with the validation view. Also can't use OBJECT_CONSTRUCT in a VALUES expression.
…ouped and dynamic label stuff now to trim the PR size
Yeesh, we took a step backwards for ungrouped label creation
The sum of queries executed by I'll re-evaluate with a hybrid table in a momen.t |
Making |
('CALL VALIDATION.VALID_LABEL_CONDITION(?)', '{"message": "Invalid label condition", "complex": true}'), | ||
-- make sure label name doesn't exist as query history column | ||
('CALL VALIDATION.IS_VALID_LABEL_NAME(?)', '{"message": "Label name cannot be the same as a column in REPORTING.ENRICHED_QUERY_HISTORY", "complex": true}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two checks as standalone procedures (versus inline blocks) cost ~1s.
I think we should be able to make all validation a single SQL select query. Prove me wrong :D |
704d76e
to
c69445f
Compare
I just pushed the latest (which has grouped and ungrouped labels passing their tests, but not dynamic labels). I did a quick prototype with just the simple validations in one I'll see if I can get all of the checks into a single query. The one problem I can immediately think of is that we rely on an exception to be raised to verify that a label name doesn't duplicate a name in |
Maybe an unpopular opinion: should we give this up and just go back to V1 of these? V1 beign specialized procedures for each crud type? I seem to recall those being faster than this. My memory: v1: First we had standalone hand coded procs for each entity type. Reasonably fast but hard to extend and had to be called as stored procs from streamlit My suggestion, in the interest of expediency is to either defer this whole problem or to stop trying to make generic validation. Thoughts? |
I had the same thought yesterday. I almost installed an old version just to confirm that they were actually faster before. |