Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#23785] YSQL: Incrementally refresh PG backend catcaches (part 4)
Summary: In part 3, I have made changes to extend the tserver to master heartbeat response message to also include the contents of `pg_yb_invalidation_messages` along with the contents of `pg_yb_catalog_version` when there is a change in `pg_yb_catalog_version` (via the existing fingerprint mechanism). The invalidation messages is set in the `db_catalog_inval_messages_data` proto field of the heartbeat response message. This diff reads the `db_catalog_inval_messages_data` from the heartbeat response message, and store it in the tserver private memory. A new map `ysql_db_invalidation_messages_map_` is added: for each database, it stores a doubly ended queue. Each element of the queue is a pair: (current_version, messages). The current_version is the catalog version, and messages is a blob representing the list of catalog cache invalidation messages generated by PG for the current_version. The maximum size of the queue is controlled by --ysql_max_invalidation_message_queue_size, default to 1024. This means that for each database, we can store a history of up to 1024 catalog versions. Logically this map simply extends the history of `ysql_db_invalidation_messages_map_`. Because by default we only store a history of 10 seconds of invalidation messages for each database, it is likely that in the heartbeat response we see much less than 1024 versions. In that case, the new versions and their associated messages are merged into the in-memory map `ysql_db_invalidation_messages_map_` while old entries are removed from the front of the queue. In this way, we let each tserver keep a more extended history of catalog versions and their invalidation messages, so that we can tolerate a PG transaction block that can run longer: a PG transaction block cannot do catalog cache refresh (whether incremental or full refresh) until the transaction completes. As a result, by the time the PG transaction block completes, there may be many DDL statements already executed. For example, if the PG local catalog version is 1 when it starts a transaction block, by the time the transaction block completes, 100 DDLs have been executed and 50 of them have incremented catalog versions, the latest catalog version that PG reads from shared memory is now 51. PG will need to read the entire sequence of 2, 3, ..., 51 and their invalidation messages in order to do a valid incremental cache refresh. By having a history of up to 1024 catalog versions, we can allow a longer running PG transaction block to still do incremental refresh. Note that even though `pg_yb_catalog_version` and `pg_yb_invalidation_messages` are written transactionally, they are not read transactionally at the master side when it prepares for the heartbeat response. Therefore we do not try to process the `db_catalog_version_data` and `db_catalog_inval_messages_data` atomically at tserver side since they could be out of sync in rare cases when the following sequence of events happens at master side: (1) pg_yb_catalog_version is read and put in response (2) pg_yb_catalog_version and pg_yb_invalidation_messages are updated transactionally (current_version = current_version + 1, with the messages) (3) pg_yb_invalidation_messages is read and put in response Test Plan: (1) Run YB_EXTRA_MASTER_FLAGS="--TEST_yb_enable_invalidation_messages=true --log_ysql_catalog_versions=true --vmodule=catalog_manager=2,heartbeater=2,master_heartbeat_service=2,pg_catversions=2 --TEST_simulate_catalog_message_read_failure=0.5" YB_EXTRA_TSERVER_FLAGS="--TEST_yb_enable_invalidation_messages=true --log_ysql_catalog_versions=true --vmodule=heartbeater=2,tablet_server=2,pg_catversions=2 --ysql_max_invalidation_message_queue_size=15" ./yb_build.sh --cxx-test pg_catalog_version-test Also look at the test logs indicating some code coverage: ``` [m-1] W0228 00:36:14.667902 4069245 master_heartbeat_service.cc:358] Could not get YSQL invalidation messages for heartbeat response: Internal error (yb/master/sys_catalog.cc:1695): Injected pg_yb_invalidation_messages read failure for testing. ``` ``` [ts-3] I0228 00:36:54.320386 4069668 tablet_server.cc:1204] reset catalog_versions_fingerprint_ ``` ``` [ts-2] W0228 00:29:16.253911 4061495 tablet_server.cc:1265] db_oid 16384 not found in ysql_db_invalidation_messages_map_ ``` ``` [ts-3] I0228 00:32:06.696022 4064671 tablet_server.cc:1234] vlog2: db_oid 1 message queue size: 4 ``` (2) Manual test: ``` yugabyte=# \i /tmp/t1.sql create table foo (id int); CREATE TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE alter table foo add column id2 text; ALTER TABLE alter table foo drop column id2; ALTER TABLE yugabyte=# alter table foo add column id2 text; ALTER TABLE yugabyte=# alter table foo drop column id2; ALTER TABLE ``` The last 2 alter table commands are manually typed (not from /tmp/t1.sql). Look at yb-tserver log: ``` I0228 02:52:11.464120 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 2 I0228 02:52:12.469295 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 7 I0228 02:52:13.474404 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 11 I0228 02:52:14.480101 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 15 I0228 02:52:15.485495 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 20 I0228 02:52:49.621912 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 16 I0228 02:52:52.636390 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 16 ``` We can see that about 5 back-to-back alter DDLs were executed per heartbeat interval. When manually execute one by one, the message queue size remained at 16 (before pop_front is called). So the message queue max size of 15 is checked and respected. Reviewers: kfranz, sanketh, mihnea Reviewed By: kfranz Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D42226
- Loading branch information