From c169105879c505ab7d116756245b98f936f7feb2 Mon Sep 17 00:00:00 2001 From: Daniel Black Date: Tue, 22 Oct 2024 12:27:40 +0530 Subject: [PATCH] more OSI --- ... Open Source India MariaDB Workshop .ipynb | 288 ++++++++++++++++-- 1 file changed, 261 insertions(+), 27 deletions(-) diff --git a/notebooks/2024 Open Source India MariaDB Workshop .ipynb b/notebooks/2024 Open Source India MariaDB Workshop .ipynb index f59e8bf..1755d6c 100644 --- a/notebooks/2024 Open Source India MariaDB Workshop .ipynb +++ b/notebooks/2024 Open Source India MariaDB Workshop .ipynb @@ -85,6 +85,94 @@ "(Source: Wikipedia: https://en.wikipedia.org/wiki/MariaDB and conversion by chatGPT)" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "47eae534-316d-4c29-b1ab-17c6f3d4960f", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62fd9e24-f611-4190-b207-076bbcdb7bb1", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bc18fb6b-cbd2-491c-9a82-8195808d23f5", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eee7e34a-ae49-4c87-8a07-28fc414ba156", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13669255-7193-47dd-96b5-fee3f710495e", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "40c4c39b-b4e7-4cf6-aac8-4ae7a12c7724", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f7cb5d9-0d11-4861-b267-9e490b343e88", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "961a0742-db90-4a7c-831a-4d8db7aa842e", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1381e994-d81c-4850-ae9e-a5cbf21a92f3", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c4cdaba8-2c29-40dc-91bb-e9b636e1cedb", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "647763f9-ae71-4673-aaf8-2b483f5aeff2", + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "markdown", "id": "af51619f-2990-4c15-8265-8d1e52978192", @@ -981,7 +1069,7 @@ "* If one argument is decimal and the other argument is a floating point, they are compared as floating point values.\n", "* If one argument is string and the other argument is integer, they are compared as decimals. This conversion was added in MariaDB 10.3.36. Prior to 10.3.36, this combination was compared as floating point values, which did not always work well for huge 64-bit integers because of a possible precision loss on conversion to double.\n", "* If a hexadecimal argument is not compared to a number, it is treated as a binary string.\n", - "* If a constant is compared to a TIMESTAMP or DATETIME, the constant is converted to a timestamp, unless used as an argument to the [[in|IN]] function.\n", + "* If a constant is compared to a TIMESTAMP or DATETIME, the constant is converted to a timestamp, unless used as an argument to the IN function.\n", "* In other cases, arguments are compared as floating point, or real, numbers.\n", "\n", "Note that if a string column is being compared with a numeric value, MariaDB will not use the index on the column, as there are numerous alternatives that may evaluate as equal (see examples below).\n", @@ -1083,7 +1171,7 @@ "Can you explain the results observed?\n", "\n", "references:\n", - "* NULLIF - https://mariadb.com/kb/en/nulif\n", + "* NULLIF - https://mariadb.com/kb/en/nullif\n", "* FIELD - https://mariadb.com/kb/en/field\n", "* operators - https://mariadb.com/kb/en/function-and-operator-reference/\n", "* case operator - https://mariadb.com/kb/en/case-operator/" @@ -1193,7 +1281,7 @@ "3rd Normal Form (3NF) etc may be gold standard.\n", "\n", "For the practictioners guide:\n", - "1. designed some tables,\n", + "1. design some tables,\n", "2. write some queries around them that match to what you are doing.\n", "3. If #2 was hard to do, or messy/complicated, your table structure should be different\n", "4. if performance matters a lot - prototype and test\n", @@ -1263,9 +1351,17 @@ " quantity_vailable INT UNSIGNED,\n", " PRIMARY KEY (supplier_id, product_id));\n", "\n", - "INSERT INTO Supplier(supplier_name, supplier_address) VALUES ('Joe', '12 Polloc st'), ('Suresh', '55 Oloc Rd'), ('Hanna', '1 Old Northern Rd');\n", - " INSERT INTO Product(product_name, product_description) VALUES ('Bananas', 'big yellow'), ('Apples', 'granny smith');\n", - "INSERT INTO Supplier_Product VALUES(1,1,0), (1,2,5), (2, 1, 4);" + "INSERT INTO Supplier(supplier_name, supplier_address) VALUES\n", + " ('Joe', '12 Polloc st'),\n", + " ('Suresh', '55 Oloc Rd'),\n", + " ('Hanna', '1 Old Northern Rd');\n", + " INSERT INTO Product(product_name, product_description) VALUES\n", + " ('Bananas', 'big yellow'),\n", + " ('Apples', 'granny smith');\n", + "INSERT INTO Supplier_Product VALUES\n", + " (1, 1, 0),\n", + " (1, 2, 5),\n", + " (2, 1, 4);" ] }, { @@ -1422,9 +1518,11 @@ ] }, { - "cell_type": "markdown", - "id": "3c77e14e-7d86-4352-8326-623fcb3f7578", + "cell_type": "code", + "execution_count": null, + "id": "f4231075-3f5f-4e76-b3c9-f5fae5344c9b", "metadata": {}, + "outputs": [], "source": [ "SELECT name, test, score, AVG(score) OVER (PARTITION BY name) \n", " AS average_by_name FROM student;" @@ -1936,7 +2034,7 @@ "metadata": {}, "source": [ "This shows:\n", - "* Innodb_buffer_pool_pages_data - number of pages (of 16k bytes in size by default), out of Innodb_buffer_pool_pages_total are used.\n", + "* **Innodb_buffer_pool_pages_data** - number of pages (of 16k bytes in size by default), out of **Innodb_buffer_pool_pages_total** are used.\n", "\n", "If very small %, then you have overallocated innodb buffer pool, (or this is idle and you aren't looking at anything meaningful).\n", "\n", @@ -1958,18 +2056,20 @@ "id": "78caf5b9-a7ef-40e5-8eb0-5f210f6b8425", "metadata": {}, "source": [ - "* Innodb_buffer_pool_read_requests - number of times SQL requests a page of data from innodb\n", - "* Innodb_buffer_pool_reads - number of time those requests made it to being a storage read.\n", + "* **Innodb_buffer_pool_read_requests** - number of times SQL requests a page of data from innodb\n", + "* **Innodb_buffer_pool_reads** - number of time those requests made it to being a storage read.\n", "\n", "note: At startup they are all going to be read from storage. Looking at this 10 minutes after start is pretty biased.\n", "\n", - "Rough metric, if Innodb_buffer_pool_reads/ Innodb_buffer_pool_read_requests < 1%, then that's the amount of storage reads.\n", + "Rough metric, if **Innodb_buffer_pool_reads** / **Innodb_buffer_pool_read_requests** < 1%, then that's the amount of storage reads.\n", "\n", "If its 10+ % and its a production workload that has been running for a while, then your buffer pool is too small.\n", "\n", "If its < 0.01% then its probably too big.\n", "\n", - "Changing the value \"SET GLOBAL innodb_buffer_pool = \"(size in bytes) - can be expression like \"12 * 1024 *1024 *1024\" (12GB);" + "Changing the value \"SET GLOBAL innodb_buffer_pool = \"(size in bytes) - can be expression like \"12 * 1024 *1024 *1024\" (12GB);\n", + "\n", + "Alternate on Linux 11.0 **[innodb_data_file_buffering=ON](https://mariadb.com/kb/en/innodb-system-variables/#innodb_data_file_buffering)**, <11.0 - **[innodb_flush_method=fsync](https://mariadb.com/kb/en/innodb-system-variables/#innodb_flush_method)** - with buffered pages OS acts as a cache." ] }, { @@ -1989,48 +2089,182 @@ "source": [ "### Innodb Log File Size\n", "\n", - "This is the Redo log. This records UPDATE/DELETE/INSERT data quickly to disk so its durable, and can return success quickly to application.\n", + "This is the Redo log. This is the ib_logfile* (s). These ensure that a power outage preserves chanages. Never delete these.\n", "\n", - "A post process called flushing moves the Redo log to the tablespaces of each table.\n", + "This records UPDATE/DELETE/INSERT data quickly to disk so its durable, and can return success quickly to application.\n", + "\n", + "Because changes are in the Redo Log and the InnoDB Buffer Pool, it can be handy if these are of the same size so that large bulk changes are limited by the InnoDB buffer pool rather than the log file.\n", "\n", "ref: https://mariadb.com/kb/en/innodb-system-variables/#innodb_log_file_size\n", "\n", - "Dynamic (since 10.11+ (ignoring EOL releases)).\n", + "Dynamic (since 10.11+ (ignoring end of life releases)).\n", "\n", "Default value 128M. If you are doing bulk updates or lots of them, this may not be enough. A too big value will mean crash recovery is slow.\n", "\n", - "ref: https://mariadb.com/kb/en/innodb-page-flushing\n", + "ref: https://mariadb.com/kb/en/innodb-page-flushing\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0e74b2cb-d47c-40b3-b198-022dd5146008", + "metadata": {}, + "outputs": [], + "source": [ + "show global status like \"Innodb_checkpoint_%\"" + ] + }, + { + "cell_type": "markdown", + "id": "c64c43f3-987b-4500-9e5f-9ce6ae9430ab", + "metadata": {}, + "source": [ + "This is a volume of data in bytes.\n", + "\n", + "If `Innodb_checkpoint_max_age` / `@@innodb_log_file_size` is close to 1, then innodb_log_file_size should be increased.\n", + "\n", + "But also check that Innodb_checkpoint_max_age is something that Innodb_checkpoint_age approaches semi frequently.\n", + "\n", + "Live example: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&from=now-24h&to=now\n", + "\n", + "ref:\n", + "* https://mariadb.com/kb/en/innodb-redo-log/#redo-log-group-capacity" + ] + }, + { + "cell_type": "markdown", + "id": "1b5e2375-8df1-4c7a-b921-89edc2636e6c", + "metadata": {}, + "source": [ + "### InnoDB Flushing\n", "\n", - "Set innodb_io_capacity to the IOPs capacity of the storage. This ensure that background flushing can occur without affecting reads from storage.\n", + "A background process called flushing moves the buffer pool changes to the tablespaces of each table.\n", "\n", - "ref: https://mariadb.com/kb/en/innodb-system-variables/#innodb_io_capacity\n", + "Triggered by:\n", + "* Adaptive flushing (enabled), then [innodb_adaptive_flushing_lwm](https://mariadb.com/kb/en/innodb-system-variables/#innodb_adaptive_flushing_lwm) is percentage of **innodb_log_file_size** to start flushing\n", + "* \n", + "Set **[innodb_io_capacity](https://mariadb.com/kb/en/innodb-system-variables/#innodb_io_capacity)** to the IOPs capacity of the storage. This ensure that background flushing can occur without affecting reads from storage.\n", "\n", + "ref:\n", "\n", - "https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&from=now-24h&to=now\n", + "* https://mariadb.com/kb/en/innodb-page-flushing/#configuring-the-innodb-io-capacity" + ] + }, + { + "cell_type": "markdown", + "id": "9081f291-cad1-4623-9e6f-4a312d9ad031", + "metadata": {}, + "source": [ + "## Tuning Connections\n", + "\n", + "* MariaDB maintains a thread pool of connections ready to process a connection\n", + "* MariaDB has a limit of max_connections that poses an upper limit of connections (doesn't apply to CONNECTION ADMIN granted users)\n", + "\n", + "Slow queries can impose pressure of the number of connections:\n", + "* A slow query ties up the use of a connection\n", + "* If it finished sooner then another client connection could be using it.\n", + "* because a concurrent web or application is imposing multiple queries at once, quick queries overall with reduce **max_connections** requirements.\n", "\n", - "ref: https://mariadb.com/kb/en/innodb-redo-log/#redo-log-group-capacity" + "After (or while if really pushed) solving slow queries\n", + "\n", + "ref: https://mariadb.com/kb/en/grant/#connection-admin" ] }, { "cell_type": "code", "execution_count": null, - "id": "0e74b2cb-d47c-40b3-b198-022dd5146008", + "id": "89c3de00-3608-46cc-bbb8-b34f6994b81d", "metadata": {}, "outputs": [], "source": [ - "show global status like \"Innodb_checkpoint_%\"" + "SHOW PROCESSLIST" ] }, { "cell_type": "markdown", - "id": "c64c43f3-987b-4500-9e5f-9ce6ae9430ab", + "id": "3e303df5-168f-4707-b9a2-1ce8b3204c68", "metadata": {}, "source": [ - "This is a volume of data in bytes.\n", + "Run this a few times to get the idea of how may active connections there are." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cef8d062-42d4-40c8-8ef7-6a9636c58b1a", + "metadata": {}, + "outputs": [], + "source": [ + "SHOW GLOBAL STATUS LIKE 'Max_used_connections'" + ] + }, + { + "cell_type": "markdown", + "id": "b58a6f62-cc8d-4ed8-b0a6-64658b5570f6", + "metadata": {}, + "source": [ + "This shows it as well,\n", + "\n", + "if this is the same as the **max_connections** system variable then its possible a stall of queries.\n", + "\n", + "potential causes:\n", + "* insufficient **innodb_buffer_pool_size** or **innodb_log_file_size**, (slow flushing as one cause)\n", + "* storage speed slowing flushing (check **innodb_io_capacity**),\n", + "* bulk large analytic queries (all the buffer pool pages are needed to preserve repeatable read)\n", + "* insufficient indexs on long open transactions causing REPEATIABLE READ pages to be fixed buffer.\n", + "* hot locked index caused delaying other queries.\n", + "\n", + "Resolve these first before deciding a **max_connections** value. A **max_connections** value needs to be supported by hardware (CPU/RAM) depends on your workload." + ] + }, + { + "cell_type": "markdown", + "id": "f003b977-ca58-4e71-a2e7-2819a6eccb97", + "metadata": {}, + "source": [ + "## Before Query Optimization\n", + "\n", + "Query Execution is based on how much of the table it will read.\n", + "\n", + "If the choice is between reading 30% of a index, and also needing to read the main table, it will ignore index.\n", + "\n", + "Also large table changes may impact results and the index statistics will not be current.\n", + "\n", + "ANALYZE TABLE analyzes and stores the key distribution for a table.\n", + "\n", + "Quick version:\n", + "\n", + " ANALYZE TABLE tbl PERSISTENT FOR ALL\n", + "\n", + "ref: https://mariadb.com/kb/en/analyze-table/" + ] + }, + { + "cell_type": "markdown", + "id": "eda009ce-046d-4e94-a5ef-36e509892f25", + "metadata": {}, + "source": [ + "## Optimizer switch\n", + "\n", + "There are large number of ways MariaDB can consider executing your query.\n", + "\n", + "Some are not enabled by default.\n", + "\n", + "Reasons for an optimizer plan to not be available:\n", + "* It was added after the MariaDB version was GA, e.g. [optimizer_switch=cset_narrowing](https://mariadb.com/kb/en/charset-narrowing-optimization/).\n", + "* Known to work really well in some circumstances, but choice of this optimization method can be suboptimizal in other circumstances.\n", + "\n", + "ref: https://mariadb.com/kb/en/optimizer-switch/\n", + "\n", + "Recommended use, enable for specific queries.\n", + "\n", + "As a session:\n", + "\n", + "SET SESSION optimizer_switch={optimization=on}\n", "\n", - "If `Innodb_checkpoint_max_age` / `@@innodb_log_file_size `is close to 1, then innodb_log_file_size should be increased.\n", + "Or statement:\n", "\n", - "Also check that Innodb_checkpoint_max_age is something that Innodb_checkpoint_age approaches semi frequently." + "SET STATEMENT optimizer_switch={optimization=on} FOR SELECT ....\n" ] } ],