Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: GREATEST() function call returns results in wrong datatype (compared to MySQL) #17750

Open
mcrauwel opened this issue Feb 12, 2025 · 3 comments · May be fixed by #17826
Open

Bug Report: GREATEST() function call returns results in wrong datatype (compared to MySQL) #17750

mcrauwel opened this issue Feb 12, 2025 · 3 comments · May be fixed by #17826

Comments

@mcrauwel
Copy link
Contributor

Overview of the Issue

Given the following table definitions

CREATE TABLE `table1` (
    `id` int unsigned NOT NULL,
    `join_id` bigint unsigned NOT NULL,
    `created_at` timestamp NOT NULL,
    PRIMARY KEY (`id`),
    KEY `idx_join_id` (`join_id`)
) ENGINE InnoDB,
  CHARSET utf8mb4,
  COLLATE utf8mb4_0900_ai_ci;

CREATE TABLE `table2` (
    `id` bigint unsigned NOT NULL,
    `join_id` int unsigned NOT NULL,
    `created_at` timestamp NOT NULL,
    `updated_at` timestamp NOT NULL,
    PRIMARY KEY (`id`),
    KEY `idx_join_id` (`join_id`)
) ENGINE InnoDB,
  CHARSET utf8mb4,
  COLLATE utf8mb4_0900_ai_ci;

Running a query like

SELECT
  GREATEST(`table1`.`created_at`, `table2`.`updated_at`) AS `created_at`
FROM `table1`
STRAIGHT_JOIN `table2`
  ON `table1`.`join_id` = `table2`.`join_id`
ORDER BY 1;

results in a different data-type returned by Vitess than in plain MySQL

for column created_at field types do not match
Not equal:
MySQL: TIMESTAMP
Vitess: VARCHAR
Query (SELECT
GREATEST(`table1`.`created_at`, `table2`.`updated_at`) AS `created_at`
FROM `table1`
STRAIGHT_JOIN `table2`
ON `table1`.`join_id` = `table2`.`join_id`
ORDER BY 1;) results mismatched.
Vitess Results:
[VARCHAR("2024-02-12 10:30:00")]
[VARCHAR("2024-02-12 10:31:00")]
[VARCHAR("2024-02-12 10:32:00")]
[VARCHAR("2024-02-12 10:33:00")]
[VARCHAR("2024-02-12 10:34:00")]
[VARCHAR("2024-02-12 10:35:00")]
[VARCHAR("2024-02-12 10:36:00")]
[VARCHAR("2024-02-12 10:37:00")]
[VARCHAR("2024-02-12 10:38:00")]
[VARCHAR("2024-02-12 10:39:00")]
[VARCHAR("2024-02-12 10:40:00")]
[VARCHAR("2024-02-12 10:41:00")]
[VARCHAR("2024-02-12 10:42:00")]
[VARCHAR("2024-02-12 10:43:00")]
[VARCHAR("2024-02-12 10:44:00")]
[VARCHAR("2024-02-12 10:45:00")]
[VARCHAR("2024-02-12 10:46:00")]
[VARCHAR("2024-02-12 10:47:00")]
[VARCHAR("2024-02-12 10:48:00")]
[VARCHAR("2024-02-12 10:49:00")]
Vitess RowsAffected: 0
MySQL Results:
[TIMESTAMP("2024-02-12 10:30:00")]
[TIMESTAMP("2024-02-12 10:31:00")]
[TIMESTAMP("2024-02-12 10:32:00")]
[TIMESTAMP("2024-02-12 10:33:00")]
[TIMESTAMP("2024-02-12 10:34:00")]
[TIMESTAMP("2024-02-12 10:35:00")]
[TIMESTAMP("2024-02-12 10:36:00")]
[TIMESTAMP("2024-02-12 10:37:00")]
[TIMESTAMP("2024-02-12 10:38:00")]
[TIMESTAMP("2024-02-12 10:39:00")]
[TIMESTAMP("2024-02-12 10:40:00")]
[TIMESTAMP("2024-02-12 10:41:00")]
[TIMESTAMP("2024-02-12 10:42:00")]
[TIMESTAMP("2024-02-12 10:43:00")]
[TIMESTAMP("2024-02-12 10:44:00")]
[TIMESTAMP("2024-02-12 10:45:00")]
[TIMESTAMP("2024-02-12 10:46:00")]
[TIMESTAMP("2024-02-12 10:47:00")]
[TIMESTAMP("2024-02-12 10:48:00")]
[TIMESTAMP("2024-02-12 10:49:00")]
MySQL RowsAffected: 0

As a work-around you could use a CAST() function to cast the result-type to DATETIME, for example:

SELECT
  CAST(GREATEST(`table1`.`created_at`, `table2`.`updated_at`) as DATETIME) AS `created_at`
FROM `table1`
STRAIGHT_JOIN `table2`
  ON `table1`.`join_id` = `table2`.`join_id`
ORDER BY 1;

Reproduction Steps

providing a vt test testcase

Testcase output:

➜  vitess-tester git:(main) ✗ ./vt test --sharded --number-of-shards 2 t/oss-bug-greatest-wrong-datatype.t.txt
starting sharded keyspace: 'mysqltest' with shards [-80 80-]
E0212 11:11:41.517472   15414 vtorc_process.go:105] configuration - {
	"instance-poll-time": "10h"
}
t/oss-bug-greatest-wrong-datatype.t.txt: ok! Ran 6 queries, 5 successfully and 1 failures take time 2.208355125 s

Error: some tests failed 😭
see errors in errors

Test case definition:

cat t/oss-bug-greatest-wrong-datatype.t.txt
CREATE TABLE `table1` (
    `id` int unsigned NOT NULL,
    `join_id` bigint unsigned NOT NULL,
    `created_at` timestamp NOT NULL,
    PRIMARY KEY (`id`),
    KEY `idx_join_id` (`join_id`)
) ENGINE InnoDB,
  CHARSET utf8mb4,
  COLLATE utf8mb4_0900_ai_ci;

CREATE TABLE `table2` (
    `id` bigint unsigned NOT NULL,
    `join_id` int unsigned NOT NULL,
    `created_at` timestamp NOT NULL,
    `updated_at` timestamp NOT NULL,
    PRIMARY KEY (`id`),
    KEY `idx_join_id` (`join_id`)
) ENGINE InnoDB,
  CHARSET utf8mb4,
  COLLATE utf8mb4_0900_ai_ci;


INSERT INTO `table1` (`id`, `join_id`, `created_at`) VALUES
(1, 1, '2024-02-12 10:00:00'),
(2, 2, '2024-02-12 10:01:00'),
(3, 3, '2024-02-12 10:02:00'),
(4, 4, '2024-02-12 10:03:00'),
(5, 5, '2024-02-12 10:04:00'),
(6, 6, '2024-02-12 10:05:00'),
(7, 7, '2024-02-12 10:06:00'),
(8, 8, '2024-02-12 10:07:00'),
(9, 9, '2024-02-12 10:08:00'),
(10, 10, '2024-02-12 10:09:00'),
(11, 11, '2024-02-12 10:10:00'),
(12, 12, '2024-02-12 10:11:00'),
(13, 13, '2024-02-12 10:12:00'),
(14, 14, '2024-02-12 10:13:00'),
(15, 15, '2024-02-12 10:14:00'),
(16, 16, '2024-02-12 10:15:00'),
(17, 17, '2024-02-12 10:16:00'),
(18, 18, '2024-02-12 10:17:00'),
(19, 19, '2024-02-12 10:18:00'),
(20, 20, '2024-02-12 10:19:00');

INSERT INTO `table2` (`id`, `join_id`, `created_at`, `updated_at`) VALUES
(101, 1, '2024-02-12 10:20:00', '2024-02-12 10:30:00'),
(102, 2, '2024-02-12 10:21:00', '2024-02-12 10:31:00'),
(103, 3, '2024-02-12 10:22:00', '2024-02-12 10:32:00'),
(104, 4, '2024-02-12 10:23:00', '2024-02-12 10:33:00'),
(105, 5, '2024-02-12 10:24:00', '2024-02-12 10:34:00'),
(106, 6, '2024-02-12 10:25:00', '2024-02-12 10:35:00'),
(107, 7, '2024-02-12 10:26:00', '2024-02-12 10:36:00'),
(108, 8, '2024-02-12 10:27:00', '2024-02-12 10:37:00'),
(109, 9, '2024-02-12 10:28:00', '2024-02-12 10:38:00'),
(110, 10, '2024-02-12 10:29:00', '2024-02-12 10:39:00'),
(111, 11, '2024-02-12 10:30:00', '2024-02-12 10:40:00'),
(112, 12, '2024-02-12 10:31:00', '2024-02-12 10:41:00'),
(113, 13, '2024-02-12 10:32:00', '2024-02-12 10:42:00'),
(114, 14, '2024-02-12 10:33:00', '2024-02-12 10:43:00'),
(115, 15, '2024-02-12 10:34:00', '2024-02-12 10:44:00'),
(116, 16, '2024-02-12 10:35:00', '2024-02-12 10:45:00'),
(117, 17, '2024-02-12 10:36:00', '2024-02-12 10:46:00'),
(118, 18, '2024-02-12 10:37:00', '2024-02-12 10:47:00'),
(119, 19, '2024-02-12 10:38:00', '2024-02-12 10:48:00'),
(120, 20, '2024-02-12 10:39:00', '2024-02-12 10:49:00');


SELECT
  GREATEST(`table1`.`created_at`, `table2`.`updated_at`) AS `created_at`
FROM `table1`
STRAIGHT_JOIN `table2`
  ON `table1`.`join_id` = `table2`.`join_id`
ORDER BY 1;

SELECT
  CAST(GREATEST(`table1`.`created_at`, `table2`.`updated_at`) as DATETIME) AS `created_at`
FROM `table1`
STRAIGHT_JOIN `table2`
  ON `table1`.`join_id` = `table2`.`join_id`
ORDER BY 1;

Binary Version

verified on v20


vttablet version Version: 20.0.6-SNAPSHOT (Git revision b71e9ac397f40807574534ce00ee50378c819d4d branch 'release-20.0') built on Wed Jan 22 15:41:39 CET 2025 by [email protected] using go1.22.11 darwin/arm64

Operating System and Environment details

MacOS Sequoia 15.3

Log Fragments

@mcrauwel mcrauwel added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Feb 12, 2025
@mattlord mattlord added Component: Query Serving and removed Needs Triage This issue needs to be correctly labelled and triaged labels Feb 12, 2025
@anshikavashistha
Copy link
Collaborator

@mattlord If you finds this issue suitable to me,I will be happy to work on this issue.
Thank you

@dbussink dbussink self-assigned this Feb 18, 2025
@dbussink dbussink linked a pull request Feb 19, 2025 that will close this issue
5 tasks
@dbussink
Copy link
Contributor

So there are essentially two problems here in one. The first is that we don't correctly handle temporal types (datetime, timestamp, date & time) correctly when they are compared in GREATEST and LEAST. We either end up doing varchar comparisons or can even panic in cases.

This problem is fixed in #17826.

Secondly, there's a separate problem here. For sharded environments, we end up sending down queries to the different shards and in cases like here we need to use a bind parameter to combine the values from different shards. This is visible in the query plan for this query:

select GREATEST(CAST(:table1_created_at AS DATETIME), table2.updated_at) as created_at from table2 where table2.join_id = CAST(:table1_join_id AS UNSIGNED)

What you can see here, is that we explicitly cast the bind variable as datetime and not as timestamp, which means the values here even after #17826 come out as datetime and not as timestamp. So why does it cast as datetime? First of all, we need to cast here since by default if we interpolate a bind variable, it would interpreted as varchar. That's the wrong type then.

But MySQL has no support for CAST(... AS TIMESTAMP). In fact, there is no possible way to create a timestamp literal or value in any way except for values directly loaded from a table. This means it's impossible here to return the timestamp type and returning a datetime is the best we can do here and the only option.

@dbussink
Copy link
Contributor

Also opened mysql/mysql-server#602 on MySQL because of a weird collation bug I uncovered while fixing the temporal comparison logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants