Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#26192] YSQL: flaky test: TestPgRegressMisc.testPgRegressMiscSerial3
Summary: The test `TestPgRegressMisc.testPgRegressMiscSerial3` is flaky (in particular tsan and asan) and when it fails, we see an error like ``` 21:26:55.191 (main) [ERROR - org.yb.BaseYBTest$1$2.logEventDetails(BaseYBTest.java:243)] YB Java test failed: class="org.yb.pgsql.TestPgRegressMisc", method="testPgRegressMiscSerial3" org.junit.internal.runners.model.MultipleFailureException: There were 2 errors: java.lang.AssertionError(pg_regress exited with error code: 1, failed tests: [yb_create_table_like]) com.yugabyte.util.PSQLException(ERROR: Cannot delete non-empty tablegroup, table 000034cb0000300080000000000040a5 is not deleted) 21:26:55.194 (main) [INFO - org.yb.BaseYBTest$1$2.logEventDetails(BaseYBTest.java:250)] YB Java test class="org.yb.pgsql.TestPgRegressMisc", method="testPgRegressMiscSerial3" took 331.45 seconds ``` After debugging, I found that the table `000034cb0000300080000000000040a5` is created as an index of a base table in a table group. When it is deleted, the relevant code is ``` auto colocated_tablet = table.table_info_with_write_lock->GetColocatedUserTablet(); if (colocated_tablet) { // TryRemoveFromTablegroup only affects tables that are part of some tablegroup. // We directly remove it from tablegroup no matter if it is retained by snapshot schedules. RETURN_NOT_OK(TryRemoveFromTablegroup(table.table_info_with_write_lock->id())); // Send a RemoveTableFromTablet() request to each // colocated parent tablet replica in the table. ``` The code removes the table from a possible containing tablegroup only when it still has a tablet. Because it is an index of a base table, when the base table is deleted, we also delete the index table. In a race condition, another thread has already invoked `table->ClearTabletMaps` so `colocated_tablet` is nullptr. As a result `TryRemoveFromTablegroup` isn't invoked. So `000034cb0000300080000000000040a5` is left in the tablegroup's in-memory data structure. Later when we try to delete the tablegroup, we hit the error `Cannot delete non-empty tablegroup, table 000034cb0000300080000000000040a5 is not deleted`. To fix this bug, I made a change to use condition `IsColocatedUserTable()` instead of the current `colocated_tablet` being not null to invoke ``` RETURN_NOT_OK(TryRemoveFromTablegroup(table.table_info_with_write_lock->id())); ``` In this way we avoid the above error. Test Plan: ./yb_build.sh tsan --java-test org.yb.pgsql.TestPgRegressMisc#testPgRegressMiscSerial3 -n 100 --tp 1 Reviewers: hsunder, zdrudi Reviewed By: zdrudi Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D42167
- Loading branch information