MDEV-39261 MariaDB crash on startup in presence of indexed virtual columns by Thirunarayanan · Pull Request #4914 · MariaDB/server

Thirunarayanan · 2026-04-08T07:00:31Z

Problem:

A single InnoDB purge worker thread can process undo logs from different tables within the same batch. But get_purge_table(), open_purge_table() incorrectly assumes that a 1:1 relationship between a purge worker thread and a table within a single batch. Based on this wrong assumtion, InnoDB attempts to reuse TABLE objects cached in thd->open_tables for virtual column computation.

Purge worker opens Table A and caches the TABLE pointer in thd->open_tables. 2) Same purge worker moves to Table B in the same batch, get_purge_table() retrieves the cached pointer for Table A instead of opening Table B. 3) Because innobase::open() is ignored for Table B, the virtual column template is never initialized.
virtual column computation for Table B aborts the server

Solution:

get_purge_table(): Accept the specific db_name and table_name associated with the current undo log record. Compare the db_name and table_name against the existing cached TABLE objects. If it is match then return the cached table object.

CLAassistant · 2026-04-08T07:01:56Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

dr-m

Thank you, great work in reproducing the error. I found some ways to simplify this logic further. I think that we should invoke open_purge_table() for each dict_table_t::id only once per purge batch, in trx_purge_table_acquire(), provided that indexed virtual columns exist. In that way, when we are purging individual undo log records, we are guaranteed to have a TABLE* handle available.

dr-m · 2026-04-13T05:04:35Z

+	trx_purge() after all workers complete. */
+       THD* coordinator_thd= nullptr;


This is mixing TAB and space indentation.

I wonder if we truly need this field. Can we get the necessary information via MDL_ticket::get_ctx()?

When innodb_purge_threads=1, the coordinator acts as both coordinator and
worker, calling srv_purge_worker_task_low() directly. In this case, it
processes purge_node_t entries and would normally call
innobase_reset_background_thd() in purge_node_t::end(), which would
prematurely close all open TABLE* objects via close_thread_tables().
This would break the batch processing since tables need to remain open
until all purge_node_t entries are processed.

Worker threads (when innodb_purge_threads > 1) have their own THD
lifecycle managed by acquire_thd()/release_thd() and must call
innobase_reset_background_thd() in purge_node_t::end() to clean up
their thread-local resources.

By comparing the current THD against coordinator_thd, purge_node_t::end()
can skip the premature cleanup for the coordinator while still allowing
workers to properly clean up their resources. The coordinator's cleanup
happens centrally in trx_purge() after all nodes complete

dr-m · 2026-04-13T05:14:01Z

+  if (thd != purge_sys.coordinator_thd)
+    innobase_reset_background_thd(thd);


Do we really need this call here, or could we avoid adding purge_sys.coordinator_thd and invoke this only in trx_purge_close_tables()? Note: The comment of that function incorrectly hints that it could be called by purge workers. That needs to be corrected; it is only called by the purge coordinator task.

Yes, we need this here. mentioned the reason in the above comment.

dr-m · 2026-04-13T05:57:51Z

-    if (!table)
-      return nullptr;
-    /* At this point, the freshly loaded table may already have been evicted.
-    We must look it up again while holding a shared dict_sys.latch.  We keep
-    trying this until the table is found in the cache or it cannot be found
-    in the dictionary (because the table has been dropped or rebuilt). */


It is a little hard to review this change, because some code was shuffled around. Can you make sure that a minimal diff will be displayed for the function trx_purge_table_open()? I can see that we removed the comment and added a dereferencing of table after it. I will post a separate comment on the moved snippet where this comment had been omitted.

I realized that the comment is at the end of a for (;;) loop body. So, the next iteration of the loop should look up the freshly loaded table.

dr-m · 2026-04-13T06:03:13Z

+    dict_sys.lock(SRW_LOCK_CALL);
+    table= dict_load_table_on_id(table_id, DICT_ERR_IGNORE_FK_NOKEY);
+    dict_sys.unlock();
+    if (!table)
+      return nullptr;
+  }

  if (!table->is_readable() || table->corrupted)


In this function trx_purge_table_open(), which the patch is moving elsewhere, so that all of the code will be indicated as new here, there used to be a comment after the !table check, saying that the freshly loaded table could already be evicted at that point, and therefore it is not safe to access the table without looking it up first. Now the comment is gone, and we are dereferencing table in an unsafe way.

Can dict_load_table_on_id() ever return an unreadable or corrupted table, or would it return nullptr in that case? I wonder if the condition could be relocated.

However, for the rest of the function, the same problem exists. Apparently, the subsequent code is assuming that we are holding dict_sys.freeze(). That is not the case if we had invoked dict_load_table_on_id().

I realized that the table would be latched in the next iteration of this for (;;) loop. So, there is no correctness issue. Only the comment was being removed, making it harder to understand the logic.

dr-m

Please include the following changes in your next iteration:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index f01ff8f3dcb..7f95cbbdaea 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -5855,6 +5855,9 @@ static void initialize_auto_increment(dict_table_t *table, const Field& field,
 @retval	0	on success */
 int
 ha_innobase::open(const char* name, int, uint)
+{ return open(name, false); }
+
+int ha_innobase::open(const char *name, bool for_vc_purge)
 {
 	char			norm_name[FN_REFLEN];
 
@@ -6075,8 +6078,6 @@ ha_innobase::open(const char* name, int, uint)
 	/* Index block size in InnoDB: used by MySQL in query optimization */
 	stats.block_size = static_cast<uint>(srv_page_size);
 
-	const my_bool for_vc_purge = THDVAR(thd, background_thread);
-
 	if (for_vc_purge || !m_prebuilt->table
 	    || m_prebuilt->table->is_temporary()
 	    || m_prebuilt->table->persistent_autoinc
@@ -19921,7 +19922,6 @@ static struct st_mysql_sys_var* innobase_system_variables[]= {
   MYSQL_SYSVAR(default_encryption_key_id),
   MYSQL_SYSVAR(immediate_scrub_data_uncompressed),
   MYSQL_SYSVAR(buf_dump_status_frequency),
-  MYSQL_SYSVAR(background_thread),
   MYSQL_SYSVAR(encrypt_temporary_tables),
 
   NULL

The new function ha_innobase::open(name, true) would only be invoked from open_purge_table(). I wonder what would happen if the table is partitioned. It could be simpler to introduce an uint test_if_locked flag that would indicate that we want to skip the initialize_auto_increment() and info() calls. In 10.6, the last used flag is

#define HA_OPEN_GLOBAL_TMP_TABLE	(1U << 14) /* TMP table used by repliction */

In main the last one seems to be the following:

#define HA_OPEN_DATA_READONLY           (1U << 17) /* Use readonly for data */

My revised idea is that open_purge_table() would be the only place to set a new flag HA_OPEN_FOR_INNODB_PURGE, which would only be checked by ha_innobase::open(const char*,int,uint).

dr-m · 2026-04-14T09:52:24Z

+  /* Open MariaDB TABLE for tables with indexed virtual columns */
+  if (*mdl && table->has_virtual_index())


This condition at the end of trx_purge_table_open() is what we actually have to add. GitHub is displaying it in two parts, because you had merged trx_purge_table_acquire() into that function. I will quote it in full to make my comment more readable:

/* Open MariaDB TABLE for tables with indexed virtual columns */ if (*mdl && table->has_virtual_index()) { *maria_table= open_purge_table(current_thd, db_buf, db_len, tbl_buf, tbl_len, *mdl); if (*maria_table && table->vc_templ) table->vc_templ->mysql_table= *maria_table; } return table; }

There is a mismatch of concepts, but it existed before this change. table is InnoDB table metadata dict_table_t, while maria_table is includes a pointer to maria_table->s, which is the MariaDB table metadata TABLE_SHARE. As far as I can tell, the dict_table_t::vc_templ will continue to exist until the table is removed from the cache or some table-rebuilding DDL is executed. Also, unless I missed something, the table->vc_templ->mysql_table that we are assigning would remain cached for the remaining lifetime of table->vc_templ (until the table is evicted or rebuilt).

Is it possible to access table->vc_templ->mysql_table from multiple threads? What is the lifetime of that field? What happens to it after the table handle is closed? How would the build fail if we removed that field?

diff --git a/storage/innobase/include/dict0mem.h b/storage/innobase/include/dict0mem.h index 0ab4e4d1fda..29dbb1185af 100644 --- a/storage/innobase/include/dict0mem.h +++ b/storage/innobase/include/dict0mem.h @@ -1865,9 +1865,6 @@ struct dict_vcol_templ_t { /** default column value if any */ byte* default_rec; - /** cached MySQL TABLE object */ - TABLE* mysql_table; - /** when mysql_table was cached */ uint64_t mysql_table_query_id;

The build would fail for the assignment here as well as for the read and write in innodb_find_table_for_vc(). The mysql_table_query_id is what attempts to protect us from unsafe access.

There appears to be a race condition in innodb_find_table_for_vc(). I think that we must protect the contents of table->vc_templ with table->lock_mutex_lock() in both places to avoid a race between reading and assigning the fields.

Thirunarayanan · 2026-04-15T09:55:01Z

Raised MDEV-39340 to remove background_threads variable
usage from ha_innodb.cc

dr-m

Before my next review, it would be useful to run some stress tests on this, with a DDL heavy workload that is frequently rebuilding tables. That is, TRUNCATE TABLE or OPTIMIZE TABLE.

dr-m · 2026-04-16T09:22:37Z

-    for (auto &t : node->tables)
+    for (auto it = node->tables.begin(); it != node->tables.end(); )
    {
-      if (t.second.first)
+      it->second= trx_purge_table_open(it->first, mdl_context);
+      if (!it->second.table)
      {
-        t.second.first= trx_purge_table_open(t.first, mdl_context,
-                                             &t.second.second);
-        if (t.second.first == reinterpret_cast<dict_table_t*>(-1))
-        {
-          if (table)
-            dict_table_close(table, false, thd, *mdl);
-          goto retry;
-        }
+        it = node->tables.erase(it);
+        continue;


I spent quite a bit of time trying to understand why we’d want to erase the element. I suggest a comment before that (and fixing the white space):

if (!it->second.table) { /* The table had been dropped or rebuilt. Skip the purge. */ it= node->tables.erase(it); continue; }

However, see my other comment about trx_purge_attach_undo_recs(). I think that we do want to retain entries with table==nullptr in order to avoid repeated invocations of trx_purge_table_open() for the same table_id that refers to a dropped or relocated table.

Because there is no good reason for using an iterator, I’d request reverting the loop to the "range for" form.

Furthermore, it seems that we are losing the dict_table_close() call. I think that to prevent that, we must save the old value before it is being overwritten by the must_wait: return from trx_purge_table_acquire(), something like this:

for (auto &t : node->tables) { if (dict_table_t *table= t.second.table) { t.second= trx_purge_table_open(it->first, mdl_context); if (!t.second.must_wait()) { dict_table_close(table, false, thd, *mdl); goto retry; }

This needs to be stress-tested with a workload that is frequently rebuilding or dropping tables.

The current implementation does handle it correctly by:

Calling trx_purge_close_tables(thd) which closes all dict_table_t* from all nodes
Then attempting to reopen tables
If must_wait, the old tables are already closed, so no missing of dict_table_close(table)

retry: trx_purge_close_tables(thd); /* Closes ALL dict_table_t*, sets pt.table=nullptr */ /* At this point, all node->tables entries have pt.table = nullptr */ purge_table pt= trx_purge_table_open(id, mdl_context); if (pt.must_wait()) goto retry; /* Safe - no dict_table_t* to leak */ for (auto it = node->tables.begin(); it != node->tables.end(); ) { /* it->second.table is nullptr here (already closed above) */ it->second= trx_purge_table_open(it->first, mdl_context); if (it->second.must_wait()) /* Safe - old table was already closed and new table will be closed again in trx_purge_close_tables() */ goto retry; }

trx_purge_close_tables() sets pt.table = nullptr , so when we overwrite it->second, we're only overwriting a purge_table with table=nullptr, not losing a valid pointer.

…lumns Problem: ======== A single InnoDB purge worker thread can process undo logs from different tables within the same batch. But get_purge_table(), open_purge_table() incorrectly assumes that a 1:1 relationship between a purge worker thread and a table within a single batch. Based on this wrong assumtion, InnoDB attempts to reuse TABLE objects cached in thd->open_tables for virtual column computation. 1) Purge worker opens Table A and caches the TABLE pointer in thd->open_tables. 2) Same purge worker moves to Table B in the same batch, get_purge_table() retrieves the cached pointer for Table A instead of opening Table B. 3) Because innobase::open() is ignored for Table B, the virtual column template is never initialized. 4) virtual column computation for Table B aborts the server Solution: ======== - Introduced purge_table class which has the following purge_table: Stores either TABLE* (for tables with indexed virtual columns) or MDL_ticket* (for tables without) in a single union using LSB as a flag. For tables with indexed virtual columns: opens TABLE*, accesses MDL_ticket* via TABLE->mdl_ticket For tables without indexed virtual columns: stores only MDL_ticket*. trx_purge_attach_undo_recs(): Coordinator opens both dict_table_t* and TABLE* with proper MDL protection. Workers access cached table pointers from purge_node_t->tables without opening their own handles purge_sys.coordinator_thd: Distinguish coordinator from workers in cleanup logic. Skip innobase_reset_background_thd() for coordinator thread to prevent premature table closure during batch processing. Workers still call cleanup to release their thread-local resources trx_purge_close_tables(): Rewrite for purge coordinator thread 1) Close all dict_table_t* objects first 2) Call close_thread_tables() once for all TABLE* objects 3) Release MDL tickets last, after tables are closed Added table->lock_mutex protection when reading (or) writing vc_templ->mysql_table and mysql_table_query_id. Clear cached TABLE* pointers before closing tables to prevent stale pointer access Declared open_purge_table() and close_thread_tables() in trx0purge.cc Declared reset_thd() in row0purge.cc and dict0stats_bg.cc. Removed innobase_reset_background_thd()

Thirunarayanan requested a review from dr-m April 8, 2026 07:00

Thirunarayanan added the MariaDB Corporation label Apr 8, 2026

dr-m reviewed Apr 8, 2026

View reviewed changes

Comment thread storage/innobase/handler/ha_innodb.cc Outdated

Comment thread storage/innobase/handler/ha_innodb.cc

Comment thread storage/innobase/handler/ha_innodb.cc Outdated

Comment thread storage/innobase/handler/ha_innodb.cc Outdated

Thirunarayanan force-pushed the MDEV-39261 branch 6 times, most recently from 60d1668 to 227427a Compare April 9, 2026 11:15

Thirunarayanan requested a review from dr-m April 9, 2026 11:58

dr-m reviewed Apr 13, 2026

View reviewed changes

Thirunarayanan force-pushed the MDEV-39261 branch from 227427a to c69ba83 Compare April 13, 2026 19:55

Thirunarayanan requested a review from dr-m April 13, 2026 19:57

Thirunarayanan force-pushed the MDEV-39261 branch 2 times, most recently from 11a7390 to 3ef5939 Compare April 14, 2026 03:33

dr-m reviewed Apr 14, 2026

View reviewed changes

Thirunarayanan force-pushed the MDEV-39261 branch from 3ef5939 to d6ae8ce Compare April 15, 2026 07:43

Thirunarayanan requested a review from dr-m April 15, 2026 09:46

dr-m reviewed Apr 15, 2026

View reviewed changes

Thirunarayanan force-pushed the MDEV-39261 branch from d6ae8ce to 9613ad4 Compare April 15, 2026 17:12

Thirunarayanan requested a review from dr-m April 15, 2026 17:17

dr-m reviewed Apr 16, 2026

View reviewed changes

Thirunarayanan force-pushed the MDEV-39261 branch from 9613ad4 to 5277b0b Compare April 16, 2026 14:40

		trx_purge() after all workers complete. */
		THD* coordinator_thd= nullptr;

		if (thd != purge_sys.coordinator_thd)
		innobase_reset_background_thd(thd);

		/* Open MariaDB TABLE for tables with indexed virtual columns */
		if (*mdl && table->has_virtual_index())

Uh oh!

Conversation

Thirunarayanan commented Apr 8, 2026

Problem:

Solution:

Uh oh!

CLAassistant commented Apr 8, 2026

Uh oh!

dr-m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dr-m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Thirunarayanan commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dr-m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone