MDEV-39263 innodb_snapshot_isolation fails to prevent lost updates under contention by dr-m · Pull Request #4936 · MariaDB/server

dr-m · 2026-04-14T06:49:57Z

The Jira issue number for this PR is: MDEV-39263

Description

An UPDATE could fail to return the error ER_CHECKREAD when innodb_snapshot_isolation=ON and the transaction that had last modified the record was committed during the lock acquisition.

Thanks to @vadimtk of Percona for reporting this and @temeo for writing a determining DEBUG_SYNC test case, which was included in innodb.lock_isolation.

lock_clust_rec_read_check_and_lock(): Refine the check whether the transaction that last modified the record is still active. The only thing that should matter is whether we are allowed to see the record. If the implicit lock holder transaction was active and we succeeded in acquiring the lock, this means that the transaction had been committed. We must return DB_RECORD_CHANGED (ER_CHECKREAD) in that case. We want to avoid returning it before the lock wait, in case the other transaction will be rolled back.

How can this PR be tested?

mysql-test/mtr innodb.lock_isolation
mysql-test/mtr --parallel=auto --repeat=1000 innodb.snapshot_isolation_race{,,,}{,,,,,}

The failure of the latter test case, as simplified by me, was analyzed with https://rr-project.org and an additional patch of a build with cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DPLUGIN_PERFSCHEMA=NO that allows the race conditions to occur during the rr record tracing. Most likely, the patch to trx_t::commit_persist() was the essential one:

diff --git a/storage/innobase/btr/btr0pcur.cc b/storage/innobase/btr/btr0pcur.cc index c7fcfd205fb..b549b6e12df 100644
--- a/storage/innobase/btr/btr0pcur.cc
+++ b/storage/innobase/btr/btr0pcur.cc
@@ -317,7 +317,7 @@ btr_pcur_t::restore_position(btr_latch_mode restore_latch_mode, mtr_t *mtr)
 	page_cur_mode_t	mode;
 	page_cur_mode_t	old_mode;
 	mem_heap_t*	heap;
-
+	std::this_thread::yield(); ut_ad(mtr->is_active()); ut_ad(pos_state == BTR_PCUR_WAS_POSITIONED || pos_state == BTR_PCUR_IS_POSITIONED); diff --git a/storage/innobase/trx/trx0rec.cc b/storage/innobase/trx/trx0rec.cc index 5d854ebd5e5..d0c442a0f1f 100644
--- a/storage/innobase/trx/trx0rec.cc
+++ b/storage/innobase/trx/trx0rec.cc
@@ -1841,7 +1841,7 @@ trx_undo_report_row_operation(
 #ifdef UNIV_DEBUG
 	int		loop_count	= 0;
 #endif /* UNIV_DEBUG */
-
+	std::this_thread::yield(); ut_a(dict_index_is_clust(index)); ut_ad(!update || rec); ut_ad(!rec || rec_offs_validate(rec, index, offsets)); diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc index e2adc085c53..adc807e5943 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1578,7 +1578,7 @@ void trx_t::commit_persist() noexcept
 {
   mtr_t *mtr= nullptr;
   mtr_t local_mtr;
-
+  std::this_thread::yield(); if (has_logged_persistent()) { mtr= &local_mtr;

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

CLAassistant · 2026-04-14T06:50:03Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

dr-m · 2026-04-14T07:05:44Z

+	if (UNIV_UNLIKELY(trx_id != 0)
+	    && !trx_sys.is_registered(trx, trx_id)) {
+		err = DB_RECORD_CHANGED;
+	}


It turns out that we will need an additional condition err <= DB_SUCCESS_LOCKED_REC here. Otherwise, the mtr test in MDEV-39264 would crash a CMAKE_BUILD_TYPE=RelWithDebInfo build on shutdown:

2026-04-14 10:01:23 0 [Note] InnoDB: Buffer pool(s) dump completed at 260414 10:01:23 2026-04-14 10:01:23 0x7fc6340517c0 InnoDB: Assertion failure in file /mariadb/10.6/storage/innobase/trx/trx0trx.cc line 219 InnoDB: Failing assertion: trx->lock.wait_lock == NULL

I will revise this shortly.

mariadb-TeemuOllakka · 2026-04-15T12:51:54Z


+	if (UNIV_UNLIKELY(trx_id != 0)
+	    && err <= DB_SUCCESS_LOCKED_REC
+	    && !trx_sys.is_registered(trx, trx_id)) {


Based on my testing with MDEV-38977 nondeterministic tests the trx_sys.is_registered() may here sometimes return true after the transaction trx_id commits and cause this function to return false negative (success). I don't quite understand how this could happen as the locks are released after the transaction trx_id deregisters itself in commit_in_memory(). Would it be possible for trx_sys.is_registered() to return a false positive with unlatched access?

Also, if the transaction trx_id rolls back, wouldn't this return DB_RECORD_CHANGED also in that case?

First of all, this is the second check, which we are executing after we have either acquired a lock or enqueued a waiting lock request on the record on the index page on which we are holding a latch.

Let me answer your last question first. When a transaction is rolled back, its undo log will be applied backwards and an empty transaction committed, with undo_no=0. Because at this point we are holding a latch on this index page, a concurrent rollback of a transaction can’t modify this index page. Furthermore, if a rollback of a conflicting transaction were in progress, we would have err == DB_LOCK_WAIT here. We would only re-lookup the DB_TRX_ID if we successfully acquired a lock on the record. If we did, we know that the rollback would have completed. Any rollback would also have rolled back the DB_TRX_ID to an older value, already before we executed the first check. Hence, the trx_sys.is_registered(trx, trx_id) in our first check would have returned nullptr and we’d have either trx_id==0 here, or we would already have returned DB_RECORD_CHANGED in the first check.

Last, the first question. I assume that by "return true" you mean "return a non-null pointer". In this comment I wrote:

Here, the trx_sys_t::is_registered() is returning a pointer to the transaction, even though it had been committed and the trx->id zeroed out after element->mutex.wr_unlock() (inside the trx_sys.is_registered()) had released the latch. Because we had not asked for a reference count to be acquired, the transaction may very well be stale (committed) at that point.

Again, the code that you quoted is the second check. This second check will only executed (trx_id!=0) if trx_sys.is_registered(trx, trx_id) indicated that the trx_id was active before we attempted to acquire a lock on the record. The condition err <= DB_SUCCESS_LOCKED_REC only holds if we succeeded in acquiring the lock. We can only succeed if the trx_id transaction had been committed by this point. The reason is that if DB_TRX_ID points to an active transaction, that transaction must hold an exclusive lock on the record. An exclusive lock would conflict with any lock (shared or exclusive). If we reach this second trx_sys.is_registered(trx, trx_id) call, we know that the other transaction had been committed (not rolled back; our page latch would block that!).

The answer to your first question appears to be yes: we may miss to assign err = DB_RECORD_CHANGED if we find a transaction that has recently been committed. To prevent a false positive, we must check the state:

if (UNIV_LIKELY(trx_id == 0) || err > DB_SUCCESS_LOCKED_REC) { } else if (trx_t *impl_trx = trx_sys.find(trx, trx_id, false) { /* impl_trx could have been committed before we acquire its mutex, but not thereafter. */ impl_trx->mutex_lock(); const trx_state_t state{impl_trx->state}; impl_trx->mutex_unlock(); if (state != TRX_STATE_COMMITTED_IN_MEMORY) { err = DB_RECORD_CHANGED; } } else { err = DB_RECORD_CHANGED; }

I modelled this after lock_rec_queue_validate(). It doesn’t seem to be necessary to acquire a reference on the transaction object after all, because we never shrink trx_pools.

Would this prevent the anomaly that you are observing?

I was thinking through this once more. I couldn’t explain to myself why the second lookup for trx_id would really help things. I was trying to polish the above suggested code change like this:

if (err > DB_SUCCESS_LOCKED_REC) { /* We failed to acquire a lock, either indefinitely or temporarily. In the latter case, after the DB_LOCK_WAIT is resolved, this function will be re-executed. */ } else if (UNIV_LIKELY(trx_id == 0)) { /* No need to check for snapshot isolation violation */ } else if (ut_d(trx_t *impl_trx =) trx_sys.find(trx, trx_id, ut_d(true ||) false)) { /* Throughout this function, we are holding block->page.lock on the index page of rec. A concurrent rollback of the transaction that last modified rec is impossible, because it would involve changing rec. Because we successfully acquired a lock on rec, the transaction of trx_id must have been committed. Because our lookup of impl_trx succeeded, trx_sys.deregister_rw() must not have been completed in trx_t::commit_in_memory() yet. For our lock_rec_lock() to be successful, impl_trx must only have held an implicit lock on rec, which must have been released by a change of the state. The function trx_t::release_locks() should not have started executing either. Let us check all this. */ ut_d(impl_trx->mutex_lock()); ut_ad(impl_trx->id == trx_id); ut_ad(impl_trx->state == TRX_STATE_COMMITTED_IN_MEMORY); ut_d(impl_trx->mutex_unlock()); ut_d(impl_trx->release_reference()); /* FIXME: Should we not recheck trx->read_view.changes_visible(trx_id)? */ } else { err = DB_RECORD_CHANGED; }

This did not make sense to me. Why would we need the first trx_sys lookup either? Why should the exact commit time of the trx_id holder matter? The only thing that should matter is whether we are allowed to see the record in the first place:

diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc index c5b83b5be6c..d915fe4896d 100644 --- a/storage/innobase/lock/lock0lock.cc +++ b/storage/innobase/lock/lock0lock.cc @@ -6326,8 +6326,7 @@ lock_clust_rec_read_check_and_lock( && trx->read_view.is_open()) { trx_id_t trx_id= trx_read_trx_id(rec + row_trx_id_offset(rec, index)); - if (!trx_sys.is_registered(trx, trx_id) - && !trx->read_view.changes_visible(trx_id) + if (!trx->read_view.changes_visible(trx_id) && IF_WSREP(!(trx->is_wsrep() && wsrep_thd_skip_locking(trx->mysql_thd)), true)) { return DB_RECORD_CHANGED;

In fact, the included test is passing for me if I apply only this code change.

This simple fix would make the test innodb.lock_isolation fail in the following test scenario:

--echo # Case 2: Transaction A modifies a record, transaction B with snapshot --echo # isolation level is blocked by A, then A is rolled back. --echo # Expected behaviour: B continues execution.

…der contention lock_clust_rec_read_check_and_lock(): Refine the check whether the transaction that last modified the record is still active. The only thing that should matter is whether we are allowed to see the record. If the implicit lock holder transaction was active and we succeeded in acquiring the lock, this means that the transaction had been committed. We must return DB_RECORD_CHANGED (ER_CHECKREAD) in that case. We want to avoid returning it before the lock wait, in case the other transaction will be rolled back. Thanks to Vadim Tkachenko of Percona for reporting this bug, as well as Kyle Kingsbury for the broader testing that led to this finding. Vadim's test case was simplified by me and the root cause analyzed with https://rr-project.org and an additional patch that added std::this_thread::yield() at the start of trx_t::commit_persist(). An even better spot should have been right after the call to trx_t::commit_state(). The addition to the test innodb.lock_isolation is based on the work by Teemu Ollakka, which was essential for refining this fix.

dr-m requested review from Thirunarayanan and temeo April 14, 2026 06:49

dr-m self-assigned this Apr 14, 2026

dr-m added the MariaDB Corporation label Apr 14, 2026

dr-m commented Apr 14, 2026

View reviewed changes

dr-m force-pushed the MDEV-39263 branch from 56a05f7 to 174930c Compare April 14, 2026 07:08

mariadb-TeemuOllakka reviewed Apr 15, 2026

View reviewed changes

dr-m force-pushed the MDEV-39263 branch from 174930c to 99f6173 Compare April 16, 2026 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MDEV-39263 innodb_snapshot_isolation fails to prevent lost updates under contention#4936

MDEV-39263 innodb_snapshot_isolation fails to prevent lost updates under contention#4936
dr-m wants to merge 1 commit into10.6from
MDEV-39263

dr-m commented Apr 14, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 14, 2026

Uh oh!

dr-m Apr 14, 2026

Uh oh!

mariadb-TeemuOllakka Apr 15, 2026 •

edited

Loading

Uh oh!

dr-m Apr 15, 2026

Uh oh!

dr-m Apr 16, 2026

Uh oh!

dr-m Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

dr-m commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How can this PR be tested?

Basing the PR against the correct MariaDB version

Uh oh!

CLAassistant commented Apr 14, 2026

Uh oh!

dr-m Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

mariadb-TeemuOllakka Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dr-m Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

dr-m Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

dr-m Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

dr-m commented Apr 14, 2026 •

edited

Loading

mariadb-TeemuOllakka Apr 15, 2026 •

edited

Loading