Part 1 – Setting the Foundation

Introduction
Understanding the InnoDB Architecture
Meet Our Transaction

Part 2 – The Write Path

The Buffer Pool – Where Every Transaction Begins
Redo Log – Protecting Transactions Before the Data Reaches Disk
Undo Log – Preserving the Past Before Changing the Present
COMMIT – When Does a Transaction Actually Become Successful?

Part 3 – After COMMIT

Life After COMMIT – The Transaction Isn’t Finished Yet
Background Threads – The Invisible Workers Inside InnoDB
Checkpoint – Keeping Redo Logs Reusable and Crash Recovery Efficient
Crash Recovery – How InnoDB Survives Unexpected Failures

Part 4 – The Read Path

The Read Path – Why Can Two Identical SELECT Statements Perform Differently?
Multi-Version Concurrency Control (MVCC) – Reading Without Blocking Writes
History List Length – When Can InnoDB Finally Forget the Past?

Part 5 – Applying the Knowledge

Connecting Production Symptoms to InnoDB Internals
Following One Transaction from Beginning to End

Author’s Note

This article is based on our recent webinar, “InnoDB Internals: Following One Transaction from INSERT to Crash Recovery.”

It covers the complete transaction journey inside InnoDB, along with additional explanations, practical DBA insights, and production-oriented examples that could not be explored in detail during the live session.

Although this article is longer than a typical blog post, it is intended to serve as a lasting reference that you can revisit whenever you want to understand how InnoDB works internally.

We hope you find it valuable and thank you for taking the time to read it.

Every MySQL DBA knows that InnoDB is the default storage engine and that it provides full ACID compliance. We also encounter terms such as Buffer Pool, Redo Log, Undo Log, MVCC, Checkpoint, Dirty Pages, and Crash Recovery almost every day.

But here’s an interesting question:

How do all these components actually work together when a transaction is executed?

A much better approach is to follow one transaction.

From the application’s perspective, it is just another SQL statement. The application sends the request, receives COMMIT successful, and moves on.

Behind that seemingly simple operation, however, InnoDB performs a remarkable sequence of internal activities.

In this article, we will follow a single INSERT statement from the moment it enters the InnoDB storage engine until it safely reaches persistent storage. Along the way, we’ll understand how the core internal components work together to provide the ACID guarantees that modern applications depend on every day.

By the end of this article, terms like Buffer Pool, Redo Log, Undo Log, MVCC, Checkpoint, and Crash Recovery will no longer feel like independent concepts. Instead, you’ll see them as different checkpoints in the journey of a single transaction—and that perspective makes understanding InnoDB significantly easier.

Understanding the InnoDB Architecture

Before we follow our transaction, let’s first understand the environment in which it operates.

Every SQL statement issued by an application passes through multiple layers inside MySQL before it eventually reaches persistent storage. Although these layers work together seamlessly, each one has a distinct responsibility.

At the highest level, the InnoDB storage engine can be viewed as three major layers:

SQL Processing Layer
In-Memory Components
Persistent Storage

Understanding these layers makes it much easier to visualize the journey of a transaction.

The SQL Processing Layer

Every request begins here.

When an application executes an SQL statement, MySQL first parses the statement to validate its syntax. The optimizer then evaluates multiple execution strategies and chooses the most efficient execution plan. Finally, the executor hands the request over to the InnoDB storage engine.

From this point onward, transaction processing becomes InnoDB’s responsibility.

The In-Memory Layer

This is where almost every transaction spends most of its life.

The most important memory structure is the Buffer Pool, which caches data and index pages to minimize physical disk I/O. Rather than reading and writing directly to storage, InnoDB attempts to perform almost all operations within this memory area.

Alongside the Buffer Pool are several other internal structures that support transaction processing.

The Redo Log Buffer temporarily stores redo records before they are persisted to the Redo Log Files.

The Undo Log preserves previous row versions, enabling transaction rollback and Multi-Version Concurrency Control (MVCC).

InnoDB also includes intelligent optimization mechanisms such as the Change Buffer, which can defer eligible secondary index updates, and the Adaptive Hash Index, which accelerates repeated index lookups for suitable workloads.

Behind these memory structures, multiple background threads continuously perform maintenance activities including page flushing, redo writing, purge processing, and checkpoint management. These activities happen transparently while user transactions continue executing.

The Persistent Storage Layer

Eventually, every committed transaction must become durable.

Persistent storage consists of several components that work together to achieve this.

User data is stored inside InnoDB Tablespaces.

Redo information is persisted inside the Redo Log Files, protecting committed transactions against server crashes.

Historical row versions are maintained within Undo Tablespaces, allowing rollback operations and consistent reads until they are no longer required.

Although these storage structures ultimately hold durable data, it is important to remember that transactions rarely interact with them directly. Most processing occurs in memory first, with background threads gradually transferring changes to persistent storage.

A Layered Design with a Purpose

This layered architecture is one of the primary reasons InnoDB performs so well under modern OLTP workloads.

Memory provides speed.

Redo Logging provides durability.

Undo enables rollback and consistent reads.

Background threads move expensive maintenance work away from foreground transactions.

Together, these components allow InnoDB to deliver high performance, high concurrency, and reliable crash recovery without sacrificing transactional consistency.

Now that we understand the architecture, let’s create a simple transaction and follow its journey through each of these components.

Meet Our Transaction

Now that we’ve explored the overall architecture, it’s time to follow a real transaction through the InnoDB storage engine.

For the rest of this article, we’ll use a simple INSERT statement as our example.

INSERT INTO orders VALUES (42, 7, 199.00);

From the application’s perspective, this transaction is extremely simple.

The SQL statement is sent to MySQL.

A few milliseconds later, the application receives:

COMMIT successful

The transaction is complete.

Or is it?

Behind those two words, “COMMIT successful,” InnoDB has already performed a remarkable amount of work.

Before the transaction can be acknowledged, it passes through multiple internal components, each with a specific responsibility.

Some components improve performance by reducing disk I/O.

Some ensure that committed transactions survive server crashes.

Some preserve previous row versions to support rollback and concurrent reads.

Others continue working even after the application has finished, quietly flushing pages, cleaning obsolete versions, and preparing the system for future recovery.

Every production issue we investigate as DBAs—whether it is high disk IOPS, slow COMMIT operations, purge lag, extended crash recovery, or inconsistent query performance—is simply pressure at one stage of this journey.

Instead of studying each InnoDB component independently, we’ll follow this single transaction from beginning to end and observe how every internal mechanism contributes to performance, concurrency, durability, and recoverability.

The journey begins inside InnoDB’s most important memory structure—the Buffer Pool.

The Buffer Pool – Where Every Transaction Begins

Our INSERT statement has now entered the InnoDB storage engine.

The very first destination is the Buffer Pool, which is the primary working area of InnoDB.

In simple terms, the Buffer Pool is InnoDB’s cache for data and index pages. Rather than reading from or writing directly to disk for every SQL statement, InnoDB attempts to perform most operations in memory, where access is significantly faster.

This design is one of the primary reasons why InnoDB delivers excellent OLTP performance.

Why Does InnoDB Work with Pages Instead of Rows?

One common misconception is that InnoDB reads and writes individual rows.

In reality, InnoDB manages data in fixed-size pages.

By default, each page is 16 KB in size, and both data and indexes are stored within these pages.

When our INSERT statement modifies a row, InnoDB doesn’t load only that row into memory. Instead, it loads the entire page containing that row into the Buffer Pool.

This page-oriented architecture allows InnoDB to optimize I/O operations and efficiently cache related data that is likely to be accessed again.

InnoDB supports different page sizes when a MySQL instance is initialized. However, 16 KB remains the default and the most commonly deployed page size in production environments.

Buffer Pool Hit vs Buffer Pool Miss

Before modifying the row, InnoDB first checks whether the required page is already available in the Buffer Pool.

Buffer Pool Hit

If the page is already present in memory, InnoDB can immediately modify it without performing any disk I/O.

This is known as a Buffer Pool Hit.

Memory access is extremely fast, so transactions can continue with minimal latency.

Buffer Pool Miss

If the required page is not currently cached, InnoDB must first read the entire 16 KB page from persistent storage into the Buffer Pool before applying the modification.

This is called a Buffer Pool Miss.

The SQL statement is identical.

The execution plan may also be identical.

Yet the response time can differ dramatically simply because one transaction accessed memory while the other had to wait for physical I/O.

This is one of the reasons two identical SQL statements can exhibit completely different execution times.

Dirty Pages

Once the required page is available in memory, our INSERT modifies the page.

At this point, the copy stored inside the Buffer Pool no longer matches the copy stored in the InnoDB tablespace.

InnoDB marks the page as a Dirty Page.

A Dirty Page does not indicate corruption.

It simply means that the page has been modified in memory but has not yet been flushed back to persistent storage.

Thousands—or even millions—of Dirty Pages may exist simultaneously in a healthy production system. This is completely normal and is an intentional part of InnoDB’s design.

Rather than writing every modification immediately to disk, InnoDB accumulates changes in memory and writes them back gradually through background flushing.

This significantly improves transaction throughput by reducing random disk writes.

How Does InnoDB Manage Buffer Pool Memory?

Memory is finite.

As more pages are loaded into the Buffer Pool, free space gradually decreases.

When the Buffer Pool becomes full, InnoDB must decide which pages can be removed to make room for new ones.

Instead of removing pages randomly, InnoDB maintains an LRU (Least Recently Used) list.

Pages that are accessed frequently remain near the “young” end of the list, while pages that have not been accessed for a long time gradually move toward the “old” end.

When space is needed, InnoDB preferentially evicts pages that are least likely to be accessed again.

This intelligent page replacement strategy significantly improves cache efficiency for mixed OLTP workloads.

Two Intelligent Buffer Pool Optimizations

The Buffer Pool also works together with two additional optimization mechanisms that improve write and read performance under suitable workloads.

Change Buffer

Suppose our INSERT statement updates a secondary index, but the corresponding secondary index page is not currently present in the Buffer Pool.

Reading that page immediately would require additional random disk I/O.

Instead, InnoDB can temporarily buffer eligible secondary index modifications and merge them later when the actual index page is loaded into memory.

This mechanism is done by the sub-component the Change Buffer.

By postponing eligible secondary index page updates until they become necessary, InnoDB avoids unnecessary immediate page reads and improves performance for many write-intensive workloads.

Important

The Change Buffer applies only to eligible secondary index modifications. Changes to the clustered (primary) index are applied directly because the clustered index stores the actual table rows.

Adaptive Hash Index

Repeated index lookups can also become expensive.

To optimize such workloads, InnoDB may automatically build an Adaptive Hash Index (AHI) on frequently accessed B+Tree pages.

Future lookups may be able to use this in-memory hash structure instead of repeatedly traversing the B+Tree hierarchy.

The Adaptive Hash Index is created automatically based on workload characteristics and can significantly accelerate certain repetitive lookup patterns.

Our Transaction Continues…

Our transaction has now successfully modified the required page inside the Buffer Pool.

However, one important problem still remains.

Memory is volatile.

If the MySQL server crashes at this exact moment, every modification that exists only in memory would be lost.

So how does InnoDB protect committed transactions against unexpected failures?

The answer lies in one of the most important innovations of modern database systems:

Write-Ahead Logging, implemented through the Redo Log.

Redo Log – Protecting Transactions Before the Data Reaches Disk

Our transaction has now modified a page inside the Buffer Pool.

At this point, the latest version of the page exists only in memory. The corresponding page in the InnoDB tablespace still contains the older version because it has not yet been flushed back to disk.

Now imagine a busy e-commerce application processing thousands of orders every second.

Would it be practical for InnoDB to immediately write every modified 16 KB page to persistent storage before allowing the transaction to continue?

The answer is no.

Writing full data pages to storage for every transaction would generate enormous random I/O and significantly reduce throughput.

Instead, InnoDB uses one of the most important techniques in modern database systems: Write-Ahead Logging (WAL).

What is Write-Ahead Logging?

The principle behind Write-Ahead Logging is surprisingly simple.

Before a modified data page is written back to the tablespace, InnoDB first records what changed.

Instead of writing the entire 16 KB page, InnoDB generates a compact Redo Record describing only the modification.

For example, conceptually, instead of rewriting an entire page, the Redo information may represent an operation such as:

“Insert a new record into this page.”

The important point is not the exact internal format of the redo record, but the fact that it contains enough information for InnoDB to reproduce the change during crash recovery.

Because redo records are much smaller than full data pages, sequential log writes are significantly more efficient than repeatedly writing random data pages.

This simple idea is the foundation of InnoDB’s durability.

The Redo Log Buffer

The generated redo records are not immediately persisted to storage.

They are first placed into another in-memory structure called the Redo Log Buffer.

The Log Buffer temporarily accumulates redo generated by active transactions before it is written to the persistent redo log.

This additional memory layer allows InnoDB to process a continuous stream of transactions efficiently without forcing every small modification to interact directly with storage.

It is important to understand that the Buffer Pool and the Redo Log Buffer serve completely different purposes.

The Buffer Pool caches data and index pages.

The Redo Log Buffer temporarily stores redo information describing modifications made to those pages.

From Memory to Durable Storage

The Redo Log Buffer itself is also memory.

If the server crashes before the redo information is safely persisted, those changes would be lost.

To prevent this, InnoDB writes the redo information into the persistent Redo Log Files.

With the default durability setting (innodb_flush_log_at_trx_commit = 1), a transaction is acknowledged only after the redo records required for that transaction have been safely persisted to durable storage.

This is the durability guarantee provided by InnoDB.

COMMIT Does Not Wait for Page Flushing

One of the most misunderstood concepts in InnoDB is the meaning of COMMIT.

Many people assume that once COMMIT succeeds, the modified data page has already been written into the tablespace.

That is not correct.

The modified page usually remains inside the Buffer Pool as a Dirty Page.

What COMMIT guarantees is that the transaction has achieved the required level of durability according to the configured durability setting.

The actual data page can be written back to the tablespace later by background flushing.

Separating transaction completion from page flushing allows InnoDB to achieve excellent write performance while still maintaining crash recovery guarantees.

Group Commit

Busy production systems rarely execute one transaction at a time.

Multiple transactions often reach COMMIT almost simultaneously.

If every transaction required its own independent storage synchronization, overall throughput would decrease significantly.

Instead, InnoDB uses Group Commit.

Transactions that reach COMMIT around the same time can share a single synchronization operation.

Each transaction still waits until its required durability guarantee is satisfied, but multiple transactions can achieve that guarantee together through one shared synchronization.

This significantly reduces storage overhead while preserving transactional correctness.

Our Transaction Continues…

At this point, our transaction is protected against server failures.

However, another important question remains.

What happens if this transaction must be rolled back before COMMIT?

Or what if another transaction needs to read the previous version of the same row while this modification is still in progress?

Before changing the row, InnoDB has already prepared for those situations by preserving the previous version of the data.

That brings us to the next major component of the transaction lifecycle:

The Undo Log.

Undo Log – Preserving the Past Before Changing the Present

Our transaction has now modified the required page in the Buffer Pool, and the corresponding redo information has been generated to protect it against unexpected failures.

The next step is less obvious, but equally important.

Before InnoDB changes the actual row, it first preserves the previous version of that row.

This preserved copy is stored as an Undo Record, and the collection of these records is commonly referred to as the Undo Log.

Unlike the Redo Log, which records how to replay a change, the Undo Log preserves enough information to reverse a change or reconstruct an earlier version of the row.

Although both are called “logs,” they serve entirely different purposes.

Why Does InnoDB Need an Undo Log?

Imagine the following transaction:

			
UPDATE orders
SET amount = 149.00
WHERE id = 42;

Suppose the original value of the row is:

Amount = 99.00

Before updating the value to 149.00, InnoDB first records the previous value.

Only then is the actual row modified.

Later, another transaction updates the same row again:

99.00  → 149.00  → 199.00

The newest value becomes the current version of the row.

However, the older versions do not disappear immediately.

Instead, they remain available through a chain of Undo Records until InnoDB determines that they are no longer required.

This chain of historical versions becomes one of the most powerful features inside InnoDB.

Purpose 1 – Transaction Rollback

The first responsibility of the Undo Log is transaction rollback.

Suppose an application updates thousands of rows inside a transaction but encounters an error before issuing COMMIT.

Without Undo information, InnoDB would have no reliable way to restore the database to its previous state.

Instead, InnoDB walks backward through the corresponding Undo Records and restores each modified row to its earlier value.

This allows incomplete transactions to be rolled back safely and efficiently.

Rollback is one of the mechanisms that enables the Atomicity property of ACID transactions.

Either every modification made by the transaction succeeds, or none of them remain visible.

Purpose 2 – Multi-Version Concurrency Control (MVCC)

Rollback is only part of the story.

The second responsibility of the Undo Log is enabling Multi-Version Concurrency Control (MVCC).

Consider two concurrent transactions.

Transaction A updates a row but has not yet committed.

At the same time, Transaction B executes a SELECT statement.

Should Transaction B wait?

Should it read the uncommitted value?

Neither.

Instead, InnoDB reconstructs the appropriate historical version of the row by following the Undo Records.

Readers continue reading a consistent snapshot while writers continue modifying the latest version.

This is one of the key reasons InnoDB can support high concurrency without forcing readers and writers to block each other.

We’ll explore MVCC in much greater detail later in this article. For now, it is enough to remember that Undo preserves the history that MVCC depends upon.

Undo Records Cannot Live Forever

Every modification creates additional historical versions.

If those versions were never removed, the database would continue growing indefinitely.

Fortunately, InnoDB includes another background mechanism that eventually removes obsolete Undo Records once no active transaction requires them.

This cleanup process is known as Purge, which we’ll examine later in the transaction lifecycle.

Until then, the historical versions remain available whenever rollback or consistent reads require them.

Redo and Undo – Two Complementary Mechanisms

At first glance, Redo Log and Undo Log appear similar because both record information about data modifications.

In reality, they solve completely different problems.

Redo Log	Undo Log
Protects committed changes against crashes	Preserves previous row versions
Used during crash recovery	Used for rollback and MVCC
Moves the database forward	Moves the database backward
Supports Durability	Supports Atomicity and consistent reads

Together, these two mechanisms form the foundation of InnoDB’s transactional behavior.

Our Transaction Continues…

At this point, our transaction has:

Modified the required page in the Buffer Pool.
Generated Redo information for crash recovery.
Preserved the previous row version through the Undo Log.

The application now issues one final statement:

COMMIT;

What actually happens inside InnoDB before the application receives “COMMIT successful”?

Let’s follow our transaction one step further.

COMMIT – When Does a Transaction Actually Become Successful?

What Does COMMIT Actually Mean?

A common misconception is that COMMIT means the modified data page has been written into the tablespace.

That is not what COMMIT guarantees.

Instead, COMMIT guarantees that the transaction has reached the required level of durability based on the configured durability settings.

With the default configuration (innodb_flush_log_at_trx_commit = 1), InnoDB acknowledges COMMIT only after the redo records required for that transaction have been safely persisted to durable storage.

At this point, the application receives a successful response, even though the modified data page may still exist only inside the Buffer Pool.

Why Doesn’t COMMIT Wait for the Data Page?

Suppose every COMMIT waited until the modified 16 KB page was written back to the tablespace.

A busy OLTP database processing thousands of transactions per second would spend most of its time waiting for random disk writes.

Instead, InnoDB separates transaction durability from page flushing.

The transaction becomes durable through the Redo Log.

The modified page itself can be written later by background flushing.

This separation is one of the key architectural decisions that allows InnoDB to achieve both high throughput and reliable crash recovery.

What Happens to the Modified Page?

After COMMIT, the modified page usually remains inside the Buffer Pool as a Dirty Page.

Subsequent transactions do not wait for the page to be written to the tablespace.

If another session needs to read that row, it typically reads the updated version directly from the Buffer Pool.

Eventually, InnoDB’s background threads flush the Dirty Page to the tablespace at an appropriate time.

Until then, the Redo Log guarantees that the committed transaction can always be recovered if an unexpected failure occurs.

Our Transaction Continues…

The application has now received COMMIT successful.

From the application’s perspective, the work is finished.

From InnoDB’s perspective, however, several important background activities are only just beginning.

The modified page still exists in memory.

It must eventually be written back to the tablespace.

Historical versions must eventually be cleaned up.

Checkpoint information must continue advancing.

Let’s now follow what happens after COMMIT, when the invisible background workers take over.

Life After COMMIT – The Transaction Isn’t Finished Yet

From the application’s perspective, our transaction has now completed successfully.

The application has received:

COMMIT successful

The database client closes the transaction and continues processing the next request.

But inside InnoDB, the work is far from over.

Although the transaction is durable, the modified data page still resides inside the Buffer Pool as a Dirty Page.

The corresponding page stored in the InnoDB tablespace still contains the older version.

At some point, these two copies must become synchronized.

Why Doesn’t InnoDB Flush the Page Immediately?

This is one of the most important architectural decisions in InnoDB.

Suppose every successful COMMIT immediately forced the modified 16 KB page to be written into the tablespace.

A production database processing thousands of transactions per second would spend most of its time waiting for random disk writes.

Transaction throughput would drop dramatically.

Instead, InnoDB deliberately separates:

Transaction completion
Data page flushing

Once the transaction satisfies its durability requirements, user processing can continue immediately.

The actual page write is postponed and performed later by background threads.

This design allows foreground transactions to remain fast while background workers optimize storage I/O independently.

Dirty Pages Continue to Accumulate

Every INSERT, UPDATE, or DELETE modifies additional pages inside the Buffer Pool.

As more transactions execute, the number of Dirty Pages gradually increases.

This is completely expected.

In fact, a production server with active write workloads will almost always contain Dirty Pages.

Their existence is not a problem.

The challenge is ensuring that they are flushed at an appropriate rate so that memory remains available and recovery remains efficient.

Who Flushes Dirty Pages?

Foreground user sessions do not normally write Dirty Pages to the tablespace.

Instead, this responsibility belongs to InnoDB’s background threads.

These threads continuously monitor:

Dirty Page percentage
Available free pages
Redo generation
Checkpoint progress
Overall system activity

Based on these conditions, InnoDB decides when pages should be flushed to persistent storage.

This process is known as page flushing.

Rather than flushing pages immediately after every transaction, InnoDB performs page flushing gradually and continuously.

This produces much more efficient I/O patterns than writing every modified page individually.

Why Is Page Flushing Important?

Page flushing serves several purposes simultaneously.

It synchronizes the Buffer Pool with the tablespace.

It frees reusable memory inside the Buffer Pool.

It allows checkpoints to continue advancing.

It also reduces the amount of work required during crash recovery because more modified pages have already reached persistent storage.

Without continuous page flushing, Dirty Pages would continue accumulating indefinitely, eventually creating memory pressure and forcing aggressive flushing.

Our Transaction Continues…

Our transaction has now entered the background phase of its lifecycle.

While the application has already moved on, several internal workers continue maintaining the storage engine.

Some flush Dirty Pages.

Some write redo information.

Others remove obsolete row versions.

Together, these background workers keep InnoDB healthy while new transactions continue executing.

Let’s meet the invisible workers that make all of this possible.

Background Threads – The Invisible Workers Inside InnoDB

Our transaction has now completed successfully.

The application has already received COMMIT successful and moved on to the next request.

However, several important tasks still remain.

The modified page must eventually be written to the tablespace.

Redo information must continue to be managed efficiently.

Obsolete Undo records must eventually be removed.

Future crash recovery must remain efficient.

Rather than forcing user transactions to perform all these activities themselves, InnoDB delegates them to a collection of background threads.

These threads work continuously behind the scenes, allowing foreground transactions to remain fast while the storage engine quietly performs maintenance activities.

Why Does InnoDB Need Background Threads?

Imagine an OLTP system processing thousands of transactions every second.

If every user transaction had to:

Flush Dirty Pages
Remove obsolete Undo records
Advance Checkpoints
Manage storage maintenance

before returning to the application, transaction latency would increase dramatically.

Instead, InnoDB separates user work from maintenance work.

Foreground threads process SQL statements.

Background threads maintain the storage engine.

This separation is one of the reasons InnoDB delivers both high throughput and excellent concurrency.

The Page Cleaner

One of the most important background workers is the Page Cleaner.

Its responsibility is to gradually flush Dirty Pages from the Buffer Pool into the InnoDB tablespaces.

Rather than waiting until memory becomes full, the Page Cleaner continuously flushes pages based on system activity.

This helps:

Keep enough free pages available in the Buffer Pool.
Reduce sudden bursts of disk I/O.
Allow checkpoints to continue advancing.
Shorten future crash recovery.

The Page Cleaner therefore plays a critical role in maintaining stable write performance.

The Log Writer

Another important worker is the Log Writer.

As transactions generate redo information, the Log Writer is responsible for moving redo information from the Log Buffer toward the persistent redo log.

This allows foreground transactions to focus on transaction processing while redo management continues efficiently in the background.

The Purge Thread

Earlier we learned that the Undo Log preserves historical row versions.

Those historical versions cannot remain forever.

Once no active transaction requires them, the Purge Thread removes the obsolete versions and reclaims them for future reuse.

Without Purge, History List Length would continue growing indefinitely, eventually affecting storage usage and overall performance.

We’ll examine Purge and History List Length in much greater detail later in this article.

I/O Threads

InnoDB also uses several I/O Threads to coordinate asynchronous read and write operations between memory and persistent storage.

These threads allow storage operations to proceed efficiently without forcing user sessions to wait unnecessarily.

Rather than performing all storage activity synchronously, InnoDB overlaps computation and I/O wherever possible.

This contributes significantly to overall throughput under concurrent workloads.

Working Together

Although each background thread has a different responsibility, they all work toward a common objective:

Keeping the storage engine healthy while foreground transactions continue executing.

One thread flushes Dirty Pages.

Another manages redo.

Another removes obsolete Undo versions.

Others coordinate storage I/O.

Together they ensure that the storage engine remains responsive even under heavy production workloads.

Our Transaction Continues…

Our transaction has now been committed, protected, and handed over to the background workers.

The next question naturally follows:

How does InnoDB know which portions of the Redo Log are no longer required?

The answer lies in another critical mechanism:

Checkpoint processing.

Checkpoint – Keeping Redo Logs Reusable and Crash Recovery Efficient

Our transaction has now been committed successfully.

The application has received COMMIT successful, background threads are flushing Dirty Pages to the tablespaces, and new transactions continue entering the storage engine.

At this point, another important question arises.

If every transaction continuously generates Redo information, what happens when the Redo Log eventually becomes full?

Can InnoDB simply overwrite the older Redo records?

Not immediately.

Before any portion of the Redo Log can be reused, InnoDB must first ensure that the corresponding modified pages have already been written safely to the tablespaces.

This is the problem that Checkpoint solves.

What is a Checkpoint?

A Checkpoint is a marker within the Redo Log that identifies a point where InnoDB knows:

“Every modification represented by redo records before this point has already been written safely to the tablespaces.”

Once this condition is satisfied, the corresponding portion of the Redo Log is no longer required for crash recovery and can be reused for future transactions.

Without Checkpoints, the Redo Log would eventually fill up and no additional transactions could generate new redo records.

Checkpointing therefore serves two critical purposes:

It allows Redo Log space to be reused.
It limits the amount of redo that must be replayed during crash recovery.

How Does the Checkpoint Move Forward?

Checkpoint advancement is closely related to page flushing.

As the Page Cleaner gradually flushes Dirty Pages from the Buffer Pool into the tablespaces, more and more modifications become permanently stored on disk.

Once InnoDB determines that the pages associated with earlier redo records have been safely flushed, the Checkpoint can move forward.

This process happens continuously in the background while user transactions continue executing.

Rather than jumping forward in large intervals, the Checkpoint advances gradually as flushing progresses.

Why is Checkpoint Important?

Imagine a server processing millions of transactions.

If the server crashes, InnoDB does not need to replay the entire Redo Log.

Instead, recovery begins from the last Checkpoint and replays only the redo generated after that point.

The more efficiently Checkpoints advance during normal operation, the less work Crash Recovery has to perform after an unexpected failure.

Checkpoint management therefore has a direct impact on recovery time.

Checkpoint and Redo Log Capacity

Checkpoint advancement is also closely related to Redo Log capacity.

If Dirty Pages cannot be flushed quickly enough, the Checkpoint cannot advance.

As more redo continues to accumulate, available Redo Log space gradually decreases.

Eventually, InnoDB may increase page flushing activity to free reusable redo space.

This is one of the reasons storage performance and page flushing behavior influence transaction throughput under heavy write workloads.

Where Does the Doublewrite Buffer Fit In?

Crash recovery protects transactions.

However, another problem can occur while writing data pages to storage.

Suppose InnoDB is flushing a 16 KB page from the Buffer Pool to the tablespace.

If the server loses power halfway through that write, only part of the page may reach storage.

This situation is known as a torn page.

A partially written page cannot be trusted because some portions contain old data while others contain new data.

To protect against this possibility, InnoDB uses the Doublewrite Buffer.

Before writing a page to its final location inside the tablespace, InnoDB first writes it to the Doublewrite Buffer.

Only after that write succeeds is the page written to its final destination.

If a crash interrupts the second write, InnoDB can recover the complete page from the Doublewrite Buffer during startup.

This mechanism protects page integrity even if power is lost while a page write is in progress.

It is important to understand that the Doublewrite Buffer and the Redo Log solve different problems.

The Redo Log protects committed transactions by allowing modifications to be replayed.
The Doublewrite Buffer protects individual data pages from partial writes.

Both mechanisms work together during crash recovery to ensure that committed data remains both recoverable and physically consistent.

Our Transaction Continues…

Our transaction has now become durable.

The modified page is gradually making its way from the Buffer Pool to the tablespace.

But what happens if the server suddenly loses power before every Dirty Page has been flushed?

Can our committed transaction still survive?

Let’s follow our transaction through the final and most important test:

Crash Recovery.

Crash Recovery – How InnoDB Survives Unexpected Failures.

Now imagine the worst possible scenario.

The database server suddenly loses power.

The operating system stops immediately.

Memory disappears.

Every Dirty Page inside the Buffer Pool is lost.

At first glance, it appears that our committed transaction has disappeared as well.

Fortunately, it has not.

Why Doesn’t a Crash Lose Committed Transactions?

The answer lies in the design of Write-Ahead Logging.

Before COMMIT was acknowledged, InnoDB ensured that the required redo information had already been safely persisted.

Although the modified data page had not yet reached the tablespace, the Redo Log still contains enough information to reproduce that modification.

When MySQL starts again, InnoDB automatically performs Crash Recovery.

How Does Crash Recovery Work?

Crash Recovery begins from the last Checkpoint, not from the beginning of the Redo Log.

From that point onward, InnoDB scans the redo generated after the Checkpoint and reapplies any committed modifications whose corresponding data pages had not yet been written to the tablespaces.

This process reconstructs the latest committed version of the affected pages.

Once redo replay has completed, InnoDB performs another important step.

Any transactions that were still active at the time of the crash are identified.

Their incomplete changes are rolled back using the corresponding Undo Records.

The result is a consistent database state:

Committed transactions remain.
Incomplete transactions disappear.

This entire process happens automatically during server startup without requiring manual intervention.

Crash Recovery in Action

Let’s revisit the journey of our transaction.

The row was modified inside the Buffer Pool.
Redo information was generated.
COMMIT was acknowledged after the required durability guarantees were satisfied.
Before the Dirty Page could be flushed, the server unexpectedly crashed.

After restart:

InnoDB begins recovery from the last Checkpoint.
Redo records reconstruct the missing page modifications.
Any incomplete transactions are rolled back using Undo.
If a page write was interrupted, the Doublewrite Buffer restores the page safely.

Although the application experienced an unexpected failure, the committed transaction survives exactly as expected.

This is one of the defining characteristics of a transactional storage engine.

Our Transaction Continues…

Our transaction has now survived a server crash successfully.

So far, we have focused entirely on the write path of a transaction.

But databases perform far more reads than writes.

What happens when another session executes a SELECT statement while transactions continue modifying the same data?

To answer that question, we now switch from the write path to the read path of InnoDB.

The Read Path – Why Can Two Identical SELECT Statements Perform Differently?

So far, we have followed the journey of a write transaction.

But databases perform far more reads than writes.

Let’s now follow a simple SELECT statement.

			
SELECT *
FROM orders
WHERE id = 42;

At first glance, this appears to be a straightforward query.

However, its execution time depends heavily on one simple question:

Is the required page already available inside the Buffer Pool?

Reading from the Buffer Pool

If the page containing the requested row is already present in the Buffer Pool, InnoDB can satisfy the request directly from memory.

No physical disk I/O is required.

This is known as a Buffer Pool Hit.

Since memory access is extremely fast, the query usually completes with very low latency.

In busy production systems, the vast majority of frequently executed queries are expected to be Buffer Pool hits.

Reading from Persistent Storage

If the required page is not currently cached, InnoDB must first read the corresponding 16 KB page from the tablespace into the Buffer Pool.

Only after the page has been loaded can the requested row be returned to the application.

This is known as a Buffer Pool Miss.

The SQL statement has not changed.

The execution plan has not changed.

Yet the response time may be dramatically different because the second query had to wait for physical storage.

This illustrates an important principle:

The same SQL statement can exhibit completely different performance characteristics depending on where the required page resides.

Our Journey Continues…

Our SELECT statement has now successfully located the required page.

But another important question remains.

What happens if another transaction modifies the same row while this SELECT is still running?

Should the reader wait?

Should it read the new value?

Should it read the old value?

The answer lies in one of InnoDB’s most powerful concurrency mechanisms:

Multi-Version Concurrency Control (MVCC).

Multi-Version Concurrency Control (MVCC) – Reading Without Blocking Writes

Our journey now shifts from transaction durability to transaction concurrency.

Imagine the following situation.

Transaction A updates a customer’s order.

At exactly the same time, Transaction B executes a report that needs to read the same row.

Should the report wait until the update finishes?

Should it read uncommitted data?

Or should the update wait for the report to complete?

If every read operation waited for every write operation, modern OLTP systems would quickly become bottle necked by locking.

Instead, InnoDB uses Multi-Version Concurrency Control (MVCC) to allow readers and writers to operate concurrently while still maintaining transactional consistency.

Understanding the Problem

Suppose the current value of a row is:

Amount = 99.00

Transaction A updates the row:

99.00 → 149.00

but has not yet committed.

Meanwhile, Transaction B executes:

			
SELECT amount
FROM orders
WHERE id = 42;

What should Transaction B see?

The answer depends on transaction isolation, but under InnoDB’s default REPEATABLE READ isolation level, Transaction B performs a consistent read.

Instead of reading the uncommitted value, InnoDB reconstructs the appropriate earlier version of the row using the Undo Records we discussed earlier.

Transaction B therefore continues reading a consistent version of the data without waiting for Transaction A to finish.

MVCC Depends on Undo

Earlier we learned that every modification preserves the previous version of the row inside the Undo Log.

Those historical versions now become extremely valuable.

Rather than blocking readers whenever a row is modified, InnoDB follows the chain of Undo Records and reconstructs the version that should be visible to the current transaction.

In other words:

Writers modify the newest version.
Readers can continue accessing an earlier version when necessary.

This is the fundamental idea behind MVCC.

Without Undo Records, MVCC would not be possible.

Consistent Reads

When a normal SELECT statement executes, InnoDB does not necessarily read the newest physical version of every row.

Instead, it determines which version should be visible according to the transaction’s consistency rules.

If the latest version is already visible, InnoDB returns it immediately.

If not, InnoDB reconstructs the appropriate historical version using the available Undo Records.

This process happens automatically and transparently.

Applications do not need to request older versions explicitly.

Why Doesn’t SELECT Wait?

One of the biggest advantages of MVCC is that readers and writers rarely block one another during normal query processing.

While one transaction updates a row, another transaction can often continue reading a consistent version of that same row.

This significantly improves concurrency in busy OLTP systems where thousands of transactions may be executing simultaneously.

It is one of the key reasons InnoDB performs so well under mixed read-write workloads.

Important

MVCC applies to consistent reads.

Locking reads such as:

SELECT ... FOR UPDATE

SELECT ... FOR SHARE

intentionally acquire locks because they are designed to protect rows that will participate in subsequent updates.

One common production question is:

“Why am I seeing old data immediately after another session updated the row?”

In many cases, the answer is not replication lag or stale caching.

It is simply MVCC working exactly as designed.

The transaction is reading a consistent snapshot that was established according to the transaction’s isolation level.

Understanding this behavior prevents many unnecessary troubleshooting efforts.

Our Journey Continues…

Every row modification creates additional historical versions.

Those versions are essential while active transactions continue reading consistent snapshots.

But they cannot remain forever.

Eventually, InnoDB must determine when those older versions are no longer needed and safely remove them.

That leads us to another important concept:

History List Length and Purge Processing.

History List Length – When Can InnoDB Finally Forget the Past?

In the previous section, we learned that MVCC allows readers to access consistent snapshots without blocking writers.

To make this possible, InnoDB preserves previous row versions in the Undo Log.

However, these historical versions cannot remain in the database forever.

At some point, they are no longer needed and must be removed.

This cleanup process is managed by the Purge Thread, and one of the most important indicators of its progress is the History List Length.

What is History List Length?

Every time a transaction modifies a row, InnoDB creates an Undo Record that preserves the previous version of that row.

As transactions continue executing, these Undo Records accumulate.

The History List Length (HLL) represents the amount of transactional history that is still waiting to be purged.

It is important to understand that History List Length is not a count of Undo pages or Undo tablespace size.

Instead, it reflects the amount of committed update history that cannot yet be removed because it may still be required by active transactions.

A growing History List Length simply means that InnoDB is retaining more historical versions.

The next question is:

Why can’t InnoDB remove them immediately after COMMIT?

Why Can’t Purge Remove Them Immediately?

Consider the following situation.

Transaction A updates a row and commits successfully.

At first glance, it might appear that the corresponding Undo Record is no longer needed.

However, another transaction may have started earlier and could still be reading a consistent snapshot that depends on that older version.

If InnoDB removed the Undo Record too soon, that transaction would no longer be able to reconstruct the correct historical version.

For this reason, Purge must wait until it is certain that no active transaction requires those historical versions anymore.

Only then can the Undo Records be safely removed.

The Role of the Purge Thread

This cleanup is performed automatically by the Purge Thread, one of InnoDB’s background workers.

Its responsibilities include:

Removing obsolete Undo Records.
Reclaiming space occupied by outdated row versions.
Preventing unnecessary growth of transactional history.
Helping maintain long-term storage efficiency.

The Purge Thread operates continuously in the background while user transactions continue executing.

In healthy systems, it quietly keeps up with the workload without requiring administrator intervention.

The Impact of Long-Running Transactions

One of the most common causes of History List Length growth is long-running transactions.

Imagine a reporting query that begins in the morning and remains active for several hours.

During that time, thousands of UPDATE statements may modify the same tables.

Although those updates have already committed, Purge cannot remove their historical versions because the reporting transaction may still need to reconstruct its original snapshot.

As a result:

History List Length continues increasing.
Undo Records accumulate.
Undo Tablespaces grow.
Purge appears to fall behind.

In many production environments, the Purge Thread is not the actual problem.

It is simply waiting for long-running transactions to finish.

Why Should DBAs Monitor History List Length?

History List Length is one of the most useful indicators of transaction health.

A steadily increasing value often suggests that old row versions are not being removed as quickly as they are being generated.

This may indicate:

Long-running transactions.
Applications holding transactions open unnecessarily.
Heavy update workloads producing Undo faster than it can be purged.

Monitoring this metric allows DBAs to identify transaction management issues before they begin affecting storage usage and overall system performance.

Our Transaction Journey is Almost Complete…

At this point, our transaction has completed its entire lifecycle.

It entered the Buffer Pool.

Redo protected it against crashes.

Undo preserved previous versions.

COMMIT made it durable.

Background threads flushed modified pages.

Checkpoint advanced.

Crash Recovery ensured committed work survived failures.

MVCC allowed readers to continue without blocking writers.

Finally, once no active transaction required its historical versions, the Purge Thread removed the obsolete Undo Records.

The journey of a single transaction is now complete.

The final step is to connect everything we’ve learned to the production incidents that DBAs encounter every day.

Connecting Production Symptoms to InnoDB Internals

Throughout this article, we’ve followed a single transaction as it moved through the InnoDB storage engine.

Along the way, we’ve explored every major component involved in transaction processing:

Buffer Pool
Redo Log
Undo Log
COMMIT
Background Threads
Checkpoint
Crash Recovery
MVCC
Purge

At first glance, these may appear to be independent concepts.

In reality, they are closely connected.

One component influences another, and understanding those relationships is one of the most valuable skills a MySQL DBA can develop.

When production issues occur, monitoring systems rarely report:

“The Buffer Pool hit ratio has decreased.”

“Checkpoint advancement has slowed.”

Instead, alerts usually look much simpler:

“MySQL is slow.”
“Disk IOPS suddenly increased.”
“Transactions are taking longer to commit.”
“Recovery took much longer than expected.”

The responsibility of a DBA is to translate these symptoms into the internal engine components that are most likely responsible.

From Symptoms to Root Cause

The following table provides a practical way to connect common production symptoms with the corresponding InnoDB components.

Production Symptom	First Areas to Investigate
Slow SELECT statements	Buffer Pool efficiency, cache misses, physical page reads
Slow COMMIT operations	Redo Log activity, durability settings, storage latency
Increasing disk IOPS	Buffer Pool pressure, Dirty Page flushing, page reads
Growing History List Length	Long-running transactions, Purge progress
Extended crash recovery	Checkpoint advancement, Redo generation
Increasing Undo Tablespace usage	Long-running transactions, MVCC history retention
Frequent page flushing	Dirty Page accumulation, flushing behavior
High write latency	Storage subsystem, Redo generation, flushing activity

Notice that none of these symptoms should immediately lead to configuration changes.

Instead, they should guide your investigation toward the corresponding internal component.

Five Practical Principles Every DBA Should Remember

As we conclude, here are five practical principles that summarize the entire transaction journey.

1. Memory is Faster Than Storage

The Buffer Pool is one of the most important contributors to InnoDB performance.

Maintaining an effective cache reduces physical I/O and improves overall query response time.

2. COMMIT Does Not Mean the Data Page Has Been Written

A successful COMMIT means the transaction has satisfied its configured durability requirements.

The modified page itself may still reside inside the Buffer Pool as a Dirty Page and will be flushed later by background threads.

3. Long Transactions Have Hidden Costs

Long-running transactions prevent Purge from removing historical row versions.

As a result, History List Length grows, Undo history accumulates, and storage usage may increase over time.

Keeping transactions short benefits both concurrency and long-term system health.

4. Checkpoint Influences Recovery

Efficient Checkpoint advancement reduces the amount of redo that must be replayed after an unexpected shutdown.

Checkpoint behavior is therefore closely connected to both normal write performance and recovery time.

5. Follow the Transaction

Perhaps the most important lesson from this article is this:

Don’t study Buffer Pool, Redo Log, Undo Log, MVCC, or Purge as isolated topics.

Follow the transaction.

Once you understand where the transaction is inside InnoDB, identifying production bottlenecks becomes significantly easier.

Following One Transaction from Beginning to End

Let’s return to the simple INSERT statement that started our journey.

INSERT INTO orders VALUES (42, 7, 199.00);

At first glance, it looked like an ordinary SQL statement.

Yet behind that single statement, InnoDB performed an extraordinary amount of work.

Our transaction entered the Buffer Pool, where the required page was modified in memory.

Redo information was generated to ensure that committed changes could survive unexpected failures.

Undo Records preserved previous row versions, enabling both rollback and Multi-Version Concurrency Control.

When the application issued COMMIT, InnoDB satisfied the configured durability requirements and acknowledged the transaction, even though the modified page had not yet been written to the tablespace.

Background threads later flushed Dirty Pages, advanced Checkpoints, and maintained the overall health of the storage engine.

If a server crash occurred before page flushing completed, Crash Recovery automatically replayed the required Redo information, rolled back incomplete transactions using Undo, and relied on the Doublewrite Buffer to protect against torn page writes.

Meanwhile, readers continued executing consistent reads through MVCC, while the Purge Thread eventually removed historical row versions that were no longer required.

Every component we explored participated in the lifecycle of this single transaction.

This is perhaps the most important takeaway from the entire article.

Buffer Pool, Redo Log, Undo Log, COMMIT, Checkpoint, Crash Recovery, MVCC, and Purge are not isolated concepts.

They are different stages in the lifecycle of every transaction processed by InnoDB.

Understanding those stages changes the way we troubleshoot production systems.

Instead of asking:

“Which parameter should I change?”

we begin asking:

“Where is my transaction inside InnoDB?”

That single question often leads directly to the root cause.

Looking Ahead

This article focused on the complete lifecycle of a transaction to build a strong conceptual foundation.

However, InnoDB Internals is a much broader subject.

Topics such as page flushing algorithms, checkpoint tuning, purge architecture, locking internals, B+Tree indexes, adaptive flushing, change buffering, memory management, and storage engine monitoring deserve dedicated deep dives of their own.

In future articles, we’ll explore each of these areas individually from a Production DBA’s perspective, combining architectural concepts with practical monitoring techniques, configuration guidance, and real-world troubleshooting scenarios.

I hope this journey has helped you visualize how a transaction moves through InnoDB and provided a stronger foundation for understanding MySQL Internals.

Happy Learning!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Inside MySQL InnoDB: The Complete Journey of One Transaction

Table of Contents

Part 1 – Setting the Foundation

Part 2 – The Write Path

Part 3 – After COMMIT

Part 4 – The Read Path

Part 5 – Applying the Knowledge

Understanding the InnoDB Architecture

The SQL Processing Layer

The In-Memory Layer

The Persistent Storage Layer

A Layered Design with a Purpose

Meet Our Transaction

The Buffer Pool – Where Every Transaction Begins

Why Does InnoDB Work with Pages Instead of Rows?

Buffer Pool Hit vs Buffer Pool Miss

Buffer Pool Hit

Buffer Pool Miss

Dirty Pages

How Does InnoDB Manage Buffer Pool Memory?

Two Intelligent Buffer Pool Optimizations

Change Buffer

Adaptive Hash Index

Our Transaction Continues…

Redo Log – Protecting Transactions Before the Data Reaches Disk

What is Write-Ahead Logging?

The Redo Log Buffer

From Memory to Durable Storage

COMMIT Does Not Wait for Page Flushing

Group Commit

Our Transaction Continues…

Undo Log – Preserving the Past Before Changing the Present

Why Does InnoDB Need an Undo Log?

Purpose 1 – Transaction Rollback

Purpose 2 – Multi-Version Concurrency Control (MVCC)

Undo Records Cannot Live Forever

Redo and Undo – Two Complementary Mechanisms

Our Transaction Continues…

COMMIT – When Does a Transaction Actually Become Successful?

What Does COMMIT Actually Mean?

Why Doesn’t COMMIT Wait for the Data Page?

What Happens to the Modified Page?

Our Transaction Continues…

Life After COMMIT – The Transaction Isn’t Finished Yet

Why Doesn’t InnoDB Flush the Page Immediately?

Dirty Pages Continue to Accumulate

Who Flushes Dirty Pages?

Why Is Page Flushing Important?

Our Transaction Continues…

Background Threads – The Invisible Workers Inside InnoDB

Why Does InnoDB Need Background Threads?

The Page Cleaner

The Log Writer

The Purge Thread

I/O Threads

Working Together

Our Transaction Continues…

Checkpoint – Keeping Redo Logs Reusable and Crash Recovery Efficient

What is a Checkpoint?

How Does the Checkpoint Move Forward?

Why is Checkpoint Important?

Checkpoint and Redo Log Capacity

Where Does the Doublewrite Buffer Fit In?

Our Transaction Continues…

Crash Recovery – How InnoDB Survives Unexpected Failures.

Why Doesn’t a Crash Lose Committed Transactions?

How Does Crash Recovery Work?

Crash Recovery in Action

Our Transaction Continues…

The Read Path – Why Can Two Identical SELECT Statements Perform Differently?

Reading from the Buffer Pool

Reading from Persistent Storage

Our Journey Continues…

Multi-Version Concurrency Control (MVCC) – Reading Without Blocking Writes

Understanding the Problem

MVCC Depends on Undo

Consistent Reads

Why Doesn’t SELECT Wait?

Our Journey Continues…

History List Length – When Can InnoDB Finally Forget the Past?