Delta Lake Concurrency Deep Dive: Isolation Levels, Writer Conflicts, and What Actually Blocks
Delta Lake's ACID guarantees work well for standard write patterns — appends, overwrites, MERGEs. The concurrency model gets more interesting when you have multiple writers, or when you're running operations like DELETE and UPDATE simultaneously with ongoing streaming appends. Understanding the isolation model tells you what's safe and what needs explicit sequencing.
How Delta's Optimistic Locking Works
Delta uses optimistic concurrency: transactions don't hold locks while running. Instead, each transaction records what version of the table it started from, and at commit time, Delta checks whether any concurrent transaction has modified data that the current transaction read or wrote.
The transaction log is the arbiter. Each JSON commit file in _delta_log is written atomically. The first writer to successfully add version N wins; the second must retry from version N+1 if there's a conflict.
What Counts as a Conflict
# These two operations can run concurrently without conflict
# Job A: append new West region orders
west_df.write.format("delta").mode("append").save("/mnt/orders")
# Job B: append new East region orders (concurrent with Job A)
east_df.write.format("delta").mode("append").save("/mnt/orders")Two appends to the same table don't conflict in Delta — neither reads the other's data, and both are just adding rows. Delta detects this and allows both commits to succeed.
# These WILL conflict if they run simultaneously on overlapping rows
# Job A: MERGE updates rows where region = 'West'
# Job B: DELETE removes rows where customer_id IN (select ...)
# If any West rows have those customer IDs, both operations read and modify the same rowsMERGE and DELETE are read-modify-write operations. They read data, compute changes, and write results. If two such operations read the same rows, the second one to commit finds that its read version is stale and must retry with the current state.
Streaming and Batch Concurrency
One pattern that works cleanly in Delta: streaming appends running simultaneously with batch reads and batch OPTIMIZE. The streaming writer only appends new rows. OPTIMIZE compacts existing files and creates new ones, but it doesn't change the logical data — the transaction log records old files as removed and new compacted files as added, atomically. A concurrent reader sees a consistent snapshot throughout.
# This is safe — concurrent streaming and OPTIMIZE are designed to work together
# Stream 1: continuously appending new events
# Stream 2: scheduled OPTIMIZE run on the same table
OPTIMIZE orders ZORDER BY (customer_id, order_date)Checking for Conflict History
If you're seeing ConcurrentModificationException in production and want to understand what collided:
DESCRIBE HISTORY orders LIMIT 20
Look at the readVersion column. When a transaction retries after a conflict, the retry's readVersion will be higher than the original attempt. A pattern of incrementing readVersion with frequent retries suggests two pipelines are regularly contending on the same data. The fix is usually scheduling changes to avoid overlap or redesigning the write pattern to touch different partitions. As always, I'm here to help.