Understanding Distributed Transactions

Distributed transactions are one of the trickiest problems in distributed systems. This post explores the challenges and practical solutions.

The Problem

In a single database, ACID transactions are straightforward. But what happens when your operation spans multiple databases, services, or geographic regions?

Consider a payment system:

User sends money from Bank A
Funds are received at Bank B

If step 1 succeeds but step 2 fails, money disappeared. If both fail, that's also a problem.

CAP Theorem

The CAP theorem tells us we can't have all three:

Consistency: Every read receives the most recent write
Availability: Every request gets a response
Partition tolerance: System works despite network failures

In real systems, you always have network failures, so you must choose between Consistency and Availability.

Practical Patterns

Two-Phase Commit (2PC)

The traditional approach:

Prepare phase: Ask all participants if they can commit
Commit phase: All commit or all rollback

Pros: Strong consistency Cons: Blocks resources, doesn't handle coordinator failure well

Saga Pattern

Break the transaction into smaller, compensatable steps:

async function transferMoney(from: string, to: string, amount: number) {
  try {
    // Step 1: Debit from source
    await debitAccount(from, amount);
    
    // Step 2: Credit to destination (might fail)
    await creditAccount(to, amount);
    
  } catch (error) {
    // Compensating transaction: Refund the debit
    await creditAccount(from, amount);
    throw error;
  }
}

Pros: No blocking, can handle long-running transactions Cons: Potential for inconsistency window, complex error handling

Event Sourcing

Instead of storing state, store all changes:

interface TransferEvent {
  type: 'transfer_initiated' | 'transfer_completed' | 'transfer_failed';
  fromAccount: string;
  toAccount: string;
  amount: number;
  timestamp: Date;
}

async function applyTransfer(transfer: Transfer) {
  const events: TransferEvent[] = [];
  
  events.push({
    type: 'transfer_initiated',
    fromAccount: transfer.from,
    toAccount: transfer.to,
    amount: transfer.amount,
    timestamp: new Date(),
  });
  
  await eventLog.append(events);
  
  // Async processing can replay these events
}

Choosing the Right Approach

Same database? Use traditional ACID transactions
Few services, strong consistency needed? Two-phase commit
Microservices, eventual consistency OK? Saga or event sourcing
Need audit trail? Event sourcing

Conclusion

There's no one-size-fits-all solution. The key is understanding the trade-offs and choosing what's right for your specific constraints.