Writing on systems that need to stay up.

Jul 4, 2026 · 25 min read

Caching in Production: Strategies, Pitfalls, and What Companies Actually Do

## Why Caching Exists A single database query takes 5ms. A cache lookup takes 1ms. The difference seems small — but at 10,000 requests per second, that 4ms gap is 40 seconds of cumulative latency pe...

Jul 21, 2026 · 15 min read

TCP for Backend Engineers: From Zero to Practical Mastery

## 1. What TCP Actually Is, and Why It Exists Start with the network underneath TCP, because that's the whole reason TCP exists. The internet moves data in packets, and the protocol that handles tha...

May 6, 2026 · 15 min read

Building an LSM-Tree Storage Engine in Go, Part 2: The Memtable

## Where We Are In Part 1 we built a Write-Ahead Log. Every mutation — write or delete — is recorded to disk and fsynced before we acknowledge it to the caller. We have durability. What we do not h...

May 6, 2026 · 15 min read

Building an LSM-Tree Storage Engine in Go, Part 3: The SSTable

## Where We Are Post 1 built the WAL — every write is fsynced to disk before being acknowledged. Post 2 built the memtable — an in-memory skip list that keeps keys sorted and serves reads in O(log n...

May 4, 2026 · 15 min read

Implementing a Write-Ahead Log in Go

## Implementing a Write-Ahead Log in Go A Write-Ahead Log is the mechanism behind the D in ACID. Before any mutation reaches your data files, a record describing that mutation is written and fsynced...

May 3, 2026 · 15 min read

Building a Load Balancer in Go, Part 1: Reverse Proxy and Round Robin

## Introduction You have three backend servers. Each one can handle 1,000 requests per second. Together they can handle 3,000 — but only if requests are spread evenly across all three. Something ha...

May 3, 2026 · 15 min read

Building a Load Balancer in Go, Part 2: Health Checks

## Where We Left Off In Part 1 we built a reverse proxy that distributes requests across three backends using round robin. It works — but it is blind. Kill one of the backends and the load balancer ...

May 2, 2026 · 15 min read

Building a Production Job Queue in Go: Concurrency, Tradeoffs, and Getting It Right

## Introduction Your API receives a request. You need to send a confirmation email, resize an image, call a third-party webhook, and update an analytics counter. You could do all of that synchronou...

Mar 25, 2026 · 13 min read

Write-Ahead Logging: How PostgreSQL Survives Crashes, Powers Replication, and Never Loses Your Data

## Introduction Your application writes a row. PostgreSQL says "committed." The server loses power one millisecond later. When it comes back, is your row there? The answer is yes — and the reason ...

Mar 14, 2026 · 10 min read

Concurrency Control in Databases: How to Handle Many Things Happening at Once

## Introduction Your application has one user. Concurrency is not a problem. That user reads data, writes data, nothing conflicts with anything. Then you have ten thousand users. Two of them update...

Mar 12, 2026 · 10 min read

Data Partitioning in Distributed Databases: How to Split Your Data Without Breaking Everything

## Introduction Your single database server handled your first 100,000 users just fine. Then you hit a million. Queries slowed down. Disk filled up. You threw more RAM at it. It helped — for a while...

Mar 10, 2026 · 10 min read

Implementing a B+ Tree in Go: What Databases Actually Use

## Why B+ Trees Exist Your database has 50 million rows. You query by user ID. Without an index, the database reads every row until it finds yours. That's a full table scan. It gets slower with ever...

Nov 28, 2025 · 12 min read

Distributed Transactions: The Hard Truth About Keeping Multiple Systems in Sync

## Why Single-Database Transactions Don't Scale A transaction in a single database is simple. You start a transaction, make changes, commit. Either everything commits or nothing does. The database h...

Nov 26, 2025 · 10 min read

Replication Strategies in Distributed Databases: What Actually Happens When You Copy Your Data

## Why Replication Exists Your database will crash. Not maybe. It will. Hard drives fail, memory corrupts, someone runs the wrong command, power goes out. When that happens, you need another copy of...

Oct 20, 2025 · 17 min read

How Databases Actually Store Data: Pages, Tuples, and the Architecture That Matters

## Introduction Most developers understand SQL. Fewer understand what happens after the query planner finishes — how rows are physically laid out on disk, why certain writes cost more than others, w...

Oct 17, 2025 · 22 min read

Consistency and Data Models in Real Life: The Art of Building Systems That Never Lie to You

## Introduction At some point you ship something that works perfectly in staging. The code is clean. The tests pass. You deploy it and feel good about it. Six months later you're debugging a produc...

Oct 16, 2025 · 17 min read

ACID Properties and Isolation Levels: Deep Dive into Production Database Behavior

## Why This Matters in Production Every backend engineer has a story about a production incident that made ACID properties real. Maybe it was a race condition in a payment system that took hours to ...

Oct 16, 2025 · 17 min read

Idempotency in Production: Making Your System Safe to Retry

## The Real Cost - What happens when you don't have idempotency Every backend engineer has experienced this: a user clicks a button, the request times out, they panic and click it again. Now somethi...