System Design|May 2, 2026|15 min read

Building a Production Job Queue in Go: Concurrency, Tradeoffs, and Getting It Right

Introduction

Your API receives a request. You need to send a confirmation email, resize an image, call a third-party webhook, and update an analytics counter.

You could do all of that synchronously — inside the request handler, blocking the client until every side effect finishes. That works until the email provider goes down for 30 seconds, the image resizer hangs on a malformed file, or the webhook endpoint times out.

Now your entire API is blocked.

A job queue decouples work from the request that triggers it. The handler enqueues a job and returns immediately. Workers pick up jobs in the background and execute them independently. The API stays fast. The work still gets done.

The problem is that "job queue" sounds simple until you start thinking about what it actually needs to guarantee: jobs must not be lost if a worker crashes mid-execution, backpressure must prevent the queue from growing unboundedly, panicking workers must not take down the entire process, and you need to know what is happening inside the system at any point in time.

This article builds a production-grade job queue in Go from scratch. Every design decision comes with an explanation of the tradeoff — because knowing what to build matters less than knowing why.

What We Are Building

A concurrent job queue with the following properties:

Fixed worker pool — bounded concurrency, not unbounded goroutine spawning
Buffered job channel — bounded queue depth with backpressure semantics
Per-worker panic recovery — one bad job cannot crash the process
Graceful shutdown — in-flight jobs finish before the process exits
Retry logic with exponential backoff — transient failures are retried, permanent ones are not
Structured metrics — visibility into queue depth, throughput, and failure rates

We will build this incrementally, explaining the concurrency concept at each step before writing the code.

The Foundation: Goroutines Are Not Free

The naive approach to concurrent job processing is to spawn a goroutine per job:

func process(job Job) {
    go func() {
        job.Execute()
    }()
}

This works under low load. Under high load it is a disaster.

Each goroutine costs memory — roughly 2–8 KB of stack space at creation, growing as needed. With 100,000 pending jobs, you have 100,000 goroutines, potentially gigabytes of stack space, and a scheduler that spends more time context-switching between goroutines than executing them. The Go runtime's M:N scheduler maps goroutines onto OS threads efficiently, but there is a limit to how much multiplexing helps when every goroutine is actively trying to do CPU or I/O work simultaneously.

The fix is a worker pool: a fixed number of goroutines that each pull work from a shared queue. You decide upfront how many workers you can afford — say, 10 — and that number never changes, regardless of how many jobs arrive.

Without worker pool:                  With worker pool:
100,000 jobs → 100,000 goroutines    100,000 jobs → 10 goroutines
                                      Jobs wait in a channel
                                      Workers drain it at their own pace

This is the central tradeoff of a worker pool: throughput is bounded, but so is resource usage. You trade unlimited parallelism for predictable memory and CPU behavior.

The Job

Before the queue, define what a job is:

package queue

import (
	"context"
	"fmt"
	"time"
)

// JobID uniquely identifies a job for logging and deduplication.
type JobID string

// Job is the unit of work the queue processes.
// Every job knows how to execute itself and how to identify itself.
type Job struct {
	ID      JobID
	Name    string        // human-readable label for logging
	Payload any           // arbitrary data the handler needs
	Handler JobHandler    // the function that does the actual work
	MaxRetry int          // 0 means no retries
	Timeout  time.Duration // 0 means no timeout
}

// JobHandler is the function signature every job must implement.
// It receives a context (for timeout/cancellation) and the job's payload.
// Return a non-nil error to signal failure. A permanent failure should
// wrap ErrPermanent so the queue knows not to retry.
type JobHandler func(ctx context.Context, payload any) error

// ErrPermanent wraps an error to signal that retrying will not help.
// The queue will not retry a job that returns a wrapped ErrPermanent.
var ErrPermanent = fmt.Errorf("permanent failure")

The Timeout field deserves attention. Without it, a job that hangs forever — say, an HTTP call to a server that never responds — holds a worker indefinitely. With Timeout, we derive a context.WithTimeout before calling the handler, and the context's cancellation propagates into any context-aware operation inside the handler. This is Go's standard pattern for deadline propagation, and it is why passing ctx through every call in your handler matters.

The Queue: Channels as the Synchronization Primitive

Go channels are typed, goroutine-safe conduits for passing values. They are also the natural primitive for a job queue: the producer side sends jobs in, the consumer side receives jobs out, and the channel itself handles the synchronization between them without a single mutex.

There are two channel variants relevant here.

An unbuffered channel (make(chan Job)) blocks the sender until a receiver is ready and vice versa. This is a strict rendezvous — sender and receiver must meet simultaneously.

A buffered channel (make(chan Job, N)) has internal capacity for N items. The sender proceeds without blocking as long as the buffer is not full. The receiver proceeds as long as the buffer is not empty. They only synchronize at the boundaries: sender blocks when full, receiver blocks when empty.

We want a buffered channel. The buffer depth is the maximum number of jobs that can be waiting for a worker. When the buffer is full, Enqueue blocks — this is backpressure, the mechanism by which a queue signals to producers that it cannot accept more work.

// Queue is the core type. It owns the job channel and the worker pool.
type Queue struct {
	jobs       chan Job
	workerCount int
	wg         sync.WaitGroup
	ctx        context.Context
	cancel     context.CancelFunc
	metrics    *Metrics
}

// New creates a Queue with the given number of workers and buffer depth.
//
// workerCount: how many goroutines process jobs concurrently.
// bufferSize: how many jobs can wait before Enqueue blocks.
//
// Choosing workerCount: for CPU-bound work, set this to runtime.NumCPU().
// For I/O-bound work (HTTP calls, database queries), set it higher — workers
// spend most of their time waiting, so more workers means more throughput.
//
// Choosing bufferSize: a larger buffer absorbs traffic spikes but uses more
// memory and hides backpressure from producers longer. Start with 10x your
// worker count and tune from there.
func New(workerCount, bufferSize int) *Queue {
	ctx, cancel := context.WithCancel(context.Background())
	return &Queue{
		jobs:        make(chan Job, bufferSize),
		workerCount: workerCount,
		ctx:         ctx,
		cancel:      cancel,
		metrics:     newMetrics(),
	}
}

The context.WithCancel at the queue level is how we will signal shutdown to all workers simultaneously. When the queue needs to stop, it calls cancel(), which closes the context. Every worker that checks ctx.Done() will see this and begin draining.

Workers: The Fan-Out Pattern

The fan-out pattern describes a single producer distributing work to multiple consumers. In our case, Enqueue is the producer and each worker goroutine is a consumer. All workers receive from the same channel — Go's scheduler ensures each job goes to exactly one worker.

// Start launches the worker pool. Call this once, before enqueuing jobs.
func (q *Queue) Start() {
	for i := 0; i < q.workerCount; i++ {
		q.wg.Add(1)
		go q.worker(i)
	}
}

// worker is the goroutine that each member of the pool runs.
// It loops forever, pulling jobs from the channel and executing them.
// It exits when the jobs channel is closed (during shutdown).
func (q *Queue) worker(id int) {
	defer q.wg.Done()

	for job := range q.jobs {
		q.execute(id, job)
	}
}

for job := range q.jobs is the idiomatic Go pattern for draining a channel. It blocks when the channel is empty (waiting for work), processes each job that arrives, and exits the loop when the channel is closed. This is the shutdown mechanism: when we close q.jobs, all workers finish their current job and exit naturally.

The sync.WaitGroup tracks how many workers are still running. wg.Add(1) before launching each goroutine, defer wg.Done() inside each worker. Shutdown calls wg.Wait() to block until all workers have exited.

Execution: Panic Recovery and Timeouts

This is where most naive implementations fall short. Two things can go wrong during job execution that the queue must handle gracefully.

A job can panic. A nil pointer dereference, an out-of-bounds slice access, or any unrecovered panic in a goroutine will crash the entire process — not just the current job. In a web server, the HTTP handler's recover() catches panics per-request. In a worker pool, nothing catches them unless you add it explicitly.

A job can hang. Without a timeout, a job that blocks indefinitely holds the worker forever. If enough jobs do this, all workers are occupied and the queue stops processing entirely.

// execute runs a single job with panic recovery, timeout enforcement,
// and retry logic. It is the most important method in the queue.
func (q *Queue) execute(workerID int, job Job) {
	attempts := job.MaxRetry + 1
	for attempt := 1; attempt <= attempts; attempt++ {
		err := q.runOnce(workerID, job, attempt)
		if err == nil {
			q.metrics.recordSuccess()
			return
		}

		// A permanent error signals that retrying will not help.
		// Skip remaining attempts immediately.
		if errors.Is(err, ErrPermanent) {
			q.metrics.recordFailure()
			log.Printf("[worker %d] job %s failed permanently after %d attempt(s): %v",
				workerID, job.ID, attempt, err)
			return
		}

		if attempt < attempts {
			backoff := exponentialBackoff(attempt)
			log.Printf("[worker %d] job %s failed (attempt %d/%d), retrying in %s: %v",
				workerID, job.ID, attempt, attempts, backoff, err)

			// Sleep with context awareness — if the queue shuts down
			// during a backoff sleep, we exit rather than retrying.
			select {
			case <-time.After(backoff):
			case <-q.ctx.Done():
				log.Printf("[worker %d] job %s retry cancelled (shutdown)", workerID, job.ID)
				return
			}
		}
	}

	q.metrics.recordFailure()
	log.Printf("[worker %d] job %s exhausted all %d attempt(s)", workerID, job.ID, attempts)
}

// runOnce executes the job handler exactly once, with panic recovery and timeout.
func (q *Queue) runOnce(workerID int, job Job, attempt int) (err error) {
	// Recover from panics. A panicking handler must not crash the worker.
	// We convert the panic value into an error and return it as a permanent
	// failure — panics are almost always programming errors, not transient
	// conditions, so retrying them is pointless.
	defer func() {
		if r := recover(); r != nil {
			err = fmt.Errorf("%w: panic in job %s: %v", ErrPermanent, job.ID, r)
			log.Printf("[worker %d] recovered panic in job %s (attempt %d): %v",
				workerID, job.ID, attempt, r)
		}
	}()

	// Derive the execution context.
	// If the job has a timeout, wrap the queue context with a deadline.
	// If not, use the queue context directly — this means the job will
	// be cancelled when the queue shuts down.
	ctx := q.ctx
	if job.Timeout > 0 {
		var cancelFn context.CancelFunc
		ctx, cancelFn = context.WithTimeout(q.ctx, job.Timeout)
		defer cancelFn()
	}

	q.metrics.recordStart()
	defer q.metrics.recordEnd()

	return job.Handler(ctx, job.Payload)
}

// exponentialBackoff returns a duration that doubles with each attempt,
// capped at 30 seconds, with random jitter to prevent thundering herd.
// If 100 jobs all fail and retry at the same time, they all hit the
// downstream system simultaneously, causing another wave of failures.
// Jitter spreads them out.
func exponentialBackoff(attempt int) time.Duration {
	base := time.Duration(1<<uint(attempt-1)) * time.Second // 1s, 2s, 4s, 8s...
	if base > 30*time.Second {
		base = 30 * time.Second
	}
	// Add up to 20% jitter
	jitter := time.Duration(rand.Int63n(int64(base / 5)))
	return base + jitter
}

The defer recover() pattern is standard Go for converting panics into errors. The important subtlety here is that we wrap the panic as ErrPermanent. A panic in a job handler is almost always a programming bug — retrying it will produce the same panic. Marking it permanent prevents the queue from uselessly retrying a broken handler.

The context hierarchy is also intentional: every job's context is derived from the queue's root context. When the queue shuts down and calls cancel(), every in-progress job's context is cancelled simultaneously. Any context-aware operation inside the handler — http.NewRequestWithContext, database/sql queries, time.After in a select — will see the cancellation and return early.

Enqueue and Backpressure

// Enqueue submits a job to the queue.
//
// It returns immediately if the buffer has space.
// It blocks if the buffer is full — this is backpressure.
// It returns an error if the queue is shut down or the context is cancelled.
//
// The ctx parameter is the *caller's* context, not the job's execution context.
// It lets the caller bound how long they are willing to wait for buffer space.
// Pass context.Background() if you want to block indefinitely.
func (q *Queue) Enqueue(ctx context.Context, job Job) error {
	select {
	case q.jobs <- job:
		q.metrics.recordEnqueue()
		return nil
	case <-ctx.Done():
		return fmt.Errorf("enqueue cancelled: %w", ctx.Err())
	case <-q.ctx.Done():
		return fmt.Errorf("queue is shut down")
	}
}

The select statement here is doing real work. It simultaneously watches three channels:

q.jobs <- job: succeeds if the buffer has space
ctx.Done(): the caller's context expired (e.g., the HTTP request was cancelled)
q.ctx.Done(): the queue itself is shutting down

Whichever case is ready first wins. If the buffer is full and neither context is cancelled, Enqueue blocks. This is backpressure working correctly — the producer is forced to wait when the queue cannot keep up.

This is the tradeoff: blocking Enqueue gives you natural backpressure but requires callers to hold a thread while waiting. An alternative is a non-blocking enqueue that returns an error immediately when the buffer is full, pushing the retry responsibility to the caller. Which is right depends on your use case: blocking is appropriate for background workers; non-blocking is better for request handlers where you cannot afford to wait.

Graceful Shutdown

Shutdown has to satisfy two requirements that pull in opposite directions:

Stop accepting new jobs immediately
Let in-flight jobs finish before returning

// Shutdown stops the queue gracefully.
//
// It stops accepting new jobs, waits for all in-flight jobs to complete,
// then returns. Any jobs still in the buffer but not yet picked up by a
// worker are drained and processed before shutdown completes.
//
// After Shutdown returns, the queue must not be used again.
func (q *Queue) Shutdown() {
	// Signal all workers that the queue is going down.
	// This cancels the root context, which propagates into every
	// in-progress job's context. Jobs that respect context cancellation
	// will exit early.
	q.cancel()

	// Closing the channel signals workers to exit once the buffer is drained.
	// Workers use `for job := range q.jobs`, which exits when the channel
	// is closed AND empty. Closing it now means workers will finish all
	// buffered jobs, then exit.
	close(q.jobs)

	// Block until all workers have exited.
	q.wg.Wait()
}

There is a subtlety here: we close the channel after calling cancel(). The cancel propagates into in-flight handlers immediately. The close tells workers to stop taking new jobs once the buffer is empty. Combined, this means: finish what you have, stop early if you can (via context), then exit.

One important constraint: you must not send to a closed channel. In Go, sending to a closed channel panics. After close(q.jobs), any call to Enqueue would panic. The q.ctx.Done() case in Enqueue handles this — after cancel() is called, Enqueue returns an error instead of attempting to send to the closed channel.

Metrics: Knowing What Is Happening

A queue without observability is a black box. You need to know how many jobs are waiting, how many are in progress, what the failure rate is, and whether the queue is keeping up with the arrival rate.

// Metrics holds atomic counters for queue health.
// All fields use atomic operations — no mutex needed,
// and no risk of a slow metrics read blocking job processing.
type Metrics struct {
	enqueued   atomic.Int64
	succeeded  atomic.Int64
	failed     atomic.Int64
	inFlight   atomic.Int64
}

func newMetrics() *Metrics { return &Metrics{} }

func (m *Metrics) recordEnqueue()  { m.enqueued.Add(1) }
func (m *Metrics) recordSuccess()  { m.succeeded.Add(1) }
func (m *Metrics) recordFailure()  { m.failed.Add(1) }
func (m *Metrics) recordStart()    { m.inFlight.Add(1) }
func (m *Metrics) recordEnd()      { m.inFlight.Add(-1) }

// Snapshot returns a consistent view of current metrics.
type MetricsSnapshot struct {
	Enqueued  int64
	Succeeded int64
	Failed    int64
	InFlight  int64
	Pending   int64 // jobs in buffer, not yet picked up
}

func (q *Queue) Metrics() MetricsSnapshot {
	return MetricsSnapshot{
		Enqueued:  q.metrics.enqueued.Load(),
		Succeeded: q.metrics.succeeded.Load(),
		Failed:    q.metrics.failed.Load(),
		InFlight:  q.metrics.inFlight.Load(),
		Pending:   int64(len(q.jobs)),
	}
}

atomic.Int64 (from sync/atomic) performs reads and writes as a single uninterruptible CPU instruction. No mutex, no blocking, no goroutine coordination — just a fast counter that any number of goroutines can update simultaneously. The tradeoff: you can read each counter individually but cannot take a consistent snapshot of all counters atomically. MetricsSnapshot accepts this — the numbers are always slightly stale, which is fine for observability. You do not need microsecond-perfect metrics; you need a reasonable view of queue health.

len(q.jobs) returns the number of items currently in the channel buffer. This is the Pending count — jobs that have been enqueued but not yet picked up by a worker. If this number is consistently at buffer capacity, you need more workers or a larger buffer.

Putting It All Together

package main

import (
	"context"
	"fmt"
	"log"
	"net/http"
	"time"

	"yourmodule/queue"
)

func main() {
	// 10 workers, buffer for 100 pending jobs.
	// For I/O-heavy workloads (HTTP calls, DB writes), more workers
	// than CPU cores is fine — they spend most of their time waiting.
	q := queue.New(10, 100)
	q.Start()

	// Expose metrics over HTTP so you can watch the queue in production.
	http.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
		m := q.Metrics()
		fmt.Fprintf(w,
			"enqueued=%d succeeded=%d failed=%d in_flight=%d pending=%d\n",
			m.Enqueued, m.Succeeded, m.Failed, m.InFlight, m.Pending,
		)
	})
	go http.ListenAndServe(":9090", nil)

	// Simulate job submission from an API handler.
	for i := 0; i < 50; i++ {
		jobID := queue.JobID(fmt.Sprintf("email-%d", i))
		userID := i

		err := q.Enqueue(context.Background(), queue.Job{
			ID:       jobID,
			Name:     "send-welcome-email",
			Payload:  userID,
			MaxRetry: 3,
			Timeout:  5 * time.Second,
			Handler: func(ctx context.Context, payload any) error {
				id := payload.(int)
				// Simulate sending an email.
				log.Printf("sending email to user %d", id)
				select {
				case <-time.After(100 * time.Millisecond):
					return nil
				case <-ctx.Done():
					return ctx.Err()
				}
			},
		})
		if err != nil {
			log.Printf("enqueue failed: %v", err)
		}
	}

	// In production, you would block on a signal here:
	// sig := make(chan os.Signal, 1)
	// signal.Notify(sig, syscall.SIGTERM, syscall.SIGINT)
	// <-sig

	time.Sleep(3 * time.Second)

	log.Println("shutting down...")
	q.Shutdown()

	m := q.Metrics()
	log.Printf("final metrics: enqueued=%d succeeded=%d failed=%d",
		m.Enqueued, m.Succeeded, m.Failed)
}

The Tradeoffs, Summarized

Every design decision in this queue involved a tradeoff. Here is each one made explicit.

Fixed worker count vs. dynamic scaling. We chose fixed. Dynamic scaling — spinning up extra workers during traffic spikes — adds complexity: you need a high-water mark, a shrink timer, and synchronization around changing the worker count. For most workloads, choosing the right fixed size upfront and tuning it is simpler and more predictable. If you genuinely need elasticity, look at semaphore-based rate limiting instead of a pool.

Buffered channel vs. explicit queue struct. A buffered channel gives you a thread-safe FIFO queue backed by the Go runtime, for free. The cost: you cannot inspect or reorder the queue, you cannot persist it across process restarts, and you cannot implement priority without multiple channels. If you need persistence or priority, a channel is the wrong primitive — use a database-backed queue (Redis lists, PostgreSQL SKIP LOCKED) instead.

Blocking Enqueue vs. non-blocking. Blocking Enqueue propagates backpressure to the caller, which is correct behavior: if the queue is full, the caller should slow down. Non-blocking Enqueue returns an error immediately and lets the caller decide what to do. Blocking is the right default for internal background workers. Non-blocking is the right default for request handlers where you cannot afford to wait.

Panic → permanent failure. We convert panics into permanent failures rather than letting the worker crash. This is correct because panics are almost always programming errors, not transient conditions. Retrying a panic will produce the same panic. The right response is to log it, discard the job, and let the worker continue with the next job.

Exponential backoff with jitter. Plain exponential backoff without jitter causes the thundering herd problem: if 1,000 jobs all fail at the same moment and retry after exactly 2 seconds, they all hit the downstream system at the same moment, causing another wave of failures, which causes another synchronized retry, and so on. Jitter desynchronizes retries. Always add jitter to exponential backoff.

What This Queue Does Not Do

This implementation is complete for many production workloads, but it has known limitations worth being explicit about.

No persistence. If the process crashes, all buffered jobs are lost. For jobs that cannot be lost — financial transactions, critical notifications — you need a persistent backend. Redis Streams, PostgreSQL with SKIP LOCKED, or a dedicated message broker are the standard answers.

No priority. All jobs are treated equally. If you need high-priority jobs to jump the queue, the standard Go approach is two channels — a high-priority and a low-priority — with a select that always drains the high-priority channel first.

No distributed coordination. This queue lives in one process. For distributed job processing across multiple nodes, you need a shared broker — the queue becomes a client library that reads from and writes to the external system, not an in-process channel.

These are not bugs. They are the scope boundary. An in-process queue with a channel backend is the right tool for decoupling work within a single service. When you need to cross process or machine boundaries, the right tool is different.

Conclusion

A job queue is a concurrency problem. The channel handles the synchronization. The worker pool bounds the parallelism. The WaitGroup coordinates the shutdown. The context propagates cancellation. Each Go primitive does one job, and the queue emerges from composing them correctly.

The concurrency concepts here — fan-out, backpressure, graceful shutdown, atomic metrics — show up in almost every backend system at scale. A job queue is one of the cleaner contexts to study them in because the requirements are concrete and the failure modes are visible.

The full implementation is available on GitHub.