NATS Messaging: Complete Guide to JetStream and Distributed Systems

The cloud-native messaging system that powers microservices, IoT, and event-driven architectures. Learn NATS Core, JetStream persistence, and production deployment patterns.

Table of Contents

1. What Is NATS?

NATS is a high-performance messaging system designed for cloud-native applications, microservices, and distributed systems. Originally created by Derek Collison at VMware in 2010, NATS has evolved from a simple pub/sub system into a comprehensive messaging platform that powers some of the world's largest distributed architectures.

The name "NATS" doesn't stand for anything—it's a recursive acronym in the tradition of GNU ("GNU's Not Unix"). What started as an internal project at VMware became a Cloud Native Computing Foundation (CNCF) incubating project in 2018, cementing its place in the cloud-native ecosystem.

Core Design Principles

NATS was built with specific principles that differentiate it from traditional message brokers:

Simplicity: NATS has a simple, text-based protocol that's easy to understand and implement. The server is a single binary with no external dependencies. You can be up and running in minutes, not hours.

Performance: Written in Go, NATS is designed for speed. It can handle millions of messages per second with minimal latency. The server uses efficient memory management and lock-free data structures to maximize throughput.

Availability: NATS uses a "always on" philosophy. Messages are delivered to available subscribers; if no subscribers exist, messages are dropped. This prevents resource exhaustion and ensures the system remains responsive.

Scalability: NATS scales horizontally through clustering. You can add nodes to handle more connections and throughput. The protocol is designed to minimize chatter between nodes, keeping cluster overhead low.

Security: Modern security features including TLS, authentication (token, username/password, NKEYS, JWT), and authorization are built-in. NATS supports decentralized security through JWTs and account-based isolation.

Messaging Patterns

NATS supports multiple messaging patterns:

Publish-Subscribe (Pub/Sub): Publishers send messages to subjects; all subscribers receive a copy. This enables broadcast patterns and event notification systems.

Request-Reply: A synchronous pattern where a client sends a request and waits for a response. NATS handles the reply addressing automatically through inbox subjects.

Queue Groups: Multiple subscribers form a queue group; messages are distributed among them. This provides load balancing and horizontal scaling of consumers.

Subject Hierarchies: Subjects use dot notation (e.g., orders.us.east) with wildcards for flexible routing. * matches a single token; > matches one or more tokens.

Subject Examples

orders.*.created matches orders.us.created and orders.eu.created but not orders.us.east.created.

orders.> matches orders.us, orders.us.east, and orders.us.east.nyc.

2. NATS Core vs JetStream

NATS comes in two flavors: Core NATS and JetStream. Understanding the difference is crucial for choosing the right solution.

NATS Core

Core NATS is the original, in-memory messaging system. It's designed for speed and simplicity:

Core NATS is perfect for real-time data, telemetry, and scenarios where losing occasional messages is acceptable.

JetStream

JetStream is NATS's persistence layer, added to support enterprise messaging patterns:

JetStream adds the durability and reliability needed for event sourcing, CQRS, and mission-critical messaging.

Feature Core NATS JetStream
Persistence None (in-memory) Disk-based with replication
Delivery guarantee At-most-once At-least-once
Message replay No Yes (by time or sequence)
Consumer durability Ephemeral only Durable with state
Retention policies N/A Limits, interest, work queue
Use case Real-time, telemetry Events, logs, commands

3. Architecture and Topology

Single Server

The simplest NATS deployment is a single server:

# Start NATS server
nats-server

# With JetStream enabled
nats-server -js

This is suitable for development and small deployments. The server handles all connections, routing, and (with JetStream) persistence.

Cluster Mode

For production, run NATS in cluster mode:

# Server 1
nats-server --cluster_name my_cluster \
    --cluster listen:0.0.0.0:6222 \
    --routes nats://nats-2:6222,nats://nats-3:6222

# Server 2
nats-server --cluster_name my_cluster \
    --cluster listen:0.0.0.0:6222 \
    --routes nats://nats-1:6222,nats://nats-3:6222

# Server 3
nats-server --cluster_name my_cluster \
    --cluster listen:0.0.0.0:6222 \
    --routes nats://nats-1:6222,nats://nats-2:6222

In cluster mode:

Superclusters

For geographically distributed deployments, NATS supports superclusters—federated clusters connected through gateway connections:

┌──────────────┐      Gateway      ┌──────────────┐
│   Cluster    │ <────────────────> │   Cluster    │
│   (US-East)  │      (TLS)        │   (EU-West)  │
└──────────────┘                   └──────────────┘
       │                                  │
       │ Gateway                          │ Gateway
       ▼                                  ▼
┌──────────────┐                   ┌──────────────┐
│   Cluster    │ <────────────────> │   Cluster    │
│   (US-West)  │                   │   (APAC)     │
└──────────────┘                   └──────────────┘

Superclusters enable:

Leaf Nodes

Leaf nodes extend NATS to edge locations. They're lightweight NATS servers that connect to a cluster or supercluster:

┌─────────────────────────────────────┐
│           NATS Cluster              │
│  ┌─────────┐      ┌─────────┐       │
│  │ Server  │<────>│ Server  │       │
│  │    1    │      │    2    │       │
│  └────┬────┘      └────┬────┘       │
│       │                │             │
│       └────────────────┘             │
│              │                       │
└──────────────┼───────────────────────┘
               │ Leaf Node Connection
       ┌───────┴───────┐
       ▼               ▼
┌─────────────┐ ┌─────────────┐
│  Leaf Node  │ │  Leaf Node  │
│  (Factory)  │ │   (Store)   │
└─────────────┘ └─────────────┘

Leaf nodes are ideal for:

4. Getting Started with NATS

Installation

# macOS
brew install nats-server nats

# Linux (using official binaries)
curl -sf https://get-nats.io | sh

# Docker
docker run -p 4222:4222 nats:latest -js

Basic Pub/Sub in Go

package main

import (
    "fmt"
    "log"
    "time"

    "github.com/nats-io/nats.go"
)

func main() {
    // Connect to NATS
    nc, err := nats.Connect(nats.DefaultURL)
    if err != nil {
        log.Fatal(err)
    }
    defer nc.Close()

    // Subscribe to subject
    sub, err := nc.Subscribe("updates", func(m *nats.Msg) {
        fmt.Printf("Received: %s\n", string(m.Data))
    })
    if err != nil {
        log.Fatal(err)
    }
    defer sub.Unsubscribe()

    // Publish messages
    for i := 0; i < 5; i++ {
        msg := fmt.Sprintf("Message %d", i)
        nc.Publish("updates", []byte(msg))
        time.Sleep(time.Second)
    }

    time.Sleep(2 * time.Second)
}

Request-Reply Pattern

// Server (responder)
nc.Subscribe("help.request", func(m *nats.Msg) {
    // Process request
    response := fmt.Sprintf("Help for: %s", string(m.Data))
    
    // Reply to the reply subject (inbox)
    m.Respond([]byte(response))
})

// Client (requestor)
msg, err := nc.Request("help.request", []byte("How do I..."), 5*time.Second)
if err != nil {
    log.Fatal("Request failed:", err)
}
fmt.Printf("Response: %s\n", string(msg.Data))

Queue Groups

// Multiple subscribers in the same queue group
// Messages are distributed among them (load balancing)

// Worker 1
nc.QueueSubscribe("tasks", "workers", func(m *nats.Msg) {
    fmt.Printf("Worker 1 processing: %s\n", string(m.Data))
})

// Worker 2
nc.QueueSubscribe("tasks", "workers", func(m *nats.Msg) {
    fmt.Printf("Worker 2 processing: %s\n", string(m.Data))
})

// Worker 3
nc.QueueSubscribe("tasks", "workers", func(m *nats.Msg) {
    fmt.Printf("Worker 3 processing: %s\n", string(m.Data))
})

// Messages sent to "tasks" are distributed round-robin

Wildcards

// Subscribe to all US orders
nc.Subscribe("orders.us.*", func(m *nats.Msg) {
    // Matches: orders.us.east, orders.us.west, orders.us.central
    // Does NOT match: orders.us.east.nyc
})

// Subscribe to all orders recursively
nc.Subscribe("orders.>", func(m *nats.Msg) {
    // Matches: orders.us, orders.us.east, orders.us.east.nyc
    // Matches: orders.eu, orders.eu.germany, etc.
})

// Combine wildcards
nc.Subscribe("*.orders.>", func(m *nats.Msg) {
    // Matches: us.orders.created, eu.orders.updated.processed
})

5. JetStream Deep Dive

Streams

A stream is a persistent message log. Messages are appended and can be replayed:

// Create a stream
js, _ := nc.JetStream()

stream, err := js.AddStream(&nats.StreamConfig{
    Name:     "ORDERS",
    Subjects: []string{"orders.*"},
    Storage:  nats.FileStorage,    // or MemoryStorage
    Replicas: 3,                   // Cluster size
    Retention: nats.LimitsPolicy, // Discard old messages
    MaxMsgs:  1000000,            // Keep last 1M messages
    MaxBytes: 10 * 1024 * 1024 * 1024, // 10GB
})
if err != nil {
    log.Fatal(err)
}

// Publish to stream
ack, err := js.Publish("orders.us", []byte(`{"id": "123", "total": 99.99}`))
if err != nil {
    log.Fatal(err)
}
fmt.Printf("Published to stream: %s, sequence: %d\n", ack.Stream, ack.Sequence)

Consumers

Consumers read from streams. They can be ephemeral or durable:

// Create a durable consumer
sub, err := js.Subscribe("ORDERS", func(m *nats.Msg) {
    // Process message
    fmt.Printf("Order: %s\n", string(m.Data))
    
    // Acknowledge successful processing
    m.Ack()
}, nats.Durable("order-processor"),     // Durable name
   nats.ManualAck(),                   // Manual acknowledgment
   nats.MaxDeliver(3),                 // Retry 3 times
   nats.AckWait(30*time.Second))       // Wait 30s for ack
if err != nil {
    log.Fatal(err)
}

// Create a pull consumer (batch processing)
sub, err = js.PullSubscribe("ORDERS", "batch-processor")
if err != nil {
    log.Fatal(err)
}

// Fetch batch of messages
msgs, err := sub.Fetch(100)
for _, msg := range msgs {
    // Process
    msg.Ack()
}

Acknowledgment Policies

Policy Behavior Use Case
AckExplicit Client must explicitly ack each message Reliable processing
AckAll Acking message N acks all up to N Batch processing
AckNone No acknowledgment needed Fire-and-forget

Replay Policies

// Replay from the beginning
sub, _ := js.Subscribe("ORDERS", handler,
    nats.ReplayInstant(),  // Default: deliver as fast as possible
    nats.DeliverAll())     // Start from first message

// Replay from specific time
sub, _ := js.Subscribe("ORDERS", handler,
    nats.DeliverByStartTime(time.Now().Add(-24*time.Hour)))

// Replay from specific sequence
sub, _ := js.Subscribe("ORDERS", handler,
    nats.StartSequence(1000))

// Only new messages (default for new consumers)
sub, _ := js.Subscribe("ORDERS", handler,
    nats.DeliverNew())

Retention Policies

// LimitsPolicy: Keep messages until limits reached (default)
js.AddStream(&nats.StreamConfig{
    Name:      "EVENTS",
    Retention: nats.LimitsPolicy,
    MaxMsgs:   1000000,
    MaxBytes:  1 * 1024 * 1024 * 1024, // 1GB
})

// InterestPolicy: Delete when all consumers have acked
js.AddStream(&nats.StreamConfig{
    Name:      "COMMANDS",
    Retention: nats.InterestPolicy,
})

// WorkQueuePolicy: Delete after successful processing
// Only one consumer can process each message
js.AddStream(&nats.StreamConfig{
    Name:      "JOBS",
    Retention: nats.WorkQueuePolicy,
})

6. Real-World Use Cases

Microservices Communication

NATS serves as the nervous system for microservices:

// Service Discovery
nc.Subscribe("discovery.register", func(m *nats.Msg) {
    var svc ServiceInfo
    json.Unmarshal(m.Data, &svc)
    registry.Register(svc)
})

// Load Balanced RPC
nc.QueueSubscribe("payments.process", "payment-workers", func(m *nats.Msg) {
    var payment PaymentRequest
    json.Unmarshal(m.Data, &payment)
    
    result := processPayment(payment)
    response, _ := json.Marshal(result)
    m.Respond(response)
})

// Event Broadcasting
nc.Publish("inventory.updated", eventData)
// Multiple services react: cache invalidation, search index update, analytics

IoT and Telemetry

NATS handles millions of IoT devices:

// Device telemetry (Core NATS - fire and forget)
deviceID := "sensor-001"
nc.Publish(fmt.Sprintf("telemetry.%s", deviceID), []byte(`{
    "temperature": 23.5,
    "humidity": 65,
    "timestamp": "2026-03-17T10:00:00Z"
}`))

// Device commands (Request-Reply)
response, err := nc.Request(
    fmt.Sprintf("commands.%s", deviceID),
    []byte(`{"action": "reboot"}`),
    5*time.Second)

// Alert processing (JetStream for durability)
js.Publish("alerts.critical", []byte(`{
    "device": "sensor-001",
    "alert": "temperature_exceeded",
    "value": 85.2
}`))

Event Sourcing and CQRS

// Command side: Store events
func (s *OrderService) CreateOrder(cmd CreateOrderCommand) error {
    event := OrderCreatedEvent{
        OrderID:   uuid.New(),
        CustomerID: cmd.CustomerID,
        Items:     cmd.Items,
        Timestamp: time.Now(),
    }
    
    data, _ := json.Marshal(event)
    _, err := s.js.Publish("events.order.created", data)
    return err
}

// Query side: Project read models
sub, _ := s.js.Subscribe("events.order.>", func(m *nats.Msg) {
    var event Event
    json.Unmarshal(m.Data, &event)
    
    switch event.Type {
    case "OrderCreated":
        projection.HandleOrderCreated(event)
    case "OrderShipped":
        projection.HandleOrderShipped(event)
    }
    
    m.Ack()
}, nats.Durable("order-projection"))

// Replay to rebuild projections
sub, _ = s.js.Subscribe("events.order.>", handler,
    nats.DeliverAll(),      // Replay all events
    nats.ReplayOriginal()) // Maintain original timing

Distributed Transactions (Saga Pattern)

// Saga orchestrator
type OrderSaga struct {
    steps []SagaStep
}

func (s *OrderSaga) Execute(order Order) error {
    // Step 1: Reserve inventory
    err := s.reserveInventory(order)
    if err != nil {
        return s.compensate(order, 0)
    }
    
    // Step 2: Process payment
    err = s.processPayment(order)
    if err != nil {
        return s.compensate(order, 1)
    }
    
    // Step 3: Ship order
    err = s.shipOrder(order)
    if err != nil {
        return s.compensate(order, 2)
    }
    
    return nil
}

func (s *OrderSaga) compensate(order Order, lastCompleted int) error {
    // Undo completed steps in reverse order
    for i := lastCompleted; i >= 0; i-- {
        s.steps[i].Compensate(order)
    }
    return fmt.Errorf("saga failed at step %d", lastCompleted+1)
}

// Services communicate via NATS for saga coordination
nc.Subscribe("saga.inventory.reserve", handleReserveInventory)
nc.Subscribe("saga.inventory.release", handleReleaseInventory) // Compensation
nc.Subscribe("saga.payment.process", handleProcessPayment)
nc.Subscribe("saga.payment.refund", handleRefundPayment)     // Compensation

7. NATS vs Kafka vs RabbitMQ

Aspect NATS Kafka RabbitMQ
Protocol Text-based, simple Binary (TCP) AMQP
Throughput Millions msg/sec Millions msg/sec Hundreds of thousands
Latency Sub-millisecond Low (ms) Low (ms)
Persistence Optional (JetStream) Always (disk) Optional (queues)
Deployment Single binary ZooKeeper/KRaft + Brokers Single binary or cluster
Resource usage Low High (JVM) Medium
Multi-tenancy Accounts (JWT) Limited Virtual hosts
Geo-replication Superclusters MirrorMaker Federation/Shovel
Best for Cloud-native, IoT, control plane Big data, event sourcing Enterprise messaging

When to Choose Each

Choose NATS when:

Choose Kafka when:

Choose RabbitMQ when:

8. Production Deployment

Docker Compose Setup

# docker-compose.yml
version: '3.8'
services:
  nats-1:
    image: nats:2.10-alpine
    command: >
      --name nats-1
      --cluster_name my_cluster
      --cluster nats://0.0.0.0:6222
      --routes nats://nats-2:6222,nats://nats-3:6222
      --js
      --store_dir /data
      --http_port 8222
    volumes:
      - nats-1-data:/data
    ports:
      - "4222:4222"
      - "8222:8222"
    networks:
      - nats

  nats-2:
    image: nats:2.10-alpine
    command: >
      --name nats-2
      --cluster_name my_cluster
      --cluster nats://0.0.0.0:6222
      --routes nats://nats-1:6222,nats://nats-3:6222
      --js
      --store_dir /data
    volumes:
      - nats-2-data:/data
    networks:
      - nats

  nats-3:
    image: nats:2.10-alpine
    command: >
      --name nats-3
      --cluster_name my_cluster
      --cluster nats://0.0.0.0:6222
      --routes nats://nats-1:6222,nats://nats-2:6222
      --js
      --store_dir /data
    volumes:
      - nats-3-data:/data
    networks:
      - nats

volumes:
  nats-1-data:
  nats-2-data:
  nats-3-data:

networks:
  nats:
    driver: bridge

Kubernetes Deployment

# nats-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nats
spec:
  serviceName: nats
  replicas: 3
  selector:
    matchLabels:
      app: nats
  template:
    metadata:
      labels:
        app: nats
    spec:
      containers:
      - name: nats
        image: nats:2.10-alpine
        command:
        - nats-server
        - --config
        - /etc/nats/nats.conf
        ports:
        - containerPort: 4222
          name: client
        - containerPort: 6222
          name: cluster
        - containerPort: 8222
          name: monitor
        volumeMounts:
        - name: config
          mountPath: /etc/nats
        - name: data
          mountPath: /data
      volumes:
      - name: config
        configMap:
          name: nats-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nats-config
data:
  nats.conf: |
    listen: 0.0.0.0:4222
    http: 0.0.0.0:8222
    
    jetstream {
      store_dir: /data
      max_memory_store: 1GB
      max_file_store: 10GB
    }
    
    cluster {
      name: my_cluster
      listen: 0.0.0.0:6222
      routes: [
        nats://nats-0.nats:6222
        nats://nats-1.nats:6222
        nats://nats-2.nats:6222
      ]
    }
    
    tls {
      cert_file: /etc/nats/certs/server.crt
      key_file: /etc/nats/certs/server.key
      ca_file: /etc/nats/certs/ca.crt
    }

---
apiVersion: v1
kind: Service
metadata:
  name: nats
spec:
  selector:
    app: nats
  ports:
  - name: client
    port: 4222
    targetPort: 4222
  - name: cluster
    port: 6222
    targetPort: 6222
  clusterIP: None  # Headless for StatefulSet

TLS Configuration

// nats.conf with TLS
tls {
    cert_file: "/etc/nats/certs/server.crt"
    key_file: "/etc/nats/certs/server.key"
    ca_file: "/etc/nats/certs/ca.crt"
    verify: true
    verify_and_map: true  # Map certificate to user
}

// Client connection with TLS
nc, err := nats.Connect("tls://nats.example.com:4222",
    nats.ClientCert("/certs/client.crt", "/certs/client.key"),
    nats.RootCAs("/certs/ca.crt"),
    nats.Secure())
if err != nil {
    log.Fatal(err)
}

// mTLS with client certificate authentication
nc, err := nats.Connect("tls://nats.example.com:4222",
    nats.ClientCert("/certs/client.crt", "/certs/client.key"),
    nats.RootCAs("/certs/ca.crt"),
    nats.Secure(),
    nats.UserCredentials("/creds/user.creds"))

9. Monitoring and Observability

Built-in Monitoring

NATS exposes metrics via HTTP endpoint:

# Enable monitoring
nats-server -m 8222

# Access metrics
curl http://localhost:8222/varz
curl http://localhost:8222/connz
curl http://localhost:8222/routez
curl http://localhost:8222/subsz

# JetStream specific
curl http://localhost:8222/jsz

Prometheus Integration

# prometheus.yml
scrape_configs:
  - job_name: 'nats'
    static_configs:
      - targets: ['nats:8222']
    metrics_path: /metrics

# Using NATS Prometheus exporter
# https://github.com/nats-io/prometheus-nats-exporter
prometheus-nats-exporter -varz -connz -routez -subz -jsz http://nats:8222

Key Metrics to Monitor

Metric Description Alert Threshold
nats_connections Active client connections > 10000
nats_subscriptions Total subscriptions > 100000
nats_in_msgs Messages received Baseline + 50%
nats_slow_consumers Dropped messages (slow consumers) > 0
jetstream_storage_used_bytes JetStream disk usage > 80%
jetstream_consumer_pending Pending messages per consumer > 10000

Distributed Tracing

// Instrument NATS with OpenTelemetry
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/propagation"
)

// Publish with trace context
func publishWithTrace(ctx context.Context, nc *nats.Conn, subject string, data []byte) error {
    tracer := otel.Tracer("nats-publisher")
    _, span := tracer.Start(ctx, "nats.publish")
    defer span.End()
    
    // Inject trace context into message headers
    carrier := propagation.MapCarrier{}
    otel.GetTextMapPropagator().Inject(ctx, carrier)
    
    msg := &nats.Msg{
        Subject: subject,
        Data:    data,
        Header:  nats.Header(carrier),
    }
    
    return nc.PublishMsg(msg)
}

// Subscribe and extract trace context
nc.Subscribe("orders.>", func(m *nats.Msg) {
    carrier := propagation.MapCarrier(m.Header)
    ctx := otel.GetTextMapPropagator().Extract(context.Background(), carrier)
    
    tracer := otel.Tracer("nats-subscriber")
    _, span := tracer.Start(ctx, "nats.process")
    defer span.End()
    
    // Process message with trace context
    processOrder(m.Data)
})
NATS: The Cloud-Native Choice

NATS combines simplicity, performance, and flexibility in a way that makes it ideal for modern distributed systems. Whether you need fire-and-forget messaging for IoT or durable streams for event sourcing, NATS delivers with minimal operational overhead. Its cloud-native design, from single binary deployment to supercluster federation, makes it a future-proof choice for your messaging infrastructure.

Resources