Temporal.io: Durable Execution for Reliable Distributed Workflows

Build resilient distributed systems that survive process crashes, network failures, and data center outages. Complete guide to Temporal workflows, activities, and production patterns.

The Problem: Building Reliable Distributed Systems

Imagine you're building an e-commerce platform. When a customer places an order, you need to:

  1. Validate the order and check inventory
  2. Reserve payment from the customer's credit card
  3. Create a shipment with the logistics provider
  4. Send confirmation emails
  5. Update analytics and inventory systems

This sounds straightforward. But what happens when:

⚠️ The Fallacies of Distributed Computing

The network is reliable. Latency is zero. Bandwidth is infinite. The network is secure. Topology doesn't change. There is one administrator. Transport cost is zero. The network is homogeneous. All of these are false.

Traditional approaches—saga orchestration with message queues, explicit state machines, or hand-rolled retry logic—quickly become unmaintainable. You write more code handling failure than business logic.

What is Temporal?

Temporal is an open-source, stateful orchestration platform for building reliable distributed systems. It was created by the team behind AWS Simple Workflow Service and open-sourced in 2019. Companies like Stripe, Netflix, Snap, Datadog, and Coinbase use Temporal in production.

The Core Promise: Durable Execution

Temporal's key innovation is durable execution. Your workflow code runs in a way that automatically survives:

💡 Durable Execution Explained

Temporal automatically records every step of your workflow in an event log. If your worker crashes after completing step 3 of a 10-step workflow, Temporal resumes execution from step 4 when a new worker comes online. From your code's perspective, it's as if nothing happened.

Architecture Overview

Your Application
Temporal Client
Temporal Server
↓ distributes work to
Worker 1
Worker 2
Worker N
Workers execute Activities (external operations). Temporal Server manages state.

Core Concepts

Workflows

Workflows are the orchestration logic. They define what needs to happen, not how. Workflows are:

Activities

Activities are the operations workflows coordinate. They're where the actual work happens:

📋 The Golden Rule

Workflows must be deterministic. They can only call Activities, not external services directly.

Activities can do anything—call APIs, use randomness, get timestamps, etc.

Your First Workflow

Setup

Terminal - Install Temporal CLI
# macOS/Linux
curl -sSf https://temporal.download/cli.sh | sh

# Verify
temporal --version
# temporal version 1.2.0
Terminal - Start Local Development Server
# Starts Temporal Server, Web UI, and Elasticsearch
temporal server start-dev --ui-port 8080

# Access:
# - Temporal Server: localhost:7233
# - Web UI: http://localhost:8080
# - gRPC: localhost:7233

Go Example: Order Processing Workflow

workflows/order_workflow.go
package workflows

import (
    "time"
    
    "go.temporal.io/sdk/workflow"
)

// OrderWorkflowInput defines the workflow input
type OrderWorkflowInput struct {
    OrderID    string
    CustomerID string
    Items      []OrderItem
}

type OrderItem struct {
    SKU      string
    Quantity int
    Price    float64
}

// OrderWorkflowResult contains the workflow output
type OrderWorkflowResult struct {
    OrderID       string
    Status        string
    Total         float64
    ShipmentID    string
    CompletedAt   time.Time
}

// OrderWorkflow is the main workflow definition
func OrderWorkflow(ctx workflow.Context, input OrderWorkflowInput) (*OrderWorkflowResult, error) {
    // Set activity options - timeout and retry policy
    ao := workflow.ActivityOptions{
        StartToCloseTimeout: time.Minute,
        RetryPolicy: &temporal.RetryPolicy{
            InitialInterval:    time.Second,
            BackoffCoefficient: 2.0,
            MaximumInterval:    time.Minute,
            MaximumAttempts:    5,
        },
    }
    ctx = workflow.WithActivityOptions(ctx, ao)
    
    // Step 1: Validate order and check inventory
    var validationResult ValidationResult
    err := workflow.ExecuteActivity(ctx, ValidateOrder, input).Get(ctx, &validationResult)
    if err != nil {
        return nil, err
    }
    
    if !validationResult.Valid {
        return &OrderWorkflowResult{
            OrderID: input.OrderID,
            Status:  "REJECTED",
        }, nil
    }
    
    // Step 2: Process payment
    paymentInput := PaymentInput{
        CustomerID: input.CustomerID,
        Amount:     validationResult.Total,
    }
    
    var paymentResult PaymentResult
    err = workflow.ExecuteActivity(ctx, ProcessPayment, paymentInput).Get(ctx, &paymentResult)
    if err != nil {
        // Payment failed - no need to compensate yet, nothing was charged
        return &OrderWorkflowResult{
            OrderID: input.OrderID,
            Status:  "PAYMENT_FAILED",
        }, nil
    }
    
    // Step 3: Create shipment
    shipmentInput := ShipmentInput{
        OrderID: input.OrderID,
        Items:   input.Items,
    }
    
    var shipmentResult ShipmentResult
    err = workflow.ExecuteActivity(ctx, CreateShipment, shipmentInput).Get(ctx, &shipmentResult)
    if err != nil {
        // Shipment failed - need to refund payment (compensation)
        _ = workflow.ExecuteActivity(ctx, RefundPayment, paymentResult.TransactionID).Get(ctx, nil)
        return &OrderWorkflowResult{
            OrderID: input.OrderID,
            Status:  "SHIPMENT_FAILED",
        }, nil
    }
    
    // Step 4: Send confirmation (fire-and-forget, can fail without affecting order)
    workflow.ExecuteActivity(ctx, SendConfirmation, input.OrderID)
    
    // Return success
    return &OrderWorkflowResult{
        OrderID:     input.OrderID,
        Status:      "COMPLETED",
        Total:       validationResult.Total,
        ShipmentID:  shipmentResult.ShipmentID,
        CompletedAt: workflow.Now(ctx),
    }, nil
}
activities/order_activities.go
package activities

import (
    "context"
    "fmt"
    "time"
)

// ValidateOrder checks inventory and calculates total
func ValidateOrder(ctx context.Context, input OrderWorkflowInput) (*ValidationResult, error) {
    // This is where you'd call your inventory service
    // Can fail, will be retried according to retry policy
    
    total := 0.0
    for _, item := range input.Items {
        // Check inventory via API
        available, err := inventoryClient.CheckStock(item.SKU)
        if err != nil {
            return nil, fmt.Errorf("inventory check failed: %w", err)
        }
        if available < item.Quantity {
            return &ValidationResult{Valid: false}, nil
        }
        total += float64(item.Quantity) * item.Price
    }
    
    return &ValidationResult{
        Valid: true,
        Total: total,
    }, nil
}

// ProcessPayment charges the customer's payment method
func ProcessPayment(ctx context.Context, input PaymentInput) (*PaymentResult, error) {
    // Call payment processor (Stripe, etc.)
    // This activity is non-idempotent by default - Temporal tracks completion
    
    result, err := paymentClient.Charge(input.CustomerID, input.Amount)
    if err != nil {
        return nil, fmt.Errorf("payment failed: %w", err)
    }
    
    return &PaymentResult{
        TransactionID: result.ID,
        Status:        result.Status,
    }, nil
}

// CreateShipment reserves shipping with logistics provider
func CreateShipment(ctx context.Context, input ShipmentInput) (*ShipmentResult, error) {
    // Call shipping API (FedEx, UPS, etc.)
    shipment, err := shippingClient.CreateShipment(input.Items)
    if err != nil {
        return nil, fmt.Errorf("shipment creation failed: %w", err)
    }
    
    return &ShipmentResult{
        ShipmentID: shipment.ID,
        LabelURL:   shipment.LabelURL,
    }, nil
}

// RefundPayment compensates failed transactions
func RefundPayment(ctx context.Context, transactionID string) error {
    return paymentClient.Refund(transactionID)
}

// SendConfirmation sends email/SMS confirmation
func SendConfirmation(ctx context.Context, orderID string) error {
    // Fire-and-forget activity - failures don't block workflow
    return emailClient.SendOrderConfirmation(orderID)
}
main.go - Worker and Client Setup
package main

import (
    "log"
    
    "go.temporal.io/sdk/client"
    "go.temporal.io/sdk/worker"
    
    "myapp/activities"
    "myapp/workflows"
)

func main() {
    // Create Temporal client
    c, err := client.New(client.Options{
        HostPort: "localhost:7233",
    })
    if err != nil {
        log.Fatalln("Unable to create Temporal client:", err)
    }
    defer c.Close()
    
    // Start workflow
    workflowOptions := client.StartWorkflowOptions{
        ID:        "order-12345",
        TaskQueue: "order-queue",
    }
    
    input := workflows.OrderWorkflowInput{
        OrderID:    "order-12345",
        CustomerID: "cust-98765",
        Items: []workflows.OrderItem{
            {SKU: "WIDGET-001", Quantity: 2, Price: 29.99},
        },
    }
    
    we, err := c.ExecuteWorkflow(context.Background(), workflowOptions, workflows.OrderWorkflow, input)
    if err != nil {
        log.Fatalln("Unable to execute workflow:", err)
    }
    
    log.Println("Started workflow:", we.GetID(), "RunID:", we.GetRunID())
    
    // Block until complete (in production, you'd get result asynchronously)
    var result workflows.OrderWorkflowResult
    if err := we.Get(context.Background(), &result); err != nil {
        log.Fatalln("Workflow failed:", err)
    }
    
    log.Printf("Workflow completed: %+v\n", result)
}

// Worker setup (typically runs in separate process)
func startWorker() {
    c, err := client.New(client.Options{
        HostPort: "localhost:7233",
    })
    if err != nil {
        log.Fatalln("Unable to create Temporal client:", err)
    }
    defer c.Close()
    
    // Create worker
    w := worker.New(c, "order-queue", worker.Options{})
    
    // Register workflow and activities
    w.RegisterWorkflow(workflows.OrderWorkflow)
    w.RegisterActivity(activities.ValidateOrder)
    w.RegisterActivity(activities.ProcessPayment)
    w.RegisterActivity(activities.CreateShipment)
    w.RegisterActivity(activities.RefundPayment)
    w.RegisterActivity(activities.SendConfirmation)
    
    // Start worker
    if err := w.Run(worker.InterruptCh()); err != nil {
        log.Fatalln("Unable to start worker:", err)
    }
}

Workflow Patterns

Pattern 1: Fan-Out/Fan-In (Parallel Execution)

Parallel Activity Execution
func ProcessBatchWorkflow(ctx workflow.Context, orders []Order) error {
    // Execute all activities in parallel
    futures := make([]workflow.Future, len(orders))
    
    for i, order := range orders {
        // Create detached context for parallel execution
        activityCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
            StartToCloseTimeout: time.Minute,
        })
        
        futures[i] = workflow.ExecuteActivity(activityCtx, ProcessOrder, order)
    }
    
    // Wait for all to complete
    for i, future := range futures {
        var result OrderResult
        if err := future.Get(ctx, &result); err != nil {
            log.Printf("Order %d failed: %v\n", i, err)
        }
    }
    
    return nil
}

Pattern 2: Child Workflows

For complex processes, decompose into child workflows:

Parent Workflow with Child Workflows
func OrderFulfillmentWorkflow(ctx workflow.Context, order Order) error {
    // Payment workflow (child)
    paymentFuture := workflow.ExecuteChildWorkflow(
        ctx,
        PaymentWorkflow,
        order.Payment,
        workflow.ChildWorkflowOptions{
            WorkflowID: "payment-" + order.ID,
        },
    )
    
    // Inventory workflow (child, parallel)
    inventoryFuture := workflow.ExecuteChildWorkflow(
        ctx,
        InventoryWorkflow,
        order.Items,
        workflow.ChildWorkflowOptions{
            WorkflowID: "inventory-" + order.ID,
        },
    )
    
    // Wait for both
    var paymentResult PaymentResult
    var inventoryResult InventoryResult
    
    if err := paymentFuture.Get(ctx, &paymentResult); err != nil {
        return err
    }
    
    if err := inventoryFuture.Get(ctx, &inventoryResult); err != nil {
        // Compensate payment
        _ = workflow.ExecuteActivity(ctx, RefundPayment, paymentResult.ID).Get(ctx, nil)
        return err
    }
    
    // Continue with fulfillment...
    return nil
}

Pattern 3: Human-in-the-Loop

Workflow with Approval Signal
func ExpenseApprovalWorkflow(ctx workflow.Context, expense Expense) error {
    // Send notification to approver
    _ = workflow.ExecuteActivity(ctx, NotifyApprover, expense).Get(ctx, nil)
    
    // Set up signal handler for approval
    selector := workflow.NewSelector(ctx)
    var approval ApprovalDecision
    
    // Signal channel
    approvalCh := workflow.GetSignalChannel(ctx, "approval")
    selector.AddReceive(approvalCh, func(ch workflow.ReceiveChannel, more bool) {
        ch.Receive(ctx, &approval)
    })
    
    // Timeout after 7 days
    timer := workflow.NewTimer(ctx, time.Hour*24*7)
    selector.AddFuture(timer, func(f workflow.Future) {
        approval = ApprovalDecision{Approved: false, Reason: "timeout"}
    })
    
    // Wait for either signal or timeout
    selector.Select(ctx)
    
    if approval.Approved {
        return workflow.ExecuteActivity(ctx, ProcessExpense, expense).Get(ctx, nil)
    }
    
    return workflow.ExecuteActivity(ctx, RejectExpense, expense).Get(ctx, nil)
}

// External system sends signal:
// temporalClient.SignalWorkflow(ctx, "expense-workflow-123", "", "approval", ApprovalDecision{Approved: true})

Error Handling & Retries

Temporal provides sophisticated retry and error handling mechanisms:

Retry Policy Options
ao := workflow.ActivityOptions{
    StartToCloseTimeout: 5 * time.Minute,
    
    // Retry policy determines behavior on activity failure
    RetryPolicy: &temporal.RetryPolicy{
        InitialInterval:        time.Second,     // First retry after 1s
        BackoffCoefficient:     2.0,             // Double each time
        MaximumInterval:        time.Minute,     // Cap at 1 minute
        MaximumAttempts:        10,              // Try 10 times total
        NonRetryableErrorTypes: []string{"InvalidCreditCard", "InsufficientFunds"},
    },
}
ctx = workflow.WithActivityOptions(ctx, ao)

Error Types

Error Type Description Retry Behavior
ApplicationError Business logic failure Configurable via NonRetryableErrorTypes
TimeoutError Activity exceeded time limit Retries (if within schedule-to-close)
CanceledError Activity was explicitly canceled No retry
TerminatedError Workflow was terminated No retry (fatal)

The Saga Pattern

Sagas manage long-running transactions across multiple services. When one step fails, compensating actions undo previous steps.

💡 Saga Pattern Explained

A saga is a sequence of local transactions where each service updates its data and publishes an event or calls the next service. If a step fails, compensating transactions are executed to undo the changes.

Saga Implementation
func SagaOrderWorkflow(ctx workflow.Context, order Order) error {
    // Track completed steps for compensation
    var completedSteps []string
    var compensations []func() error
    
    // Helper to execute with compensation tracking
    execute := func(name string, fn, comp interface{}) error {
        err := workflow.ExecuteActivity(ctx, fn, order).Get(ctx, nil)
        if err != nil {
            // Execute compensations in reverse order
            for i := len(compensations) - 1; i >= 0; i-- {
                _ = compensations[i]()
            }
            return err
        }
        completedSteps = append(completedSteps, name)
        compensations = append(compensations, func() error {
            return workflow.ExecuteActivity(ctx, comp, order).Get(ctx, nil)
        })
        return nil
    }
    
    // Step 1: Reserve inventory
    if err := execute("reserve", ReserveInventory, ReleaseInventory); err != nil {
        return err
    }
    
    // Step 2: Process payment
    if err := execute("payment", ProcessPayment, RefundPayment); err != nil {
        return err
    }
    
    // Step 3: Create shipment
    if err := execute("shipment", CreateShipment, CancelShipment); err != nil {
        return err
    }
    
    // Step 4: Send notification (no compensation needed)
    _ = workflow.ExecuteActivity(ctx, SendNotification, order).Get(ctx, nil)
    
    return nil
}
⚠️ Compensation Considerations
  • Compensations can also fail—design idempotent compensations
  • Some operations cannot be truly undone (emails sent, notifications delivered)
  • Consider business implications: partial compensation may leave system in consistent but unexpected state

Schedules & Cron Workflows

Temporal supports scheduled workflows—perfect for recurring jobs:

Cron Workflow Example
// DailyReportWorkflow runs on a schedule
func DailyReportWorkflow(ctx workflow.Context) error {
    // This workflow starts, completes, and Temporal schedules the next run
    
    activityOpts := workflow.ActivityOptions{
        StartToCloseTimeout: 10 * time.Minute,
    }
    ctx = workflow.WithActivityOptions(ctx, activityOpts)
    
    // Generate daily report
    yesterday := workflow.Now(ctx).Add(-24 * time.Hour)
    
    var report Report
    if err := workflow.ExecuteActivity(ctx, GenerateDailyReport, yesterday).Get(ctx, &report); err != nil {
        return err
    }
    
    // Email report
    return workflow.ExecuteActivity(ctx, EmailReport, report).Get(ctx, nil)
}

// Schedule creation via client:
// schedule, err := temporalClient.ScheduleClient().Create(ctx, client.ScheduleOptions{
//     ID: "daily-report",
//     Spec: client.ScheduleSpec{
//         CronExpressions: []string{"0 9 * * *"}, // 9 AM daily
//     },
//     Action: &client.ScheduleWorkflowAction{
//         ID:        "daily-report-workflow",
//         Workflow:  DailyReportWorkflow,
//         TaskQueue: "report-queue",
//     },
// })

Production Deployment

Temporal Server Architecture

Frontend
Frontend
Frontend
↓ gRPC ↓
Matching
History
Worker
↓ SQL ↓
PostgreSQL/MySQL
Elasticsearch

Docker Compose Deployment

docker-compose.yml
version: '3'
services:
  postgresql:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: temporal
      POSTGRES_PASSWORD: temporal
      POSTGRES_DB: temporal
    volumes:
      - postgresql_data:/var/lib/postgresql/data

  elasticsearch:
    image: elasticsearch:8.11.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data

  temporal-server:
    image: temporalio/server:1.22.3
    depends_on:
      - postgresql
      - elasticsearch
    environment:
      - DB=postgresql
      - DB_PORT=5432
      - POSTGRES_SEEDS=postgresql
      - POSTGRES_USER=temporal
      - POSTGRES_PWD=temporal
      - ENABLE_ES=true
      - ES_SEEDS=elasticsearch
      - ES_PORT=9200
    ports:
      - "7233:7233"

  temporal-web:
    image: temporalio/web:2.25.0
    depends_on:
      - temporal-server
    environment:
      - TEMPORAL_GRPC_ENDPOINT=temporal-server:7233
    ports:
      - "8080:8080"

volumes:
  postgresql_data:
  elasticsearch_data:

Kubernetes Deployment

Helm - Temporal on Kubernetes
# Add Temporal Helm repo
helm repo add temporal https://temporalio.github.io/helm-charts
helm repo update

# Install with production values
helm install temporal temporal/temporal \
  --values values-production.yaml \
  --set server.replicaCount=3 \
  --set cassandra.enabled=false \
  --set postgresql.enabled=true \
  --set elasticsearch.enabled=true \
  --set prometheus.enabled=true \
  --set grafana.enabled=true

# values-production.yaml highlights:
server:
  config:
    persistence:
      default:
        driver: "sql"
        sql:
          driver: "postgres12"
          host: "postgresql.temporal.svc"
          port: 5432
          database: "temporal"
          user: "temporal"
          password: "${POSTGRES_PASSWORD}"
      visibility:
        driver: "es"
        es:
          version: "v8"
          url:
            scheme: "http"
            host: "elasticsearch.temporal.svc:9200"
  
  # Resource requests
  resources:
    requests:
      memory: "2Gi"
      cpu: "1000m"
    limits:
      memory: "4Gi"
      cpu: "2000m"

# Deploy workers as separate Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: temporal-worker
spec:
  replicas: 5
  selector:
    matchLabels:
      app: temporal-worker
  template:
    spec:
      containers:
      - name: worker
        image: myapp/temporal-worker:v1.0.0
        env:
        - name: TEMPORAL_HOST
          value: "temporal-frontend.temporal.svc:7233"
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"

Observability & Debugging

Temporal Web UI

The Temporal Web UI provides:

Custom Search Attributes

Adding Searchable Attributes
// In your workflow, upsert searchable attributes
func OrderWorkflow(ctx workflow.Context, input OrderWorkflowInput) (*OrderWorkflowResult, error) {
    // Make order attributes searchable
    workflow.UpsertSearchAttributes(ctx, map[string]interface{}{
        "CustomOrderID":     input.OrderID,
        "CustomCustomerID":  input.CustomerID,
        "CustomOrderStatus": "PENDING",
    })
    
    // ... workflow logic ...
    
    // Update status
    workflow.UpsertSearchAttributes(ctx, map[string]interface{}{
        "CustomOrderStatus": "COMPLETED",
    })
    
    return result, nil
}

// Then search via API:
// tctl workflow list --query "CustomOrderStatus='FAILED' AND CustomCustomerID='cust-123'"
// Or in Web UI: CustomOrderStatus="FAILED"

Metrics and Alerting

Temporal exposes Prometheus metrics:

Prometheus Queries
# Workflow success rate
sum(rate(temporal_workflow_completed_count{status="completed"}[5m]))
/
sum(rate(temporal_workflow_completed_count[5m]))

# Activity failure rate by type
sum(rate(temporal_activity_execution_failed_count[5m])) by (activity_type)

# Worker task backlog
temporal_task_scheduler_pending_tasks{namespace="default"}

# Workflow execution latency (p99)
histogram_quantile(0.99, 
  sum(rate(temporal_workflow_execution_latency_bucket[5m])) by (le)
)

Temporal vs Alternatives

Feature Temporal Cadence AWS Step Functions Camunda
Self-hosted ✅ Yes ✅ Yes ❌ No ✅ Yes
Code-first ✅ Yes ✅ Yes ❌ JSON/YAML ⚠️ BPMN
Multi-language Go, Java, TS, Python, PHP, .NET Go, Java JSON only Java, JavaScript
Max workflow duration Unlimited Unlimited 1 year Unlimited
Community Active, growing Legacy (Uber) AWS only Active
Cloud offering Temporal Cloud None AWS Step Functions Camunda Cloud

Conclusion>

Temporal solves one of the hardest problems in distributed systems: reliably orchestrating complex operations across unreliable infrastructure. By separating business logic from reliability concerns, it lets developers focus on what matters.

Key takeaways:

  • Durable execution means your workflows survive anything short of a database wipe
  • Code-first workflows are more maintainable than JSON/YAML definitions
  • Activity retries and sagas handle failure automatically
  • Observability is built-in—see exactly what happened and when
  • Self-hosted or cloud—choose based on your compliance needs
🎯 When to Use Temporal

Use Temporal when you have multi-step business processes that span services and time. If your code has retry loops, state machines, or saga orchestration—you should probably be using Temporal.

Start small with the local development server, then scale to production with confidence. The investment in learning Temporal pays dividends in reliability and maintainability.