DevOps and Kubernetes: Orchestrating Containerized Workloads

Introduction

The shift from monolithic deployments to containerized microservices has fundamentally changed how engineering teams ship software. But containers alone solve only part of the problem. Running a single Docker container locally is straightforward; running hundreds of interdependent containers across a fleet of machines — with automatic failover, rolling updates, traffic routing, and resource enforcement — requires a coordination layer. That layer is Kubernetes.

Kubernetes, originally developed at Google as an open-source derivative of their internal Borg system, has become the de facto standard for container orchestration since its release in 2014. It is now maintained by the Cloud Native Computing Foundation (CNCF) and has an enormous ecosystem of tooling built around it. But its popularity comes with a steep learning curve, and many teams adopt it without fully understanding what problem it is actually solving, or how its internal machinery works.

This article is written for engineers who have moved past the tutorial phase and want a grounded, practical understanding of how Kubernetes fits into a mature DevOps practice. We will cover the control plane internals, deployment strategies, autoscaling mechanisms, GitOps workflows, and the sharp edges that tend to cause real production incidents.

The Problem Kubernetes Solves

Before diving into how Kubernetes works, it is worth being precise about the problem it exists to solve. Many engineers conflate container orchestration with container runtime or container networking — these are related but distinct concerns.

The core problem is desired state management at scale. When you have dozens of services, each with multiple replicas, running across a cluster of machines, you need something that continuously reconciles what is currently running against what should be running. Machines fail. Containers crash. Traffic spikes. Deployments need to roll forward or back. Doing all of this manually, or even with bespoke shell scripts and cron jobs, is brittle and does not scale with organizational complexity.

Kubernetes addresses this through a declarative API and a reconciliation loop architecture. Instead of issuing imperative commands like "start container X on node 3," you declare the desired state in a manifest — "I want three replicas of this container running at all times" — and the control plane continuously works to make reality match that declaration. This shift from imperative to declarative operations is the mental model that unlocks most of what Kubernetes offers.

Kubernetes Architecture: What Actually Runs Where

Understanding the Kubernetes control plane is not optional if you operate clusters in production. When something goes wrong — and it will — you need to know which component to look at and why.

The Control Plane

The control plane is the brain of the cluster. In a production setup, it runs across multiple nodes for high availability. Its core components are:

etcd is a distributed key-value store that holds the entire cluster state. Every Kubernetes object — pods, services, config maps, secrets, deployments — is stored as a serialized protobuf record in etcd. All reads and writes from the API server go through etcd. This means etcd's health is directly tied to cluster health. If etcd loses quorum, the API server stops accepting writes, and your cluster becomes effectively read-only. Backup and restore procedures for etcd are not an afterthought — they are a critical operational requirement.

kube-apiserver is the front door of the cluster. Every operation — whether from kubectl, a CI system, or an in-cluster controller — goes through the API server. It handles authentication, authorization (via RBAC), and admission control before persisting anything to etcd. The admission controller chain is particularly important: it is where policy enforcement tools like OPA/Gatekeeper, resource quota enforcement, and pod security admission operate.

kube-scheduler watches for pods that have been created but not yet assigned to a node, and assigns them based on resource requests, node selectors, affinity/anti-affinity rules, taints and tolerations, and topology spread constraints. A common misconception is that the scheduler is simple. In practice, scheduling decisions are complex, and suboptimal pod placement is a frequent source of resource waste and uneven cluster utilization.

kube-controller-manager runs a collection of controllers — each a reconciliation loop — that manage objects like ReplicaSets, Deployments, StatefulSets, Jobs, and endpoints. Each controller watches the API server for changes to its resource type and takes action to reconcile actual state toward desired state.

The Worker Nodes

Each worker node runs three components. The kubelet is an agent that receives pod specifications from the API server and drives the container runtime (via the Container Runtime Interface, or CRI) to start, stop, and health-check containers. The kube-proxy maintains network rules (using iptables or IPVS) that implement the Service abstraction — directing traffic to the correct pod endpoints. The container runtime (containerd or CRI-O in modern clusters) actually manages the container lifecycle on the host.

Workload Resources: Choosing the Right Abstraction

Kubernetes provides several workload resource types, and choosing the wrong one for a given use case is a common mistake.

Deployments and ReplicaSets

For stateless workloads — web servers, API gateways, background workers that do not need persistent identity — a Deployment is the right choice. A Deployment manages a ReplicaSet, which in turn manages a set of Pods. The Deployment resource provides rolling update semantics: by default, it ensures that a minimum number of pods remain available during a rollout (maxUnavailable) and that the total number of pods does not exceed a ceiling (maxSurge). Rollbacks are implemented by creating a new ReplicaSet with the previous pod template and scaling it up while scaling down the current one.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  namespace: production
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
        version: "1.4.2"
    spec:
      containers:
        - name: api
          image: my-registry/api-service:1.4.2
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          readinessProbe:
            httpGet:
              path: /healthz/ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /healthz/live
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20

A few things worth noting in this manifest. Resource requests are what the scheduler uses to place pods on nodes — they represent a guaranteed minimum. Resource limits are enforced at runtime by the kubelet via cgroups. Setting limits much higher than requests can lead to noisy-neighbor problems where one pod monopolizes node resources. The readiness and liveness probes are not cosmetic: the readiness probe controls when a pod starts receiving traffic from its Service, and the liveness probe controls when the kubelet restarts a container. Getting these wrong causes deployment-time outages or zombie pods that never restart.

StatefulSets

For workloads that require stable network identity, stable storage, or ordered deployment and scaling — databases, message brokers, distributed caches — a StatefulSet is the appropriate abstraction. Unlike Deployments, StatefulSet pods are named with an ordinal suffix (pod-0, pod-1, pod-2) and that identity persists across restarts. Each pod can have its own PersistentVolumeClaim, and those claims are not deleted when the pod is rescheduled.

StatefulSets are more operationally complex than Deployments. Updates are applied one pod at a time, starting from the highest ordinal. This protects workloads like etcd or ZooKeeper from losing quorum during upgrades, but it also means updates are slower. Careful thought around PodDisruptionBudgets is required to avoid data loss or availability gaps during node maintenance or rolling updates.

Jobs and CronJobs

For batch workloads — one-time data migrations, scheduled report generation, periodic cleanup tasks — Kubernetes provides Jobs and CronJobs. A Job creates one or more pods, ensures they run to completion, and tracks successful completions. A CronJob creates Job objects on a schedule. These are often overlooked in favor of external schedulers, but running batch work inside the cluster keeps resource accounting unified and simplifies access to internal services.

Deployment Strategies: Beyond Rolling Updates

The rolling update strategy built into Deployments is a sensible default, but production systems often need more precise control over how new versions are introduced to traffic.

Blue-Green Deployments

In a blue-green deployment, you maintain two identical environments — the "blue" environment serving production traffic and the "green" environment running the new version. Once the green environment passes validation, you shift traffic (typically by updating a Service selector or an Ingress rule) atomically. This provides instant rollback: if something goes wrong, you flip traffic back to blue.

The downside is resource cost: you are running double the capacity during the switchover. In Kubernetes, this is typically implemented by running two Deployments with different label values and managing which one the Service selector targets.

# Service selector targets the active deployment via a label
apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  selector:
    app: api-service
    slot: blue  # Change to "green" to cut over traffic
  ports:
    - port: 80
      targetPort: 8080

Canary Deployments

Canary deployments take a more gradual approach: you route a small percentage of traffic to the new version and observe its behavior before committing to a full rollout. This is particularly valuable when the risk profile of a change is high — new database access patterns, changes to external API calls, modified serialization logic.

Implementing canary deployments in raw Kubernetes is awkward because Service traffic distribution is based on the number of pod endpoints, not weights. Running one pod of the new version alongside nine pods of the old version gives you roughly 10% of traffic to the canary, but this is a blunt instrument. For precise traffic weighting, you need a service mesh (Istio, Linkerd) or an ingress controller that supports traffic splitting natively (NGINX Plus, Traefik, Argo Rollouts).

Scaling: Horizontal, Vertical, and Cluster-Level

Kubernetes offers scaling mechanisms at three levels: the pod, the node, and the workload. Using them effectively requires understanding what each one controls and how they interact.

Horizontal Pod Autoscaler (HPA)

The HPA watches a metric — by default, CPU utilization relative to the requested amount, but extensible via the custom metrics API — and adjusts the replica count of a Deployment or StatefulSet accordingly. The reconciliation loop runs every 15 seconds by default.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "500"

The HPA using both CPU and a custom metric (requests per second) here illustrates a production-realistic pattern. CPU alone is often insufficient as a scaling signal for I/O-bound or latency-sensitive services. Scaling on request rate or queue depth provides faster and more accurate response to traffic changes.

There is a subtle interaction to be aware of: the HPA requires that pods have resource requests set. Without requests, the scheduler cannot compute utilization percentages, and the HPA will not function. This is one of several reasons why omitting resource requests is an antipattern.

Vertical Pod Autoscaler (VPA)

The VPA adjusts the resource requests and limits of individual pods based on historical usage. It is useful for workloads where resource needs are hard to estimate upfront. However, VPA has a significant operational constraint: in its default mode, applying new resource recommendations requires restarting the pod, which can be disruptive. Do not run HPA and VPA on the same resource dimension (e.g., both scaling on CPU) simultaneously — they will interfere with each other.

Cluster Autoscaler

The Cluster Autoscaler (CA) adjusts the number of nodes in a cluster. When pods are pending due to insufficient resources, the CA provisions new nodes. When nodes are underutilized and their pods can be safely rescheduled, the CA drains and terminates them. This is typically integrated with cloud provider node groups (AWS Auto Scaling Groups, GCP Managed Instance Groups, Azure VMSS).

Getting the CA tuned correctly is one of the more nuanced operational challenges in Kubernetes. Scale-up is typically fast (one to three minutes for a new node to join and become schedulable), but scale-down is deliberately conservative: the CA waits for a node to be underutilized for a configurable period (default 10 minutes) before terminating it. Aggressive scale-down can cause pod churn; overly conservative scale-down wastes cloud spend.

GitOps: Kubernetes as a Reconciliation Target

The declarative nature of Kubernetes manifests makes it a natural fit for GitOps — the practice of using a Git repository as the single source of truth for cluster state, with automated agents continuously reconciling what is running against what is declared in Git.

The Pull-Based Model

Traditional CI/CD pipelines push changes to infrastructure: the pipeline runs, builds an artifact, then directly executes kubectl apply against the cluster. GitOps inverts this: an in-cluster agent (Flux or Argo CD) pulls the desired state from Git and applies it. This has several advantages. The cluster credentials never need to leave the cluster. Every change is tracked in Git history with a commit author and message. Drift detection is automatic — if someone modifies a resource manually, the agent detects and corrects it.

# Example: Generating Kubernetes manifests programmatically with Python
# before committing to the GitOps repository

from dataclasses import dataclass, field
from typing import List
import yaml

@dataclass
class ContainerSpec:
    name: str
    image: str
    cpu_request: str = "250m"
    memory_request: str = "256Mi"
    cpu_limit: str = "500m"
    memory_limit: str = "512Mi"

@dataclass
class DeploymentConfig:
    name: str
    namespace: str
    replicas: int
    containers: List[ContainerSpec] = field(default_factory=list)

def render_deployment(config: DeploymentConfig) -> dict:
    return {
        "apiVersion": "apps/v1",
        "kind": "Deployment",
        "metadata": {
            "name": config.name,
            "namespace": config.namespace,
        },
        "spec": {
            "replicas": config.replicas,
            "selector": {
                "matchLabels": {"app": config.name}
            },
            "template": {
                "metadata": {"labels": {"app": config.name}},
                "spec": {
                    "containers": [
                        {
                            "name": c.name,
                            "image": c.image,
                            "resources": {
                                "requests": {
                                    "cpu": c.cpu_request,
                                    "memory": c.memory_request
                                },
                                "limits": {
                                    "cpu": c.cpu_limit,
                                    "memory": c.memory_limit
                                }
                            }
                        }
                        for c in config.containers
                    ]
                }
            }
        }
    }

if __name__ == "__main__":
    config = DeploymentConfig(
        name="api-service",
        namespace="production",
        replicas=5,
        containers=[
            ContainerSpec(
                name="api",
                image="my-registry/api-service:1.4.2"
            )
        ]
    )
    print(yaml.dump(render_deployment(config), default_flow_style=False))

This pattern of generating manifests programmatically — rather than maintaining raw YAML files — is common in larger organizations where configuration varies across environments and teams. The generated YAML is then committed to the GitOps repository where it is picked up by the reconciler.

Argo CD and Application Sets

Argo CD is the most widely adopted GitOps tool in the Kubernetes ecosystem. It introduces the Application CRD, which maps a source (a path in a Git repository, optionally processed through Helm or Kustomize) to a destination (a cluster and namespace). ApplicationSets extend this to manage applications across multiple clusters from a single template, which is valuable for multi-region or multi-tenant setups.

A key operational question with Argo CD is sync policy: do you auto-sync (apply changes to the cluster immediately when the Git repository changes) or require manual approval? Auto-sync with pruning enabled is the fully automated path, but it requires high confidence in your Git workflows and review processes. Many teams start with auto-sync disabled and progressively enable it as they build trust in the pipeline.

Trade-offs and Production Pitfalls

Kubernetes provides powerful primitives, but its flexibility also creates many opportunities for misconfiguration. The following are failure patterns that appear repeatedly across production incidents.

Resource Request Misconfiguration

Setting resource requests too low — or omitting them entirely — is one of the most common root causes of cluster instability. When requests are too low, the scheduler places too many pods on a node, and when those pods actually use their resources under load, the node becomes overcommitted. Memory overcommitment can trigger the Linux OOM killer, which terminates processes non-deterministically. CPU overcommitment leads to throttling, which manifests as elevated latency that is difficult to attribute.

Setting limits much higher than requests compounds this problem. A single pod with a limit of 4GB of memory and a request of 256MB can consume the memory headroom of an entire node. The right approach is to profile your workloads under realistic load, set requests based on steady-state consumption, and set limits modestly above that — close enough to catch runaway processes, but with enough headroom to handle bursts.

Missing Pod Disruption Budgets

Node maintenance events — OS patching, node pool upgrades, spot instance preemptions — cause pods to be evicted. Without a PodDisruptionBudget (PDB), there is no guarantee that a minimum number of replicas remain available during these events. A PDB that ensures minAvailable: 2 for a deployment with three replicas means the cluster will not evict a pod if doing so would leave fewer than two replicas running. This is a simple resource to add and frequently prevents unnecessary downtime during infrastructure operations.

Misconfigured Health Probes

Health probe misconfiguration is a recurring source of self-inflicted incidents. A liveness probe that is too aggressive — low timeout, low failure threshold, checking an endpoint that is slow during startup — will restart healthy pods unnecessarily. An initialDelaySeconds that is too short will kill pods before they have finished initializing. Conversely, a liveness probe that is too lenient will leave zombie processes running that should be restarted.

The readiness probe has different failure semantics: a failing readiness probe removes the pod from Service endpoints but does not restart it. This is the right mechanism for temporarily taking a pod out of rotation (during a dependent service outage, for example) without triggering a restart. Many engineers set identical readiness and liveness probes, which misses the distinction between "this pod is temporarily unable to serve traffic" and "this pod is stuck and needs to be killed."

Namespace and RBAC Design

A flat namespace structure — all workloads in a handful of namespaces — quickly becomes an operational liability. Namespaces provide isolation boundaries for ResourceQuotas, NetworkPolicies, and RBAC. Designing namespace hierarchy upfront — by team, by environment, or by security domain — is much easier than refactoring it later when services have accumulated interdependencies.

RBAC policies in Kubernetes use a least-privilege model: by default, service accounts have no permissions. Teams often work around this by granting overly broad permissions (binding the cluster-admin ClusterRole to a service account) to unblock themselves, and those permissions persist long after the immediate need is gone. Regular auditing of ClusterRoleBindings and RoleBindings is a hygiene requirement, not a one-time setup.

Best Practices for Production-Grade Kubernetes

Accumulating operational experience with Kubernetes eventually converges on a set of practices that reduce incident frequency and blast radius. The following represent the highest-leverage items.

Treat Manifests as Code

All Kubernetes manifests should live in version control, be reviewed via pull request, and pass automated validation before being applied to any cluster. Tools like kubeval, kubeconform, and kube-score can validate manifests against the Kubernetes API schema and flag common misconfigurations — missing resource limits, missing probes, deprecated API versions — in CI. This feedback loop catches problems before they reach the cluster, where they are more costly to diagnose.

Policy-as-code tools like OPA/Gatekeeper or Kyverno extend this to runtime: they intercept admission requests and reject or mutate resources that violate organizational policy. Encoding policies like "all production pods must have resource limits" or "container images must come from an approved registry" as admission webhooks ensures compliance across all deployment mechanisms, not just the ones your team controls.

Separate Application Config from Application Code

ConfigMaps and Secrets provide a Kubernetes-native mechanism for separating configuration from the container image. However, Secrets are only base64-encoded in etcd by default — they are not encrypted at rest unless etcd encryption is explicitly configured. For sensitive credentials, consider external secret management via the External Secrets Operator, which syncs secrets from Vault, AWS Secrets Manager, or GCP Secret Manager into Kubernetes Secrets. This keeps the source of truth for sensitive values outside the cluster while maintaining the familiar Secret API for applications.

Implement Structured Observability

Kubernetes generates a large volume of metrics through the metrics-server and through the Kubernetes API itself. The most useful signals for daily operations are pod-level resource utilization versus requests (to identify mismatched sizing), container restart rates (to identify stability issues), scheduler pending pod duration (to identify capacity constraints), and API server request latency (to identify control plane health). The kube-state-metrics exporter exposes most of these in Prometheus format.

Distributed tracing is increasingly important in microservices environments and should be instrumented at the application level — not bolted on at the infrastructure layer. OpenTelemetry has become the de facto standard for instrumentation libraries and provides a vendor-neutral API that can export traces to Jaeger, Zipkin, or commercial backends.

Plan for Multi-Tenancy Early

If multiple teams or customers will share a cluster, the isolation model needs to be designed before problems emerge. Hard multi-tenancy — where workloads from different tenants must be completely isolated — is difficult to achieve with a single Kubernetes cluster given shared control plane components. Soft multi-tenancy — where teams trust each other but need resource accounting and RBAC isolation — is achievable using a combination of namespaces, ResourceQuotas, NetworkPolicies, and namespace-scoped RBAC.

For stricter isolation requirements, virtual cluster tools (vcluster) or dedicated clusters per tenant may be more appropriate than fighting the boundaries of single-cluster isolation.

Key Takeaways

Five practices that engineering teams can apply immediately to improve their Kubernetes operations:

Profile before you configure resources. Run load tests against your services and set resource requests based on p95 consumption, not guesswork. Misconfigured requests cause more production incidents than most other single factors.
Add PodDisruptionBudgets to every production workload. This is a two-minute change per Deployment that prevents maintenance events from causing unplanned downtime. There is rarely a good reason not to have them.
Audit your RBAC bindings quarterly. Overly permissive service accounts accumulate over time. A ClusterRole that was granted in a hurry during an incident often stays in place permanently. Automate the detection of bindings to cluster-admin or wildcard verbs.
Shift manifest validation left. Add kubeconform and kube-score to your CI pipeline so schema and best-practice violations are caught before merge, not after deployment.
Adopt a GitOps tool and enforce declarative state. Once you have a GitOps agent reconciling cluster state, ad-hoc kubectl apply commands become drift that gets corrected automatically. This dramatically reduces the "works in staging, broken in production" class of problems.

Analogies and Mental Models

Kubernetes is often described using the metaphor of a shipping container system, and the analogy holds further than most people take it. The control plane is like a port authority: it tracks where every container is, routes incoming shipments to available berths, and reallocates containers when vessels move or berths become unavailable. The containers themselves are standardized and interchangeable — it does not matter what is inside; the logistics system handles them uniformly.

A more precise mental model for day-to-day operations is to think of Kubernetes as a desired state database with automatic repair. Every resource you create is a row in that database expressing an intent. The controllers are background jobs that read those rows and take action to make the real world match the record. When something breaks — a node goes down, a container crashes — the database still holds the desired state, and the controllers keep working until the state is achieved again. Your job as an operator is to write correct desired state, not to manage the repair process manually.

The 80/20 Insight

Kubernetes has an enormous surface area — custom resource definitions, admission webhooks, service meshes, storage classes, network plugins, and much more. But 80% of the value in most production environments comes from mastering five things: correct resource requests and limits, health probes that accurately reflect application state, a deployment strategy appropriate for the workload's risk profile, horizontal autoscaling on the right signal, and a GitOps workflow that provides auditability and drift correction.

Everything else — Istio, KEDA, OPA/Gatekeeper, multi-cluster federation — adds value in specific contexts, but often introduces complexity that teams are not ready for. Getting the fundamentals right, consistently, across all workloads in the cluster is more valuable than adopting advanced features on a subset of them.

Conclusion

Kubernetes is genuinely powerful, but its power is proportional to the depth of understanding brought to its operation. Treating it as a black box — applying manifests and hoping the cluster sorts things out — leads to resource waste, fragile deployments, and incidents that are difficult to diagnose. Treating it as a system with well-documented internals — a reconciliation engine built on etcd, a scheduling model based on resource requests, a networking model based on label selectors and endpoint slices — makes it comprehensible and predictable.

The transition to containerized workloads orchestrated by Kubernetes is not just a technology change; it is an operational model change. It requires investing in manifest quality, observability infrastructure, RBAC discipline, and GitOps tooling. Teams that make those investments find that Kubernetes enables a level of deployment velocity and operational reliability that was impractical with previous approaches. Teams that skip them often find themselves managing a more complex version of the problems they had before.

The fundamentals are stable. The ecosystem continues to evolve, but the core architecture of Kubernetes — the API server, the controller pattern, the scheduler, the kubelet — has been remarkably consistent across versions. Learning it deeply is an investment that pays dividends across years, not just the next release cycle.

References

Kubernetes Official Documentation — https://kubernetes.io/docs/
Kubernetes Design Proposals (GitHub) — https://github.com/kubernetes/design-proposals-archive
Borg, Omega, and Kubernetes — Brendan Burns, Brian Grant, David Oppenheimer, et al. ACM Queue, Vol. 14, 2016. https://dl.acm.org/doi/10.1145/2898442.2898444
CNCF Kubernetes Project — Cloud Native Computing Foundation. https://www.cncf.io/projects/kubernetes/
Argo CD Documentation — https://argo-cd.readthedocs.io/
Flux CD Documentation — https://fluxcd.io/docs/
Kubernetes Horizontal Pod Autoscaler Walkthrough — https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
OpenTelemetry Documentation — https://opentelemetry.io/docs/
kube-state-metrics — https://github.com/kubernetes/kube-state-metrics
External Secrets Operator — https://external-secrets.io/
OPA/Gatekeeper — https://open-policy-agent.github.io/gatekeeper/
Kyverno Policy Engine — https://kyverno.io/docs/
kubeconform — https://github.com/yannh/kubeconform
kube-score — https://github.com/zegl/kube-score
Production Kubernetes — Josh Rosso, Rich Lander, Alexander Brand, John Harris. O'Reilly Media, 2021.
Kubernetes Patterns — Bilgin Ibryam, Roland Huß. O'Reilly Media, 2019.