Docker Installation, Setup, and Getting Started: A Complete Guide for Engineers

Introduction

There is a moment familiar to almost every software engineer: a colleague says "it works on my machine," and the entire team quietly resigns itself to hours of debugging environment discrepancies. Different OS versions, mismatched library dependencies, conflicting runtime configurations — the accidental complexity of software deployment has historically consumed enormous engineering time. Docker was built to address exactly this problem.

Docker is an open-source platform that packages applications and their dependencies into lightweight, portable units called containers. Unlike virtual machines, containers share the host operating system's kernel, making them significantly more resource-efficient while still providing strong isolation boundaries. Since Docker's public release in 2013, it has become a foundational tool across development, CI/CD pipelines, and production infrastructure. Understanding how to install, configure, and work with Docker effectively is no longer optional for professional engineers — it is a baseline competency.

This article walks through Docker's architecture and core concepts, the installation and setup process across major platforms, and practical patterns for using Docker in real engineering work. It is written for engineers who understand software systems and want a thorough, no-fluff guide rather than a surface-level tutorial.

Context: Why Containerization Matters

Before the container era, deployment meant either shipping entire virtual machines (heavy, slow, expensive) or relying on carefully documented setup scripts that inevitably drifted from reality. Configuration management tools like Chef, Puppet, and Ansible helped, but they operated on mutable infrastructure — servers that changed over time in ways that were hard to audit or reproduce.

The insight behind containerization is that an application's runtime environment should be a first-class artifact, versioned and shipped alongside the code itself. This is sometimes called "immutable infrastructure": instead of updating a running system, you replace it with a new, known-good image. This shift dramatically reduces the surface area for "works on my machine" failures and makes rollbacks trivially reproducible.

Docker popularized this model by providing a practical, developer-friendly interface over Linux kernel features that had existed for years: cgroups for resource isolation and namespaces for process, network, and filesystem isolation. The innovation was not the kernel primitives themselves but the tooling and image distribution model built on top of them. Docker Hub and the OCI (Open Container Initiative) image format created a shared ecosystem where pre-built images for databases, runtimes, and services could be pulled and run in seconds.

Core Concepts Before You Install

Understanding a handful of concepts before touching the CLI will save significant confusion later. Docker has a specific vocabulary, and conflating terms like "image" and "container" is a common source of early mistakes.

A Docker image is a read-only, layered filesystem snapshot. It contains your application code, runtime, system libraries, and configuration. Images are built from a Dockerfile — a declarative text file describing the steps to assemble the image. Images are immutable: once built, they do not change. You can think of an image as a class definition in object-oriented programming.

A Docker container is a running instance of an image — the class instantiated into an object. Containers are ephemeral by default: when a container stops, any filesystem changes made inside it are discarded unless explicitly persisted using volumes. Multiple containers can run from the same image simultaneously, each with its own isolated process space and network interface.

A Docker registry is a storage and distribution system for images. Docker Hub is the default public registry. Organisations commonly run private registries (AWS ECR, Google Artifact Registry, or self-hosted Harbor) to store proprietary images. The docker pull command fetches images from a registry; docker push publishes them.

Volumes are the mechanism for persisting data beyond a container's lifecycle. A volume is a directory managed by Docker and mounted into a container at a specified path. Volumes survive container removal and can be shared across containers — the canonical way to persist database data or share build artifacts between containers in a pipeline.

Docker Compose is a tool for defining and running multi-container applications using a single YAML configuration file. In a typical web application stack, you might define a service for the application server, a database, and a cache — each in its own container, all networked together and started with a single docker compose up command.

Installing Docker

Linux

On Linux, Docker Engine can be installed directly without any virtualization layer, since Docker's container runtime relies on the Linux kernel natively.

The recommended approach is to use Docker's official APT or YUM repositories rather than distribution-packaged versions, which are often outdated. On Ubuntu or Debian:

# Remove any older versions
sudo apt-get remove docker docker-engine docker.io containerd runc

# Add Docker's official GPG key and repository
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

After installation, add your user to the docker group so you can run Docker commands without sudo. This is important for usability and required by many toolchains:

sudo usermod -aG docker $USER
# Log out and back in for group membership to take effect
newgrp docker

Verify the installation with docker run hello-world. Docker will pull the official hello-world image and run a container that prints a confirmation message to stdout.

macOS and Windows

On macOS and Windows, Docker containers still run Linux workloads, which means a lightweight Linux virtual machine must run underneath the container layer. Docker Desktop handles this transparently, managing a bundled VM (using HyperKit on older macOS, Apple Virtualization Framework on Apple Silicon, Hyper-V or WSL2 on Windows) and providing an integrated GUI for managing containers and images.

Docker Desktop can be downloaded from docker.com/products/docker-desktop. On Apple Silicon (M1/M2/M3) Macs, ensure you download the ARM64 build — Docker Desktop runs natively on Apple Silicon and performs well. On Windows, WSL2 (Windows Subsystem for Linux 2) backend is strongly preferred over the legacy Hyper-V backend for better performance and compatibility.

On macOS, you can also install Docker Engine without Docker Desktop using tools like OrbStack or Colima, which are lighter weight and have no licensing implications for commercial use. OrbStack in particular has gained significant traction in the engineering community for its fast startup and low resource usage:

# Using Homebrew
brew install orbstack
# or
brew install colima docker
colima start

For most engineers, Docker Desktop is the path of least resistance on macOS and Windows. For teams with specific commercial licensing concerns or who want finer control, Colima or OrbStack are sound alternatives.

Your First Container: Moving Beyond Hello World

Once Docker is installed, the next step is understanding the practical shape of container operations before diving into writing your own Dockerfiles.

The docker run command is the workhorse of container interaction. Its general form is:

docker run [OPTIONS] IMAGE [COMMAND] [ARGS]

A few immediately useful patterns:

# Run an interactive bash shell inside a Ubuntu container
docker run -it ubuntu:22.04 bash

# Run a Postgres database in the background, mapping port 5432
docker run -d \
  --name my-postgres \
  -e POSTGRES_PASSWORD=secret \
  -e POSTGRES_DB=myapp \
  -p 5432:5432 \
  postgres:16

# Run a Node.js script in the current directory without installing Node locally
docker run --rm -v $(pwd):/app -w /app node:20 node index.js

The -d flag (detach) runs the container in the background. The -p host:container flag maps a port on the host to a port inside the container. The -v source:target flag mounts a host directory into the container. The --rm flag removes the container automatically when it exits — useful for one-off commands.

These patterns reveal a use case that is underappreciated by engineers new to Docker: using containers as isolated, reproducible command environments, not just for deployment. You can run any tool — linters, compilers, database clients, data processing scripts — without installing it on the host machine, ensuring consistent behavior across developer machines and CI.

Writing Dockerfiles

A Dockerfile is the source of truth for an image. Writing good Dockerfiles requires understanding both the instruction set and the layer caching model, which has significant impact on build performance.

Layer Caching

Each instruction in a Dockerfile creates a new layer in the image. Docker caches layers and reuses them on subsequent builds as long as the instruction and everything before it in the file remains unchanged. This means instruction order matters enormously for build speed. Dependencies that change infrequently should be installed early; frequently-changing application code should be copied late.

A poorly-ordered Dockerfile for a Node.js application:

# Inefficient: copies all source files before installing dependencies
FROM node:20-alpine
WORKDIR /app
COPY . .          # ← entire source copied first
RUN npm install   # ← cache busts on ANY source file change
CMD ["node", "server.js"]

A well-ordered Dockerfile that leverages caching effectively:

FROM node:20-alpine
WORKDIR /app

# Copy dependency manifests first — these change rarely
COPY package.json package-lock.json ./
RUN npm ci --only=production   # ← cached unless package*.json changes

# Copy source last — this changes frequently
COPY . .
CMD ["node", "server.js"]

In a large project with many dependencies, this ordering change can reduce incremental build time from minutes to seconds.

Multi-Stage Builds

Multi-stage builds are a powerful pattern for producing lean production images. The idea is to use one image for building (which may include compilers, test tools, and dev dependencies) and a separate, minimal image for running the application. The final image contains only the compiled output — none of the build toolchain.

A realistic multi-stage Dockerfile for a TypeScript application:

# --- Build stage ---
FROM node:20-alpine AS builder

WORKDIR /app
COPY package.json package-lock.json tsconfig.json ./
RUN npm ci

COPY src/ ./src/
RUN npm run build   # Outputs to /app/dist

# --- Production stage ---
FROM node:20-alpine AS production

ENV NODE_ENV=production
WORKDIR /app

# Copy only production dependencies and compiled output
COPY package.json package-lock.json ./
RUN npm ci --only=production && npm cache clean --force

COPY --from=builder /app/dist ./dist

# Run as non-root user for security
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

EXPOSE 3000
CMD ["node", "dist/server.js"]

This pattern typically reduces image size by 60–80% compared to a single-stage build that includes all development tooling. Smaller images mean faster pulls in CI/CD pipelines, smaller attack surface, and lower storage costs in registries.

Docker Compose for Multi-Container Applications

Real applications are rarely single-container affairs. A typical stack involves an application server, a database, a cache, and perhaps a background worker. Docker Compose provides a declarative way to define and orchestrate these services as a unit.

The following docker-compose.yml defines a realistic development environment for a web application with a Python FastAPI backend, a PostgreSQL database, and a Redis cache:

version: "3.9"

services:
  api:
    build:
      context: .
      dockerfile: Dockerfile
      target: development          # Use the dev stage of a multi-stage Dockerfile
    volumes:
      - .:/app                     # Mount source for hot reloading
    ports:
      - "8000:8000"
    environment:
      DATABASE_URL: postgresql://appuser:secret@db:5432/myapp
      REDIS_URL: redis://cache:6379
    depends_on:
      db:
        condition: service_healthy  # Wait until Postgres is ready
      cache:
        condition: service_started

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: appuser
      POSTGRES_PASSWORD: secret
      POSTGRES_DB: myapp
    volumes:
      - postgres_data:/var/lib/postgresql/data   # Persist data across restarts
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser -d myapp"]
      interval: 5s
      timeout: 5s
      retries: 5

  cache:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:

With this file in place, docker compose up starts all three services in the correct order, with networking configured so that services can reach each other by their service names (db, cache) as hostnames. docker compose down -v tears everything down and removes volumes — useful for a clean-slate reset during development.

The depends_on with condition: service_healthy is a frequently overlooked detail. Without it, the API container may start before the database is ready to accept connections, causing initialization failures that are race-condition-dependent and hard to reproduce. The healthcheck ensures Compose waits until Postgres is genuinely ready.

Trade-offs and Pitfalls

Image Size and Attack Surface

There is a persistent tendency among engineers new to Docker to use generic, large base images (e.g., ubuntu:22.04 or node:20) when smaller, purpose-built alternatives exist. Alpine Linux-based images (node:20-alpine, python:3.12-alpine) are significantly smaller and have fewer pre-installed packages, which directly reduces the attack surface for security vulnerabilities.

"Distroless" images, produced by Google, go further: they contain only the application runtime and its dependencies, with no shell, package manager, or any other OS tooling. This makes them highly secure but can complicate debugging. The trade-off is worth it for production images of stable, well-understood services.

One common pitfall is leaving sensitive data — API keys, credentials, private keys — in image layers. Because image layers are content-addressable and immutable, credentials added in one layer and deleted in a subsequent layer are still present in the history and recoverable with docker image history or by inspecting the layer directly. The correct approach is to never write secrets into a Dockerfile at all; use build-time secrets via --secret or runtime environment variables injected at container start.

Ephemeral Containers and State Management

Docker containers are designed to be stateless. The most common beginner mistake is relying on a container's internal filesystem to persist important data — database files, uploaded user content, generated reports. When the container is removed and recreated (which happens constantly in CI/CD and in production with orchestrators like Kubernetes), that data is gone.

State should always live in named volumes or on external storage systems. For databases, this means mounting the data directory as a volume. For user-generated files, it typically means using cloud object storage (S3, GCS) and having the application write there rather than to the local filesystem.

Networking Subtleties

Container networking is a frequent source of confusion, particularly the distinction between localhost inside a container and on the host. When a containerised application tries to connect to localhost:5432, it is resolving localhost relative to the container's network namespace — not the host's. If Postgres is running on the host (not in a container), the correct address from within a container depends on the platform: host.docker.internal on Docker Desktop for macOS and Windows resolves to the host machine, but on Linux, you may need to use --add-host=host.docker.internal:host-gateway to achieve the same.

In a Compose-defined multi-container setup, use service names as hostnames. Docker's embedded DNS resolves these automatically within the shared bridge network.

Best Practices

Use specific image tags, not latest. The latest tag is mutable and can change without notice when the upstream image is updated, introducing silent regressions into builds. Always pin to a specific version tag (e.g., postgres:16.2-alpine) in Dockerfile and docker-compose.yml. Use a tool like Dependabot or Renovate to automate version bumps.

Run containers as non-root. By default, processes inside containers run as root (UID 0). While they are isolated from the host by the container boundary, running as root is a security liability — if the container is compromised, the attacker has root inside the container and may be able to exploit kernel vulnerabilities or misconfigured volume mounts. Always create and switch to a non-root user in your Dockerfile before the CMD or ENTRYPOINT instruction.

Use .dockerignore aggressively. The .dockerignore file works like .gitignore, specifying paths that should be excluded from the build context sent to the Docker daemon. Without it, COPY . . will include node_modules, .git, build artifacts, test fixtures, and local configuration files — inflating the build context and potentially leaking sensitive files into the image. At minimum, include node_modules, .git, *.log, and any local environment files.

# .dockerignore
node_modules
.git
.gitignore
*.log
.env
.env.*
dist
coverage

Separate development and production Dockerfiles (or use multi-stage targets). Development containers often need live-reloading, test frameworks, and debugging tools. Production containers should be as lean and immutable as possible. Multi-stage Dockerfiles with named stages (AS development, AS production) allow a single file to serve both purposes: docker build --target development for local work, and docker build (defaulting to the final stage) for production images.

Structure health checks correctly. The HEALTHCHECK instruction in a Dockerfile tells Docker how to test whether the container is working correctly. This is important for orchestrators (Compose, Kubernetes, ECS) that need to know when a container is ready to receive traffic. A health check should probe the application's readiness, not just whether the process is running.

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

Key Takeaways

There are five habits that produce most of the practical value from Docker, worth internalizing immediately:

Use multi-stage builds for all compiled or transpiled applications. The build toolchain does not belong in your production image. A 2GB image with compiler tools and dev dependencies produces the same application as a 150MB production image — the former just costs more in storage, pull time, and attack surface.
Pin all image versions in every file that references an image. Treat image version updates as dependency upgrades: deliberate, tested, and audited. Use Renovate or Dependabot to keep versions current without manual effort.
Never put secrets in Dockerfiles. Use Docker Build secrets (--secret), runtime environment variables, or secret management systems (Vault, AWS Secrets Manager). Audit your image history periodically with docker image history <image> to catch accidental credential leaks.
Think in services, not containers once you move beyond single-container use cases. Docker Compose is the right unit of abstraction for a local development environment. A docker-compose.yml in every repository, usable with a single command, eliminates the "getting started" friction for new team members.
Understand the layer cache as a first-class engineering concern. Ordering Dockerfile instructions to maximize cache hits is not an optimization detail — it is the difference between 30-second and 10-minute CI builds. Dependencies first, source code last, always.

Analogies and Mental Models

The container-as-shipping-container analogy is so well-worn it has become cliché, but it is genuinely useful: before standardized shipping containers, loading and unloading cargo was slow, error-prone, and required specialist knowledge of each cargo type. Standardization didn't change what was being shipped — it changed how it moved through the logistics system. Docker does the same thing for software: the application is packaged into a standard format that runs the same way on a developer's laptop, in CI, and in production.

A more nuanced mental model for layers: think of a Docker image as a stack of transparent acetate sheets, each adding or modifying what is below it. When a container runs, it sees the composite of all layers as a single coherent filesystem. The container itself adds one more writable layer on top — but only that top layer goes away when the container stops. The layers underneath are shared, read-only, and reused across all containers running from the same image.

The Dockerfile as a recipe (not a configuration file) is another useful frame. A recipe describes the steps to produce a dish; the dish itself is the image. Every time you build the image, you re-execute the recipe from the top — unless the cache short-circuits a step because neither the instruction nor its inputs have changed. Modifying an ingredient list midway through a recipe forces re-execution from that point forward, which is exactly how Docker's layer cache works.

The 80/20 Insight

Most of Docker's practical value in day-to-day engineering work comes from mastering a small core:

The Dockerfile with proper instruction ordering and multi-stage builds is where most of the productive leverage lives. A well-written Dockerfile eliminates environment inconsistencies, enables fast CI, and produces secure, minimal production images. Invest time here before exploring advanced features.

The docker-compose.yml for local development is the second high-leverage artifact. Every project should have one. It removes the "environment setup" step for anyone who joins the project and makes spinning up dependencies — databases, caches, message queues — a non-event.

The volume and networking model is where most engineers hit their first significant wall. Understanding that containers have their own network namespace, that service names in Compose are DNS hostnames, and that persistent data belongs in named volumes — not in the container filesystem — resolves the majority of confusing behaviors engineers encounter in their first few weeks with Docker.

Everything else — Swarm, advanced BuildKit features, custom network drivers, multi-host networking — builds on top of this foundation. Master these three things first.

Conclusion

Docker has fundamentally changed how software is built, tested, and deployed. The shift it represents — from mutable, snowflake servers to immutable, reproducible container images — is one of the most durable architectural ideas to emerge in the last decade. The tooling has matured considerably since its initial release: multi-stage builds, BuildKit, the Compose specification, and the broader OCI ecosystem have addressed most of the early rough edges.

Getting started with Docker is straightforward. Getting good at Docker requires understanding a small set of mental models — layers, immutability, ephemerality, networking namespaces — and applying them consistently. The engineers who get the most from Docker are those who treat their Dockerfiles and Compose configurations with the same care as their application code: reviewing them, testing them, and refining them over time.

The investment is worth it. A team that ships Docker-containerized applications has eliminated an entire class of "but it worked in dev" failures, enabled developers to spin up full application stacks in seconds, and laid the foundation for a straightforward path to Kubernetes or other container orchestration platforms when scale demands it.

References

Docker, Inc. Docker Documentation. https://docs.docker.com
Open Container Initiative. OCI Image Format Specification. https://github.com/opencontainers/image-spec
Docker, Inc. Dockerfile reference. https://docs.docker.com/reference/dockerfile/
Docker, Inc. Docker Compose specification. https://docs.docker.com/compose/compose-file/
Docker, Inc. Use multi-stage builds. https://docs.docker.com/build/building/multi-stage/
Google. Distroless container images. https://github.com/GoogleContainerTools/distroless
Linux Kernel Documentation. Control Groups (cgroups). https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
Docker, Inc. Docker security. https://docs.docker.com/engine/security/
Liz Rice. Container Security: Fundamental Technology Concepts that Protect Containerized Applications. O'Reilly Media, 2020.
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes. Borg, Omega, and Kubernetes. ACM Queue, 2016. https://queue.acm.org/detail.cfm?id=2898444