Tuesday, January 27, 2026

Why Service Providers Don't Accept Customer BGP FlowSpec

Why Service Providers Don't Accept Customer BGP FlowSpec — And Why It's Not About Upselling DDoS Protection

Why Service Providers Don't Accept Customer BGP FlowSpec

And Why It's Not About Upselling DDoS Protection
After ~25 years in networking, I often hear: "ISPs block customer FlowSpec because they want to upsell DDoS protection."

That's only half the story.

The real reason FlowSpec rarely crosses the ISP–customer boundary is the collision of control, accountability, and shared infrastructure.

🎯 The Common Misconception

BGP FlowSpec (RFC 8955/8956) is one of the most powerful yet underutilized tools in DDoS mitigation. In theory, it allows a customer to signal filtering rules to their upstream provider during an attack — dynamically, without manual intervention.

But in practice, most service providers don't accept FlowSpec from customers.

The typical explanation? "They want to upsell managed DDoS scrubbing."

While there's truth to that, it misses the deeper technical and operational reasons why FlowSpec is fundamentally incompatible with the ISP–customer trust model.

"FlowSpec works best where control and accountability are aligned.
That's why it thrives inside an AS, but rarely across AS boundaries."

1️⃣ The Business Shift Is Real — But It's About Liability, Not Just Margin

Transit Became Cheap. DDoS Protection Didn't.

Over the last decade, IP transit pricing collapsed. What used to cost hundreds of dollars per Mbps now costs pennies. ISPs can't make meaningful margin on connectivity alone anymore.

But the deeper issue is ownership:

  • If the ISP scrubs → they own the outcome
  • If the customer injects FlowSpec → the ISP inherits the risk

One bad rule can blackhole legitimate traffic, and the ISP still gets blamed.

💡 Key Point: When you hand a customer the ability to inject drop rules into your network, you inherit liability for every mistake they make — without the visibility or control to validate their intent.

2️⃣ TCAM Is a Shared Fate Problem

FlowSpec Rules Consume Scarce Hardware Resources

Modern routers use TCAM (Ternary Content Addressable Memory) to perform line-rate packet filtering. TCAM is:

  • Expensive
  • Finite
  • Shared across all customers on the same router

Complex FlowSpec matches expand entries:

  • Multi-field matches (source port + destination port + protocol + packet length)
  • Fragment handling (differs by platform)
  • DSCP marking, TCP flags, ICMP types

There is no safe per-customer quota that works during a real attack.

⚠️ Technical Reality: A single customer under DDoS stress can inject hundreds of FlowSpec rules. If those rules exhaust TCAM, it impacts everyone on that router — not just the customer under attack.

Why ISPs Can't Just "Allocate TCAM Per Customer"

TCAM isn't like bandwidth — you can't partition it cleanly:

  • Rule expansion is unpredictable
  • Platform behavior varies (Juniper MX vs Cisco ASR vs Arista 7280)
  • During an actual volumetric attack, FlowSpec rules compete with ACLs, uRPF, and other control-plane protections

A "fair share" policy doesn't exist in TCAM world.

3️⃣ Validation Breaks Customer Expectations

RFC 8955 Protects the Network — But Creates Operational Ambiguity

FlowSpec includes validation to prevent abuse:

  • A FlowSpec rule targeting a destination must match a unicast BGP route in the RIB
  • If the destination prefix isn't in the routing table, the rule is marked Invalid

This sounds safe. But in asymmetric routing environments, it breaks:

📌 Example Scenario:
  • Customer owns 203.0.113.0/24
  • ISP receives this prefix via Peer A (best path)
  • Customer injects FlowSpec via Transit Link B
  • The ISP's router doesn't have a route to 203.0.113.0/24 via that session
  • Result: FlowSpec rule is silently marked Invalid
💡 The Problem: Rules aren't rejected — they're silently inactive. The customer thinks they mitigated. They didn't. This operational ambiguity is poison during an incident.

4️⃣ Source-Based Blocking Is Intentionally Constrained

You Don't Own Attacker Prefixes

FlowSpec is destination-anchored by design to prevent abuse:

  • You can't inject a rule saying "drop all traffic from 1.2.3.0/24" unless you own that prefix
  • If you could, a malicious actor could blackhole any prefix on the internet

That makes it safer — but it also means FlowSpec is not true push-back.

FlowSpec Is RTBH++, Not Attacker Suppression

Think of FlowSpec as:

  • Remotely Triggered Black Hole (RTBH) with granular match criteria
  • You can say "drop packets to MY prefix matching X"
  • You cannot say "block this attacker globally"

This limits its effectiveness against distributed attacks from thousands of sources.

5️⃣ Multi-Vendor Reality Hurts

What Works on One Platform May Fail on Another

ISPs run heterogeneous networks:

  • Juniper MX at peering points
  • Cisco ASR9k at aggregation
  • Arista 7280 at customer edge

FlowSpec behavior differs:

  • Some platforms support fragment filtering; others don't
  • TCAM layout varies (e.g., Broadcom Trident3 vs Jericho2)
  • Actions like "rate-limit" vs "redirect-to-VRF" aren't universally supported
⚠️ Real-World Impact: ISPs struggle to normalize FlowSpec internally. Letting customers inject rules multiplies that risk — now the ISP has to guarantee consistent behavior across platforms they don't fully control.

❌ So What Actually Kills Customer FlowSpec?

Not One Team — Every Team

Engineering fears blast radius:

  • One bad rule can affect hundreds of customers
  • TCAM exhaustion is silent until it's catastrophic

Operations fears silent failure and troubleshooting hell:

  • "Why isn't my FlowSpec rule working?" becomes the #1 ticket
  • Debugging asymmetric routing + validation state + multi-vendor TCAM behavior at 2 AM

Security fears abuse:

  • A compromised customer could inject rules targeting someone else
  • Even with validation, the attack surface is non-zero

Finance asks: Who pays when this goes wrong?

  • If the ISP's network drops traffic due to a customer-injected rule, who's liable?
  • SLAs don't cover "customer shot themselves in the foot"

🎯 The Core Issue: Shared Control Without Shared Responsibility Doesn't Scale

FlowSpec isn't broken. It's incredibly powerful inside a single administrative domain:

  • A large enterprise using FlowSpec between DC and branches
  • A cloud provider using it internally across regions
  • An ISP using it for internal DDoS response teams

But across AS boundaries, the trust model collapses:

  • The customer doesn't own the ISP's TCAM
  • The ISP doesn't control the customer's filtering logic
  • When something breaks, both sides blame each other
"It's not that FlowSpec is broken.
It's that shared control without shared responsibility doesn't scale."

🔮 What's the Alternative?

If Not Customer FlowSpec, Then What?

1. ISP-Managed Scrubbing Centers

  • BGP-triggered diversion to dedicated scrubbing infrastructure
  • ISP owns the filtering logic and liability
  • Customer pays for the service

2. Customer-Side FlowSpec (Within Their AS)

  • Customer runs FlowSpec internally (e.g., from firewall to edge routers)
  • ISP only sees the "clean" side

3. RTBH (Remotely Triggered Black Hole)

  • Simpler, less risky
  • Customer signals via BGP community: "drop all traffic to this /32"
  • ISP implements it at their edge

4. API-Based On-Demand Filtering

  • Customer calls ISP API during attack
  • ISP validates and applies rules in controlled manner
  • Combines automation with ISP oversight

✅ Final Takeaway

Customer FlowSpec across ISP boundaries fails not because ISPs are greedy, but because the operational model is fundamentally misaligned.

FlowSpec requires:

  • Trust in the customer's filtering logic
  • Shared fate in TCAM exhaustion risk
  • Multi-vendor consistency that doesn't exist
  • Clear liability when things break

None of these exist at the ISP–customer boundary.

"FlowSpec works best where control and accountability are aligned.
Inside your AS? Powerful.
Across AS boundaries? A liability nightmare."

The next time someone says "ISPs just want to upsell scrubbing" — remind them: the technical reasons are more fundamental than the business reasons. And until we solve TCAM scarcity, validation ambiguity, and multi-vendor normalization, customer FlowSpec will remain an idea that works in slides, but breaks in production.

BGP FlowSpec DDoS Mitigation ISP Strategy TCAM RFC 8955 Network Security Service Provider BGP Routing Security Traffic Filtering

Saturday, January 24, 2026

FIB Failures: When the Control Plane Is Right and Traffic Still Drops

FIB Failures: When the Control Plane Is Right and Traffic Still Drops | RJS Expert

FIB Failures: When the Control Plane Is Right and Traffic Still Drops

✍️ Written by: RJS Expert
Understanding the gap between RIB convergence and FIB programming in production networks.

Most large networks don't fail because the design is wrong.

They fail because the Forwarding Information Base (FIB) hits limits that architecture reviews never model.

📋 What Design & Config Checks Validate

Design and config checks validate:

  • ✔ Routing correctness
  • ✔ Features like PIC, TI-LFA, SR, Add-Path
  • ✔ Timers and best practices

All necessary.
Still insufficient.

Because forwarding is constrained by silicon, not by intent.

⚠️ RIB Converged ≠ Forwarding Correct

A familiar production pattern:

✓ Control Plane Status

  • BGP converged
  • IGP stable
  • PIC triggered
  • Routes present in RIB

✗ Forwarding Reality

  • Selective packet loss
  • Prefix-level blackholes
  • Drops during failover

This is not a control-plane issue.
It's a FIB programming failure.

🔍 Common Real-World FIB Failure Patterns

1. TCAM Exhaustion & Fragmentation

  • Asymmetric programming across line cards
  • Fragmentation blocks new entries
  • Prefixes exist in RIB but never reach hardware

Often triggered by combined scale: Internet routes + ACLs + QoS + SR

2. PIC Edge Timing Gaps

  • Software switches next-hops instantly
  • Hardware lags under scale
  • Micro-blackholes, stale adjacencies, VRF-specific loss

PIC works.
Forwarding timing doesn't always match.

3. Segment Routing / TI-LFA Scale Pressure

  • Node SIDs, Adj-SIDs, repair paths, policies all compete for FIB
  • Backup paths compute correctly
  • Only partially program in hardware

Failures surface during large topology events—exactly when protection is needed.

❌ Why Design & Config Audits Miss This

Audit Type What It Answers
Design Review Should this work?
Config Audit Is it enabled?
❓ Missing Question Can the hardware sustain worst-case churn, scale, and recovery simultaneously?

FIB failures are stress-induced, incremental, and often invisible until failure conditions align.

✅ Post-Incident FIB Audit Checklist

After every major incident, check:

Audit Area What to Check
RIB vs FIB • Prefixes present in RIB but missing in hardware
• Per-line-card inconsistencies
TCAM Health • Utilization and fragmentation
• Feature-wise consumption (BGP, ACL, QoS, SR)
Failover Reality • PIC trigger time vs actual forwarding switchover
• Micro-blackholes during convergence
SR / Labels • Repair paths actually installed in FIB
• Label space pressure or partial installs
Programming Performance • FIB update latency during failure
• Hardware programming drops or queueing
Asymmetry & Churn • Uneven FIB pressure across cards
• Route churn volume during the event

💡 The Hard Truth

Most "random" outages are not bugs.

They are hardware scale limits discovered during failure.

The control plane did exactly what it should.
The silicon couldn't keep up.

🔧 Diagnostic Commands for FIB Validation

Cisco IOS-XR

# Compare RIB vs FIB
show route
show cef
show cef inconsistency

# TCAM utilization
show controllers npu resources all location all
show controllers fia diagshell 0 "diag cosq stat" location all

# Per-line-card FIB
show cef location 0/0/CPU0
show adjacency location 0/0/CPU0

Cisco IOS-XE / NX-OS

# RIB vs FIB
show ip route
show ip cef
show ip cef inconsistency

# TCAM health
show platform hardware fed active fwd-asic resource tcam utilization
show hardware capacity

Juniper Junos

# RIB vs FIB
show route
show route forwarding-table

# FIB programming
show pfe statistics traffic
show chassis forwarding

📊 Real-World Scenario: When Everything "Works" But Traffic Drops

Incident Timeline:

T+0 Link failure triggers PIC Edge
T+50ms RIB updates complete, next-hops switched
T+200ms FIB programming starts on line cards
T+2s Line card 3 TCAM full, drops 1,200 prefixes
T+5s Monitoring shows "BGP converged" ✓
Impact Traffic to 1,200 prefixes blackholed for 8 minutes until manual intervention

Root cause: TCAM fragmentation + scale. No config error. No design flaw. Hardware couldn't sustain the churn.

🛠️ Preventive Measures

  1. Baseline TCAM utilization across all line cards
    • Track per-feature consumption (routing, ACLs, QoS, SR labels)
    • Monitor fragmentation levels
    • Set alerts at 70%, not 90%
  2. Test FIB programming under failure conditions
    • Simulate link failures during peak routing table size
    • Measure actual FIB update latency, not just RIB convergence
    • Validate per-line-card consistency
  3. Implement FIB monitoring in production
    • Compare RIB vs FIB prefix counts continuously
    • Alert on inconsistencies that persist > 30 seconds
    • Track hardware programming queue depth
  4. Right-size SR/TI-LFA deployments
    • Not every prefix needs backup path protection
    • Limit repair path depth
    • Test combined scale: Internet + SR + ACLs
  5. Include FIB validation in change windows
    • Post-change: verify RIB/FIB consistency
    • Check TCAM utilization trends
    • Document FIB programming timing

🎯 Final Thought

"Your network is defined not by what the RIB converges to, but by what the FIB can sustain under stress."

If you don't audit the FIB after incidents,
you're debugging symptoms—not root cause.

And hope is not an operational strategy.

📚 Key Takeaways:

  • RIB convergence ≠ Forwarding correctness — Always verify FIB programming
  • TCAM exhaustion is silent — Until failure strikes during churn
  • PIC timing gaps are real — Software and hardware don't always sync
  • SR/TI-LFA scale matters — Protection paths compete for limited resources
  • Post-incident FIB audits are mandatory — Not optional
  • Design reviews miss hardware limits — Test under stress, not just steady-state

Friday, January 23, 2026

Docker Data Management and Volumes

Docker Data Management and Volumes: Complete Guide

Docker Data Management and Volumes: Complete Guide

Written by: RJS Expert

This guide builds upon the Docker Introduction and Docker Images and Containers guides, exploring how to manage data persistence in Docker containers using volumes, bind mounts, and understanding the critical differences between them.

Understanding Data Types in Docker Applications

Before diving into volumes and data persistence mechanisms, it's essential to understand the three fundamental types of data that exist in containerized applications.

1. Application Code and Environment

Characteristics:

  • Read-Only: Once the image is built, this data doesn't change
  • Source: Copied into the image during the build process
  • Examples: Application source code, dependencies, configuration files
  • Location: Stored in image layers, accessible via container's read-only layer
# Dockerfile example - Application code
FROM node:14
WORKDIR /app
COPY package.json .
RUN npm install
COPY . .
CMD ["node", "server.js"]

2. Temporary Data

Characteristics:

  • Read-Write: Generated and modified during runtime
  • Volatile: It's acceptable if this data is lost when container stops
  • Examples: Temporary files, cache data, session information
  • Location: Stored in container's read-write layer

⚠️ 3. Permanent Data (Critical Data Type)

Characteristics:

  • Read-Write: Generated and modified during runtime
  • Persistent: Must survive container restarts and removals
  • Examples: User accounts, uploaded files, database records, log files
  • Solution: Requires Docker Volumes or Bind Mounts

The Data Persistence Problem

Docker containers operate with a layered file system architecture that creates a fundamental challenge for data persistence. Understanding this architecture is crucial to solving data management problems.

Understanding Container Isolation

Container Layer Architecture

Layer Type Access Lifecycle Purpose
Image Layers Read-Only Permanent (until image deleted) Contains application code and dependencies
Container Layer Read-Write Temporary (deleted with container) Stores runtime changes and new data

The Problem Scenario: What happens when you remove a container?

  1. The container's read-write layer is deleted
  2. All data stored in that layer is permanently lost
  3. The base image remains unchanged (read-only)
  4. New containers start with a clean slate

Example: Feedback Application

// Node.js application storing user feedback
const express = require('express');
const app = express();

app.post('/feedback', (req, res) => {
    // Store feedback in /app/feedback directory
    const feedbackPath = '/app/feedback/' + req.body.title + '.txt';
    fs.writeFileSync(feedbackPath, req.body.content);
    res.json({ message: 'Feedback saved!' });
});

Problem: When you stop and remove the container, all feedback files are lost because they were stored in the container's read-write layer!

Docker Volumes: The Solution

Volumes are folders on your host machine that are mounted (mapped) into Docker containers. They create a bidirectional connection that solves the data persistence problem.

What Are Volumes?

  • Changes in the container are reflected on the host machine
  • Changes on the host machine are reflected in the container
  • Data persists even after container removal
  • Multiple containers can share the same volume

Volumes vs COPY Instruction

Aspect COPY Instruction Volumes
When It Happens During image build (one-time) At container runtime (continuous)
Connection Type Snapshot - no ongoing relation Live connection - bidirectional
Updates Requires image rebuild Automatic and immediate
Data Persistence Lost when container removed Persists on host machine

Types of Volumes

1. Anonymous Volumes

Anonymous Volume Characteristics

  • Docker generates a random ID as the volume name
  • Tied to a specific container lifecycle
  • Automatically deleted when container is removed (with --rm flag)
  • Created with VOLUME instruction in Dockerfile or -v flag without a name
# In Dockerfile
VOLUME ["/app/temp"]

# Or via command line
docker run -v /app/temp myimage

Use Cases for Anonymous Volumes:

  • Performance optimization - offload temporary data from container layer
  • Protecting specific folders from being overwritten by bind mounts
  • Data that doesn't need to persist beyond container lifecycle

2. Named Volumes

Named Volume Characteristics

  • You assign a meaningful name to the volume
  • Not tied to any specific container
  • Survives container shutdown and removal
  • Can be shared across multiple containers
  • Managed by Docker (location on host is abstracted)
# Create and use a named volume
docker run -v feedback:/app/feedback myimage

# List all volumes
docker volume ls

# Inspect a specific volume
docker volume inspect feedback

# Remove a volume
docker volume rm feedback

# Remove all unused volumes
docker volume prune

🎯 Best Practice: Named volumes are the recommended approach for data that needs to persist. Docker manages the storage location, providing portability and ease of management.

Comparison: Anonymous vs Named Volumes

Feature Anonymous Volume Named Volume
Creation VOLUME in Dockerfile or -v /path -v name:/path on docker run
Naming Random ID generated by Docker User-defined name
Container Binding Attached to specific container Independent of containers
Persistence Deleted with container (--rm) Survives container removal
Sharing Cannot be shared Can be shared across containers
Use Case Performance, protecting paths Persistent data storage

Bind Mounts: Development Powerhouse

Bind mounts map a specific directory on your host machine to a directory in the container. Unlike volumes, you control the exact location on the host filesystem.

Key Differences from Volumes

  • Host Path: You specify the exact host directory path
  • Management: You manage the directory, not Docker
  • Visibility: Full access to files on host machine
  • Primary Use: Development environments for live code updates
# Bind mount syntax
docker run -v /absolute/path/on/host:/app/code myimage

# macOS/Linux shortcut
docker run -v $(pwd):/app myimage

# Windows shortcut
docker run -v "%cd%":/app myimage

# Example with complete command
docker run -d \
  --name feedback-app \
  -p 3000:80 \
  -v /Users/developer/project:/app \
  -v /app/node_modules \
  feedback-node

Bind Mounts Use Case: Live Development

Development Workflow:

  1. Mount your source code directory into the container
  2. Edit code on your host machine with your favorite IDE
  3. Changes are immediately available in the running container
  4. No need to rebuild the image for every code change
  5. Combine with nodemon or similar tools for automatic server restart

The node_modules Problem

Common Issue

When you bind mount your entire project directory, you overwrite the node_modules folder that was created during the image build!

Solution: Use an anonymous volume to protect node_modules

# Complete command with node_modules protection
docker run -d \
  --name feedback-app \
  -p 3000:80 \
  -v feedback:/app/feedback \           # Named volume for data
  -v /Users/dev/project:/app \          # Bind mount for source code
  -v /app/node_modules \                # Anonymous volume protects node_modules
  feedback-node

🔍 How Volume Priority Works

When multiple volumes map to overlapping paths, Docker uses this rule:

The most specific (longest) path wins

In the example above:
-v /Users/dev/project:/app maps entire app folder
-v /app/node_modules is more specific
• Result: Bind mount controls /app, but node_modules is preserved from image

Read-Only Volumes

You can make volumes or bind mounts read-only from the container's perspective to prevent accidental modifications.

# Read-only bind mount
docker run -v /host/path:/container/path:ro myimage

# Example: Source code should not be modified by container
docker run -d \
  -v $(pwd):/app:ro \                    # Read-only source code
  -v /app/feedback \                     # Writable data folder
  -v /app/temp \                         # Writable temp folder
  feedback-node

🛡️ Security Best Practice: Use read-only volumes for application code to prevent the container from accidentally modifying your source files.

Volume Management Commands

Essential Docker Volume Commands

Command Description Example
docker volume create Create a volume manually docker volume create mydata
docker volume ls List all volumes docker volume ls
docker volume inspect View volume details docker volume inspect mydata
docker volume rm Remove a specific volume docker volume rm mydata
docker volume prune Remove all unused volumes docker volume prune
# Create a volume
docker volume create feedback-data

# Run container with pre-created volume
docker run -v feedback-data:/app/data myimage

# Inspect volume to see mount point
docker volume inspect feedback-data

# Output shows internal Docker mount point
{
    "CreatedAt": "2024-01-20T10:30:00Z",
    "Driver": "local",
    "Mountpoint": "/var/lib/docker/volumes/feedback-data/_data",
    "Name": "feedback-data"
}

# Remove unused volumes
docker volume prune

Environment Variables and Build Arguments

Environment Variables (Runtime)

Environment variables allow you to configure containers at runtime without rebuilding images.

# In Dockerfile
ENV PORT=80
EXPOSE $PORT

# Set at runtime with --env or -e
docker run -e PORT=8000 -p 8000:8000 myimage

# Use environment file
docker run --env-file .env myimage

# .env file contents
PORT=8000
DB_HOST=localhost
DB_NAME=mydb

⚠️ Security Warning: Don't hardcode sensitive data (passwords, API keys) in Dockerfile. Use environment variables at runtime and keep .env files out of version control!

Build Arguments (Build-time)

Build arguments allow you to pass values during image build, creating flexible images without modifying the Dockerfile.

# In Dockerfile
ARG DEFAULT_PORT=80
ENV PORT=$DEFAULT_PORT
EXPOSE $PORT

# Build with different argument values
docker build --build-arg DEFAULT_PORT=80 -t myapp:web .
docker build --build-arg DEFAULT_PORT=8000 -t myapp:dev .

ARG vs ENV Comparison

Aspect ARG (Build Arguments) ENV (Environment Variables)
Availability Only during image build At build time and runtime
Set via --build-arg flag --env flag or --env-file
Visible in Code No (only in Dockerfile) Yes (accessible in application)
Use Case Build-time configuration Runtime configuration
Security Stored in image history Not in image (if set at runtime)

Best Practices for Data Management

✅ Development Best Practices

  1. Use Bind Mounts: For source code to enable live updates
  2. Protect Dependencies: Use anonymous volumes for node_modules, vendor folders
  3. Use .dockerignore: Prevent unnecessary files from being copied
  4. Hot Reload Tools: Implement nodemon, webpack-dev-server for automatic restarts
  5. Read-Only Mounts: Make source code read-only from container

✅ Production Best Practices

  1. Named Volumes Only: No bind mounts in production
  2. Snapshot Images: Use COPY in Dockerfile for code
  3. Data Persistence: Use named volumes for databases, user files
  4. Environment Variables: Configure via --env at runtime
  5. Backup Strategy: Regularly backup volume data
  6. Volume Cleanup: Implement volume pruning strategies

Volume Strategy by Data Type

Data Type Development Production
Source Code Bind mount (read-only) COPY in Dockerfile (no volume)
Dependencies Anonymous volume In image via RUN command
User Data Named volume Named volume
Logs Named volume or bind mount Named volume or logging service
Temporary Files Anonymous volume Anonymous volume or tmpfs
Configuration Bind mount Environment variables or secrets

Troubleshooting Common Issues

Issue 1: Data Not Persisting

Symptom: Data disappears when container restarts

Causes:

  • Using anonymous volumes instead of named volumes
  • Using --rm flag without proper volumes
  • Removing volumes with container

Solution: Use named volumes: -v mydata:/app/data and verify with docker volume ls

Issue 2: Bind Mount Not Working (WSL2 Windows)

Symptom: File changes don't reflect in container

Cause: Project in Windows filesystem, not Linux filesystem

Solution: Move project to WSL Linux filesystem and access via \\wsl$\Ubuntu\home\user\project

Issue 3: Permission Denied Errors

Symptom: Container cannot write to volume

Solutions:

  • Remove :ro flag if write access needed
  • Check host directory permissions: chmod 755
  • Run container with correct user: --user $(id -u):$(id -g)

Issue 4: node_modules Overwritten by Bind Mount

Symptom: Module not found errors after adding bind mount

Cause: Bind mount overwrites node_modules from image

Solution: Add anonymous volume for node_modules: -v /app/node_modules

Key Takeaways

Summary of Core Concepts

  1. Three Data Types: Application code (read-only), temporary data (volatile), permanent data (must persist)
  2. Container Isolation: Data in container's read-write layer is lost when container is removed
  3. Volumes: Folders on host machine mounted into containers for data persistence
  4. Anonymous Volumes: Container-specific, good for performance and path protection
  5. Named Volumes: Persistent, shareable, managed by Docker - best for permanent data
  6. Bind Mounts: Development tool for live code updates, you control host path
  7. Read-Only Volumes: Security practice for source code
  8. Volume Priority: More specific (longer) paths override general ones
  9. Environment Variables: Runtime configuration without image rebuild
  10. Build Arguments: Build-time customization for flexible images

Volume Type Quick Reference

When You Need Use This
Persistent data across container lifecycles Named Volume
Live code updates during development Bind Mount
Protect folders from bind mount override Anonymous Volume
Share data between containers Named Volume
Temporary performance optimization Anonymous Volume or tmpfs
Prevent container from modifying code Read-Only Bind Mount

🎯 Production Reminder

In production environments:

  • Never use bind mounts (no source code connections)
  • Use named volumes for all persistent data
  • Application code comes from COPY in Dockerfile (snapshot)
  • Configure via environment variables, not bind mounts
  • Implement proper backup strategies for volume data

Wednesday, January 21, 2026

Docker Images and Containers

Docker Images and Containers: A Complete Deep Dive | RJS Expert

Docker Images and Containers: A Complete Deep Dive

Written by: RJS Expert

This guide builds upon the Docker Introduction and explores the relationship between images and containers, how to build custom images, and optimize your Docker workflow.

Images vs Containers: The Core Relationship

When working with Docker, understanding the distinction between images and containers is fundamental to mastering containerization.

What Are Docker Images?

Images are templates, blueprints for containers.

An image contains:

  • The application code
  • The required tools to execute the code
  • All dependencies and libraries
  • Environment configuration
  • Setup instructions

Key Point: Images are read-only and shareable. You create them once and can use them to run multiple containers.

What Are Docker Containers?

Containers are running instances of images.

A container is:

  • The concrete running application
  • Based on an image
  • Isolated from other containers
  • Can be started, stopped, and removed
  • Has its own filesystem and network

Analogy: If an image is a class in programming, a container is an instance of that class. You can create multiple containers (instances) from a single image (class).

The Relationship

Images Containers
Templates/Blueprints Running instances
Read-only Read-write layer on top
Created once, reused many times Can create multiple containers from one image
Contains code and environment Executes the code

We run containers, which are based on images.

Working with Pre-Built Images

Before building custom images, let's understand how to use existing images from Docker Hub.

Docker Hub: The Image Registry

Docker Hub (hub.docker.com) hosts thousands of pre-built images:

  • Official images (Node.js, Python, Nginx, PostgreSQL, etc.)
  • Community-maintained images
  • Your own custom images

Pulling and Running an Image

Example: Running the official Node.js image

# Pull and run Node.js image
docker run node

# Run with interactive terminal
docker run -it node

# List all containers (including stopped)
docker ps -a

Important: By default, containers are isolated. Even if a process inside the container exposes a port or interface, it's not automatically available to the host machine.

Understanding Container Isolation

When you run docker run node, the container starts and immediately exits because:

  • No interactive session is exposed
  • The container ran its default command and finished
  • Containers run in isolation from the host

To interact with the container, use the -it flag:

# -i = interactive (keep STDIN open)
# -t = tty (allocate pseudo-terminal)
docker run -it node

# Now you can run Node commands
> 1 + 1
2
> console.log("Hello from container!")
Hello from container!

Building Custom Images

Most real-world scenarios require building custom images with your application code.

The Dockerfile

A Dockerfile contains instructions for building an image. It's a plain text file (no extension) that Docker reads to create your custom image.

Sample Application Structure

my-node-app/
├── server.js           # Application code
├── package.json        # Dependencies
├── public/
│   └── styles.css      # Static files
└── Dockerfile          # Image instructions

Creating Your First Dockerfile

# Use Node.js as base image
FROM node

# Set working directory
WORKDIR /app

# Copy package.json first (optimization)
COPY package.json /app

# Install dependencies
RUN npm install

# Copy application code
COPY . /app

# Document exposed port
EXPOSE 80

# Command to run when container starts
CMD ["node", "server.js"]

Understanding Dockerfile Instructions

Instruction Purpose When Executed
FROM Specifies base image During image build
WORKDIR Sets working directory inside container During image build
COPY Copies files from host to image During image build
RUN Executes commands during build During image build
EXPOSE Documents which port container uses Documentation only
CMD Default command to run When container starts

Critical Difference: RUN vs CMD

  • RUN executes during image build (e.g., installing packages)
  • CMD executes when container starts (e.g., starting your application)

Building the Image

# Build image from Dockerfile in current directory
docker build .

# Output shows each step
Step 1/7 : FROM node
Step 2/7 : WORKDIR /app
Step 3/7 : COPY package.json /app
Step 4/7 : RUN npm install
Step 5/7 : COPY . /app
Step 6/7 : EXPOSE 80
Step 7/7 : CMD ["node", "server.js"]
Successfully built abc123def456

Running Your Custom Container

# Run container from image ID
docker run abc123def456

# Won't work yet! Need to publish ports...

Port Publishing: Exposing Container Ports

Even though we added EXPOSE 80 in the Dockerfile, the container port is still not accessible from the host.

Why EXPOSE Alone Isn't Enough

The EXPOSE instruction is documentation only. It tells users which port the container uses, but doesn't actually publish it.

Publishing Ports with -p Flag

# -p HOST_PORT:CONTAINER_PORT
docker run -p 3000:80 abc123def456

# Now accessible at localhost:3000
# Host port 3000 → Container port 80

Port Mapping Explained:

  • 3000 (left side) = Port on your host machine
  • 80 (right side) = Port inside the container
  • You can map to any available host port
  • Multiple containers can use the same container port (80) as long as host ports differ

Managing Containers

Essential Container Commands

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

# Stop a container
docker stop CONTAINER_NAME

# Start a stopped container
docker start CONTAINER_NAME

# Remove a container
docker rm CONTAINER_NAME

# Remove an image
docker rmi IMAGE_ID

# View container logs
docker logs CONTAINER_NAME

Using Short IDs

You don't need to type the full container or image ID. Docker accepts unique prefixes:

# Full ID
docker run abc123def456

# Short ID (first few characters)
docker run abc

# If unique, even single character works
docker run a

Images Are Immutable: The Snapshot Concept

This is one of the most important concepts to understand about Docker images.

What Happens When You Change Code

Let's say you modify your source code after building an image:

// server.js - Original
<h1>My Course Goal</h1>

// server.js - Modified
<h1>My Course Goal!</h1>  // Added exclamation mark

If you restart the container, the change won't appear.

Why Code Changes Don't Appear

Understanding the Snapshot:

  1. When you run COPY . /app, Docker copies files at that moment
  2. The image stores a snapshot of your code
  3. Changes to source files after building don't affect the image
  4. The image is read-only and locked

Solution: Rebuild the Image

# Rebuild to pick up code changes
docker build .

# New image ID is generated
Successfully built xyz789abc012

# Run container with new image
docker run -p 3000:80 xyz789abc012

Key Takeaway: Images are templates that are finalized when built. To update the code in an image, you must rebuild it.

Layer-Based Architecture: Understanding Caching

Docker images use a layer-based architecture to optimize build performance.

How Layers Work

Each instruction in a Dockerfile creates a layer:

FROM node           # Layer 1
WORKDIR /app        # Layer 2
COPY package.json   # Layer 3
RUN npm install     # Layer 4
COPY . /app         # Layer 5
EXPOSE 80           # Layer 6 (metadata)
CMD ["node"...]     # Layer 7

Caching Behavior

How Docker Uses Cache:

  • Each layer result is cached
  • If nothing changed, Docker uses the cached layer
  • If a layer changes, that layer and all subsequent layers are rebuilt
  • Cache dramatically speeds up rebuilds

Rebuilding Without Code Changes

# First build - all layers executed
docker build .

# Second build (no changes) - uses cache
docker build .

Step 1/7 : FROM node
 ---> Using cache
Step 2/7 : WORKDIR /app
 ---> Using cache
Step 3/7 : COPY package.json
 ---> Using cache
Step 4/7 : RUN npm install
 ---> Using cache
...
Successfully built (almost instant!)

When Code Changes

# Modified server.js, rebuild
docker build .

Step 1/7 : FROM node
 ---> Using cache
Step 2/7 : WORKDIR /app
 ---> Using cache
Step 3/7 : COPY package.json
 ---> Using cache
Step 4/7 : RUN npm install
 ---> Using cache
Step 5/7 : COPY . /app
 ---> abc123def  # NEW - detects file change
Step 6/7 : EXPOSE 80
 ---> xyz456abc
Step 7/7 : CMD ["node", "server.js"]
 ---> hij789klm

Notice: Layers 1-4 used cache, but layer 5 onwards were rebuilt.

Optimizing Dockerfile Layer Order

The order of instructions matters significantly for build performance.

Unoptimized Dockerfile

FROM node
WORKDIR /app
COPY . /app              # Copies ALL files
RUN npm install          # Runs every time code changes
CMD ["node", "server.js"]

Problem: Any code change invalidates the COPY layer, which means npm install runs again unnecessarily.

Optimized Dockerfile

FROM node
WORKDIR /app
COPY package.json /app   # Copy dependencies first
RUN npm install          # Install dependencies
COPY . /app              # Copy source code last
CMD ["node", "server.js"]

Benefit: Source code changes don't trigger npm install again unless package.json changes.

Optimization Strategy:

  • Place stable instructions (rarely change) at the top
  • Place frequently changing instructions (code) at the bottom
  • Separate dependency installation from code copying
  • This maximizes cache utilization

Impact Example

Scenario Unoptimized Optimized
First build 60 seconds 60 seconds
Rebuild (code change) 55 seconds (npm install again) 5 seconds (cache used)
Rebuild (dependency change) 55 seconds 55 seconds

Complete Workflow Example

Let's put everything together with a complete workflow:

1. Create Application Files

// server.js
const express = require('express');
const app = express();

app.get('/', (req, res) => {
  res.send('<h1>Hello from Docker!</h1>');
});

app.listen(80, () => {
  console.log('Server running on port 80');
});
// package.json
{
  "name": "docker-demo",
  "version": "1.0.0",
  "dependencies": {
    "express": "^4.18.0"
  }
}

2. Create Optimized Dockerfile

FROM node
WORKDIR /app
COPY package.json /app
RUN npm install
COPY . /app
EXPOSE 80
CMD ["node", "server.js"]

3. Build and Run

# Build the image
docker build -t my-node-app .

# Run the container
docker run -p 3000:80 my-node-app

# Access at http://localhost:3000

4. Make Code Changes

// Update server.js
res.send('<h1>Hello from Docker - Updated!</h1>');

# Rebuild (fast - uses cache for npm install)
docker build -t my-node-app .

# Stop old container
docker stop CONTAINER_NAME

# Run new container
docker run -p 3000:80 my-node-app

Key Takeaways

Essential Concepts

  • Images are templates - Read-only blueprints containing code and environment
  • Containers are instances - Running applications based on images
  • Images are immutable - Must rebuild to incorporate changes
  • Layers enable caching - Each instruction creates a cached layer
  • Order matters - Place stable instructions first to maximize cache usage
  • EXPOSE is documentation - Use -p flag to actually publish ports
  • RUN vs CMD - RUN during build, CMD when container starts

Best Practices Summary

Dockerfile Optimization

  1. Copy dependency files (package.json) before copying source code
  2. Run dependency installation before copying full application
  3. Place frequently changing layers (code) at the bottom
  4. Use .dockerignore to exclude unnecessary files
  5. Combine multiple RUN commands to reduce layers
  6. Use specific base image versions (node:14) instead of :latest

Container Management

  • Name your containers with --name for easier management
  • Use -d flag to run containers in detached mode
  • Regularly clean up stopped containers with docker container prune
  • Remove unused images with docker image prune
  • Use docker logs to troubleshoot container issues

What's Next?

Now that you understand images and containers deeply, you can explore:

  • Data Persistence: Docker volumes and bind mounts
  • Networking: Container communication and networks
  • Multi-Container Apps: Docker Compose
  • Environment Variables: Configuration management
  • Multi-Stage Builds: Advanced image optimization
  • Container Orchestration: Kubernetes for production