Introduction: Why Measure Folder Size in Bash?
Managing disk space is a fundamental task for software engineers, sysadmins, and power users. Whether you're running a personal laptop or overseeing mission-critical servers, knowing which directories are consuming the most space can prevent outages, slowdowns, or even data loss. Bash, the ubiquitous Unix shell, provides powerful tools—especially the du
utility—for quickly checking folder sizes with minimal setup. In this post, we'll walk through a quick tutorial, evolving from a quick one-liner to a full-featured folder size checker script suitable for production monitoring.
Learning how to use Bash for disk usage checks brings both flexibility and power. While GUI tools exist, scripts are essential for remote servers, CI/CD pipelines, or cron-based automation. We'll explore not just basic commands but also advanced scripting patterns, error handling, sorting, size thresholds, and integration with monitoring workflows. By the end, you'll have a reusable, customizable Bash script for folder size checks that scales from quick local checks to automated, server-grade health monitoring.
The Simplest Approach: One-Liners for Immediate Checks
If you need a fast answer to “How big is this folder?”, Bash offers an elegant one-liner using the du
command. Here's a quick snippet:
#!/bin/bash
folder_path="/path/to/folder"
# Check and display the size of a specified folder
du -sh "$folder_path"
echo "Folder size checked."
This script uses du -sh
, where -s
summarizes the total size and -h
makes the output human-readable (e.g., in MB or GB). It's ideal for quick, ad-hoc usage. Just change the folder_path
variable, run the script, and you're done. This approach is also perfect for embedded checks in larger Bash workflows, or for use directly in the terminal.
But what if you want to make the script more robust? For example, you may want to pass the folder as a command-line argument, handle errors gracefully, and ensure the script behaves predictably in edge cases. Here's an improved version:
#!/usr/bin/env bash
set -Eeuo pipefail
target="${1:-.}"
if [[ ! -d "$target" ]]; then
echo "Error: '$target' is not a directory" >&2
exit 1
fi
echo "Checking size of: $target"
du -sh -- "$target"
This version checks if the argument is a valid directory and uses best practices (set -Eeuo pipefail
) to make the script safer. It defaults to the current directory if no argument is provided, making it convenient and user-friendly.
Deep Dive: Sorting, Excludes, and Top-N Largest Subdirectories
While one-liners are handy, real-world storage issues usually require more detail. Which subdirectories are the biggest culprits? How can we ignore noisy directories like .git
or node_modules
? Bash and du
can help here, too:
# Top 10 largest subdirs at depth 1, excluding common noise
du -xh --max-depth=1 -- . \
| grep -Ev '/\.git($|/)|/node_modules($|/)' \
| sort -hr \
| head -n 10
Let's break it down:
du -xh --max-depth=1 -- .
lists all first-level subdirectories with human-readable sizes.grep -Ev '/\.git($|/)|/node_modules($|/)'
filters out.git
andnode_modules
folders, which often inflate usage but aren't relevant to most disk audits.sort -hr
sorts the results by size in descending order.head -n 10
gives you the top ten largest offenders.
This pattern is invaluable for diagnosing space leaks, especially in development environments or during incident response. You can customize the exclude pattern to fit your stack—think Python virtualenvs, log directories, or vendor folders. With a few tweaks, this approach serves as the core of many server health scripts, CI/CD checks, or even developer onboarding guides.
Building a Production-Ready Folder Size Checker Script
Simple commands are great, but there comes a time when you need a script that's reusable, parameterized, and safe for automation. Below is a robust Bash script (sizecheck.sh
) that supports excludes, thresholds, top-N reporting, JSON output, and more:
#!/usr/bin/env bash
set -Eeuo pipefail
usage() {
cat <<'EOF'
Usage: sizecheck.sh [-p PATH] [-n TOP] [-e PATTERN] [-t SIZE] [-x] [--json]
-p PATH Directory to analyze (default: current directory)
-n TOP Show top N largest first-level subdirectories (default: 0 = none)
-e PATTERN Exclude glob pattern (can be repeated). Example: -e "node_modules" -e ".git"
-t SIZE Fail if PATH is larger than SIZE (e.g., 500M, 2G)
-x Do not cross filesystem boundaries
--json Print a JSON summary in addition to human output
--apparent Use apparent size instead of blocks allocated
-h Show help
Examples:
sizecheck.sh -p . -n 10 -e node_modules -e .git
sizecheck.sh -p /var/log -t 5G -x --json
EOF
}
# Parse human sizes like 500M, 2G, 1024K -> bytes
to_bytes() {
local s="${1^^}"
if [[ "$s" =~ ^([0-9]+)([KMGTP]?)B?$ ]]; then
local num="${BASH_REMATCH[1]}" unit="${BASH_REMATCH[2]}"
local mul=1
case "$unit" in
K) mul=$((1024));;
M) mul=$((1024**2));;
G) mul=$((1024**3));;
T) mul=$((1024**4));;
P) mul=$((1024**5));;
esac
echo $(( num * mul ))
else
echo "Invalid size: $1" >&2
return 1
fi
}
PATH_ARG="."
TOP_N=0
EXCLUDES=()
THRESHOLD_BYTES=0
NO_CROSS=0
JSON_OUT=0
APPARENT=0
while (($#)); do
case "$1" in
-p) PATH_ARG="$2"; shift 2;;
-n) TOP_N="$2"; shift 2;;
-e) EXCLUDES+=("$2"); shift 2;;
-t) THRESHOLD_BYTES="$(to_bytes "$2")"; shift 2;;
-x) NO_CROSS=1; shift;;
--json) JSON_OUT=1; shift;;
--apparent) APPARENT=1; shift;;
-h|--help) usage; exit 0;;
--) shift; break;;
-*) echo "Unknown option: $1" >&2; usage; exit 2;;
*) break;;
esac
done
if [[ ! -d "$PATH_ARG" ]]; then
echo "Error: '$PATH_ARG' is not a directory" >&2
exit 1
fi
DU_OPTS=(-h -s)
(( APPARENT )) && DU_OPTS+=(--apparent-size)
(( NO_CROSS )) && DU_OPTS+=(-x)
# Compute total size
total_human="$(du "${DU_OPTS[@]}" -- "$PATH_ARG" | awk '{print $1}')"
# Recompute non-human bytes for threshold check
total_bytes="$(du -sb $([[ $APPARENT -eq 1 ]] && echo --apparent-size) $([[ $NO_CROSS -eq 1 ]] && echo -x) -- "$PATH_ARG" | awk '{print $1}')"
echo "Path: $PATH_ARG"
echo "Total size: $total_human ($total_bytes bytes)"
# Optional top-N breakdown
if (( TOP_N > 0 )); then
echo
echo "Top $TOP_N largest first-level subdirectories:"
# Build exclude regex
exclude_regex=""
if ((${#EXCLUDES[@]})); then
exclude_regex="$(printf "%s\n" "${EXCLUDES[@]}" | sed 's/[].[^$*\/]/\\&/g' | paste -sd'|' -)"
fi
du_list=$(du -xh --max-depth=1 $([[ $APPARENT -eq 1 ]] && echo --apparent-size) $([[ $NO_CROSS -eq 1 ]] && echo -x) -- "$PATH_ARG" \
| tail -n +2 \
| { if [[ -n "$exclude_regex" ]]; then grep -Ev "/($exclude_regex)(/|$)"; else cat; fi; } \
| sort -hr \
| head -n "$TOP_N")
printf "%s\n" "$du_list"
fi
# JSON output (simple summary)
if (( JSON_OUT )); then
echo
printf '{ "path": "%s", "total_bytes": %s, "total_human": "%s" }\n' \
"$(printf "%s" "$PATH_ARG" | sed 's/"/\\"/g')" \
"$total_bytes" \
"$total_human"
fi
# Threshold check
if (( THRESHOLD_BYTES > 0 )) && (( total_bytes > THRESHOLD_BYTES )); then
echo "Threshold exceeded: $(printf "%'d" "$total_bytes") > $(printf "%'d" "$THRESHOLD_BYTES") bytes" >&2
exit 3
fi
This advanced script offers:
- Flexible CLI options: Analyze any directory, show the top N largest subfolders, exclude arbitrary patterns, and more.
- Threshold checks: Fail with a nonzero exit code if the directory exceeds a given size, perfect for automation or monitoring.
- Apparent size vs. allocated blocks: Decide whether to use “apparent” file size or disk blocks used.
- JSON Output: Easily plug the script into log aggregators or dashboards.
- Robust error handling: Safe for use in production, with clear error messages and exit codes.
You can deploy this script to /usr/local/bin/sizecheck.sh
and run it interactively, from cron, or in CI pipelines. The ability to exclude patterns and set thresholds makes it suitable for complex environments where not all large folders are equally important.
Automation with Cron: Scheduled Monitoring and Alerting
Manual checks are great, but the real power comes when you automate them. With cron, you can schedule daily (or even hourly) size checks and trigger alerts if disk usage crosses critical thresholds. Here are some example cron entries:
# Edit with: crontab -e
# Every day at 8:00, fail if /var/log is > 5G
0 8 * * * /usr/local/bin/sizecheck.sh -p /var/log -t 5G -x || echo "Warning: /var/log exceeded 5G" >> /var/log/disk_warnings.log
# Every day at 8:10, capture top 10 heavy subdirs in $HOME
10 8 * * * /usr/local/bin/sizecheck.sh -p "$HOME" -n 10 --json >> "$HOME/.disk_reports/$(date +\%F).json"
These jobs provide two critical benefits:
- Proactive alerts: Get notified before disk usage becomes a crisis. The script exits nonzero when thresholds are breached, so you can trigger emails, SMS, or other alerts.
- Historical logs: By saving JSON output daily, you can track trends, investigate incidents, or build dashboards showing how your disk usage evolves over time.
Automated folder size monitoring is often the difference between a healthy system and a surprise outage. If you're running multiple servers, consider centralizing logs or integrating with monitoring solutions like Prometheus or Datadog.
Conclusion: Empower Your Infrastructure with Smart Bash Disk Checks
In this quick tutorial, we've shown how Bash, combined with the power of du
, sort
, and robust scripting practices, can help you understand and control disk usage at any scale. From a simple one-liner to a comprehensive production-ready script, these patterns are proven in development, DevOps, and SRE environments.
Investing a few minutes in learning and automating folder size checks pays off massively in uptime, predictability, and peace of mind. Whether you're maintaining your laptop, running a SaaS platform, or helping your team troubleshoot disk issues, these Bash techniques are tools you'll reach for again and again.
Ready to take your monitoring to the next level? Start with the provided scripts, adapt them to your needs, and schedule them with cron. The result: less firefighting, more time building what matters.