Python
Python argparse Tutorial: Build a Command-Line Log Analyser
Introduction
Every serious tool eventually needs a command-line interface. Whether it's a data pipeline that runs on a schedule, a deployment script that accepts environment flags, or a report generator that needs flexible input, the pattern is the same: users pass arguments to control behavior, and your code needs to read and validate those arguments reliably.
Python ships with argparse — a batteries-included argument parsing library that transforms raw command-line strings into structured, validated Python objects. This tutorial walks through building a real log analysis CLI from scratch, starting with the fragile approach of reading sys.argv directly and ending with a professional-grade tool that supports subcommands, type validation, and automatic help text.
Background
Every Python program receives its command-line arguments as a list of strings in sys.argv. The first item is always the script name. Everything after that is what the user typed.
Parsing those strings manually is possible for very simple cases, but it breaks quickly: you have to handle missing arguments, wrong types, unknown flags, and help text yourself. argparse does all of that for you, plus it generates --help output automatically and produces meaningful error messages when the user makes a mistake.
The result is a tool that behaves the way users expect command-line tools to behave.
Practical Scenario
Consider a web server that writes access logs in CSV format — one row per HTTP request, with fields like timestamp, method, URL path, status code, and response time in milliseconds. Over time these logs accumulate and become hard to reason about manually.
You need a CLI tool that can:
- Load a log file and filter rows by status code
- Summarize traffic: top paths by frequency, status code breakdown, average response time
- Export filtered results to CSV or JSON for further processing
This is a pattern that comes up constantly in DevOps pipelines, backend debugging, and data engineering. A well-structured CLI makes the tool reusable and easy to integrate into shell scripts without any code changes.
The Problem with sys.argv
The most direct way to read arguments in Python is to inspect sys.argv directly.
Create a new file:
touch log_analyzer.py
Run it using:
python3 log_analyzer.py logs.csv --status 404
import sys
if len(sys.argv) < 2:
print("Usage: log_analyzer.py <file> [--status <code>]")
sys.exit(1)
filename = sys.argv[1]
status_filter = None
if "--status" in sys.argv:
idx = sys.argv.index("--status")
if idx + 1 >= len(sys.argv):
print("Error: --status requires a value")
sys.exit(1)
status_filter = sys.argv[idx + 1]
print(f"Reading: {filename}")
print(f"Status filter: {status_filter}")
The output will look like this:
Reading: logs.csv
Status filter: 404
This works for exactly this combination of arguments. The moment a user passes an unexpected flag, forgets the file, or runs --help, the program either crashes or prints a vague message. Every new argument you add requires more index arithmetic and more manual validation. This approach does not scale.
Basic argparse: Help and Error Handling for Free
Let's replace the manual parsing with argparse. Replace the entire content of log_analyzer.py with the following:
import argparse
parser = argparse.ArgumentParser(
description="Analyze web server access logs."
)
parser.add_argument("file", help="Path to the CSV log file")
parser.add_argument("--status", help="Filter by HTTP status code")
args = parser.parse_args()
print(f"Reading: {args.file}")
print(f"Status filter: {args.status}")
Run it with no arguments:
python3 log_analyzer.py
usage: log_analyzer.py [-h] [--status STATUS] file
log_analyzer.py: error: the following arguments are required: file
Run it with --help:
python3 log_analyzer.py --help
usage: log_analyzer.py [-h] [--status STATUS] file
Analyze web server access logs.
positional arguments:
file Path to the CSV log file
options:
-h, --help show this help message and exit
--status STATUS Filter by HTTP status code
Why this is better
argparse validates inputs automatically, generates formatted help text, and reports errors clearly — all with zero extra code on your side. Your application code reads clean Python attributes like args.file instead of fragile string indexes like sys.argv[1]. Adding a new argument is additive, not surgical.
Types, Defaults, Choices, and Short Flags
Arguments are strings by default. For a log analyzer, some values should be integers, and some should be restricted to a known set of options. Replace the content of log_analyzer.py:
import argparse
parser = argparse.ArgumentParser(
description="Analyze web server access logs."
)
parser.add_argument("file", help="Path to the CSV log file")
parser.add_argument("--status", type=int, help="Filter by HTTP status code (e.g. 404)")
parser.add_argument("--top", "-n", type=int, default=10, help="Show top N paths (default: 10)")
parser.add_argument("--format", choices=["table", "json"], default="table",
help="Output format (default: table)")
parser.add_argument("--verbose", "-v", action="store_true", help="Print each loaded row")
args = parser.parse_args()
print(f"File: {args.file}")
print(f"Status: {args.status}")
print(f"Top N: {args.top}")
print(f"Format: {args.format}")
print(f"Verbose: {args.verbose}")
Run it with several options:
python3 log_analyzer.py logs.csv --status 500 -n 5 --format json -v
File: logs.csv
Status: 500
Top N: 5
Format: json
Verbose: True
Try passing a format value that is not allowed:
python3 log_analyzer.py logs.csv --format xml
usage: log_analyzer.py [-h] [--status STATUS] [--top TOP] [--format {table,json}] [--verbose] file
log_analyzer.py: error: argument --format: invalid choice: 'xml' (choose from 'table', 'json')
Why this is better
type=int converts the string automatically and rejects non-numeric input before it reaches your code. choices restricts what values are legal and communicates that restriction in both the error message and the help text. default means unspecified arguments have sensible values rather than None. action="store_true" handles boolean flags cleanly — no value needed, the flag itself sets it to True. The -n and -v short forms are registered alongside the long ones with a single line.
Building a Working Analyzer
Now let's wire everything up to actual log processing. First we need some data to work with. Replace the entire content of log_analyzer.py:
import argparse
import csv
import json
from collections import Counter
def generate_sample_logs(filename):
rows = [
["2024-01-15 10:23:01", "GET", "/home", 200, 45],
["2024-01-15 10:23:05", "GET", "/api/users", 200, 120],
["2024-01-15 10:23:10", "POST", "/api/login", 401, 30],
["2024-01-15 10:23:15", "GET", "/api/users", 200, 115],
["2024-01-15 10:23:20", "GET", "/missing-page", 404, 12],
["2024-01-15 10:23:25", "GET", "/home", 200, 42],
["2024-01-15 10:23:30", "DELETE", "/api/users/42", 500, 230],
["2024-01-15 10:23:35", "GET", "/api/users", 200, 118],
["2024-01-15 10:23:40", "GET", "/home", 200, 44],
["2024-01-15 10:23:45", "POST", "/api/login", 200, 95],
["2024-01-15 10:23:50", "GET", "/missing-page", 404, 11],
["2024-01-15 10:23:55", "GET", "/api/data", 500, 310],
]
with open(filename, "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["timestamp", "method", "path", "status", "response_ms"])
writer.writerows(rows)
print(f"Sample log file written to: {filename}")
def load_logs(filename, status_filter=None, verbose=False):
rows = []
with open(filename, newline="") as f:
reader = csv.DictReader(f)
for row in reader:
row["status"] = int(row["status"])
row["response_ms"] = int(row["response_ms"])
if status_filter is None or row["status"] == status_filter:
rows.append(row)
if verbose:
print(f" loaded: {row['method']} {row['path']} -> {row['status']}")
return rows
def summarize(rows, top_n, output_format):
if not rows:
print("No matching rows found.")
return
path_counts = Counter(row["path"] for row in rows)
status_counts = Counter(row["status"] for row in rows)
avg_response = sum(row["response_ms"] for row in rows) / len(rows)
report = {
"total_requests": len(rows),
"avg_response_ms": round(avg_response, 1),
"status_breakdown": dict(sorted(status_counts.items())),
"top_paths": dict(path_counts.most_common(top_n)),
}
if output_format == "json":
print(json.dumps(report, indent=2))
else:
print(f"Total requests : {report['total_requests']}")
print(f"Avg response : {report['avg_response_ms']} ms")
print(f"\nStatus breakdown:")
for code, count in report["status_breakdown"].items():
print(f" {code}: {count} requests")
print(f"\nTop {top_n} paths:")
for path, count in report["top_paths"].items():
print(f" {count:4d} {path}")
parser = argparse.ArgumentParser(description="Analyze web server access logs.")
parser.add_argument("file", help="Path to the CSV log file")
parser.add_argument("--status", type=int, help="Filter by HTTP status code")
parser.add_argument("--top", "-n", type=int, default=10, help="Show top N paths (default: 10)")
parser.add_argument("--format", choices=["table", "json"], default="table", help="Output format")
parser.add_argument("--verbose", "-v", action="store_true", help="Print each loaded row")
parser.add_argument("--generate-sample", action="store_true",
help="Write a sample log file to FILE and exit")
args = parser.parse_args()
if args.generate_sample:
generate_sample_logs(args.file)
else:
rows = load_logs(args.file, status_filter=args.status, verbose=args.verbose)
summarize(rows, args.top, args.format)
Generate the sample data first:
python3 log_analyzer.py sample_logs.csv --generate-sample
Sample log file written to: sample_logs.csv
Now run the full analysis:
python3 log_analyzer.py sample_logs.csv
Total requests : 12
Avg response : 97.2 ms
Status breakdown:
200: 7 requests
401: 1 requests
404: 2 requests
500: 2 requests
Top 10 paths:
3 /home
3 /api/users
2 /missing-page
2 /api/login
1 /api/users/42
1 /api/data
Filter to only 500 errors with JSON output:
python3 log_analyzer.py sample_logs.csv --status 500 --format json
{
"total_requests": 2,
"avg_response_ms": 270.0,
"status_breakdown": {
"500": 2
},
"top_paths": {
"/api/users/42": 1,
"/api/data": 1
}
}
The same tool now handles filtering, formatting, and summary generation through clean CLI flags with no code changes needed to switch between modes.
Subcommands with Subparsers
Real CLI tools often do multiple distinct things: git has commit, push, and pull. docker has build, run, and ps. argparse supports this pattern with subparsers, where each subcommand defines its own arguments independently.
We will split the tool into two subcommands: analyze for generating summaries and export for writing filtered rows to a new file. Replace everything from the parser = argparse.ArgumentParser(...) line to the end of the file with the following:
def export_rows(rows, output_file, output_format):
if not rows:
print("No matching rows found.")
return
if output_format == "json":
with open(output_file, "w") as f:
json.dump(rows, f, indent=2)
else:
with open(output_file, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=rows[0].keys())
writer.writeheader()
writer.writerows(rows)
print(f"Exported {len(rows)} rows to: {output_file}")
parser = argparse.ArgumentParser(description="Web server log analysis tool.")
parser.add_argument("file", help="Path to the CSV log file")
parser.add_argument("--status", type=int, help="Filter by HTTP status code")
parser.add_argument("--verbose", "-v", action="store_true", help="Print each loaded row")
subparsers = parser.add_subparsers(dest="command", help="Available commands")
analyze_parser = subparsers.add_parser("analyze", help="Summarize log statistics")
analyze_parser.add_argument("--top", "-n", type=int, default=10, help="Show top N paths (default: 10)")
analyze_parser.add_argument("--format", choices=["table", "json"], default="table", help="Output format")
export_parser = subparsers.add_parser("export", help="Export filtered rows to a new file")
export_parser.add_argument("output", help="Output file path")
export_parser.add_argument("--format", choices=["csv", "json"], default="csv", help="Output format (default: csv)")
args = parser.parse_args()
if args.command == "analyze":
rows = load_logs(args.file, status_filter=args.status, verbose=args.verbose)
summarize(rows, args.top, args.format)
elif args.command == "export":
rows = load_logs(args.file, status_filter=args.status, verbose=args.verbose)
export_rows(rows, args.output, args.format)
else:
parser.print_help()
Run the analyze subcommand, showing only the top 3 paths:
python3 log_analyzer.py sample_logs.csv analyze --top 3
Total requests : 12
Avg response : 97.2 ms
Status breakdown:
200: 7 requests
401: 1 requests
404: 2 requests
500: 2 requests
Top 3 paths:
3 /home
3 /api/users
2 /missing-page
Export only the 404 errors to a JSON file:
python3 log_analyzer.py sample_logs.csv --status 404 export errors_404.json --format json
Each subcommand has its own --help page:
python3 log_analyzer.py sample_logs.csv export --help
usage: log_analyzer.py file export [-h] [--format {csv,json}] output
positional arguments:
output Output file path
options:
-h, --help show this help message and exit
--format {csv,json} Output format (default: csv)
Why this is better
Subparsers let each command define its own arguments without polluting a single flat namespace. The top-level parser still owns shared arguments like --status and --verbose that apply regardless of which subcommand runs. Each subcommand gets its own required arguments, its own defaults, and its own help page. Users only see the flags that are relevant to what they are doing.
Mutually Exclusive Arguments
Some argument combinations are logically incompatible. argparse has a built-in mechanism for declaring that two flags cannot be used together, which enforces the constraint before your code ever runs.
Add the following output mode flags to the analyze subparser. Replace the analyze_parser block with this:
analyze_parser = subparsers.add_parser("analyze", help="Summarize log statistics")
analyze_parser.add_argument("--top", "-n", type=int, default=10, help="Show top N paths (default: 10)")
analyze_parser.add_argument("--format", choices=["table", "json"], default="table", help="Output format")
output_group = analyze_parser.add_mutually_exclusive_group()
output_group.add_argument("--summary-only", action="store_true",
help="Show only the total count and average response time")
output_group.add_argument("--paths-only", action="store_true",
help="Show only the path frequency table")
Try passing both flags at the same time:
python3 log_analyzer.py sample_logs.csv analyze --summary-only --paths-only
usage: log_analyzer.py file analyze [-h] [--top TOP] [--format {table,json}] [--summary-only | --paths-only]
log_analyzer.py: error: argument --paths-only: not allowed with argument --summary-only
The [--summary-only | --paths-only] notation in the usage line communicates the mutual exclusion directly to the user through the help text, before they have made any mistake.
To actually use these flags in the analyze branch, update the call to summarize and the function itself. Replace the summarize function with:
def summarize(rows, top_n, output_format, summary_only=False, paths_only=False):
if not rows:
print("No matching rows found.")
return
avg_response = sum(row["response_ms"] for row in rows) / len(rows)
path_counts = Counter(row["path"] for row in rows)
status_counts = Counter(row["status"] for row in rows)
if paths_only:
print(f"Top {top_n} paths:")
for path, count in path_counts.most_common(top_n):
print(f" {count:4d} {path}")
return
if output_format == "json":
report = {
"total_requests": len(rows),
"avg_response_ms": round(avg_response, 1),
"status_breakdown": dict(sorted(status_counts.items())),
"top_paths": dict(path_counts.most_common(top_n)),
}
print(json.dumps(report, indent=2))
return
print(f"Total requests : {len(rows)}")
print(f"Avg response : {round(avg_response, 1)} ms")
if summary_only:
return
print(f"\nStatus breakdown:")
for code, count in sorted(status_counts.items()):
print(f" {code}: {count} requests")
print(f"\nTop {top_n} paths:")
for path, count in path_counts.most_common(top_n):
print(f" {count:4d} {path}")
And update the analyze branch to pass the new flags:
elif args.command == "analyze":
rows = load_logs(args.file, status_filter=args.status, verbose=args.verbose)
summarize(rows, args.top, args.format,
summary_only=args.summary_only,
paths_only=args.paths_only)
Now run with --summary-only to get just the headline numbers:
python3 log_analyzer.py sample_logs.csv analyze --summary-only
Total requests : 12
Avg response : 97.2 ms
And with --paths-only to get just the frequency table:
python3 log_analyzer.py sample_logs.csv analyze --paths-only -n 4
Top 4 paths:
3 /home
3 /api/users
2 /missing-page
2 /api/login
Why this is better
Without mutually exclusive groups, enforcing this constraint would require an if args.summary_only and args.paths_only: print("error")... check inside your own code. With add_mutually_exclusive_group, the constraint is declared once at the parser level, enforced automatically, and documented in the help text — all three things happen without any conditional logic in your application code.
Summary
Python's argparse module transforms command-line argument handling from a brittle string-parsing problem into a structured, self-documenting API. In this tutorial we built a functional log analysis tool that demonstrates the full progression:
sys.argvmakes simple cases work but fails as soon as requirements grow- A basic
ArgumentParserprovides automatic validation and--helpoutput with almost no effort type=,choices=,default=, andaction=cover the most common argument patterns without any manual checking- Subparsers give each command its own isolated argument namespace and help page
- Mutually exclusive groups encode incompatible-flag constraints at the parser level, keeping that logic out of your application code
The same patterns apply to any CLI tool: data pipelines, deployment scripts, file converters, database seeders, or automation utilities. Once you know how to structure an argparse setup, building tools that are both easy to use and hard to misuse becomes straightforward.