Python

Python argparse Tutorial: Build a Command-Line Log Analyser

Dima Май 13, 2026

Introduction

Every serious tool eventually needs a command-line interface. Whether it's a data pipeline that runs on a schedule, a deployment script that accepts environment flags, or a report generator that needs flexible input, the pattern is the same: users pass arguments to control behavior, and your code needs to read and validate those arguments reliably.

Python ships with argparse — a batteries-included argument parsing library that transforms raw command-line strings into structured, validated Python objects. This tutorial walks through building a real log analysis CLI from scratch, starting with the fragile approach of reading sys.argv directly and ending with a professional-grade tool that supports subcommands, type validation, and automatic help text.

Background

Every Python program receives its command-line arguments as a list of strings in sys.argv. The first item is always the script name. Everything after that is what the user typed.

Parsing those strings manually is possible for very simple cases, but it breaks quickly: you have to handle missing arguments, wrong types, unknown flags, and help text yourself. argparse does all of that for you, plus it generates --help output automatically and produces meaningful error messages when the user makes a mistake.

The result is a tool that behaves the way users expect command-line tools to behave.

Practical Scenario

Consider a web server that writes access logs in CSV format — one row per HTTP request, with fields like timestamp, method, URL path, status code, and response time in milliseconds. Over time these logs accumulate and become hard to reason about manually.

You need a CLI tool that can:

Load a log file and filter rows by status code
Summarize traffic: top paths by frequency, status code breakdown, average response time
Export filtered results to CSV or JSON for further processing

This is a pattern that comes up constantly in DevOps pipelines, backend debugging, and data engineering. A well-structured CLI makes the tool reusable and easy to integrate into shell scripts without any code changes.

The Problem with sys.argv

The most direct way to read arguments in Python is to inspect sys.argv directly.

Create a new file:

touch log_analyzer.py

Run it using:

python3 log_analyzer.py logs.csv --status 404

import sys

if len(sys.argv) < 2:
    print("Usage: log_analyzer.py <file> [--status <code>]")
    sys.exit(1)

filename = sys.argv[1]
status_filter = None

if "--status" in sys.argv:
    idx = sys.argv.index("--status")
    if idx + 1 >= len(sys.argv):
        print("Error: --status requires a value")
        sys.exit(1)
    status_filter = sys.argv[idx + 1]

print(f"Reading: {filename}")
print(f"Status filter: {status_filter}")

The output will look like this:

Reading: logs.csv
Status filter: 404

This works for exactly this combination of arguments. The moment a user passes an unexpected flag, forgets the file, or runs --help, the program either crashes or prints a vague message. Every new argument you add requires more index arithmetic and more manual validation. This approach does not scale.

Basic argparse: Help and Error Handling for Free

Let's replace the manual parsing with argparse. Replace the entire content of log_analyzer.py with the following:

import argparse

parser = argparse.ArgumentParser(
    description="Analyze web server access logs."
)

parser.add_argument("file", help="Path to the CSV log file")
parser.add_argument("--status", help="Filter by HTTP status code")

args = parser.parse_args()

print(f"Reading: {args.file}")
print(f"Status filter: {args.status}")

Run it with no arguments:

python3 log_analyzer.py

usage: log_analyzer.py [-h] [--status STATUS] file
log_analyzer.py: error: the following arguments are required: file

Run it with --help:

python3 log_analyzer.py --help

usage: log_analyzer.py [-h] [--status STATUS] file

Analyze web server access logs.

positional arguments:
  file             Path to the CSV log file

options:
  -h, --help       show this help message and exit
  --status STATUS  Filter by HTTP status code

Why this is better

argparse validates inputs automatically, generates formatted help text, and reports errors clearly — all with zero extra code on your side. Your application code reads clean Python attributes like args.file instead of fragile string indexes like sys.argv[1]. Adding a new argument is additive, not surgical.

Types, Defaults, Choices, and Short Flags

Arguments are strings by default. For a log analyzer, some values should be integers, and some should be restricted to a known set of options. Replace the content of log_analyzer.py:

import argparse

parser = argparse.ArgumentParser(
    description="Analyze web server access logs."
)

parser.add_argument("file", help="Path to the CSV log file")
parser.add_argument("--status", type=int, help="Filter by HTTP status code (e.g. 404)")
parser.add_argument("--top", "-n", type=int, default=10, help="Show top N paths (default: 10)")
parser.add_argument("--format", choices=["table", "json"], default="table",
                    help="Output format (default: table)")
parser.add_argument("--verbose", "-v", action="store_true", help="Print each loaded row")

args = parser.parse_args()

print(f"File:    {args.file}")
print(f"Status:  {args.status}")
print(f"Top N:   {args.top}")
print(f"Format:  {args.format}")
print(f"Verbose: {args.verbose}")

Run it with several options:

python3 log_analyzer.py logs.csv --status 500 -n 5 --format json -v

File:    logs.csv
Status:  500
Top N:   5
Format:  json
Verbose: True

Try passing a format value that is not allowed:

python3 log_analyzer.py logs.csv --format xml

usage: log_analyzer.py [-h] [--status STATUS] [--top TOP] [--format {table,json}] [--verbose] file
log_analyzer.py: error: argument --format: invalid choice: 'xml' (choose from 'table', 'json')

Why this is better

type=int converts the string automatically and rejects non-numeric input before it reaches your code. choices restricts what values are legal and communicates that restriction in both the error message and the help text. default means unspecified arguments have sensible values rather than None. action="store_true" handles boolean flags cleanly — no value needed, the flag itself sets it to True. The -n and -v short forms are registered alongside the long ones with a single line.

Building a Working Analyzer

Now let's wire everything up to actual log processing. First we need some data to work with. Replace the entire content of log_analyzer.py:

import argparse
import csv
import json
from collections import Counter


def generate_sample_logs(filename):
    rows = [
        ["2024-01-15 10:23:01", "GET",    "/home",          200, 45],
        ["2024-01-15 10:23:05", "GET",    "/api/users",     200, 120],
        ["2024-01-15 10:23:10", "POST",   "/api/login",     401, 30],
        ["2024-01-15 10:23:15", "GET",    "/api/users",     200, 115],
        ["2024-01-15 10:23:20", "GET",    "/missing-page",  404, 12],
        ["2024-01-15 10:23:25", "GET",    "/home",          200, 42],
        ["2024-01-15 10:23:30", "DELETE", "/api/users/42",  500, 230],
        ["2024-01-15 10:23:35", "GET",    "/api/users",     200, 118],
        ["2024-01-15 10:23:40", "GET",    "/home",          200, 44],
        ["2024-01-15 10:23:45", "POST",   "/api/login",     200, 95],
        ["2024-01-15 10:23:50", "GET",    "/missing-page",  404, 11],
        ["2024-01-15 10:23:55", "GET",    "/api/data",      500, 310],
    ]
    with open(filename, "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["timestamp", "method", "path", "status", "response_ms"])
        writer.writerows(rows)
    print(f"Sample log file written to: {filename}")


def load_logs(filename, status_filter=None, verbose=False):
    rows = []
    with open(filename, newline="") as f:
        reader = csv.DictReader(f)
        for row in reader:
            row["status"] = int(row["status"])
            row["response_ms"] = int(row["response_ms"])
            if status_filter is None or row["status"] == status_filter:
                rows.append(row)
                if verbose:
                    print(f"  loaded: {row['method']} {row['path']} -> {row['status']}")
    return rows


def summarize(rows, top_n, output_format):
    if not rows:
        print("No matching rows found.")
        return

    path_counts = Counter(row["path"] for row in rows)
    status_counts = Counter(row["status"] for row in rows)
    avg_response = sum(row["response_ms"] for row in rows) / len(rows)

    report = {
        "total_requests": len(rows),
        "avg_response_ms": round(avg_response, 1),
        "status_breakdown": dict(sorted(status_counts.items())),
        "top_paths": dict(path_counts.most_common(top_n)),
    }

    if output_format == "json":
        print(json.dumps(report, indent=2))
    else:
        print(f"Total requests : {report['total_requests']}")
        print(f"Avg response   : {report['avg_response_ms']} ms")
        print(f"\nStatus breakdown:")
        for code, count in report["status_breakdown"].items():
            print(f"  {code}: {count} requests")
        print(f"\nTop {top_n} paths:")
        for path, count in report["top_paths"].items():
            print(f"  {count:4d}  {path}")


parser = argparse.ArgumentParser(description="Analyze web server access logs.")
parser.add_argument("file", help="Path to the CSV log file")
parser.add_argument("--status",  type=int,  help="Filter by HTTP status code")
parser.add_argument("--top", "-n", type=int, default=10, help="Show top N paths (default: 10)")
parser.add_argument("--format", choices=["table", "json"], default="table", help="Output format")
parser.add_argument("--verbose", "-v", action="store_true", help="Print each loaded row")
parser.add_argument("--generate-sample", action="store_true",
                    help="Write a sample log file to FILE and exit")

args = parser.parse_args()

if args.generate_sample:
    generate_sample_logs(args.file)
else:
    rows = load_logs(args.file, status_filter=args.status, verbose=args.verbose)
    summarize(rows, args.top, args.format)

Generate the sample data first:

python3 log_analyzer.py sample_logs.csv --generate-sample

Sample log file written to: sample_logs.csv

Now run the full analysis:

python3 log_analyzer.py sample_logs.csv

Total requests : 12
Avg response   : 97.2 ms

Status breakdown:
  200: 7 requests
  401: 1 requests
  404: 2 requests
  500: 2 requests

Top 10 paths:
   3  /home
   3  /api/users
   2  /missing-page
   2  /api/login
   1  /api/users/42
   1  /api/data

Filter to only 500 errors with JSON output:

python3 log_analyzer.py sample_logs.csv --status 500 --format json

{
  "total_requests": 2,
  "avg_response_ms": 270.0,
  "status_breakdown": {
    "500": 2
  },
  "top_paths": {
    "/api/users/42": 1,
    "/api/data": 1
  }
}

The same tool now handles filtering, formatting, and summary generation through clean CLI flags with no code changes needed to switch between modes.

Subcommands with Subparsers

Real CLI tools often do multiple distinct things: git has commit, push, and pull. docker has build, run, and ps. argparse supports this pattern with subparsers, where each subcommand defines its own arguments independently.

We will split the tool into two subcommands: analyze for generating summaries and export for writing filtered rows to a new file. Replace everything from the parser = argparse.ArgumentParser(...) line to the end of the file with the following:

def export_rows(rows, output_file, output_format):
    if not rows:
        print("No matching rows found.")
        return
    if output_format == "json":
        with open(output_file, "w") as f:
            json.dump(rows, f, indent=2)
    else:
        with open(output_file, "w", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=rows[0].keys())
            writer.writeheader()
            writer.writerows(rows)
    print(f"Exported {len(rows)} rows to: {output_file}")


parser = argparse.ArgumentParser(description="Web server log analysis tool.")
parser.add_argument("file",      help="Path to the CSV log file")
parser.add_argument("--status",  type=int, help="Filter by HTTP status code")
parser.add_argument("--verbose", "-v", action="store_true", help="Print each loaded row")

subparsers = parser.add_subparsers(dest="command", help="Available commands")

analyze_parser = subparsers.add_parser("analyze", help="Summarize log statistics")
analyze_parser.add_argument("--top", "-n", type=int, default=10, help="Show top N paths (default: 10)")
analyze_parser.add_argument("--format", choices=["table", "json"], default="table", help="Output format")

export_parser = subparsers.add_parser("export", help="Export filtered rows to a new file")
export_parser.add_argument("output", help="Output file path")
export_parser.add_argument("--format", choices=["csv", "json"], default="csv", help="Output format (default: csv)")

args = parser.parse_args()

if args.command == "analyze":
    rows = load_logs(args.file, status_filter=args.status, verbose=args.verbose)
    summarize(rows, args.top, args.format)
elif args.command == "export":
    rows = load_logs(args.file, status_filter=args.status, verbose=args.verbose)
    export_rows(rows, args.output, args.format)
else:
    parser.print_help()

Run the analyze subcommand, showing only the top 3 paths:

python3 log_analyzer.py sample_logs.csv analyze --top 3

Total requests : 12
Avg response   : 97.2 ms

Status breakdown:
  200: 7 requests
  401: 1 requests
  404: 2 requests
  500: 2 requests

Top 3 paths:
   3  /home
   3  /api/users
   2  /missing-page

Export only the 404 errors to a JSON file:

python3 log_analyzer.py sample_logs.csv --status 404 export errors_404.json --format json

Each subcommand has its own --help page:

python3 log_analyzer.py sample_logs.csv export --help

usage: log_analyzer.py file export [-h] [--format {csv,json}] output

positional arguments:
  output               Output file path

options:
  -h, --help           show this help message and exit
  --format {csv,json}  Output format (default: csv)

Why this is better

Subparsers let each command define its own arguments without polluting a single flat namespace. The top-level parser still owns shared arguments like --status and --verbose that apply regardless of which subcommand runs. Each subcommand gets its own required arguments, its own defaults, and its own help page. Users only see the flags that are relevant to what they are doing.

Mutually Exclusive Arguments

Some argument combinations are logically incompatible. argparse has a built-in mechanism for declaring that two flags cannot be used together, which enforces the constraint before your code ever runs.

Add the following output mode flags to the analyze subparser. Replace the analyze_parser block with this:

analyze_parser = subparsers.add_parser("analyze", help="Summarize log statistics")
analyze_parser.add_argument("--top", "-n", type=int, default=10, help="Show top N paths (default: 10)")
analyze_parser.add_argument("--format", choices=["table", "json"], default="table", help="Output format")

output_group = analyze_parser.add_mutually_exclusive_group()
output_group.add_argument("--summary-only", action="store_true",
                          help="Show only the total count and average response time")
output_group.add_argument("--paths-only", action="store_true",
                          help="Show only the path frequency table")

Try passing both flags at the same time:

python3 log_analyzer.py sample_logs.csv analyze --summary-only --paths-only

usage: log_analyzer.py file analyze [-h] [--top TOP] [--format {table,json}] [--summary-only | --paths-only]
log_analyzer.py: error: argument --paths-only: not allowed with argument --summary-only

The [--summary-only | --paths-only] notation in the usage line communicates the mutual exclusion directly to the user through the help text, before they have made any mistake.

To actually use these flags in the analyze branch, update the call to summarize and the function itself. Replace the summarize function with:

def summarize(rows, top_n, output_format, summary_only=False, paths_only=False):
    if not rows:
        print("No matching rows found.")
        return

    avg_response = sum(row["response_ms"] for row in rows) / len(rows)
    path_counts = Counter(row["path"] for row in rows)
    status_counts = Counter(row["status"] for row in rows)

    if paths_only:
        print(f"Top {top_n} paths:")
        for path, count in path_counts.most_common(top_n):
            print(f"  {count:4d}  {path}")
        return

    if output_format == "json":
        report = {
            "total_requests": len(rows),
            "avg_response_ms": round(avg_response, 1),
            "status_breakdown": dict(sorted(status_counts.items())),
            "top_paths": dict(path_counts.most_common(top_n)),
        }
        print(json.dumps(report, indent=2))
        return

    print(f"Total requests : {len(rows)}")
    print(f"Avg response   : {round(avg_response, 1)} ms")

    if summary_only:
        return

    print(f"\nStatus breakdown:")
    for code, count in sorted(status_counts.items()):
        print(f"  {code}: {count} requests")
    print(f"\nTop {top_n} paths:")
    for path, count in path_counts.most_common(top_n):
        print(f"  {count:4d}  {path}")

And update the analyze branch to pass the new flags:

elif args.command == "analyze":
    rows = load_logs(args.file, status_filter=args.status, verbose=args.verbose)
    summarize(rows, args.top, args.format,
              summary_only=args.summary_only,
              paths_only=args.paths_only)

Now run with --summary-only to get just the headline numbers:

python3 log_analyzer.py sample_logs.csv analyze --summary-only

Total requests : 12
Avg response   : 97.2 ms

And with --paths-only to get just the frequency table:

python3 log_analyzer.py sample_logs.csv analyze --paths-only -n 4

Top 4 paths:
   3  /home
   3  /api/users
   2  /missing-page
   2  /api/login

Why this is better

Without mutually exclusive groups, enforcing this constraint would require an if args.summary_only and args.paths_only: print("error")... check inside your own code. With add_mutually_exclusive_group, the constraint is declared once at the parser level, enforced automatically, and documented in the help text — all three things happen without any conditional logic in your application code.

Summary

Python's argparse module transforms command-line argument handling from a brittle string-parsing problem into a structured, self-documenting API. In this tutorial we built a functional log analysis tool that demonstrates the full progression:

sys.argv makes simple cases work but fails as soon as requirements grow
A basic ArgumentParser provides automatic validation and --help output with almost no effort
type=, choices=, default=, and action= cover the most common argument patterns without any manual checking
Subparsers give each command its own isolated argument namespace and help page
Mutually exclusive groups encode incompatible-flag constraints at the parser level, keeping that logic out of your application code

The same patterns apply to any CLI tool: data pipelines, deployment scripts, file converters, database seeders, or automation utilities. Once you know how to structure an argparse setup, building tools that are both easy to use and hard to misuse becomes straightforward.