Python Python

How to Handle Temporary Files Safely in Python

Dima May 13, 2026

Introduction

Most programs that process data work in stages: something arrives, gets transformed, and something goes out. Between those stages you often need a place to hold intermediate results. Writing to hardcoded paths like temp.csv or stage2_output.txt works for a single-user script but breaks immediately in any real service: two concurrent requests overwrite each other's data, a crash leaves orphaned files scattered across the filesystem, and sensitive data may persist on disk far longer than intended.

Python's tempfile module solves all of this. It creates files and directories that are uniquely named to avoid collisions, placed in the correct system location on any operating system, and automatically deleted when your code is done with them. This tutorial walks through every tool the module provides, starting with the simplest case and ending with the patterns needed for production-quality data pipelines.


Background

The operating system provides a designated location for temporary files — /tmp on Linux and macOS, a system-specific path on Windows. You can inspect it at any time:

import tempfile
print(tempfile.gettempdir())

Every file and directory tempfile creates gets a randomly generated component in its name, making simultaneous use by multiple processes safe by default.

The other core guarantee is automatic cleanup. Every high-level tool in tempfile supports the context manager protocol. When the with block exits — whether normally or due to an exception — the file or directory is deleted without any extra code on your part.


Practical Scenario

Consider a backend service that processes customer order exports from an e-commerce platform. Clients send a CSV file of raw order records. The service needs to:

  1. Accept the incoming data and hold it while processing begins
  2. Filter and clean the records — remove cancelled orders, fix malformed rows
  3. Group results into per-region summaries and stage each as a separate file
  4. Bundle the staged files and deliver them before cleaning up

The service handles many concurrent requests. Every temporary path must be unique, every intermediate file must vanish after the request finishes, and a crash at any stage must not leave stale or sensitive data behind on disk.


The Problem with Hardcoded Temp Filenames

A common first attempt at staging intermediate data is to write to a fixed filename.

Create a new file:

touch order_processor.py

Run it using:

python3 order_processor.py
import csv
import io

RAW_DATA = """order_id,region,amount,status
1001,north,250.00,completed
1002,south,89.50,cancelled
1003,north,410.00,completed
1004,east,175.00,completed
1005,south,320.00,cancelled
1006,east,95.00,completed
1007,south,430.00,completed
"""

def process_orders():
    with open("temp_orders.csv", "w") as f:
        f.write(RAW_DATA)

    results = []
    with open("temp_orders.csv", "r") as f:
        reader = csv.DictReader(f)
        for row in reader:
            if row["status"] == "completed":
                results.append(row)

    print(f"Processed {len(results)} completed orders")
    for row in results:
        print(f"  {row['order_id']}  {row['region']}  {row['amount']}")

process_orders()


The result should look like this:

Processed 5 completed orders
  1001  north  250.00
  1003  north  410.00
  1004  east  175.00
  1006  east  95.00
  1007  south  430.00


The logic works, but temp_orders.csv stays on disk after the function returns. Two concurrent calls to process_orders would read and write the same file simultaneously, corrupting each other's data. A crash between writing and reading leaves the file behind permanently. On a shared system this leaks client data between requests.


TemporaryFile: Auto-Deleted Anonymous Files

The simplest fix is tempfile.TemporaryFile. It creates a file with no visible path on disk — the file descriptor is open but no directory entry points to it, so other processes cannot access it and the OS reclaims it the moment it closes.

Replace the process_orders function with the following:

import tempfile

def process_orders():
    with tempfile.TemporaryFile(mode="w+") as f:
        f.write(RAW_DATA)
        f.seek(0)  # rewind before reading back

        results = []
        reader = csv.DictReader(f)
        for row in reader:
            if row["status"] == "completed":
                results.append(row)

    print(f"Processed {len(results)} completed orders")
    for row in results:
        print(f"  {row['order_id']}  {row['region']}  {row['amount']}")

process_orders()


The output is identical to before. The difference is what happens on disk — nothing. No file exists outside the with block, not even during an exception.

The mode="w+" opens the file for both reading and writing as text. The f.seek(0) call rewinds the cursor to the start of the file before reading back what was just written. This is a pattern you will use often with temporary files.

Why this is better

There is nothing to clean up, nothing to delete on crash, and no name that another concurrent request could stumble across. The file exists only inside the with block and nowhere else.

Note: TemporaryFile has no accessible filename. If you need to pass a path to an external tool, a subprocess, or a library function that only accepts a string, you need NamedTemporaryFile instead.


NamedTemporaryFile: When You Need the Path

Some workflows require a real file path — passing data to a command-line tool, handing it to a library that opens files by name, or logging the location for debugging. NamedTemporaryFile creates a uniquely named file and exposes its path through the .name attribute.

Replace the process_orders function:

def process_orders():
    with tempfile.NamedTemporaryFile(mode="w+", suffix=".csv") as f:
        print(f"Working at: {f.name}")

        f.write(RAW_DATA)
        f.flush()   # push buffered data to the OS before seeking
        f.seek(0)

        results = []
        reader = csv.DictReader(f)
        for row in reader:
            if row["status"] == "completed":
                results.append(row)

    print(f"Processed {len(results)} completed orders")
    for row in results:
        print(f"  {row['order_id']}  {row['region']}  {row['amount']}")

process_orders()


Working at: /tmp/tmpk8xmq3ld.csv
Processed 5 completed orders
  1001  north  250.00
  1003  north  410.00
  1004  east  175.00
  1006  east  95.00
  1007  south  430.00


The path includes the .csv suffix we specified and a randomly generated component that guarantees uniqueness across concurrent requests. The file is still deleted when the with block exits.

Why this is better

The randomly generated name means two simultaneous requests each get their own file with no coordination required. The suffix parameter lets you match the extension that external tools expect without giving up unique naming.

Note: On Windows, a NamedTemporaryFile with the default delete=True cannot be opened a second time while it is still open. If you need to pass the path to another function that will open the file independently, use delete=False as shown in the next section.


delete=False: Handing a Temp File to Another Stage

Sometimes a temporary file needs to outlive the function that created it — for example, when you pass processed data to a separate delivery step that is responsible for sending it and then cleaning it up.

Setting delete=False prevents automatic deletion on close. The file persists until you explicitly remove it with os.unlink.

Replace the process_orders function:

import os

def process_orders():
    with tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) as f:
        temp_path = f.name
        writer = csv.writer(f)
        writer.writerow(["order_id", "region", "amount"])
        reader = csv.DictReader(io.StringIO(RAW_DATA))
        for row in reader:
            if row["status"] == "completed":
                writer.writerow([row["order_id"], row["region"], row["amount"]])

    # File is closed but still on disk — safe to hand off
    print(f"Staged at: {temp_path}")
    deliver(temp_path)

def deliver(path):
    try:
        with open(path) as f:
            print(f"\nDelivering:\n{f.read()}")
    finally:
        os.unlink(path)
        print(f"Cleaned up: {os.path.basename(path)}")

process_orders()


Staged at: /tmp/tmpw2n7p1qc.csv

Delivering:
order_id,region,amount
1001,north,250.00
1003,north,410.00
1004,east,175.00
1006,east,95.00
1007,south,430.00

Cleaned up: tmpw2n7p1qc.csv


Why this is better

Using a randomly named temp file as a handoff point means concurrent requests never share a path. The stage that finishes with the data owns the cleanup — that ownership is explicit in the code rather than implied. The try/finally in deliver guarantees the file is removed even if delivery raises an exception midway through.


TemporaryDirectory: An Isolated Workspace for Multiple Files

When a pipeline produces more than one output file — for example, one file per region — a temporary directory gives you an isolated workspace that cleans up everything at once.

Replace the entire content of order_processor.py:

import tempfile
import csv
import io
import os
from collections import defaultdict

RAW_DATA = """order_id,region,amount,status
1001,north,250.00,completed
1002,south,89.50,cancelled
1003,north,410.00,completed
1004,east,175.00,completed
1005,south,320.00,cancelled
1006,east,95.00,completed
1007,south,430.00,completed
"""

def process_orders():
    with tempfile.TemporaryDirectory() as tmpdir:
        print(f"Staging in: {tmpdir}\n")

        by_region = defaultdict(list)
        reader = csv.DictReader(io.StringIO(RAW_DATA))
        for row in reader:
            if row["status"] == "completed":
                by_region[row["region"]].append(row)

        staged = []
        for region, orders in by_region.items():
            path = os.path.join(tmpdir, f"{region}_orders.csv")
            with open(path, "w", newline="") as f:
                writer = csv.DictWriter(f, fieldnames=["order_id", "region", "amount"])
                writer.writeheader()
                for o in orders:
                    writer.writerow({"order_id": o["order_id"],
                                     "region":   o["region"],
                                     "amount":   o["amount"]})
            staged.append(path)
            print(f"  wrote {len(orders)} orders -> {os.path.basename(path)}")

        bundle(staged)

    print(f"\nDirectory gone: {not os.path.exists(tmpdir)}")

def bundle(files):
    print("\n--- Bundle preview ---")
    for path in files:
        with open(path) as f:
            print(f.read().strip(), "\n")

process_orders()


Staging in: /tmp/tmpb9d3k2lx

  wrote 2 orders -> north_orders.csv
  wrote 2 orders -> east_orders.csv
  wrote 1 orders -> south_orders.csv

--- Bundle preview ---
order_id,region,amount
1001,north,250.00
1003,north,410.00

order_id,region,amount
1004,east,175.00
1006,east,95.00

order_id,region,amount
1007,south,430.00

Directory gone: True


Why this is better

The directory and every file inside it vanish when the with block exits, regardless of how many files were created or whether an exception occurred midway. You do not need to track which files to delete or loop through them during cleanup. Two concurrent pipeline runs each operate in their own isolated directory with no chance of interference.


SpooledTemporaryFile: Memory First, Disk as Fallback

SpooledTemporaryFile starts in memory and only writes to disk once the data exceeds a size threshold you specify. For small payloads this avoids disk I/O entirely. For large ones, it spills to disk automatically. The calling code does not change either way.

Add the following function at the bottom of order_processor.py and call it:

def process_with_spooling(data):
    # Up to 64 KB stays in memory; larger data spills to disk automatically
    with tempfile.SpooledTemporaryFile(max_size=64 * 1024, mode="w+") as f:
        f.write(data)
        f.seek(0)
        rows = list(csv.DictReader(f))

    completed = [r for r in rows if r["status"] == "completed"]
    print(f"Spooled result: {len(completed)} completed orders from {len(rows)} total rows")

# Small payload — stays entirely in memory
process_with_spooling(RAW_DATA)

# Large payload — spills to disk transparently
process_with_spooling(RAW_DATA * 5000)


Spooled result: 5 completed orders from 7 total rows
Spooled result: 5000 completed orders from 7000 total rows


Why this is better

A pure in-memory buffer (io.StringIO) never spills to disk — fast for small data but risky if a payload grows unexpectedly large. A regular TemporaryFile always uses disk, adding I/O overhead even for tiny payloads. SpooledTemporaryFile picks the right storage automatically based on actual data size and presents the same interface in both cases. Request handlers that usually process small exports but occasionally receive large ones benefit most from this.


mkstemp: Low-Level Control for Security-Sensitive Data

The context-manager tools above set file permissions after creation, leaving a brief window where the file exists with default system permissions. On a shared server, another process could theoretically read it during that window.

mkstemp returns a raw OS file descriptor the moment the file is created, before any data is written. You apply permissions immediately, closing the window entirely.

Add the following to order_processor.py:

import stat

def process_secure():
    fd, path = tempfile.mkstemp(suffix=".csv", prefix="orders_secure_")
    try:
        # Lock down permissions before writing a single byte
        os.chmod(path, stat.S_IRUSR | stat.S_IWUSR)  # owner read/write only

        with os.fdopen(fd, "w", newline="") as f:
            writer = csv.writer(f)
            writer.writerow(["order_id", "amount"])
            reader = csv.DictReader(io.StringIO(RAW_DATA))
            for row in reader:
                if row["status"] == "completed":
                    writer.writerow([row["order_id"], row["amount"]])

        mode = oct(os.stat(path).st_mode)
        print(f"Written to:  {path}")
        print(f"Permissions: {mode}  (owner read/write only)")

        with open(path) as f:
            print(f"\n{f.read()}")
    finally:
        os.unlink(path)
        print(f"Deleted: {not os.path.exists(path)}")

process_secure()


Written to:  /tmp/orders_secure_m4k9p2xr.csv
Permissions: 0o100600  (owner read/write only)

order_id,amount
1001,250.00
1003,410.00
1004,175.00
1006,95.00
1007,430.00

Deleted: True


The 0o100600 permission means only the file owner can read and write it. No other user on the system can access the data while it is being processed.

Why this is better

mkstemp hands you a file descriptor before the OS has assigned default permissions. Any permission you set with os.chmod is in place before your code writes the first byte. The try/finally ensures deletion happens even if processing raises an exception — the same cleanup guarantee the context-manager tools provide, written explicitly.

Note: os.fdopen(fd, "w") takes ownership of the file descriptor returned by mkstemp. Do not call os.close(fd) separately after this — closing the file object closes the descriptor too.


Summary

Python's tempfile module provides the right tool for every level of temporary file handling. In this tutorial we built a multi-stage order processing pipeline that demonstrated the full range:

  • Hardcoded temp filenames cause name collisions, leave stale data on disk, and are unsafe for concurrent use
  • TemporaryFile gives you an anonymous, auto-deleted stream — the simplest option when no path is needed
  • NamedTemporaryFile adds a unique, accessible path for workflows that require passing a file location to an external tool or library
  • delete=False lets a temp file outlive the function that created it, enabling clean handoffs between pipeline stages with explicit ownership of cleanup
  • TemporaryDirectory provides an isolated workspace that removes itself and all its contents automatically, even after exceptions
  • SpooledTemporaryFile stays in memory for small payloads and spills transparently to disk for large ones, combining the speed of in-memory I/O with the capacity of disk storage
  • mkstemp gives you a file descriptor before default permissions are set, closing the brief race window that the higher-level tools leave open — essential when handling sensitive data on shared systems

You need to be logged in to access the cloud lab and experiment with the code presented in this tutorial.

Log in