MySQL

Fast Queries at Scale: Index Design and EXPLAIN in MySQL

grivel Iun 24, 2026

Introduction

A query that returns results in two milliseconds on a thousand-row development table can take forty seconds on the same logic running against fifty million rows in production. The code did not change. The data did. Indexes are the primary tool for keeping query time bounded as tables grow, and designing them incorrectly — or omitting them entirely — is among the most common causes of production performance incidents in MySQL applications.

The problem with indexes is that their absence is invisible until it is catastrophic. MySQL executes queries without indexes using full table scans, reading every row to find the ones that match. For small tables this is fast enough that no one notices. For large tables, full table scans block concurrent writes, spike CPU, and produce query times that grow linearly with the row count instead of logarithmically. By the time the problem is noticed, the table may already have tens of millions of rows and the engineers are adding indexes under load.

This tutorial builds a query optimization workflow for an e-commerce order management system. Starting from a schema with no secondary indexes, it progressively adds indexes for WHERE filters, multi-column conditions, JOIN keys, and covering lookups. Every step uses EXPLAIN to measure what the optimizer is doing before and after each change, turning index design from guesswork into a reproducible diagnostic process.

Background

MySQL's default storage engine, InnoDB, stores table data in a clustered index sorted by primary key. Every non-primary-key lookup therefore starts with a scan of a separate data structure — a secondary index — to find the primary key values matching the filter, then fetches the actual rows from the clustered index.

EXPLAIN shows the optimizer's plan for executing a query. The most important columns:

type: how the table is accessed. ALL means a full table scan — every row is read. ref means an index is used to find matching rows. const means at most one matching row (primary key or unique index lookup). range means an index range scan. ALL is almost always bad on large tables.
key: the index the optimizer chose. NULL means no index was used.
rows: the optimizer's estimate of rows examined to produce the result. Lower is better.
Extra: additional execution details. Using where means rows are filtered after retrieval. Using index means a covering index is used — no row lookup needed. Using filesort means results are sorted in memory because no suitable index exists for the ORDER BY.

A composite index covers multiple columns. MySQL can use it for queries that filter on a left-prefix of the indexed columns — if the index is on (status, order_date), a query filtering only on status can use it, but a query filtering only on order_date cannot. Column order in a composite index is a design decision that must match the query patterns it serves.

A covering index is one where all columns referenced by the query — in SELECT, WHERE, JOIN, and ORDER BY — appear in the index itself. When a covering index applies, MySQL reads the index without touching the table rows at all, which is typically twice as fast for read-heavy queries.

Practical Scenario

A B2B e-commerce platform processes thousands of orders per day across hundreds of enterprise customers. The analytics team runs a set of recurring reports: completed orders by region, high-value pending orders that need manual review, and join queries that combine order data with customer profiles. At launch with a few thousand rows these queries ran instantly. Eighteen months later with 30 million orders and 2 million customers, the same queries lock the database for minutes during morning report generation, delaying the sales team's daily standup and occasionally cascading into connection pool exhaustion that takes the entire application offline.

The team needs to identify which queries are doing full table scans, add targeted indexes to eliminate them, and verify the improvement using EXPLAIN before and after. Blindly adding indexes to every column is not the answer — each index adds write overhead, and redundant indexes waste memory in the InnoDB buffer pool. The goal is one index per query pattern, designed to cover exactly what the query needs.

The Problem

The orders table has only a primary key. Every filter runs a full table scan.

touch script.sql

mysql -u root --table < script.sql

DROP DATABASE IF EXISTS storefront;
CREATE DATABASE storefront;
USE storefront;

CREATE TABLE customers (
    customer_id INT           NOT NULL AUTO_INCREMENT,
    email       VARCHAR(120)  NOT NULL,
    region      VARCHAR(40)   NOT NULL,
    created_at  DATE          NOT NULL,
    PRIMARY KEY (customer_id)
);

CREATE TABLE orders (
    order_id    INT            NOT NULL AUTO_INCREMENT,
    customer_id INT            NOT NULL,
    order_date  DATE           NOT NULL,
    status      VARCHAR(20)    NOT NULL,
    amount      DECIMAL(10,2)  NOT NULL,
    PRIMARY KEY (order_id)
);

INSERT INTO customers (email, region, created_at) VALUES
    ('alice@retailcorp.com',  'West',    '2021-06-15'),
    ('bob@techsupply.com',    'East',    '2021-09-20'),
    ('carol@globalmart.com',  'West',    '2022-01-08'),
    ('david@northtrade.com',  'Central', '2022-03-14'),
    ('elena@southbiz.com',    'East',    '2022-07-30'),
    ('frank@westdist.com',    'West',    '2022-11-11'),
    ('grace@centerco.com',    'Central', '2023-02-05');

INSERT INTO orders (customer_id, order_date, status, amount) VALUES
    (1, '2023-11-01', 'completed',  849.00),
    (2, '2023-11-05', 'pending',    120.50),
    (3, '2023-11-08', 'completed', 2340.00),
    (4, '2023-11-10', 'cancelled',   95.00),
    (5, '2023-11-12', 'completed',  560.00),
    (1, '2023-11-15', 'failed',     180.00),
    (6, '2023-11-18', 'pending',    445.00),
    (2, '2023-11-20', 'completed', 1100.00),
    (7, '2023-11-22', 'completed',  275.00),
    (3, '2023-11-25', 'cancelled',   90.00),
    (4, '2023-11-28', 'completed', 3200.00),
    (5, '2023-12-01', 'pending',    650.00),
    (6, '2023-12-04', 'completed',  420.00),
    (7, '2023-12-07', 'failed',      75.00),
    (1, '2023-12-10', 'completed', 1850.00);

EXPLAIN SELECT order_id, customer_id, amount
FROM orders
WHERE status = 'completed'\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 15
     filtered: 33.33
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

type: ALL means every row in orders is read before the WHERE filter is applied. key: NULL confirms no index was used. rows: 15 is the full table. With 30 million orders, this number becomes 30,000,000 — MySQL reads the entire table on every execution of this report query.

Single-Column Index on a WHERE Filter

Create an index on the status column. The optimizer can now jump directly to the rows with status = 'completed' instead of reading every row.

Replace the query section of script.sql with the following (keep the schema and INSERT statements):

-- (schema and INSERT statements from above — keep them)

CREATE INDEX idx_status ON orders (status);

EXPLAIN SELECT order_id, customer_id, amount
FROM orders
WHERE status = 'completed'\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
   partitions: NULL
         type: ref
possible_keys: idx_status
          key: idx_status
      key_len: 82
          ref: const
         rows: 8
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

: type changed from ALL to ref, meaning MySQL now uses the index to locate matching rows directly. key: idx_status confirms the index is active. rows dropped from 15 to 8 (only the matching rows are estimated). At 30 million rows with 40% completed, rows would drop from 30 million to 12 million — a proportional improvement that scales with table size.

Note: On small tables, the MySQL optimizer may choose a full scan even with an index present, because the overhead of the index lookup outweighs a sequential read of a few pages. The ref access type and its associated gains become decisive once tables hold tens of thousands of rows or more.

Composite Index for Multi-Column Filters

A query that filters on both status and order_date can use a composite index. The column order matters: MySQL can use a composite index only when the query filters on a left-prefix of the indexed columns.

Replace the query section with:

-- (schema and INSERT statements from above — keep them)

CREATE INDEX idx_status ON orders (status);
CREATE INDEX idx_status_date ON orders (status, order_date);

-- Uses idx_status_date: filters on the full prefix (status, order_date)
EXPLAIN SELECT order_id, amount
FROM orders
WHERE status = 'completed'
  AND order_date >= '2023-12-01'\G

-- Uses idx_status: filters only on the leading column (status)
EXPLAIN SELECT order_id, amount
FROM orders
WHERE status = 'pending'\G

-- Cannot use idx_status_date: skips the leading column (status)
EXPLAIN SELECT order_id, amount
FROM orders
WHERE order_date >= '2023-12-01'\G

-- Query 1 (status + order_date): uses composite index
*************************** 1. row ***************************
         type: range
possible_keys: idx_status,idx_status_date
          key: idx_status_date
      key_len: 87
         rows: 3
        Extra: Using index condition

-- Query 2 (status only): uses single-column index
*************************** 1. row ***************************
         type: ref
possible_keys: idx_status,idx_status_date
          key: idx_status
         rows: 3
        Extra: NULL

-- Query 3 (order_date only): full table scan
*************************** 1. row ***************************
         type: ALL
possible_keys: NULL
          key: NULL
         rows: 15
        Extra: Using where

: Query 1 uses type: range on the composite index, limiting the scan to only completed orders after the date cutoff — a narrow slice of the table. Query 3 gets no benefit because order_date alone is not a prefix of idx_status_date. If order_date-only queries are common, a separate CREATE INDEX idx_date ON orders (order_date) would be needed. Design composite indexes around your most frequent query shapes — a (status, order_date) index serves both "status-only" and "status + date" queries simultaneously.

Index on a JOIN Key

When MySQL joins orders to customers, it must look up each order's customer_id in the customers table. Without an index on the join key in orders, MySQL performs a full scan of orders for every row it needs to match.

Replace the query section with:

-- (schema and INSERT statements from above — keep them)

CREATE INDEX idx_status_date ON orders (status, order_date);

-- Before: no index on orders.customer_id for the join
EXPLAIN SELECT o.order_id, c.email, c.region, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'completed'\G

-- After: add index on the join key
CREATE INDEX idx_customer ON orders (customer_id);

EXPLAIN SELECT o.order_id, c.email, c.region, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'completed'\G

-- Before: orders.customer_id has no index
*************************** 2. row ***************************
        table: c
         type: ALL
possible_keys: PRIMARY
          key: NULL
         rows: 7
        Extra: Using where; Using join buffer (hash join)

-- After: idx_customer on orders.customer_id
*************************** 2. row ***************************
        table: c
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
         rows: 1
        Extra: NULL

: type: eq_ref means MySQL uses the primary key of customers to look up exactly one row per join, the most efficient join access type possible. The Using join buffer (hash join) in the before plan means MySQL was accumulating rows from orders in memory and then hashing the customers table — a strategy that degrades with large datasets. Foreign key columns that appear in JOIN ON clauses should always be indexed.

Covering Index to Eliminate Row Lookups

A secondary index stores the indexed columns plus the primary key. When a query selects columns that are not in the index, MySQL must do a second lookup into the clustered index (the table) to fetch them — called a row lookup. A covering index includes all columns the query needs, eliminating that second lookup entirely.

Replace the query section with:

-- (schema and INSERT statements from above — keep them)

CREATE INDEX idx_status ON orders (status);

-- Without covering: index on status, but amount must be fetched from the row
EXPLAIN SELECT order_id, status, amount
FROM orders
WHERE status = 'completed'\G

-- Drop and replace with covering index that includes amount
DROP INDEX idx_status ON orders;
CREATE INDEX idx_status_covering ON orders (status, amount);

-- With covering: all needed columns (status, amount, order_id/PK) are in the index
EXPLAIN SELECT order_id, status, amount
FROM orders
WHERE status = 'completed'\G

-- Without covering index: row lookup required
*************************** 1. row ***************************
         type: ref
          key: idx_status
         rows: 8
        Extra: NULL

-- With covering index: no row lookup
*************************** 1. row ***************************
         type: ref
          key: idx_status_covering
         rows: 8
        Extra: Using index

: Extra: Using index confirms MySQL reads only the index and never touches the table rows. For a reporting query that runs thousands of times per hour, eliminating the row lookup roughly halves the I/O — the difference between reading the index B-tree and also reading the table's B-tree. Covering indexes are most valuable for high-frequency read queries on large tables where the selected columns are stable and few.

Note: InnoDB secondary indexes always include the primary key, so order_id is implicitly available in any secondary index without being explicitly listed. Include only the additional columns the query references in SELECT, WHERE, and ORDER BY — over-including columns wastes index space and slows writes.

Summary

This tutorial built a systematic index design workflow for an e-commerce order management system, using EXPLAIN to measure query plans before and after each index addition.

EXPLAIN reveals the optimizer's access strategy: type: ALL is a full table scan; type: ref uses an index; type: eq_ref uses a primary key or unique index for exactly one row per join; type: range scans an index range. Always address ALL on large tables.
A single-column index on a WHERE filter column changes type from ALL to ref and reduces the rows estimate to only the matching rows — the performance gain scales linearly with table size.
Composite indexes serve queries that filter on multiple columns; MySQL can only use a composite index when the query filters on a left-prefix of the indexed columns — filtering on only non-leading columns forces a full scan.
Foreign key columns used in JOIN ON clauses must be indexed on the referencing side; without this index, MySQL uses a hash join buffer that degrades at scale, shown as type: ALL with Using join buffer in Extra.
A covering index includes all columns the query references so MySQL reads only the index without touching the table rows; Extra: Using index confirms covering access is in use.
Every index adds write overhead and consumes buffer pool memory; add indexes for specific measured query patterns rather than speculatively indexing every column.