Skip to main content

Architecture & Internals

Understanding how pg_ttl_index works under the hood.

System Architecture

┌─────────────────────────────────────────────────────────┐
│ PostgreSQL Server │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Background Worker Process │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ 1. Wake up every naptime seconds │ │ │
│ │ │ 2. Try acquire advisory lock │ │ │
│ │ │ 3. Call ttl_runner() │ │ │
│ │ │ 4. Sleep naptime seconds │ │ │
│ │ │ 5. Repeat │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ ttl_runner() Function │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ FOR EACH active TTL in ttl_index_table │ │ │
│ │ │ LOOP (batch deletion) │ │ │
│ │ │ DELETE batch_size rows WHERE expired │ │ │
│ │ │ UPDATE statistics │ │ │
│ │ │ Sleep 10ms (yield to other processes) │ │ │
│ │ │ EXIT when no more expired rows │ │ │
│ │ │ END LOOP │ │ │
│ │ │ END FOR │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Components

1. Extension SQL Objects

Location: pg_ttl_index--2.0.0.sql

  • ttl_index_table - Configuration and statistics storage
  • ttl_create_index() - Create/update TTL rules
  • ttl_drop_index() - Remove TTL rules
  • ttl_runner() - Execute cleanup logic
  • ttl_summary() - View with computed fields
  • ttl_worker_status() - Worker health information

###2. C Extension Layer

Location: src/ directory

  • pg_ttl_index.c - Extension initialization
  • worker.c - Background worker implementation
  • api.c - C functions (ttl_start_worker, ttl_stop_worker)
  • utils.c - Helper utilities

3. Configuration

Location: postgresql.conf

  • shared_preload_libraries - Load extension at startup
  • pg_ttl_index.naptime - Cleanup interval (GUC)
  • pg_ttl_index.enabled - Enable/disable worker (GUC)

Background Worker Lifecycle

Startup Sequence

1. PostgreSQL starts
└─> Loads shared_preload_libraries
└─> Loads pg_ttl_index.so
└─> Registers background worker template

2. User calls ttl_start_worker()
└─> Spawns background worker process
└─> Worker connects to current database
└─> Enters main loop

Main Worker Loop

while (true) {
// 1. Check if shutdown requested
if (shutdown_requested) break;

// 2. Execute cleanup
execute_sql("SELECT ttl_runner()");

// 3. Sleep for naptime seconds
sleep(pg_ttl_index_naptime);
}

Shutdown Sequence

1. ttl_stop_worker() called
└─> Signals worker to terminate
└─> Worker exits main loop
└─> Process terminates

Cleanup Algorithm

ttl_runner() Internals

1. Acquire advisory lock (pg_try_advisory_lock)
└─> If already locked, exit (another instance running)

2. FOR EACH active TTL configuration:
a. Initialize batch counter

b. LOOP (until no more expired rows):
i. SELECT ctid of expired rows (LIMIT batch_size)
ii. DELETE rows with matching ctid
iii. Update row counter
iv. Sleep 10ms (yield)
v. EXIT if deleted < batch_size

c. UPDATE statistics (rows_deleted, last_run)
d. COMMIT (per-table transaction)

3. Release advisory lock
4. RETURN total rows deleted

Why ctid?

Using ctid (physical row location) is faster than index-based deletion:

-- Slow: requires index scan for each row
DELETE FROM table WHERE id IN (SELECT id FROM ...);

-- Fast: direct physical row access
DELETE FROM table WHERE ctid = ANY(ARRAY(SELECT ctid FROM ...));

Concurrency Control

Advisory Lock Mechanism

-- At start of ttl_runner()
SELECT pg_try_advisory_lock(hashtext('pg_ttl_index_runner'));

-- If lock acquired = true:
-- Proceed with cleanup
-- If lock acquired = false:
-- Another instance is running, skip this run

-- At end of ttl_runner()
SELECT pg_advisory_unlock(hashtext('pg_ttl_index_runner'));

Benefits:

  • Prevents overlapping cleanup runs
  • Safe in clustered/replica environments
  • No database schema changes required

Auto-Indexing

Index Creation

When ttl_create_index() is called:

1. Generate index name: idx_ttl_{table}_{column}
2. Execute: CREATE INDEX IF NOT EXISTS idx_ttl_... ON table(column)
3. Store index name in ttl_index_table.index_name

Example:

SELECT ttl_create_index('sessions', 'created_at', 3600);
-- Creates: idx_ttl_sessions_created_at

Index Usage

The index accelerates the expired row selection:

-- Without index: full table scan
SELECT ctid FROM table WHERE created_at < NOW() - INTERVAL '1 hour';

-- With index: index scan (much faster)
SELECT ctid FROM table WHERE created_at < NOW() - INTERVAL '1 hour';

Batch Deletion Strategy

Why Batching?

┌─────────────────────────────────────┐
│ Without Batching (delete all) │
├─────────────────────────────────────┤
│ ❌ Long-running transactions │
│ ❌ Locks table for extended period │
│ ❌ Large WAL generation │
│ ❌ bloat and performance issues │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ With Batching (batch_size = 10K) │
├─────────────────────────────────────┤
│ ✅ Short transactions │
│ ✅ Minimal locking │
│ ✅ Manageable WAL size │
│ ✅ Other queries can proceed │
└─────────────────────────────────────┘

Batch Processing Flow

Expired rows: 250,000
Batch size: 10,000

Iteration 1: Delete 10,000 rows → Sleep 10ms
Iteration 2: Delete 10,000 rows → Sleep 10ms
...
Iteration 25: Delete 10,000 rows → Sleep 10ms
Total: 250,000 rows deleted in 25 batches

Statistics Tracking

How Statistics are Updated

-- At end of each table's cleanup:
UPDATE ttl_index_table
SET
last_run = NOW(),
rows_deleted_last_run = {rows deleted this run},
total_rows_deleted = total_rows_deleted + {rows deleted this run}
WHERE table_name = ... AND column_name = ...;

Counter Precision

  • rows_deleted_last_run: BIGINT (max: 9,223,372,036,854,775,807)
  • total_rows_deleted: BIGINT (will never overflow in practice)

Error Handling

Per-Table Isolation

FOR rec IN SELECT ... FROM ttl_index_table LOOP
BEGIN
-- Cleanup for this table
...
EXCEPTION WHEN OTHERS THEN
-- Log error, continue to next table
RAISE WARNING 'Failed to cleanup %: %', rec.table_name, SQLERRM;
END;
END LOOP;

Result: Error in one table doesn't affect others.

Performance Characteristics

Time Complexity

OperationComplexityNotes
Create TTL indexO(n log n)Index creation on existing table
Delete batchO(batch_size)With index
Find expired rowsO(log n)Binary search via index
Update statisticsO(1)Single row update

Space Complexity

  • ttl_index_table: O(m) where m = number of TTL configurations (typically < 100)
  • Auto-created indexes: O(n) per table, where n = table size

Resource Usage

Per cleanup run:

  • CPU: Minimal (index lookups + deletes)
  • Memory: ~10-50 MB (batch processing)
  • I/O: Proportional to deleted rows
  • WAL: ~(row_size × rows_deleted)

Comparison with Alternatives

vs. Triggers

Featurepg_ttl_indexTriggers
Performance✅ Batch deletion❌ Per-row overhead
Complexity✅ Simple setup❌ Complex trigger logic
Control✅ Centralized❌ Spread across tables

vs. CRON Jobs

Featurepg_ttl_indexCRON
Integration✅ Native PostgreSQL❌ External dependency
Monitoring✅ Built-in views❌ Custom logging
Concurrency✅ Advisory locks❌ Manual coordination

vs. Partitioning

Featurepg_ttl_indexPartitioning
Granularity✅ Row-level❌ Partition-level
Setup✅ Simple function call❌ Complex schema changes
Flexibility✅ Change TTL anytime❌ Requires repartitioning

Internal Functions (Not Public API)

These functions exist but are not meant for direct use:

  • pg_ttl_index_main() - Worker main loop (C)
  • ttl_worker_launch() - Internal worker launcher (C)

Source Code Organization

src/
├── pg_ttl_index.c # Extension initialization, _PG_init()
├── pg_ttl_index.h # Header file, constants
├── worker.c # Background worker implementation
├── api.c # Public C functions (start/stop worker)
├── utils.c # Helper functions
└── utils.h # Helper headers

See Also