Skip to main content

extract

The extract command connects to a PostgreSQL database and exports its complete schema to a JSON file. This JSON representation serves as the “current state” for schema comparison.

Usage

pgtofu extract [flags]

Flags

FlagShortDescriptionDefault
--database-urlPostgreSQL connection URL$DATABASE_URL
--output-oOutput file path (- for stdout)schema.json
--exclude-schemaAdditional schemas to exclude (repeatable)
--help-hHelp for extract

Examples

Basic Extraction

# Using command-line flag
pgtofu extract --database-url "postgres://user:pass@localhost:5432/mydb" \
  --output current-schema.json

# Using environment variable
export DATABASE_URL="postgres://user:pass@localhost:5432/mydb"
pgtofu extract --output current-schema.json

Output to stdout

pgtofu extract --output - | jq '.tables | length'

Exclude Additional Schemas

# Exclude Prisma and GraphQL schemas
pgtofu extract \
  --exclude-schema _prisma \
  --exclude-schema graphql_public \
  --output current-schema.json

Docker

docker run --rm \
  -v "$(pwd):/workspace" \
  -w /workspace \
  -e DATABASE_URL="$DATABASE_URL" \
  accented/pgtofu:latest extract --output current-schema.json

Connection URL Format

The PostgreSQL connection URL follows this format:
postgres://[user[:password]@][host][:port][/database][?param1=value1&...]

Examples

# Basic connection
postgres://user:password@localhost:5432/mydb

# With SSL mode
postgres://user:[email protected]:5432/mydb?sslmode=require

# With connection timeout
postgres://user:password@localhost:5432/mydb?connect_timeout=10

# Using socket connection
postgres://user:password@/mydb?host=/var/run/postgresql

SSL Modes

ModeDescription
disableNo SSL
allowTry non-SSL, fall back to SSL
preferTry SSL, fall back to non-SSL (default)
requireSSL required, no certificate verification
verify-caSSL required, verify CA
verify-fullSSL required, verify CA and hostname

Schema Exclusion

Automatically Excluded Schemas

pgtofu automatically excludes system and internal schemas:
CategorySchemas
PostgreSQLpg_catalog, information_schema, pg_toast
TimescaleDBtimescaledb_information, timescaledb_internal, _timescaledb_*
Hasurahdb_catalog

Excluding Third-Party Schemas

Use --exclude-schema to exclude schemas from third-party tools:
# Prisma
pgtofu extract --exclude-schema _prisma --output schema.json

# Supabase
pgtofu extract \
  --exclude-schema auth \
  --exclude-schema storage \
  --exclude-schema graphql_public \
  --output schema.json

# PostgREST
pgtofu extract --exclude-schema postgrest --output schema.json

Output Format

The extracted schema is a JSON object containing all database objects:
{
  "version": "1.0",
  "database_name": "mydb",
  "extracted_at": "2024-01-15T10:30:00Z",
  "schemas": [
    {"name": "public", "owner": "postgres"}
  ],
  "extensions": [
    {"name": "uuid-ossp", "schema": "public", "version": "1.1"}
  ],
  "custom_types": [
    {"schema": "public", "name": "status_enum", "type": "enum", "values": ["pending", "active"]}
  ],
  "sequences": [...],
  "tables": [
    {
      "schema": "public",
      "name": "users",
      "columns": [
        {"name": "id", "data_type": "bigint", "is_nullable": false, ...}
      ],
      "constraints": [...],
      "indexes": [...]
    }
  ],
  "views": [...],
  "materialized_views": [...],
  "functions": [...],
  "triggers": [...],
  "hypertables": [...],
  "continuous_aggregates": [...]
}

What Gets Extracted

  • Column definitions (types, nullability, defaults)
  • Primary keys and foreign keys
  • Unique constraints and check constraints
  • Exclusion constraints
  • Indexes (including partial and covering indexes)
  • Table comments
  • Partition information (for partitioned tables)
  • Regular views with definitions
  • Materialized views with refresh settings
  • View indexes (for materialized views)
  • View comments
  • Function signatures (name, arguments, return type)
  • Function bodies (preserving original language)
  • Volatility settings (STABLE, VOLATILE, IMMUTABLE)
  • Security settings (DEFINER vs INVOKER)
  • Triggers with timing and events
  • Enum types with values
  • Composite types with fields
  • Domain types with constraints
  • Sequences with start/increment values
  • Hypertables with dimensions
  • Compression policies and settings
  • Retention policies
  • Continuous aggregates
  • Refresh policies

Performance Considerations

  • Extraction queries the PostgreSQL system catalogs directly
  • Large databases (1000+ tables) may take 30-60 seconds
  • Consider using --output - with streaming for very large schemas

Troubleshooting

Ensure the database is running and accessible:
psql "$DATABASE_URL" -c "SELECT 1"
The user needs read access to system catalogs. Grant if needed:
GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO myuser;
Try different SSL modes:
DATABASE_URL="postgres://...?sslmode=require"
DATABASE_URL="postgres://...?sslmode=disable"

See Also

  • diff - Compare extracted schema with desired schema
  • generate - Generate migrations from schema differences