Schema Extraction

The extraction phase connects to your PostgreSQL database and exports its complete schema as a JSON representation. This JSON serves as the “current state” for schema comparison.

How It Works

pgtofu queries PostgreSQL’s system catalogs directly to gather comprehensive schema information:

┌───────────────────────────────────────┐
│        PostgreSQL Database            │
├──────────────────────────────────────┤
│  pg_catalog                           │
│  ├── pg_class (tables, indexes)       │
│  ├── pg_attribute (columns)           │
│  ├── pg_constraint (constraints)      │
│  ├── pg_index (index details)         │
│  ├── pg_proc (functions)              │
│  ├── pg_trigger (triggers)            │
│  ├── pg_type (types)                  │
│  └── pg_extension (extensions)        │
└───────────────────────────────────────┘
              │
              ▼
┌───────────────────────────────────────┐
│     pgtofu Extractor                  │
│     ─────────────────                 │
│     Queries → Parsing → Normalization │
└───────────────────────────────────────┘
              │
              ▼
┌───────────────────────────────────────┐
│     JSON Schema Output                │
│     ──────────────────                │
│     {                                 │
│       "tables": [...],                │
│       "views": [...],                 │
│       "functions": [...]              │
│     }                                 │
└───────────────────────────────────────┘

Extraction Order

Objects are extracted in dependency order:

Schemas - Database namespaces
Extensions - Installed extensions
Custom Types - Enums, composites, domains
Sequences - Serial/identity sources
Tables - With columns, constraints, indexes
Views - Regular and materialized
Functions - All languages
Triggers - Table triggers
Hypertables - TimescaleDB (if installed)
Continuous Aggregates - TimescaleDB (if installed)

What Gets Extracted

Tables

For each table, pgtofu extracts:

{
  "schema": "public",
  "name": "users",
  "columns": [
    {
      "name": "id",
      "data_type": "bigint",
      "position": 1,
      "is_nullable": false,
      "is_identity": true,
      "identity_generation": "ALWAYS"
    },
    {
      "name": "email",
      "data_type": "character varying",
      "max_length": 255,
      "is_nullable": false
    }
  ],
  "constraints": [
    {
      "name": "users_pkey",
      "type": "PRIMARY KEY",
      "columns": ["id"]
    },
    {
      "name": "users_email_key",
      "type": "UNIQUE",
      "columns": ["email"]
    }
  ],
  "indexes": [
    {
      "name": "idx_users_created_at",
      "columns": ["created_at"],
      "type": "btree"
    }
  ],
  "comment": "Application users"
}

Column Details

Property	Description
`name`	Column name
`data_type`	PostgreSQL type name
`position`	Ordinal position (1-based)
`is_nullable`	Whether NULL is allowed
`default`	Default value expression
`max_length`	For VARCHAR/CHAR types
`precision`, `scale`	For NUMERIC types
`is_array`	Whether it’s an array type
`is_identity`	GENERATED AS IDENTITY column
`identity_generation`	ALWAYS or BY DEFAULT
`is_generated`	Computed/generated column
`generation_expression`	Expression for generated columns
`comment`	Column comment

Constraint Details

Constraint Type	Extracted Properties
PRIMARY KEY	Columns
FOREIGN KEY	Columns, referenced table, ON DELETE/UPDATE
UNIQUE	Columns, deferrable settings
CHECK	Check expression
EXCLUDE	Columns, operators, where clause

Index Details

Property	Description
`name`	Index name
`columns`	Indexed columns
`include_columns`	INCLUDE columns (covering index)
`type`	btree, hash, gin, gist, etc.
`is_unique`	Unique index
`where`	Partial index condition
`definition`	Full CREATE INDEX statement

Functions

{
  "schema": "public",
  "name": "update_updated_at",
  "language": "plpgsql",
  "argument_types": [],
  "return_type": "trigger",
  "volatility": "VOLATILE",
  "is_strict": false,
  "is_security_definer": false,
  "body": "BEGIN\n    NEW.updated_at = NOW();\n    RETURN NEW;\nEND;",
  "definition": "CREATE OR REPLACE FUNCTION..."
}

Views

{
  "schema": "public",
  "name": "active_users",
  "definition": "SELECT id, email, name FROM users WHERE status = 'active'",
  "is_updatable": false,
  "check_option": "NONE"
}

TimescaleDB Objects

If TimescaleDB is installed, pgtofu also extracts:

{
  "hypertables": [
    {
      "schema": "public",
      "table_name": "metrics",
      "time_column_name": "time",
      "chunk_time_interval": "1 day",
      "compression_enabled": true,
      "compression_settings": {
        "segment_by_columns": ["device_id"],
        "order_by_columns": [{"column": "time", "desc": true}]
      },
      "retention_policy": {
        "drop_after": "2 years"
      }
    }
  ],
  "continuous_aggregates": [
    {
      "schema": "public",
      "view_name": "metrics_hourly",
      "query": "SELECT time_bucket('1 hour', time) AS bucket...",
      "refresh_policy": {
        "start_offset": "3 days",
        "end_offset": "1 hour",
        "schedule_interval": "1 hour"
      }
    }
  ]
}

Schema Filtering

Automatically Excluded

pgtofu excludes system and internal schemas by default:

Category	Schemas
PostgreSQL	`pg_catalog`, `information_schema`, `pg_toast`
TimescaleDB	`_timescaledb_*`, `timescaledb_information`, `timescaledb_internal`
Hasura	`hdb_catalog`

Manual Exclusion

Exclude additional schemas using --exclude-schema:

pgtofu extract \
  --exclude-schema _prisma \
  --exclude-schema graphql_public \
  --output schema.json

Common third-party schemas to exclude:

Tool	Schemas
Prisma	`_prisma`, `_prisma_migrations`
Supabase	`auth`, `storage`, `graphql_public`, `supabase_*`
PostgREST	`postgrest`
pgAdmin	`pgagent`

Type Normalization

pgtofu normalizes type names for consistent comparison:

PostgreSQL Representation	Normalized Form
`int4`	`integer`
`int8`	`bigint`
`bool`	`boolean`
`varchar`	`character varying`
`timestamp`	`timestamp without time zone`
`timestamptz`	`timestamp with time zone`

Performance Considerations

Query Efficiency

All extraction queries use indexed system catalog columns
Single-pass extraction minimizes database round trips
Typical extraction takes 5-30 seconds for most databases

Large Databases

For databases with 1000+ tables:

Consider extracting specific schemas
Use output streaming (--output -)
Extraction time may reach 60+ seconds

Troubleshooting

Permission errors

Ensure the user has SELECT access to system catalogs:

GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO myuser;
GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO myuser;

Missing objects

Objects in excluded schemas won’t appear. Check the exclusion list:

System schemas (automatic)
--exclude-schema flags

TimescaleDB not detected

Ensure TimescaleDB extension is installed and accessible:

SELECT * FROM pg_extension WHERE extname = 'timescaledb';

Output Verification

After extraction, verify the output:

# Check table count
jq '.tables | length' schema.json

# List all tables
jq '.tables[].name' schema.json

# Check specific table
jq '.tables[] | select(.name == "users")' schema.json

Getting Started

Core Concepts

Features

Workflows

Development

Schema Extraction

Schema Extraction

How It Works

Extraction Order

What Gets Extracted

Tables

Column Details

Constraint Details

Index Details

Functions

Views

TimescaleDB Objects

Schema Filtering

Automatically Excluded

Manual Exclusion

Type Normalization

Performance Considerations

Query Efficiency

Large Databases

Troubleshooting

Output Verification

See Also

Getting Started

Core Concepts

Features

Workflows

Development

​Schema Extraction

​How It Works

​Extraction Order

​What Gets Extracted

​Tables

​Column Details

​Constraint Details

​Index Details

​Functions

​Views

​TimescaleDB Objects

​Schema Filtering

​Automatically Excluded

​Manual Exclusion

​Type Normalization

​Performance Considerations

​Query Efficiency

​Large Databases

​Troubleshooting

​Output Verification

​See Also

Schema Extraction

How It Works

Extraction Order

What Gets Extracted

Tables

Column Details

Constraint Details

Index Details

Functions

Views

TimescaleDB Objects

Schema Filtering

Automatically Excluded

Manual Exclusion

Type Normalization

Performance Considerations

Query Efficiency

Large Databases

Troubleshooting

Output Verification

See Also