Skip to main content

extractor Package

The extractor package queries PostgreSQL system catalogs to build a complete schema representation.
import "github.com/accented-ai/pgtofu/internal/extractor"

Extractor

Main type for schema extraction.
type Extractor struct {
    pool    *database.Pool
    options Options
}

func New(pool *database.Pool, options Options) *Extractor

Options

type Options struct {
    // Additional schemas to exclude (beyond system defaults)
    ExcludeSchemas []string
}

func DefaultOptions() Options

Extract

Extracts the complete database schema.
func (e *Extractor) Extract(ctx context.Context) (*schema.Database, error)
Extraction Order:
  1. Schemas
  2. Extensions
  3. Custom Types
  4. Sequences
  5. Tables (with columns, constraints)
  6. Indexes
  7. Views
  8. Materialized Views
  9. Functions
  10. Triggers
  11. Hypertables (if TimescaleDB)
  12. Continuous Aggregates (if TimescaleDB)

Example Usage

import (
    "context"
    "github.com/accented-ai/pgtofu/internal/extractor"
    "github.com/accented-ai/pgtofu/pkg/database"
)

func extractDatabase(ctx context.Context, dbURL string) (*schema.Database, error) {
    // Create connection pool
    pool, err := database.NewPoolFromURL(ctx, dbURL)
    if err != nil {
        return nil, fmt.Errorf("connecting to database: %w", err)
    }
    defer pool.Close()

    // Create extractor with options
    opts := extractor.Options{
        ExcludeSchemas: []string{
            "_prisma",
            "graphql_public",
        },
    }
    ext := extractor.New(pool, opts)

    // Extract schema
    db, err := ext.Extract(ctx)
    if err != nil {
        return nil, fmt.Errorf("extracting schema: %w", err)
    }

    return db, nil
}

System Schema Exclusion

These schemas are excluded by default:
CategorySchemas
PostgreSQLpg_catalog, information_schema, pg_toast
TimescaleDB_timescaledb_cache, _timescaledb_catalog, _timescaledb_config, _timescaledb_internal, timescaledb_information, timescaledb_internal
Hasurahdb_catalog

Internal Extraction Methods

Tables

func (e *Extractor) extractTables(ctx context.Context) ([]schema.Table, error)
Queries:
  • pg_class - Table metadata
  • pg_attribute - Column definitions
  • pg_constraint - Constraints
  • pg_attrdef - Default values

Indexes

func (e *Extractor) extractIndexes(ctx context.Context, tables []schema.Table) error
Queries:
  • pg_index - Index metadata
  • pg_class - Index properties
  • pg_am - Access method (btree, hash, etc.)

Functions

func (e *Extractor) extractFunctions(ctx context.Context) ([]schema.Function, error)
Queries:
  • pg_proc - Function definitions
  • pg_type - Argument and return types
  • pg_language - Implementation language

TimescaleDB

func (e *Extractor) extractHypertables(ctx context.Context) ([]schema.Hypertable, error)
func (e *Extractor) extractContinuousAggregates(ctx context.Context) ([]schema.ContinuousAggregate, error)
Queries TimescaleDB-specific catalog views:
  • timescaledb_information.hypertables
  • timescaledb_information.dimensions
  • timescaledb_information.compression_settings
  • timescaledb_information.continuous_aggregates

Error Handling

Extraction errors are wrapped with context:
// Common error patterns
if err != nil {
    return nil, fmt.Errorf("extracting tables: %w", err)
}

if err != nil {
    return nil, fmt.Errorf("table %s: extracting columns: %w", tableName, err)
}

Performance Considerations

  • Single database connection used throughout extraction
  • Queries are optimized to minimize round trips
  • Large databases (1000+ tables) may take 30-60 seconds
  • Context timeout should be set appropriately (default: 5 minutes)
// Set custom timeout
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
defer cancel()

db, err := ext.Extract(ctx)

See Also