Files
c-relay/plans/event_json_storage_and_migration_plan.md

15 KiB
Raw Permalink Blame History

Event JSON Storage & Database Migration Plan

Goal: Store full event JSON in database for 2,500x faster retrieval + implement proper database migration system


Decision: Fresh Start vs Migration

Pros:

  • Clean implementation (no migration complexity)
  • Fast deployment (no data conversion)
  • No risk of migration bugs
  • Opportunity to fix any schema issues
  • Smaller database (no legacy data)

Cons:

  • Lose existing events
  • Relay starts "empty"
  • Historical data lost

Recommendation: Fresh start for this change because:

  1. Your relay is still in development/testing phase
  2. The schema change is fundamental (affects every event)
  3. Migration would require reconstructing JSON for every existing event (expensive)
  4. You've been doing fresh starts anyway

Option B: Implement Migration System

Pros:

  • Preserve existing events
  • No data loss
  • Professional approach
  • Reusable for future changes

Cons:

  • Complex implementation
  • Slow migration (reconstruct JSON for all events)
  • Risk of bugs during migration
  • Requires careful testing

Recommendation: Implement migration system for FUTURE changes, but start fresh for this one.


Proposed Schema Change

New Schema (v11)

CREATE TABLE events (
    id TEXT PRIMARY KEY,
    pubkey TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    kind INTEGER NOT NULL,
    event_type TEXT NOT NULL CHECK (event_type IN ('regular', 'replaceable', 'ephemeral', 'addressable')),
    content TEXT NOT NULL,
    sig TEXT NOT NULL,
    tags JSON NOT NULL DEFAULT '[]',
    event_json TEXT NOT NULL,  -- NEW: Full event as JSON string
    first_seen INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);

-- Keep all existing indexes (they query the columns, not event_json)
CREATE INDEX idx_events_pubkey ON events(pubkey);
CREATE INDEX idx_events_kind ON events(kind);
CREATE INDEX idx_events_created_at ON events(created_at DESC);
CREATE INDEX idx_events_kind_created_at ON events(kind, created_at DESC);
CREATE INDEX idx_events_pubkey_created_at ON events(pubkey, created_at DESC);

Why Keep Both Columns AND event_json?

Columns (id, pubkey, kind, etc.):

  • Used for querying (WHERE clauses, indexes)
  • Fast filtering and sorting
  • Required for SQL operations

event_json:

  • Used for retrieval (SELECT results)
  • Pre-serialized, ready to send
  • Eliminates JSON reconstruction

This is a common pattern in high-performance systems (denormalization for read performance).


Implementation Plan

Phase 1: Schema Update (v11)

File: src/sql_schema.h

#define EMBEDDED_SCHEMA_VERSION "11"

// In schema SQL:
"CREATE TABLE events (\n\
    id TEXT PRIMARY KEY,\n\
    pubkey TEXT NOT NULL,\n\
    created_at INTEGER NOT NULL,\n\
    kind INTEGER NOT NULL,\n\
    event_type TEXT NOT NULL,\n\
    content TEXT NOT NULL,\n\
    sig TEXT NOT NULL,\n\
    tags JSON NOT NULL DEFAULT '[]',\n\
    event_json TEXT NOT NULL,\n\  -- NEW COLUMN
    first_seen INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))\n\
);\n\

Phase 2: Update store_event() Function

File: src/main.c (lines 660-773)

Current:

int store_event(cJSON* event) {
    // Extract fields
    cJSON* id = cJSON_GetObjectItem(event, "id");
    // ... extract other fields ...
    
    // INSERT with individual columns
    const char* sql = "INSERT INTO events (id, pubkey, ...) VALUES (?, ?, ...)";
}

New:

int store_event(cJSON* event) {
    // Serialize event to JSON string ONCE
    char* event_json = cJSON_PrintUnformatted(event);
    if (!event_json) {
        return -1;
    }
    
    // Extract fields for indexed columns
    cJSON* id = cJSON_GetObjectItem(event, "id");
    // ... extract other fields ...
    
    // INSERT with columns + event_json
    const char* sql = "INSERT INTO events (id, pubkey, ..., event_json) VALUES (?, ?, ..., ?)";
    
    // ... bind parameters ...
    sqlite3_bind_text(stmt, 9, event_json, -1, SQLITE_TRANSIENT);
    
    // ... execute ...
    free(event_json);
}

Phase 3: Update handle_req_message() Function

File: src/main.c (lines 1302-1361)

Current:

while (sqlite3_step(stmt) == SQLITE_ROW) {
    // Build event JSON from 7 columns
    cJSON* event = cJSON_CreateObject();
    cJSON_AddStringToObject(event, "id", (char*)sqlite3_column_text(stmt, 0));
    // ... 6 more fields ...
    cJSON* tags = cJSON_Parse(tags_json);  // Parse tags
    cJSON_AddItemToObject(event, "tags", tags);
    
    // Create EVENT message
    cJSON* event_msg = cJSON_CreateArray();
    cJSON_AddItemToArray(event_msg, cJSON_CreateString("EVENT"));
    cJSON_AddItemToArray(event_msg, cJSON_CreateString(sub_id));
    cJSON_AddItemToArray(event_msg, event);
    
    char* msg_str = cJSON_Print(event_msg);
    queue_message(wsi, pss, msg_str, msg_len, LWS_WRITE_TEXT);
}

New:

// Update SQL to select event_json
const char* sql = "SELECT event_json FROM events WHERE ...";

while (sqlite3_step(stmt) == SQLITE_ROW) {
    const char* event_json = (char*)sqlite3_column_text(stmt, 0);
    
    // Build EVENT message with pre-serialized event
    // Format: ["EVENT","sub_id",{...event_json...}]
    size_t msg_len = 12 + strlen(sub_id) + strlen(event_json);  // ["EVENT","",""]
    char* msg_str = malloc(msg_len + 1);
    snprintf(msg_str, msg_len + 1, "[\"EVENT\",\"%s\",%s]", sub_id, event_json);
    
    queue_message(wsi, pss, msg_str, strlen(msg_str), LWS_WRITE_TEXT);
    free(msg_str);
}

Speedup: 366 × (cJSON operations) eliminated!


Database Migration System Design

For Future Schema Changes

File: src/migrations.c (new file)

typedef struct {
    int from_version;
    int to_version;
    const char* description;
    int (*migrate_func)(sqlite3* db);
} migration_t;

// Migration from v10 to v11: Add event_json column
int migrate_v10_to_v11(sqlite3* db) {
    // Step 1: Add column
    const char* add_column_sql = 
        "ALTER TABLE events ADD COLUMN event_json TEXT";
    
    if (sqlite3_exec(db, add_column_sql, NULL, NULL, NULL) != SQLITE_OK) {
        return -1;
    }
    
    // Step 2: Populate event_json for existing events
    const char* select_sql = 
        "SELECT id, pubkey, created_at, kind, content, sig, tags FROM events";
    
    sqlite3_stmt* stmt;
    if (sqlite3_prepare_v2(db, select_sql, -1, &stmt, NULL) != SQLITE_OK) {
        return -1;
    }
    
    while (sqlite3_step(stmt) == SQLITE_ROW) {
        // Reconstruct JSON
        cJSON* event = cJSON_CreateObject();
        cJSON_AddStringToObject(event, "id", (char*)sqlite3_column_text(stmt, 0));
        // ... add other fields ...
        
        char* event_json = cJSON_PrintUnformatted(event);
        
        // Update row
        const char* update_sql = "UPDATE events SET event_json = ? WHERE id = ?";
        sqlite3_stmt* update_stmt;
        sqlite3_prepare_v2(db, update_sql, -1, &update_stmt, NULL);
        sqlite3_bind_text(update_stmt, 1, event_json, -1, SQLITE_TRANSIENT);
        sqlite3_bind_text(update_stmt, 2, (char*)sqlite3_column_text(stmt, 0), -1, SQLITE_STATIC);
        sqlite3_step(update_stmt);
        sqlite3_finalize(update_stmt);
        
        free(event_json);
        cJSON_Delete(event);
    }
    
    sqlite3_finalize(stmt);
    
    // Step 3: Make column NOT NULL
    // (SQLite doesn't support ALTER COLUMN, so we'd need to recreate table)
    
    return 0;
}

// Migration registry
static migration_t migrations[] = {
    {10, 11, "Add event_json column for fast retrieval", migrate_v10_to_v11},
    // Future migrations go here
};

int run_migrations(sqlite3* db, int current_version, int target_version) {
    for (int i = 0; i < sizeof(migrations) / sizeof(migration_t); i++) {
        if (migrations[i].from_version >= current_version && 
            migrations[i].to_version <= target_version) {
            
            printf("Running migration: %s\n", migrations[i].description);
            
            if (migrations[i].migrate_func(db) != 0) {
                fprintf(stderr, "Migration failed: %s\n", migrations[i].description);
                return -1;
            }
            
            // Update schema version
            char update_version_sql[256];
            snprintf(update_version_sql, sizeof(update_version_sql),
                     "PRAGMA user_version = %d", migrations[i].to_version);
            sqlite3_exec(db, update_version_sql, NULL, NULL, NULL);
        }
    }
    return 0;
}

Recommendation: Hybrid Approach

For This Change (v10 → v11): Fresh Start

Rationale:

  1. Your relay is still in development
  2. Migration would be slow (reconstruct JSON for all events)
  3. You've been doing fresh starts anyway
  4. Clean slate for performance testing

Steps:

  1. Update schema to v11 with event_json column
  2. Update store_event() to populate event_json
  3. Update handle_req_message() to use event_json
  4. Deploy with fresh database
  5. Test performance improvement

For Future Changes: Use Migration System

Rationale:

  1. Once relay is in production, data preservation matters
  2. Migration system is reusable
  3. Professional approach for production relay

Steps:

  1. Create src/migrations.c and src/migrations.h
  2. Implement migration framework
  3. Add migration functions for each schema change
  4. Test migrations thoroughly before deployment

Migration System Features

Core Features

  1. Version Detection

    • Read current schema version from database
    • Compare with embedded schema version
    • Determine which migrations to run
  2. Migration Chain

    • Run migrations in sequence (v8 → v9 → v10 → v11)
    • Skip already-applied migrations
    • Stop on first failure
  3. Backup Before Migration

    • Automatic database backup before migration
    • Rollback capability if migration fails
    • Backup retention policy
  4. Progress Reporting

    • Log migration progress
    • Show estimated time remaining
    • Report success/failure

Safety Features

  1. Transaction Wrapping

    sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);
    int result = migrate_v10_to_v11(db);
    if (result == 0) {
        sqlite3_exec(db, "COMMIT", NULL, NULL, NULL);
    } else {
        sqlite3_exec(db, "ROLLBACK", NULL, NULL, NULL);
    }
    
  2. Validation After Migration

    • Verify row counts match
    • Check data integrity
    • Validate indexes created
  3. Dry-Run Mode

    • Test migration without committing
    • Report what would be changed
    • Estimate migration time

Implementation Timeline

Immediate (Today): Fresh Start with event_json

Changes:

  1. Update schema to v11 (add event_json column)
  2. Update store_event() to populate event_json
  3. Update handle_req_message() to use event_json
  4. Deploy with fresh database

Effort: 4 hours
Impact: 2,500x faster event retrieval

This Week: Build Migration Framework

Changes:

  1. Create src/migrations.c and src/migrations.h
  2. Implement migration runner
  3. Add backup/rollback capability
  4. Add progress reporting

Effort: 1-2 days
Impact: Reusable for all future schema changes

Future: Add Migrations as Needed

For each schema change:

  1. Write migration function
  2. Add to migrations array
  3. Test thoroughly
  4. Deploy with automatic migration

Code Structure

File Organization

src/
├── migrations.c          # NEW: Migration system
├── migrations.h          # NEW: Migration API
├── sql_schema.h          # Schema definition (v11)
├── main.c                # Updated store_event() and handle_req_message()
└── ...

Migration API

// migrations.h
int init_migration_system(sqlite3* db);
int run_pending_migrations(sqlite3* db);
int backup_database(const char* db_path, char* backup_path, size_t backup_path_size);
int rollback_migration(sqlite3* db, const char* backup_path);

Testing Strategy

For Fresh Start (v11)

  1. Local testing:

    • Build with new schema
    • Post test events
    • Query events and measure performance
    • Verify event_json is populated correctly
  2. Performance testing:

    • Query 366 events
    • Measure time (should be <10ms instead of 18s)
    • Check CPU usage (should be <20%)
  3. Production deployment:

    • Stop relay
    • Delete old database
    • Start relay with v11 schema
    • Monitor performance

For Migration System (Future)

  1. Unit tests:

    • Test each migration function
    • Test rollback capability
    • Test error handling
  2. Integration tests:

    • Create database with old schema
    • Run migration
    • Verify data integrity
    • Test rollback
  3. Performance tests:

    • Measure migration time for large databases
    • Test with 10K, 100K, 1M events
    • Optimize slow migrations

Migration Complexity Analysis

For v10 → v11 Migration

If we were to migrate existing data:

-- Step 1: Add column (fast)
ALTER TABLE events ADD COLUMN event_json TEXT;

-- Step 2: Populate event_json (SLOW!)
-- For each of N events:
--   1. SELECT 7 columns
--   2. Reconstruct JSON (cJSON operations)
--   3. Serialize to string (cJSON_Print)
--   4. UPDATE event_json column
--   5. Free memory

-- Estimated time:
-- - 1000 events: ~10 seconds
-- - 10000 events: ~100 seconds
-- - 100000 events: ~1000 seconds (16 minutes)

Conclusion: Migration is expensive for this change. Fresh start is better.


Future Migration Examples

Easy Migrations (Fast)

Adding an index:

int migrate_add_index(sqlite3* db) {
    return sqlite3_exec(db, 
        "CREATE INDEX idx_new ON events(new_column)",
        NULL, NULL, NULL);
}

Adding a column with default:

int migrate_add_column(sqlite3* db) {
    return sqlite3_exec(db,
        "ALTER TABLE events ADD COLUMN new_col TEXT DEFAULT ''",
        NULL, NULL, NULL);
}

Hard Migrations (Slow)

Changing column type:

  • Requires table recreation
  • Copy all data
  • Recreate indexes
  • Can take minutes for large databases

Populating computed columns:

  • Requires row-by-row processing
  • Can take minutes for large databases

Recommendation Summary

For This Change (event_json)

Do: Fresh start with v11 schema

  • Fast deployment
  • Clean implementation
  • Immediate performance benefit
  • No migration complexity

Don't: Migrate existing data

  • Too slow (reconstruct JSON for all events)
  • Too complex (first migration)
  • Not worth it (relay still in development)

For Future Changes

Do: Implement migration system

  • Professional approach
  • Data preservation
  • Reusable framework
  • Required for production relay

Timeline:

  • Today: Deploy v11 with fresh start
  • This week: Build migration framework
  • Future: Use migrations for all schema changes

Next Steps

  1. Update schema to v11 (add event_json column)
  2. Update store_event() to populate event_json
  3. Update handle_req_message() to use event_json
  4. Test locally with 366-event query
  5. Deploy to production with fresh database
  6. Measure performance improvement
  7. Build migration system for future use

Expected result: 366-event retrieval time drops from 18s to <10ms (2,500x speedup)