15 KiB
Event JSON Storage & Database Migration Plan
Goal: Store full event JSON in database for 2,500x faster retrieval + implement proper database migration system
Decision: Fresh Start vs Migration
Option A: Fresh Start (Recommended for This Change)
Pros:
- ✅ Clean implementation (no migration complexity)
- ✅ Fast deployment (no data conversion)
- ✅ No risk of migration bugs
- ✅ Opportunity to fix any schema issues
- ✅ Smaller database (no legacy data)
Cons:
- ❌ Lose existing events
- ❌ Relay starts "empty"
- ❌ Historical data lost
Recommendation: Fresh start for this change because:
- Your relay is still in development/testing phase
- The schema change is fundamental (affects every event)
- Migration would require reconstructing JSON for every existing event (expensive)
- You've been doing fresh starts anyway
Option B: Implement Migration System
Pros:
- ✅ Preserve existing events
- ✅ No data loss
- ✅ Professional approach
- ✅ Reusable for future changes
Cons:
- ❌ Complex implementation
- ❌ Slow migration (reconstruct JSON for all events)
- ❌ Risk of bugs during migration
- ❌ Requires careful testing
Recommendation: Implement migration system for FUTURE changes, but start fresh for this one.
Proposed Schema Change
New Schema (v11)
CREATE TABLE events (
id TEXT PRIMARY KEY,
pubkey TEXT NOT NULL,
created_at INTEGER NOT NULL,
kind INTEGER NOT NULL,
event_type TEXT NOT NULL CHECK (event_type IN ('regular', 'replaceable', 'ephemeral', 'addressable')),
content TEXT NOT NULL,
sig TEXT NOT NULL,
tags JSON NOT NULL DEFAULT '[]',
event_json TEXT NOT NULL, -- NEW: Full event as JSON string
first_seen INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);
-- Keep all existing indexes (they query the columns, not event_json)
CREATE INDEX idx_events_pubkey ON events(pubkey);
CREATE INDEX idx_events_kind ON events(kind);
CREATE INDEX idx_events_created_at ON events(created_at DESC);
CREATE INDEX idx_events_kind_created_at ON events(kind, created_at DESC);
CREATE INDEX idx_events_pubkey_created_at ON events(pubkey, created_at DESC);
Why Keep Both Columns AND event_json?
Columns (id, pubkey, kind, etc.):
- Used for querying (WHERE clauses, indexes)
- Fast filtering and sorting
- Required for SQL operations
event_json:
- Used for retrieval (SELECT results)
- Pre-serialized, ready to send
- Eliminates JSON reconstruction
This is a common pattern in high-performance systems (denormalization for read performance).
Implementation Plan
Phase 1: Schema Update (v11)
File: src/sql_schema.h
#define EMBEDDED_SCHEMA_VERSION "11"
// In schema SQL:
"CREATE TABLE events (\n\
id TEXT PRIMARY KEY,\n\
pubkey TEXT NOT NULL,\n\
created_at INTEGER NOT NULL,\n\
kind INTEGER NOT NULL,\n\
event_type TEXT NOT NULL,\n\
content TEXT NOT NULL,\n\
sig TEXT NOT NULL,\n\
tags JSON NOT NULL DEFAULT '[]',\n\
event_json TEXT NOT NULL,\n\ -- NEW COLUMN
first_seen INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))\n\
);\n\
Phase 2: Update store_event() Function
File: src/main.c (lines 660-773)
Current:
int store_event(cJSON* event) {
// Extract fields
cJSON* id = cJSON_GetObjectItem(event, "id");
// ... extract other fields ...
// INSERT with individual columns
const char* sql = "INSERT INTO events (id, pubkey, ...) VALUES (?, ?, ...)";
}
New:
int store_event(cJSON* event) {
// Serialize event to JSON string ONCE
char* event_json = cJSON_PrintUnformatted(event);
if (!event_json) {
return -1;
}
// Extract fields for indexed columns
cJSON* id = cJSON_GetObjectItem(event, "id");
// ... extract other fields ...
// INSERT with columns + event_json
const char* sql = "INSERT INTO events (id, pubkey, ..., event_json) VALUES (?, ?, ..., ?)";
// ... bind parameters ...
sqlite3_bind_text(stmt, 9, event_json, -1, SQLITE_TRANSIENT);
// ... execute ...
free(event_json);
}
Phase 3: Update handle_req_message() Function
File: src/main.c (lines 1302-1361)
Current:
while (sqlite3_step(stmt) == SQLITE_ROW) {
// Build event JSON from 7 columns
cJSON* event = cJSON_CreateObject();
cJSON_AddStringToObject(event, "id", (char*)sqlite3_column_text(stmt, 0));
// ... 6 more fields ...
cJSON* tags = cJSON_Parse(tags_json); // Parse tags
cJSON_AddItemToObject(event, "tags", tags);
// Create EVENT message
cJSON* event_msg = cJSON_CreateArray();
cJSON_AddItemToArray(event_msg, cJSON_CreateString("EVENT"));
cJSON_AddItemToArray(event_msg, cJSON_CreateString(sub_id));
cJSON_AddItemToArray(event_msg, event);
char* msg_str = cJSON_Print(event_msg);
queue_message(wsi, pss, msg_str, msg_len, LWS_WRITE_TEXT);
}
New:
// Update SQL to select event_json
const char* sql = "SELECT event_json FROM events WHERE ...";
while (sqlite3_step(stmt) == SQLITE_ROW) {
const char* event_json = (char*)sqlite3_column_text(stmt, 0);
// Build EVENT message with pre-serialized event
// Format: ["EVENT","sub_id",{...event_json...}]
size_t msg_len = 12 + strlen(sub_id) + strlen(event_json); // ["EVENT","",""]
char* msg_str = malloc(msg_len + 1);
snprintf(msg_str, msg_len + 1, "[\"EVENT\",\"%s\",%s]", sub_id, event_json);
queue_message(wsi, pss, msg_str, strlen(msg_str), LWS_WRITE_TEXT);
free(msg_str);
}
Speedup: 366 × (cJSON operations) eliminated!
Database Migration System Design
For Future Schema Changes
File: src/migrations.c (new file)
typedef struct {
int from_version;
int to_version;
const char* description;
int (*migrate_func)(sqlite3* db);
} migration_t;
// Migration from v10 to v11: Add event_json column
int migrate_v10_to_v11(sqlite3* db) {
// Step 1: Add column
const char* add_column_sql =
"ALTER TABLE events ADD COLUMN event_json TEXT";
if (sqlite3_exec(db, add_column_sql, NULL, NULL, NULL) != SQLITE_OK) {
return -1;
}
// Step 2: Populate event_json for existing events
const char* select_sql =
"SELECT id, pubkey, created_at, kind, content, sig, tags FROM events";
sqlite3_stmt* stmt;
if (sqlite3_prepare_v2(db, select_sql, -1, &stmt, NULL) != SQLITE_OK) {
return -1;
}
while (sqlite3_step(stmt) == SQLITE_ROW) {
// Reconstruct JSON
cJSON* event = cJSON_CreateObject();
cJSON_AddStringToObject(event, "id", (char*)sqlite3_column_text(stmt, 0));
// ... add other fields ...
char* event_json = cJSON_PrintUnformatted(event);
// Update row
const char* update_sql = "UPDATE events SET event_json = ? WHERE id = ?";
sqlite3_stmt* update_stmt;
sqlite3_prepare_v2(db, update_sql, -1, &update_stmt, NULL);
sqlite3_bind_text(update_stmt, 1, event_json, -1, SQLITE_TRANSIENT);
sqlite3_bind_text(update_stmt, 2, (char*)sqlite3_column_text(stmt, 0), -1, SQLITE_STATIC);
sqlite3_step(update_stmt);
sqlite3_finalize(update_stmt);
free(event_json);
cJSON_Delete(event);
}
sqlite3_finalize(stmt);
// Step 3: Make column NOT NULL
// (SQLite doesn't support ALTER COLUMN, so we'd need to recreate table)
return 0;
}
// Migration registry
static migration_t migrations[] = {
{10, 11, "Add event_json column for fast retrieval", migrate_v10_to_v11},
// Future migrations go here
};
int run_migrations(sqlite3* db, int current_version, int target_version) {
for (int i = 0; i < sizeof(migrations) / sizeof(migration_t); i++) {
if (migrations[i].from_version >= current_version &&
migrations[i].to_version <= target_version) {
printf("Running migration: %s\n", migrations[i].description);
if (migrations[i].migrate_func(db) != 0) {
fprintf(stderr, "Migration failed: %s\n", migrations[i].description);
return -1;
}
// Update schema version
char update_version_sql[256];
snprintf(update_version_sql, sizeof(update_version_sql),
"PRAGMA user_version = %d", migrations[i].to_version);
sqlite3_exec(db, update_version_sql, NULL, NULL, NULL);
}
}
return 0;
}
Recommendation: Hybrid Approach
For This Change (v10 → v11): Fresh Start
Rationale:
- Your relay is still in development
- Migration would be slow (reconstruct JSON for all events)
- You've been doing fresh starts anyway
- Clean slate for performance testing
Steps:
- Update schema to v11 with event_json column
- Update store_event() to populate event_json
- Update handle_req_message() to use event_json
- Deploy with fresh database
- Test performance improvement
For Future Changes: Use Migration System
Rationale:
- Once relay is in production, data preservation matters
- Migration system is reusable
- Professional approach for production relay
Steps:
- Create
src/migrations.candsrc/migrations.h - Implement migration framework
- Add migration functions for each schema change
- Test migrations thoroughly before deployment
Migration System Features
Core Features
-
Version Detection
- Read current schema version from database
- Compare with embedded schema version
- Determine which migrations to run
-
Migration Chain
- Run migrations in sequence (v8 → v9 → v10 → v11)
- Skip already-applied migrations
- Stop on first failure
-
Backup Before Migration
- Automatic database backup before migration
- Rollback capability if migration fails
- Backup retention policy
-
Progress Reporting
- Log migration progress
- Show estimated time remaining
- Report success/failure
Safety Features
-
Transaction Wrapping
sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL); int result = migrate_v10_to_v11(db); if (result == 0) { sqlite3_exec(db, "COMMIT", NULL, NULL, NULL); } else { sqlite3_exec(db, "ROLLBACK", NULL, NULL, NULL); } -
Validation After Migration
- Verify row counts match
- Check data integrity
- Validate indexes created
-
Dry-Run Mode
- Test migration without committing
- Report what would be changed
- Estimate migration time
Implementation Timeline
Immediate (Today): Fresh Start with event_json
Changes:
- Update schema to v11 (add event_json column)
- Update store_event() to populate event_json
- Update handle_req_message() to use event_json
- Deploy with fresh database
Effort: 4 hours
Impact: 2,500x faster event retrieval
This Week: Build Migration Framework
Changes:
- Create src/migrations.c and src/migrations.h
- Implement migration runner
- Add backup/rollback capability
- Add progress reporting
Effort: 1-2 days
Impact: Reusable for all future schema changes
Future: Add Migrations as Needed
For each schema change:
- Write migration function
- Add to migrations array
- Test thoroughly
- Deploy with automatic migration
Code Structure
File Organization
src/
├── migrations.c # NEW: Migration system
├── migrations.h # NEW: Migration API
├── sql_schema.h # Schema definition (v11)
├── main.c # Updated store_event() and handle_req_message()
└── ...
Migration API
// migrations.h
int init_migration_system(sqlite3* db);
int run_pending_migrations(sqlite3* db);
int backup_database(const char* db_path, char* backup_path, size_t backup_path_size);
int rollback_migration(sqlite3* db, const char* backup_path);
Testing Strategy
For Fresh Start (v11)
-
Local testing:
- Build with new schema
- Post test events
- Query events and measure performance
- Verify event_json is populated correctly
-
Performance testing:
- Query 366 events
- Measure time (should be <10ms instead of 18s)
- Check CPU usage (should be <20%)
-
Production deployment:
- Stop relay
- Delete old database
- Start relay with v11 schema
- Monitor performance
For Migration System (Future)
-
Unit tests:
- Test each migration function
- Test rollback capability
- Test error handling
-
Integration tests:
- Create database with old schema
- Run migration
- Verify data integrity
- Test rollback
-
Performance tests:
- Measure migration time for large databases
- Test with 10K, 100K, 1M events
- Optimize slow migrations
Migration Complexity Analysis
For v10 → v11 Migration
If we were to migrate existing data:
-- Step 1: Add column (fast)
ALTER TABLE events ADD COLUMN event_json TEXT;
-- Step 2: Populate event_json (SLOW!)
-- For each of N events:
-- 1. SELECT 7 columns
-- 2. Reconstruct JSON (cJSON operations)
-- 3. Serialize to string (cJSON_Print)
-- 4. UPDATE event_json column
-- 5. Free memory
-- Estimated time:
-- - 1000 events: ~10 seconds
-- - 10000 events: ~100 seconds
-- - 100000 events: ~1000 seconds (16 minutes)
Conclusion: Migration is expensive for this change. Fresh start is better.
Future Migration Examples
Easy Migrations (Fast)
Adding an index:
int migrate_add_index(sqlite3* db) {
return sqlite3_exec(db,
"CREATE INDEX idx_new ON events(new_column)",
NULL, NULL, NULL);
}
Adding a column with default:
int migrate_add_column(sqlite3* db) {
return sqlite3_exec(db,
"ALTER TABLE events ADD COLUMN new_col TEXT DEFAULT ''",
NULL, NULL, NULL);
}
Hard Migrations (Slow)
Changing column type:
- Requires table recreation
- Copy all data
- Recreate indexes
- Can take minutes for large databases
Populating computed columns:
- Requires row-by-row processing
- Can take minutes for large databases
Recommendation Summary
For This Change (event_json)
Do: Fresh start with v11 schema
- Fast deployment
- Clean implementation
- Immediate performance benefit
- No migration complexity
Don't: Migrate existing data
- Too slow (reconstruct JSON for all events)
- Too complex (first migration)
- Not worth it (relay still in development)
For Future Changes
Do: Implement migration system
- Professional approach
- Data preservation
- Reusable framework
- Required for production relay
Timeline:
- Today: Deploy v11 with fresh start
- This week: Build migration framework
- Future: Use migrations for all schema changes
Next Steps
- ✅ Update schema to v11 (add event_json column)
- ✅ Update store_event() to populate event_json
- ✅ Update handle_req_message() to use event_json
- ✅ Test locally with 366-event query
- ✅ Deploy to production with fresh database
- ✅ Measure performance improvement
- ⏳ Build migration system for future use
Expected result: 366-event retrieval time drops from 18s to <10ms (2,500x speedup)