Building a Real-Time Claude AI Conversation Analytics Platform: Architecture Overview

How to capture, sync, and analyze AI coding assistant conversations with a modern TypeScript + Python data stack

Every day, I have dozens of conversations with Claude Code. Some are brilliant problem-solving sessions where we refactor complex systems together. Others are debugging marathons that finally end with that satisfying “aha!” moment. And a few are… well, let’s just say they teach me more about prompt engineering than I planned.

But here’s the thing: all those conversations disappear into the void. Or rather, they hide away in scattered JSONL files across my filesystem, never to be seen again.

I wanted to change that.

What if I could search through past conversations? What if I could see which tools Claude uses most often in my projects? What if I could understand my own AI-assisted development patterns?

This article introduces a platform I built to do exactly that. We’ll explore the architecture of a real-time sync system that captures Claude Code conversations and transforms them into searchable, analyzable data. By the end, you’ll understand how the pieces fit together and be ready to dive deeper into each component in the articles that follow.

The Problem: Invisible Conversations

Claude Code stores your conversation history as JSONL files in ~/.claude/projects/. Each project gets its own directory, and each session creates entries with timestamps, messages, and tool calls.

Here’s what a typical entry looks like:

{
  "type": "message",
  "sessionId": "abc123",
  "timestamp": "2024-12-15T10:30:00.000Z",
  "message": {
    "role": "assistant",
    "content": [
      {"type": "text", "text": "I'll help you refactor that function..."},
      {"type": "tool_use", "name": "Edit", "input": {...}}
    ]
  }
}

This format works great for Claude Code’s internal use, but it creates several challenges for developers who want to learn from their AI interactions:

No Search Capability: Want to find that conversation where Claude helped you set up a complex Docker configuration? Good luck grepping through nested JSON across hundreds of files.

No Analytics: How often does Claude use the Edit tool versus Write? Which projects generate the most conversation volume? These patterns remain hidden in raw log files.

No Historical Review: Debugging a production issue and need to remember how you solved something similar three months ago? You’ll be manually scanning through files trying to find it.

No Aggregation: Each project lives in isolation. You can’t query across your entire Claude Code usage history.

These limitations matter because understanding your AI-assisted workflows can make you more productive. Knowing which tools work best for certain tasks, seeing patterns in successful debugging sessions, or identifying where you spend the most time with AI assistance are all valuable insights for improving how you work.

The Solution: A Three-Component Architecture

To solve these problems, I designed a platform with three distinct components, each handling a specific responsibility:

+------------------+     +------------------+     +------------------+
|   Sync Service   |     |       UI         |     |    Analytics     |
|   (TypeScript)   |     |    (Next.js)     |     |    (Python)      |
+------------------+     +------------------+     +------------------+
|                  |     |                  |     |                  |
| - File watching  |     | - Browsing       |     | - ELT pipeline   |
| - SQLite buffer  |     | - Search         |     | - dbt models     |
| - MongoDB sync   |     | - Filtering      |     | - Dashboards     |
|                  |     |                  |     |                  |
+--------+---------+     +--------+---------+     +--------+---------+
         |                        |                        |
         v                        v                        v
    +--------------------------------------------+         |
    |                  MongoDB                   |<--------+
    +--------------------------------------------+

Sync Service handles the real-time ingestion. It watches JSONL files for changes, buffers entries in SQLite for resilience, and batch-syncs to MongoDB. Written in TypeScript for strong typing and excellent async file handling.

UI provides a Next.js application for browsing and searching conversations. Built with shadcn/ui components, it offers filtering by project, session, and date range, plus visualization of conversation patterns over time.

Analytics runs a Python-based ELT pipeline. It extracts data from MongoDB to Parquet files, loads them into DuckDB, and uses dbt to transform raw data through Bronze, Silver, and Gold layers. Metabase provides self-service dashboards.

Why three separate components instead of one monolith? Each serves a different access pattern:

Sync needs to run continuously, handling file events as they happen
UI needs to serve interactive queries with low latency
Analytics needs to run batch transformations on large datasets

Separating them means each can scale, deploy, and fail independently.

Complete Data Flow

Here’s the complete architecture, from JSONL files to analytics dashboards:

~/.claude/projects/**/*.jsonl
         |
         v
    +---------+
    | Watcher | (chokidar)
    +----+----+
         |
         v
    +---------+
    | SQLite  | (buffer.db)
    | Buffer  |
    +----+----+
         |
         v
    +---------+
    | MongoDB |
    +----+----+
         |
    +----+----+
    |         |
    v         v
+------+  +----------+
|  UI  |  | Analytics|
|(3000)|  | Extractor|
+------+  +----+-----+
               |
               v
          +---------+
          | DuckDB  | <- dbt (Bronze->Silver->Gold)
          +----+----+
               |
               v
          +----------+
          | Metabase |
          |  (3001)  |
          +----------+

Let me walk you through what happens when you have a conversation with Claude Code:

1. Claude Code writes entries to a JSONL file in ~/.claude/projects/[project-hash]/conversations.jsonl.

2. Chokidar detects the file change within milliseconds. The Watcher component tracks file positions using SQLite, so it only reads new lines, not the entire file.

3. New entries go into the SQLite buffer. This is the critical resilience layer. If MongoDB is down, entries wait safely in SQLite. If the process restarts, file positions are preserved.

4. Every 5 seconds, the sync worker pulls pending entries from SQLite and batch-inserts them into MongoDB. Failed syncs leave entries in pending state for retry.

5. MongoDB stores the canonical data, indexed for efficient queries by project, session, and timestamp. Both the UI and Analytics pipeline read from here.

6. The UI queries MongoDB directly for interactive browsing. Users can search, filter by date range, and drill into specific sessions.

7. The Analytics Extractor periodically pulls data from MongoDB and writes Parquet files with date-based partitioning. A high-water mark ensures incremental extraction.

8. DuckDB loads the Parquet files, providing a fast analytical query engine. dbt models transform raw data through the medallion architecture.

9. Metabase connects to DuckDB for self-service analytics and dashboards.

This flow provides real-time sync (seconds of latency) for the UI while enabling batch analytics workloads that can run on larger time windows.

Component Deep Dive

Let’s explore each component at a higher level. Subsequent articles in this series will provide implementation details.

Sync Service: The Real-Time Backbone

The sync service consists of three TypeScript classes with clear responsibilities:

// watcher.ts - Monitors JSONL files
export class Watcher {
  private watcher: FSWatcher | null = null;
  private buffer: Buffer;
  private processing = new Set<string>();  // Prevents concurrent file processing

  start(): void {
    this.watcher = chokidar.watch(`${this.watchDir}/**/*.jsonl`, {
      persistent: true,
      ignoreInitial: false,
      awaitWriteFinish: { stabilityThreshold: 300, pollInterval: 100 },
    });

    this.watcher.on('add', (filePath) => this.processFile(filePath));
    this.watcher.on('change', (filePath) => this.processFile(filePath));
  }
}

The Watcher uses chokidar to monitor all JSONL files recursively. The awaitWriteFinish option ensures we don’t read partial lines while Claude Code is still writing.

// buffer.ts - SQLite persistence layer
export class Buffer {
  private db: DatabaseType;
  private statements: {
    getPosition: Statement<[string]>;
    insertEntry: Statement<[string, string | null, string, string]>;
    getPending: Statement<[number]>;
    markSynced: Statement<[number]>;
    // ... more prepared statements
  };

  constructor(dbPath?: string) {
    this.db = new Database(dbPath);
    this.db.pragma('journal_mode = WAL');  // Better concurrent performance
    this.statements = this.prepareStatements();
  }
}

The Buffer uses SQLite with WAL mode for concurrent reads and writes. All queries are prepared at startup for performance. The buffer tracks two key things: file positions (where we left off reading each file) and pending entries (waiting to sync).

// sync.ts - MongoDB batch sync
export class MongoSync {
  async sync(): Promise<number> {
    const pending = this.buffer.getPendingEntries(this.batchSize);
    if (pending.length === 0) return 0;

    const docs = pending.map((row) => ({
      ...JSON.parse(row.entry_json),
      projectId: row.project_id,
      ingestedAt: new Date(),
    }));

    // Unordered insert allows partial success on duplicates
    await this.collection.insertMany(docs, { ordered: false });
    this.buffer.markAsSynced(pending.map((r) => r.id));
    return docs.length;
  }
}

The MongoSync class pulls pending entries in batches and inserts them to MongoDB. Using ordered: false means a duplicate key error won’t stop the entire batch, only the specific duplicate. This is essential for idempotent retries.

UI: Conversation Browser

The UI is a Next.js 14 application using shadcn/ui components. It connects directly to MongoDB for low-latency queries.

// page.tsx - Main conversation viewer
function ConversationViewer() {
  const { data, fetchNextPage, hasNextPage, isLoading } = useConversations({
    projectId,
    sessionId,
    search,
    startDate,
    endDate,
    sortOrder,
  });

  return (
    <>
      <FilterPanel onExport={handleExport} />
      <SessionChart
        projectId={projectId}
        onBarClick={handleChartBarClick}
      />
      <ConversationList
        conversations={conversations}
        hasMore={hasNextPage}
        onLoadMore={() => fetchNextPage()}
      />
    </>
  );
}

Key features include:

Infinite scroll with React Query for efficient pagination
Project and session filtering to narrow down results
Date range selection with chart-based filtering
Full-text search across conversation content
Export functionality for offline analysis

The session chart provides a visual timeline of conversation activity, and clicking a bar filters the list to that time period.

Analytics Pipeline: From Raw to Insights

The analytics component uses a modern Python data stack:

MongoDB -> Extractor -> Parquet Files -> DuckDB -> dbt -> Metabase

Extractor (Python with PyMongo):

class MongoExtractor:
    def extract(self, full_backfill: bool = False) -> list[Path]:
        # Track where we left off
        since = None if full_backfill else self.high_water_mark.get()

        for doc in self._fetch_documents(since=since):
            record = self.transformer.transform(doc, extracted_at)
            records_by_date[date_key].append(record)

        # Write date-partitioned Parquet files
        for date_key, records in records_by_date.items():
            self._write_partition(records, partition_date, output_dir)

The extractor pulls from MongoDB, transforms nested message structures into flat records, and writes Parquet files partitioned by date. A high-water mark file tracks the last extraction timestamp for incremental runs.

Loader (DuckDB):

class DuckDBLoader:
    def load_from_parquet(self, parquet_path: Path) -> int:
        # Upsert with conflict handling
        self.conn.execute("""
            INSERT INTO raw.conversations
            SELECT * FROM read_parquet('{path}', hive_partitioning=true)
            ON CONFLICT (_id) DO UPDATE SET ...
        """)

DuckDB’s native Parquet reader with Hive partitioning support makes loading efficient. The upsert pattern handles re-extraction gracefully.

dbt Models (Medallion Architecture):

staging/           <- Bronze: Clean source data
  stg_conversations.sql
  stg_messages.sql
  stg_tool_calls.sql

intermediate/      <- Silver: Enriched entities
  int_messages_enriched.sql
  int_sessions_computed.sql
  int_tool_usage.sql

marts/             <- Gold: Analytics-ready
  dim_projects.sql
  dim_sessions.sql
  fct_messages.sql
  fct_tool_calls.sql
  agg_daily_summary.sql

The staging layer cleans and types the raw data. Intermediate models enrich entities with computed fields. Marts provide fact and dimension tables optimized for BI tools.

Prefect Orchestration:

@flow(name="claude-analytics-pipeline")
def analytics_pipeline(
    full_backfill: bool = False,
    full_refresh: bool = False,
) -> dict:
    # Step 1: Extract from MongoDB
    extraction_stats = extract_task(full_backfill=full_backfill)

    # Step 2: Load into DuckDB
    load_stats = load_task(extraction_stats, full_refresh=full_refresh)

    # Step 3: Run dbt transformations
    transform_stats = transform_task(load_stats, full_refresh=full_refresh)

    return {"extraction": extraction_stats, "load": load_stats, "transform": transform_stats}

Prefect orchestrates the ELT pipeline with retries, logging, and scheduling. You can run ad-hoc backfills or schedule hourly incremental runs.

Key Design Decisions

Several design choices make this architecture robust:

SQLite Buffer with WAL Mode

Why buffer through SQLite instead of writing directly to MongoDB? Resilience. If MongoDB goes down, you don’t lose entries. If the sync process crashes, file positions are preserved. When everything comes back up, pending entries sync automatically.

WAL (Write-Ahead Logging) mode enables concurrent reads and writes without blocking. The watcher can insert new entries while the sync worker reads pending ones.

Processing Lock for File Events

private processing = new Set<string>();

private async processFile(filePath: string): Promise<void> {
  if (this.processing.has(filePath)) return;
  this.processing.add(filePath);
  try {
    // ... process file
  } finally {
    this.processing.delete(filePath);
  }
}

File watchers can fire multiple events rapidly. This Set prevents concurrent processing of the same file, avoiding race conditions where two handlers might read overlapping content.

Unordered MongoDB Inserts

await this.collection.insertMany(docs, { ordered: false });

With ordered: false, a duplicate key error on one document doesn’t fail the entire batch. This is crucial for idempotent retries. If a batch partially succeeds and then fails, re-running it will skip the duplicates and insert the rest.

High Water Mark for Incremental Extraction

class HighWaterMark:
    def get(self) -> datetime | None:
        # Read last extraction timestamp from file

    def set(self, timestamp: datetime) -> None:
        # Update after successful extraction

Instead of re-extracting everything, the analytics pipeline tracks the last successfully extracted timestamp. Each run only pulls documents newer than the high water mark, making hourly incremental runs efficient.

Medallion Architecture for dbt

Organizing dbt models into Bronze (staging), Silver (intermediate), and Gold (marts) layers provides several benefits:

Testability: Each layer has specific data quality expectations
Debuggability: You can query intermediate results when something breaks
Reusability: Silver layer entities feed multiple Gold layer aggregations
Performance: Materializing intermediate results speeds up downstream models

Deployment Strategy

The platform uses different deployment approaches for each component:

Sync Service: PM2

# Install PM2 globally
npm install -g pm2

# Start the sync service
pm2 start ecosystem.config.js

# Useful commands
pm2 status
pm2 logs claude-mongo-sync
pm2 restart claude-mongo-sync

# Auto-start on boot
pm2 startup
pm2 save

PM2 provides process management, automatic restarts on failure, and log aggregation. For production Linux servers, a systemd service file is also available.

Analytics: Docker Compose

cd analytics
make up       # Start Prefect + Metabase
make deploy   # Deploy pipeline flows
make run-backfill  # Initial full extraction

Docker Compose orchestrates the analytics services, including the Prefect server, worker, and Metabase. Volumes persist DuckDB data and Metabase configuration.

Port Assignments

Service	Port	Purpose
Sync Health	9090	Health check endpoint
UI	3000	Next.js application
Metabase	3001	Analytics dashboards
Prefect UI	4200	Pipeline orchestration
dbt Docs	8080	Data model documentation

The health endpoint at localhost:9090/health returns sync status:

{
  "status": "ok",
  "pending": 0,
  "synced": 1523,
  "lastSyncAt": "2024-12-15T10:30:00.000Z",
  "mongoConnected": true,
  "uptime": 3600
}

What’s Next: The Series Roadmap

This article provided the architecture overview. The next six articles will dive deep into each component:

Article 2: Real-Time File Watching with Chokidar and SQLite Buffering We’ll implement the Watcher and Buffer classes from scratch, exploring file system events, SQLite prepared statements, and graceful error handling.

Article 3: Resilient MongoDB Sync with TypeScript Building the MongoSync class with connection management, batch processing, duplicate handling, and the health endpoint.

Article 4: Building a Conversation Browser with Next.js and shadcn/ui Creating the UI with infinite scroll, filtering, search, and interactive charts using React Query and Recharts.

Article 5: ELT Pipeline Design: MongoDB to DuckDB with Python Implementing the Extractor and Loader with PyMongo, PyArrow, and DuckDB. Covering high water mark tracking and Parquet partitioning.

Article 6: Medallion Architecture with dbt: From Raw to Analytics Designing dbt models across Bronze, Silver, and Gold layers. Writing data quality tests and generating documentation.

Article 7: Self-Service Analytics with Metabase Dashboards Connecting Metabase to DuckDB, building dashboards for conversation analytics, and deploying with Docker Compose.

Conclusion

Building an analytics platform for Claude AI conversations transforms scattered log files into searchable, analyzable data. The three-component architecture separates concerns: real-time sync handles ingestion, a Next.js UI enables browsing, and a Python pipeline powers analytics.

Key takeaways:

Resilience through buffering: SQLite bridges the gap between file events and MongoDB availability
Separation of concerns: Each component scales and fails independently
Modern data stack: dbt’s medallion architecture brings software engineering practices to data transformation

If you’re using Claude Code regularly, understanding your conversation patterns can make you more productive. This platform gives you the tools to do exactly that.

The complete source code is available on GitHub. In the next article, we’ll start building: implementing real-time file watching with chokidar and SQLite buffering.