Building Self-Service Analytics Dashboards with Metabase

The Final Chapter: Turning Your Data Warehouse into Actionable Insights

You have spent weeks building a modern analytics stack. Your Python extractors pull data from MongoDB flawlessly. DuckDB loads and stores everything with blazing speed. Your dbt models transform raw chaos into a pristine star schema. But here is the uncomfortable truth: none of that matters if your stakeholders cannot actually see the data.

This is the final mile problem in analytics. We obsess over extraction patterns and transformation logic, yet the visualization layer often becomes an afterthought. It should not be.

This is Article 7 of our series on building a complete analytics platform for Claude AI conversation logs. In previous articles, we covered the Python extraction layer (Article 4), DuckDB loading strategies (Article 5), and dbt transformations using the medallion architecture (Article 6). Now we complete the journey by connecting Metabase to our gold layer models and building self-service dashboards that anyone can explore.

Why Metabase? Three reasons. First, it is open-source and self-hostable, which means no vendor lock-in and full control over your data. Second, it has native DuckDB support through a community driver, eliminating the need for intermediate services. Third, its visual query builder empowers business users to explore data without writing SQL, while still offering full SQL access for power users.

By the end of this article, you will have three production-ready dashboards: Developer Productivity, AI Interaction Patterns, and Project Insights. More importantly, you will understand how to design self-service analytics that scale with your organization.

Let us finish what we started.

Metabase Architecture in Our Stack

Before diving into dashboard creation, we need to understand how Metabase fits into our existing Docker Compose infrastructure. The analytics platform runs several interconnected services, and Metabase serves as the presentation layer that queries our DuckDB warehouse.

The key insight here is that Metabase queries DuckDB directly through a shared Docker volume. There is no intermediate API layer or data copying. When a user views a dashboard, Metabase executes SQL against the same analytics.db file that dbt populates during transformation runs.

The Custom Dockerfile

Metabase does not ship with DuckDB support out of the box. We need a custom image that includes the community DuckDB driver. Here is our Dockerfile:

# Custom Metabase image with DuckDB support
# Based on MotherDuck's recommended approach for glibc compatibility

FROM eclipse-temurin:21-jre-jammy

# Build arguments for versions
ARG METABASE_VERSION=0.56.9
ARG METABASE_DUCKDB_DRIVER_VERSION=1.4.3.0

ENV MB_PLUGINS_DIR=/home/metabase/plugins/

# Create metabase user
RUN groupadd -r metabase && useradd -r -g metabase metabase

# Install CA certificates for HTTPS downloads and curl for health checks
RUN apt-get update && apt-get install -y \
    ca-certificates \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Create directories for plugins and data
RUN mkdir -p /home/metabase/plugins /home/metabase/data /duckdb && \
    chown -R metabase:metabase /home/metabase /duckdb

WORKDIR /home/metabase

# Download Metabase JAR
ADD --chown=metabase:metabase \
    https://downloads.metabase.com/v${METABASE_VERSION}/metabase.jar \
    /home/metabase/

# Download DuckDB driver
ADD --chown=metabase:metabase \
    https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/download/${METABASE_DUCKDB_DRIVER_VERSION}/duckdb.metabase-driver.jar \
    /home/metabase/plugins/

# Ensure proper file permissions
RUN chmod 755 /home/metabase/metabase.jar && \
    chmod 755 /home/metabase/plugins/duckdb.metabase-driver.jar

EXPOSE 3000

USER metabase

CMD ["java", "-jar", "/home/metabase/metabase.jar"]

A few important decisions in this Dockerfile deserve explanation:

Eclipse Temurin base image. We use eclipse-temurin:21-jre-jammy instead of Alpine because the DuckDB driver requires glibc compatibility. Alpine uses musl libc, which causes cryptic runtime errors with DuckDB’s native extensions.

Version pinning. Both METABASE_VERSION and METABASE_DUCKDB_DRIVER_VERSION are pinned as build arguments. This ensures reproducible builds and allows easy upgrades by changing a single value.

Non-root user. The container runs as the metabase user rather than root, following security best practices for production deployments.

Docker Compose Integration

The Metabase service definition in our Docker Compose file handles the integration:

metabase:
  build:
    context: ./metabase
    dockerfile: Dockerfile
  container_name: metabase
  environment:
    # PostgreSQL for Metabase metadata
    - MB_DB_TYPE=postgres
    - MB_DB_DBNAME=metabase
    - MB_DB_PORT=5432
    - MB_DB_USER=metabase
    - MB_DB_PASS=metabase
    - MB_DB_HOST=metabase-db
    - MB_EMOJI_IN_LOGS=false
  ports:
    - "3001:3000"
  volumes:
    # DuckDB needs write access for lock files even when reading
    - duckdb-data:/duckdb
  depends_on:
    metabase-db:
      condition: service_healthy
  healthcheck:
    test: ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"]
    interval: 30s
    timeout: 10s
    retries: 5
    start_period: 120s
  networks:
    - analytics-network

The port mapping 3001:3000 exposes Metabase on port 3001 externally while it runs on 3000 internally. This avoids conflicts with other services that might use port 3000, such as Next.js development servers.

The duckdb-data volume mount is critical. This shared volume allows both the analytics worker (which runs dbt) and Metabase to access the same DuckDB database file. Note that DuckDB requires write access even for read operations because it creates lock files.

Setting Up the DuckDB Connection

With Metabase running, the first step is connecting it to our DuckDB warehouse. Start the analytics stack if you have not already:

cd analytics
make up

After the containers finish initializing (Metabase takes about 2 minutes on first startup), navigate to http://localhost:3001 in your browser. Complete the initial setup wizard by creating an admin account.

Adding DuckDB as a Data Source

From the Metabase admin panel, navigate to Admin Settings > Databases > Add Database. Select DuckDB as the database type (it appears in the list because we installed the driver in our custom image).

Configure the connection with these settings:

Setting	Value	Notes
Database type	DuckDB	Community driver
Display name	Claude Analytics	User-friendly name
Database file path	`/duckdb/analytics.db`	Path inside container

Click Save and Metabase will scan the database schema. Within a few seconds, you should see all the tables from our dbt project appear in the data browser.

Schema Discovery

Metabase automatically discovers three schemas in our DuckDB database:

raw: Source tables loaded by the Python loader
staging: Bronze layer models (cleaned source data)
intermediate: Silver layer models (enriched and joined)
marts: Gold layer models (star schema for BI)

For dashboards, we focus exclusively on the marts schema. These tables are specifically designed for analytics consumption with pre-aggregated metrics and denormalized dimensions.

Understanding the Gold Layer Models

Before building dashboards, let us examine what data is available in the gold layer. Our dbt project creates a classic star schema with dimension tables, fact tables, and pre-aggregated summary tables.

Key Tables for Analytics

Dimension Tables:

dim_date: Calendar dimension with temporal attributes (year, month, day of week, weekend flags)
dim_sessions: Session-level attributes including duration, message counts, and project association
dim_projects: Project dimension with activity metrics and status classification
dim_tools: Tool catalog with categories and popularity rankings

Fact Tables:

fct_messages: Message-level analytics with content analysis (code blocks, questions, length)
fct_tool_calls: Individual tool invocations with execution timing
fct_file_operations: File read/write/edit operations with path analysis

Aggregate Tables:

agg_daily_summary: Pre-computed daily metrics for time-series dashboards
agg_hourly_activity: Activity heatmap data by hour and day of week
agg_session_metrics: Session-level statistics and percentiles
agg_tool_efficiency: Tool performance and usage rankings

The aggregate tables deserve special attention. By pre-computing metrics during dbt runs, we shift computational work from query time to transformation time. This means dashboards load instantly even with millions of underlying records.

Consider the agg_daily_summary model. Instead of calculating session counts on every dashboard refresh, dbt computes these values once:

-- From agg_daily_summary.sql
daily_sessions as (
    select
        date_key,
        count(*) as session_count,
        count(distinct project_id) as active_projects,
        sum(duration_minutes) as total_session_minutes,
        avg(duration_minutes) as avg_session_minutes,
        sum(message_count) as total_messages,
        avg(message_count) as avg_messages_per_session,
        sum(tool_call_count) as total_tool_calls
    from sessions
    group by date_key
)

When Metabase queries this table, it simply reads pre-aggregated rows rather than scanning millions of session records.

Building the Sample Dashboards

Our analytics platform includes a sample_questions.json file that defines 12 pre-configured questions across three dashboards. This file serves as both documentation and a template for recreating these visualizations in Metabase.

Dashboard Structure

{
  "dashboards": [
    {
      "name": "Developer Productivity",
      "description": "Session duration, activity patterns, and productivity metrics",
      "questions": [
        "Sessions Over Time",
        "Average Session Duration",
        "Active Hours Heatmap",
        "Task Category Distribution"
      ]
    },
    {
      "name": "AI Interaction Patterns",
      "description": "Tool usage, efficiency, and interaction analysis",
      "questions": [
        "Tool Usage Distribution",
        "Tool Category Breakdown",
        "Top 10 Most Used Tools",
        "Response Time Trends"
      ]
    },
    {
      "name": "Project Insights",
      "description": "Project activity, code changes, and file operations",
      "questions": [
        "Project Activity Ranking",
        "Project Status Distribution",
        "Daily Code Changes",
        "File Types Worked On"
      ]
    }
  ]
}

Let us build each dashboard step by step.

Dashboard 1: Developer Productivity

This dashboard answers the question: “How am I using Claude AI, and when am I most productive?”

Sessions Over Time (Line Chart)

SELECT date_key, session_count
FROM marts.agg_daily_summary
ORDER BY date_key

This simple query powers a trend line showing daily session counts. Spikes indicate intense development periods, while dips might correlate with meetings or context switches.

Active Hours Heatmap (Pivot Table)

SELECT hour_of_day, day_name, total_activity
FROM marts.agg_hourly_activity

Configure this as a pivot table with day_name on rows, hour_of_day on columns, and total_activity as the cell value. The result is a heatmap showing when you are most actively coding with Claude. Many developers discover surprising patterns, like unexpected productivity during early morning hours.

Average Session Duration (Scalar)

SELECT AVG(duration_minutes) as avg_duration
FROM marts.dim_sessions

Display this as a single number card. Track this metric over time to understand whether your Claude sessions are becoming more focused (shorter) or more exploratory (longer).

Task Category Distribution (Pie Chart)

SELECT task_category, COUNT(*) as count
FROM marts.fct_messages
GROUP BY task_category
ORDER BY count DESC

This visualization breaks down your work into categories: debugging, feature development, refactoring, documentation, and others. It provides insight into how you allocate your coding time.

Dashboard 2: AI Interaction Patterns

This dashboard focuses on how you interact with Claude’s tools and capabilities.

Tool Usage Distribution (Pie Chart)

SELECT tool_name, total_calls
FROM marts.agg_tool_efficiency
ORDER BY total_calls DESC

See at a glance which tools dominate your workflow. Is it file reading? Code editing? Shell commands? The distribution often reveals opportunities to expand your usage of underutilized capabilities.

Tool Category Breakdown (Bar Chart)

SELECT tool_category, SUM(total_calls) as calls
FROM marts.agg_tool_efficiency
GROUP BY tool_category
ORDER BY calls DESC

Aggregate tools into categories (file operations, search, shell, web) for a higher-level view of interaction patterns.

Top 10 Most Used Tools (Row Chart)

SELECT tool_name, total_calls, sessions_used
FROM marts.agg_tool_efficiency
ORDER BY popularity_rank
LIMIT 10

A ranked horizontal bar chart makes it easy to compare your most-used tools. The sessions_used column shows breadth of usage across different coding sessions.

Response Time Trends (Line Chart)

SELECT date_key, AVG(avg_response_time_seconds) as avg_response
FROM marts.dim_sessions
GROUP BY date_key
ORDER BY date_key

Track Claude’s response latency over time. Sudden increases might indicate complex queries or infrastructure issues.

Dashboard 3: Project Insights

This dashboard provides visibility into project-level activity and code changes.

Project Activity Ranking (Bar Chart)

SELECT project_id, total_messages, session_count
FROM marts.dim_projects
ORDER BY total_messages DESC
LIMIT 15

See which projects consume the most of your Claude interactions. This helps prioritize documentation and knowledge transfer for heavily-used codebases.

Project Status Distribution (Pie Chart)

SELECT activity_status, COUNT(*) as project_count
FROM marts.dim_projects
GROUP BY activity_status

Projects are classified as active, dormant, or archived based on recent activity. This visualization shows the health of your project portfolio.

Daily Code Changes (Area Chart)

SELECT date_key, file_reads, file_writes, file_edits
FROM marts.agg_daily_summary
WHERE has_activity = true
ORDER BY date_key

Stacked area chart showing the volume of file operations over time. Large spikes in writes often correlate with feature implementations, while edit-heavy periods suggest refactoring work.

File Types Worked On (Pie Chart)

SELECT aggregation_key as file_type, total_operations
FROM marts.agg_code_changes
WHERE aggregation_level = 'by_file_type'
ORDER BY total_operations DESC

Distribution of file operations by extension (.ts, .py, .sql, etc.). This reveals your technology stack focus over time.

Self-Service Features for End Users

The true power of Metabase lies in its self-service capabilities. Once the dashboards are set up, team members can explore data without SQL knowledge or engineering support.

Visual Query Builder

Metabase’s visual query builder allows users to create questions through a point-and-click interface. Users select a table, choose columns, add filters, and pick a visualization type. Behind the scenes, Metabase generates optimized SQL.

For example, a product manager could:

Select the dim_projects table
Choose project_id and total_messages columns
Add a filter for activity_status = 'active'
Sort by total_messages descending
Visualize as a bar chart

No SQL required.

Interactive Filters

Add dashboard-level filters to enable dynamic exploration:

Date Range Filter

Field: date_key
Default: Last 30 days
Allows users to analyze specific time periods

Project Filter

Field: project_id
Type: Multi-select dropdown
Filter all dashboard cards simultaneously

Activity Level Filter

Field: activity_level
Values: minimal, light, moderate, heavy
Focus on high-activity or low-activity periods

These filters connect to all relevant questions on a dashboard, providing a unified exploration experience.

Metabase offers several ways to share insights:

Public links: Generate shareable URLs for specific questions or dashboards
Embedding: Embed dashboards in internal tools or documentation
Subscriptions: Schedule email delivery of dashboard snapshots
Slack integration: Post dashboard updates to team channels

For our Claude analytics use case, consider setting up a weekly email subscription that summarizes productivity metrics for each team member.

Permissions Model

Metabase supports granular permissions at the database, schema, and table levels. For a self-service analytics deployment, consider this structure:

Admins: Full access to all data and Metabase configuration
Analysts: Query access to all schemas, can create questions and dashboards
Viewers: Can view existing dashboards and filter data, but cannot create new questions

This prevents accidental exposure of sensitive data while enabling broad access to insights.

Deployment and Operations

Running Metabase in production requires attention to reliability, performance, and data freshness.

Starting the Stack

Launch the complete analytics platform with a single command:

cd analytics
make up

This starts Prefect (orchestration), the analytics worker, PostgreSQL (Metabase metadata), and Metabase itself. Verify all services are healthy:

docker-compose -f docker-compose.analytics.yml ps

You should see all containers in a healthy or running state.

Health Monitoring

Metabase exposes a health endpoint for monitoring:

curl http://localhost:3001/api/health

A healthy response returns {"status":"ok"}. Integrate this with your monitoring system (Prometheus, Datadog, etc.) to alert on Metabase availability issues.

Backup and Restore

Two data stores require backup consideration:

PostgreSQL Metadata Database

Metabase stores all questions, dashboards, user accounts, and configuration in PostgreSQL. Back up this database regularly:

docker exec metabase-db pg_dump -U metabase metabase > metabase_backup.sql

Restore with:

cat metabase_backup.sql | docker exec -i metabase-db psql -U metabase metabase

DuckDB Analytics Database

The DuckDB file contains your actual analytics data. Since dbt can regenerate this from source data, full backups are less critical. However, for faster recovery, periodically copy the database file:

docker cp analytics-worker:/duckdb/analytics.db ./analytics_backup.db

Performance Tuning

Several strategies improve Metabase performance with DuckDB:

Question Caching

Enable query caching in Metabase settings. For stable historical data, cache results for 1-6 hours. This dramatically reduces DuckDB query load.

Refresh Schedules

Configure appropriate refresh intervals based on data freshness requirements:

Dashboard Type	Refresh Interval	Rationale
Real-time activity	1 hour	Balance freshness with load
Historical trends	6 hours	Data changes slowly
Aggregates	Daily at 3 AM	After nightly dbt runs

Pre-aggregation

Our dbt models already implement pre-aggregation (the agg_* tables). If you add new questions that perform expensive aggregations, consider adding corresponding dbt models rather than computing on the fly.

Series Conclusion: The Complete Data Journey

We have reached the end of our seven-article journey. Let us step back and appreciate what we have built together.

Article 1-3 covered the sync service that watches Claude’s JSONL log files, buffers changes in SQLite, and syncs to MongoDB. This gave us durable, queryable storage for conversation history.

Article 4 introduced the Python extraction layer using Prefect for orchestration. We built incremental extraction logic that efficiently pulls new data from MongoDB without full table scans.

Article 5 explored DuckDB as our analytics warehouse. We implemented a loader that converts Parquet files into queryable tables, leveraging DuckDB’s columnar storage for analytics performance.

Article 6 was our deepest dive, covering dbt transformations using the medallion architecture. Bronze, silver, and gold layers progressively refined raw data into analytics-ready star schemas.

Article 7 (this article) completed the stack with Metabase dashboards. We connected to DuckDB, built three comprehensive dashboards, and explored self-service features that empower non-technical users.

What We Built

The complete platform provides:

Real-time data capture from Claude AI sessions
Durable storage in MongoDB for operational queries
Columnar analytics in DuckDB for fast aggregations
Semantic modeling through dbt transformations
Self-service visualization via Metabase dashboards
Orchestrated pipelines with Prefect scheduling

All of this runs locally in Docker containers, requiring no cloud services or external dependencies.

Future Enhancements

Several directions could extend this platform:

Alerting: Configure Metabase alerts when metrics exceed thresholds (e.g., session duration drops significantly)
Semantic Layer: Add a metrics layer using dbt Semantic Layer or Cube.js for consistent metric definitions
Machine Learning: Export data to train models that predict task complexity or suggest optimal tools
Multi-user Analytics: Track metrics across a development team rather than individual usage
Cost Attribution: Correlate Claude usage with API costs for budgeting and optimization

Final Thoughts

Building an analytics platform is an investment. Each layer required careful design decisions, from the choice of SQLite for buffering to DuckDB for warehousing to Metabase for visualization. But the payoff is substantial: complete visibility into how you use AI-assisted development tools.

More importantly, this architecture is not specific to Claude. The patterns we explored, which include real-time sync, medallion architecture, and self-service BI, apply to any analytics use case. Swap MongoDB for PostgreSQL, change the dbt models, and you have an analytics platform for a SaaS product, an IoT system, or an e-commerce store.

I hope this series has demystified the modern data stack. The tools are mature, the patterns are proven, and the barriers to entry have never been lower. Build something great.

Suggested Tags: Metabase, DuckDB, Data Analytics, Business Intelligence, Data Engineering

This is Article 7 of 7 in the “Building a Modern Analytics Platform” series. Read the complete series to learn how to build production-ready data infrastructure from scratch.