Building Self-Service Analytics Dashboards with Metabase
The Final Chapter: Turning Your Data Warehouse into Actionable Insights
You have spent weeks building a modern analytics stack. Your Python extractors pull data from MongoDB flawlessly. DuckDB loads and stores everything with blazing speed. Your dbt models transform raw chaos into a pristine star schema. But here is the uncomfortable truth: none of that matters if your stakeholders cannot actually see the data.
This is the final mile problem in analytics. We obsess over extraction patterns and transformation logic, yet the visualization layer often becomes an afterthought. It should not be.
This is Article 7 of our series on building a complete analytics platform for Claude AI conversation logs. In previous articles, we covered the Python extraction layer (Article 4), DuckDB loading strategies (Article 5), and dbt transformations using the medallion architecture (Article 6). Now we complete the journey by connecting Metabase to our gold layer models and building self-service dashboards that anyone can explore.
Why Metabase? Three reasons. First, it is open-source and self-hostable, which means no vendor lock-in and full control over your data. Second, it has native DuckDB support through a community driver, eliminating the need for intermediate services. Third, its visual query builder empowers business users to explore data without writing SQL, while still offering full SQL access for power users.
By the end of this article, you will have three production-ready dashboards: Developer Productivity, AI Interaction Patterns, and Project Insights. More importantly, you will understand how to design self-service analytics that scale with your organization.
Let us finish what we started.
Metabase Architecture in Our Stack
Before diving into dashboard creation, we need to understand how Metabase fits into our existing Docker Compose infrastructure. The analytics platform runs several interconnected services, and Metabase serves as the presentation layer that queries our DuckDB warehouse.
The key insight here is that Metabase queries DuckDB directly through a shared Docker volume. There is no intermediate API layer or data copying. When a user views a dashboard, Metabase executes SQL against the same analytics.db file that dbt populates during transformation runs.
The Custom Dockerfile
Metabase does not ship with DuckDB support out of the box. We need a custom image that includes the community DuckDB driver. Here is our Dockerfile:
# Custom Metabase image with DuckDB support# Based on MotherDuck's recommended approach for glibc compatibility
FROM eclipse-temurin:21-jre-jammy
# Build arguments for versionsARG METABASE_VERSION=0.56.9ARG METABASE_DUCKDB_DRIVER_VERSION=1.4.3.0
ENV MB_PLUGINS_DIR=/home/metabase/plugins/
# Create metabase userRUN groupadd -r metabase && useradd -r -g metabase metabase
# Install CA certificates for HTTPS downloads and curl for health checksRUN apt-get update && apt-get install -y \ ca-certificates \ curl \ && rm -rf /var/lib/apt/lists/*
# Create directories for plugins and dataRUN mkdir -p /home/metabase/plugins /home/metabase/data /duckdb && \ chown -R metabase:metabase /home/metabase /duckdb
WORKDIR /home/metabase
# Download Metabase JARADD --chown=metabase:metabase \ https://downloads.metabase.com/v${METABASE_VERSION}/metabase.jar \ /home/metabase/
# Download DuckDB driverADD --chown=metabase:metabase \ https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/download/${METABASE_DUCKDB_DRIVER_VERSION}/duckdb.metabase-driver.jar \ /home/metabase/plugins/
# Ensure proper file permissionsRUN chmod 755 /home/metabase/metabase.jar && \ chmod 755 /home/metabase/plugins/duckdb.metabase-driver.jar
EXPOSE 3000
USER metabase
CMD ["java", "-jar", "/home/metabase/metabase.jar"]A few important decisions in this Dockerfile deserve explanation:
Eclipse Temurin base image. We use eclipse-temurin:21-jre-jammy instead of Alpine because the DuckDB driver requires glibc compatibility. Alpine uses musl libc, which causes cryptic runtime errors with DuckDB’s native extensions.
Version pinning. Both METABASE_VERSION and METABASE_DUCKDB_DRIVER_VERSION are pinned as build arguments. This ensures reproducible builds and allows easy upgrades by changing a single value.
Non-root user. The container runs as the metabase user rather than root, following security best practices for production deployments.
Docker Compose Integration
The Metabase service definition in our Docker Compose file handles the integration:
metabase: build: context: ./metabase dockerfile: Dockerfile container_name: metabase environment: # PostgreSQL for Metabase metadata - MB_DB_TYPE=postgres - MB_DB_DBNAME=metabase - MB_DB_PORT=5432 - MB_DB_USER=metabase - MB_DB_PASS=metabase - MB_DB_HOST=metabase-db - MB_EMOJI_IN_LOGS=false ports: - "3001:3000" volumes: # DuckDB needs write access for lock files even when reading - duckdb-data:/duckdb depends_on: metabase-db: condition: service_healthy healthcheck: test: ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"] interval: 30s timeout: 10s retries: 5 start_period: 120s networks: - analytics-networkThe port mapping 3001:3000 exposes Metabase on port 3001 externally while it runs on 3000 internally. This avoids conflicts with other services that might use port 3000, such as Next.js development servers.
The duckdb-data volume mount is critical. This shared volume allows both the analytics worker (which runs dbt) and Metabase to access the same DuckDB database file. Note that DuckDB requires write access even for read operations because it creates lock files.
Setting Up the DuckDB Connection
With Metabase running, the first step is connecting it to our DuckDB warehouse. Start the analytics stack if you have not already:
cd analyticsmake upAfter the containers finish initializing (Metabase takes about 2 minutes on first startup), navigate to http://localhost:3001 in your browser. Complete the initial setup wizard by creating an admin account.
Adding DuckDB as a Data Source
From the Metabase admin panel, navigate to Admin Settings > Databases > Add Database. Select DuckDB as the database type (it appears in the list because we installed the driver in our custom image).
Configure the connection with these settings:
| Setting | Value | Notes |
|---|---|---|
| Database type | DuckDB | Community driver |
| Display name | Claude Analytics | User-friendly name |
| Database file path | /duckdb/analytics.db | Path inside container |
Click Save and Metabase will scan the database schema. Within a few seconds, you should see all the tables from our dbt project appear in the data browser.
Schema Discovery
Metabase automatically discovers three schemas in our DuckDB database:
- raw: Source tables loaded by the Python loader
- staging: Bronze layer models (cleaned source data)
- intermediate: Silver layer models (enriched and joined)
- marts: Gold layer models (star schema for BI)
For dashboards, we focus exclusively on the marts schema. These tables are specifically designed for analytics consumption with pre-aggregated metrics and denormalized dimensions.
Understanding the Gold Layer Models
Before building dashboards, let us examine what data is available in the gold layer. Our dbt project creates a classic star schema with dimension tables, fact tables, and pre-aggregated summary tables.
Key Tables for Analytics
Dimension Tables:
dim_date: Calendar dimension with temporal attributes (year, month, day of week, weekend flags)dim_sessions: Session-level attributes including duration, message counts, and project associationdim_projects: Project dimension with activity metrics and status classificationdim_tools: Tool catalog with categories and popularity rankings
Fact Tables:
fct_messages: Message-level analytics with content analysis (code blocks, questions, length)fct_tool_calls: Individual tool invocations with execution timingfct_file_operations: File read/write/edit operations with path analysis
Aggregate Tables:
agg_daily_summary: Pre-computed daily metrics for time-series dashboardsagg_hourly_activity: Activity heatmap data by hour and day of weekagg_session_metrics: Session-level statistics and percentilesagg_tool_efficiency: Tool performance and usage rankings
The aggregate tables deserve special attention. By pre-computing metrics during dbt runs, we shift computational work from query time to transformation time. This means dashboards load instantly even with millions of underlying records.
Consider the agg_daily_summary model. Instead of calculating session counts on every dashboard refresh, dbt computes these values once:
-- From agg_daily_summary.sqldaily_sessions as ( select date_key, count(*) as session_count, count(distinct project_id) as active_projects, sum(duration_minutes) as total_session_minutes, avg(duration_minutes) as avg_session_minutes, sum(message_count) as total_messages, avg(message_count) as avg_messages_per_session, sum(tool_call_count) as total_tool_calls from sessions group by date_key)When Metabase queries this table, it simply reads pre-aggregated rows rather than scanning millions of session records.
Building the Sample Dashboards
Our analytics platform includes a sample_questions.json file that defines 12 pre-configured questions across three dashboards. This file serves as both documentation and a template for recreating these visualizations in Metabase.
Dashboard Structure
{ "dashboards": [ { "name": "Developer Productivity", "description": "Session duration, activity patterns, and productivity metrics", "questions": [ "Sessions Over Time", "Average Session Duration", "Active Hours Heatmap", "Task Category Distribution" ] }, { "name": "AI Interaction Patterns", "description": "Tool usage, efficiency, and interaction analysis", "questions": [ "Tool Usage Distribution", "Tool Category Breakdown", "Top 10 Most Used Tools", "Response Time Trends" ] }, { "name": "Project Insights", "description": "Project activity, code changes, and file operations", "questions": [ "Project Activity Ranking", "Project Status Distribution", "Daily Code Changes", "File Types Worked On" ] } ]}Let us build each dashboard step by step.
Dashboard 1: Developer Productivity
This dashboard answers the question: “How am I using Claude AI, and when am I most productive?”
Sessions Over Time (Line Chart)
SELECT date_key, session_countFROM marts.agg_daily_summaryORDER BY date_keyThis simple query powers a trend line showing daily session counts. Spikes indicate intense development periods, while dips might correlate with meetings or context switches.
Active Hours Heatmap (Pivot Table)
SELECT hour_of_day, day_name, total_activityFROM marts.agg_hourly_activityConfigure this as a pivot table with day_name on rows, hour_of_day on columns, and total_activity as the cell value. The result is a heatmap showing when you are most actively coding with Claude. Many developers discover surprising patterns, like unexpected productivity during early morning hours.
Average Session Duration (Scalar)
SELECT AVG(duration_minutes) as avg_durationFROM marts.dim_sessionsDisplay this as a single number card. Track this metric over time to understand whether your Claude sessions are becoming more focused (shorter) or more exploratory (longer).
Task Category Distribution (Pie Chart)
SELECT task_category, COUNT(*) as countFROM marts.fct_messagesGROUP BY task_categoryORDER BY count DESCThis visualization breaks down your work into categories: debugging, feature development, refactoring, documentation, and others. It provides insight into how you allocate your coding time.
Dashboard 2: AI Interaction Patterns
This dashboard focuses on how you interact with Claude’s tools and capabilities.
Tool Usage Distribution (Pie Chart)
SELECT tool_name, total_callsFROM marts.agg_tool_efficiencyORDER BY total_calls DESCSee at a glance which tools dominate your workflow. Is it file reading? Code editing? Shell commands? The distribution often reveals opportunities to expand your usage of underutilized capabilities.
Tool Category Breakdown (Bar Chart)
SELECT tool_category, SUM(total_calls) as callsFROM marts.agg_tool_efficiencyGROUP BY tool_categoryORDER BY calls DESCAggregate tools into categories (file operations, search, shell, web) for a higher-level view of interaction patterns.
Top 10 Most Used Tools (Row Chart)
SELECT tool_name, total_calls, sessions_usedFROM marts.agg_tool_efficiencyORDER BY popularity_rankLIMIT 10A ranked horizontal bar chart makes it easy to compare your most-used tools. The sessions_used column shows breadth of usage across different coding sessions.
Response Time Trends (Line Chart)
SELECT date_key, AVG(avg_response_time_seconds) as avg_responseFROM marts.dim_sessionsGROUP BY date_keyORDER BY date_keyTrack Claude’s response latency over time. Sudden increases might indicate complex queries or infrastructure issues.
Dashboard 3: Project Insights
This dashboard provides visibility into project-level activity and code changes.
Project Activity Ranking (Bar Chart)
SELECT project_id, total_messages, session_countFROM marts.dim_projectsORDER BY total_messages DESCLIMIT 15See which projects consume the most of your Claude interactions. This helps prioritize documentation and knowledge transfer for heavily-used codebases.
Project Status Distribution (Pie Chart)
SELECT activity_status, COUNT(*) as project_countFROM marts.dim_projectsGROUP BY activity_statusProjects are classified as active, dormant, or archived based on recent activity. This visualization shows the health of your project portfolio.
Daily Code Changes (Area Chart)
SELECT date_key, file_reads, file_writes, file_editsFROM marts.agg_daily_summaryWHERE has_activity = trueORDER BY date_keyStacked area chart showing the volume of file operations over time. Large spikes in writes often correlate with feature implementations, while edit-heavy periods suggest refactoring work.
File Types Worked On (Pie Chart)
SELECT aggregation_key as file_type, total_operationsFROM marts.agg_code_changesWHERE aggregation_level = 'by_file_type'ORDER BY total_operations DESCDistribution of file operations by extension (.ts, .py, .sql, etc.). This reveals your technology stack focus over time.
Self-Service Features for End Users
The true power of Metabase lies in its self-service capabilities. Once the dashboards are set up, team members can explore data without SQL knowledge or engineering support.
Visual Query Builder
Metabase’s visual query builder allows users to create questions through a point-and-click interface. Users select a table, choose columns, add filters, and pick a visualization type. Behind the scenes, Metabase generates optimized SQL.
For example, a product manager could:
- Select the
dim_projectstable - Choose
project_idandtotal_messagescolumns - Add a filter for
activity_status = 'active' - Sort by
total_messagesdescending - Visualize as a bar chart
No SQL required.
Interactive Filters
Add dashboard-level filters to enable dynamic exploration:
Date Range Filter
- Field:
date_key - Default: Last 30 days
- Allows users to analyze specific time periods
Project Filter
- Field:
project_id - Type: Multi-select dropdown
- Filter all dashboard cards simultaneously
Activity Level Filter
- Field:
activity_level - Values: minimal, light, moderate, heavy
- Focus on high-activity or low-activity periods
These filters connect to all relevant questions on a dashboard, providing a unified exploration experience.
Sharing and Collaboration
Metabase offers several ways to share insights:
- Public links: Generate shareable URLs for specific questions or dashboards
- Embedding: Embed dashboards in internal tools or documentation
- Subscriptions: Schedule email delivery of dashboard snapshots
- Slack integration: Post dashboard updates to team channels
For our Claude analytics use case, consider setting up a weekly email subscription that summarizes productivity metrics for each team member.
Permissions Model
Metabase supports granular permissions at the database, schema, and table levels. For a self-service analytics deployment, consider this structure:
- Admins: Full access to all data and Metabase configuration
- Analysts: Query access to all schemas, can create questions and dashboards
- Viewers: Can view existing dashboards and filter data, but cannot create new questions
This prevents accidental exposure of sensitive data while enabling broad access to insights.
Deployment and Operations
Running Metabase in production requires attention to reliability, performance, and data freshness.
Starting the Stack
Launch the complete analytics platform with a single command:
cd analyticsmake upThis starts Prefect (orchestration), the analytics worker, PostgreSQL (Metabase metadata), and Metabase itself. Verify all services are healthy:
docker-compose -f docker-compose.analytics.yml psYou should see all containers in a healthy or running state.
Health Monitoring
Metabase exposes a health endpoint for monitoring:
curl http://localhost:3001/api/healthA healthy response returns {"status":"ok"}. Integrate this with your monitoring system (Prometheus, Datadog, etc.) to alert on Metabase availability issues.
Backup and Restore
Two data stores require backup consideration:
PostgreSQL Metadata Database
Metabase stores all questions, dashboards, user accounts, and configuration in PostgreSQL. Back up this database regularly:
docker exec metabase-db pg_dump -U metabase metabase > metabase_backup.sqlRestore with:
cat metabase_backup.sql | docker exec -i metabase-db psql -U metabase metabaseDuckDB Analytics Database
The DuckDB file contains your actual analytics data. Since dbt can regenerate this from source data, full backups are less critical. However, for faster recovery, periodically copy the database file:
docker cp analytics-worker:/duckdb/analytics.db ./analytics_backup.dbPerformance Tuning
Several strategies improve Metabase performance with DuckDB:
Question Caching
Enable query caching in Metabase settings. For stable historical data, cache results for 1-6 hours. This dramatically reduces DuckDB query load.
Refresh Schedules
Configure appropriate refresh intervals based on data freshness requirements:
| Dashboard Type | Refresh Interval | Rationale |
|---|---|---|
| Real-time activity | 1 hour | Balance freshness with load |
| Historical trends | 6 hours | Data changes slowly |
| Aggregates | Daily at 3 AM | After nightly dbt runs |
Pre-aggregation
Our dbt models already implement pre-aggregation (the agg_* tables). If you add new questions that perform expensive aggregations, consider adding corresponding dbt models rather than computing on the fly.
Series Conclusion: The Complete Data Journey
We have reached the end of our seven-article journey. Let us step back and appreciate what we have built together.
Article 1-3 covered the sync service that watches Claude’s JSONL log files, buffers changes in SQLite, and syncs to MongoDB. This gave us durable, queryable storage for conversation history.
Article 4 introduced the Python extraction layer using Prefect for orchestration. We built incremental extraction logic that efficiently pulls new data from MongoDB without full table scans.
Article 5 explored DuckDB as our analytics warehouse. We implemented a loader that converts Parquet files into queryable tables, leveraging DuckDB’s columnar storage for analytics performance.
Article 6 was our deepest dive, covering dbt transformations using the medallion architecture. Bronze, silver, and gold layers progressively refined raw data into analytics-ready star schemas.
Article 7 (this article) completed the stack with Metabase dashboards. We connected to DuckDB, built three comprehensive dashboards, and explored self-service features that empower non-technical users.
What We Built
The complete platform provides:
- Real-time data capture from Claude AI sessions
- Durable storage in MongoDB for operational queries
- Columnar analytics in DuckDB for fast aggregations
- Semantic modeling through dbt transformations
- Self-service visualization via Metabase dashboards
- Orchestrated pipelines with Prefect scheduling
All of this runs locally in Docker containers, requiring no cloud services or external dependencies.
Future Enhancements
Several directions could extend this platform:
-
Alerting: Configure Metabase alerts when metrics exceed thresholds (e.g., session duration drops significantly)
-
Semantic Layer: Add a metrics layer using dbt Semantic Layer or Cube.js for consistent metric definitions
-
Machine Learning: Export data to train models that predict task complexity or suggest optimal tools
-
Multi-user Analytics: Track metrics across a development team rather than individual usage
-
Cost Attribution: Correlate Claude usage with API costs for budgeting and optimization
Final Thoughts
Building an analytics platform is an investment. Each layer required careful design decisions, from the choice of SQLite for buffering to DuckDB for warehousing to Metabase for visualization. But the payoff is substantial: complete visibility into how you use AI-assisted development tools.
More importantly, this architecture is not specific to Claude. The patterns we explored, which include real-time sync, medallion architecture, and self-service BI, apply to any analytics use case. Swap MongoDB for PostgreSQL, change the dbt models, and you have an analytics platform for a SaaS product, an IoT system, or an e-commerce store.
I hope this series has demystified the modern data stack. The tools are mature, the patterns are proven, and the barriers to entry have never been lower. Build something great.
Suggested Tags: Metabase, DuckDB, Data Analytics, Business Intelligence, Data Engineering
This is Article 7 of 7 in the “Building a Modern Analytics Platform” series. Read the complete series to learn how to build production-ready data infrastructure from scratch.