Building the Foundation: Livepeer NaaP Analytics

I’m excited to share an update on what the Livepeer Cloud SPE has been working on: our treasury proposal for Network-as-a-Product (NaaP) MVP – SLA Metrics, Analytics, and Public Infrastructure has passed, and we’re deep into Milestone 1.

👉 Treasury Proposal (On-Chain): Livepeer Explorer

👉 Forum Discussion: Metrics and SLA Foundations for NaaP

The Problem: You Can’t Optimize What You Can’t Measure

Livepeer’s AI video infrastructure is growing fast. Gateways route inference jobs to orchestrators. GPUs process real-time video streams. The network is doing work.

But here’s the challenge: we lack a shared, network-wide view of performance, reliability, and demand that participants can use to assess Livepeer for production use.

Right now, there’s no centralized way to answer questions like:

What’s the average prompt-to-first-frame latency for a given orchestrator?
Which GPUs are delivering consistent 20+ FPS performance?
What’s the jitter coefficient across the network under load?
Are orchestrators meeting SLA targets for uptime and reliability?

Without this data, we’re flying blind. Gateway providers can’t set SLAs. Orchestrators can’t benchmark themselves. And external developers struggle to evaluate Livepeer for serious workloads.

This proposal changes that.

What We’re Building

The NaaP MVP delivers a focused, end-to-end metrics system for observability and learning. It’s designed to make the Livepeer network measurable, comparable, and trustworthy.

The Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         LIVEPEER NETWORK                             │
│  ┌──────────────┐           ┌──────────────┐                        │
│  │   Gateway    │           │ Orchestrator │                        │
│  │  (Daydream)  │           │   (GPU Node) │                        │
│  └──────┬───────┘           └──────┬───────┘                        │
│         │ Events                   │ Events                         │
└─────────┼──────────────────────────┼────────────────────────────────┘
          │                          │
          └──────────┬───────────────┘
                     │
          ┌──────────▼──────────┐
          │       KAFKA         │  ← Event streaming backbone
          └──────────┬──────────┘
                     │
          ┌──────────▼──────────┐
          │   APACHE FLINK      │  ← Stream processing & correlation
          └──────────┬──────────┘
                     │
          ┌──────────┴──────────┐
          ▼                     ▼
    ┌──────────┐          ┌─────────┐
    │ClickHouse│          │ MinIO   │
    │ (hot)    │          │ (cold)  │
    └────┬─────┘          └─────────┘
         │
         ├─────────────────────────────┐
         ▼                             ▼
    ┌──────────┐                ┌─────────────┐
    │ Grafana  │                │ Public APIs │
    │Dashboard │                │ /gpu/metrics│
    └──────────┘                │ /sla/compliance
                                └─────────────┘
                                       │
                                       ▼
                           ┌───────────────────────┐
                           │   Apps & Gateways     │
                           │ (Orchestrator Selection)
                           └───────────────────────┘

The Stack

Component	Role
Apache Kafka 3.9	Durable event log — ingests all streaming events
Apache Flink 1.20	Parses, transforms, and correlates events in real-time
ClickHouse 24.11	Fast columnar database for analytics queries
MinIO	S3-compatible cold storage for audit trail and replay
Grafana 11.x	Visualization layer — dashboards for operators and network

This isn’t a toy. It’s designed to handle 1000+ events per second, with sub-second query latency and 90-day retention.

Proposal Deliverables

The NaaP MVP covers five key areas:

1. Core SLA Metrics (MVP Scope)

A standardized set of network, performance, and reliability metrics sufficient to evaluate orchestrator and GPU behavior across workflows.

2. Network Test & Verification Signals

Reference load-test gateways generating consistent, reproducible performance signals. Public test scenarios captured in GitHub for community verification.

3. Analytics & Aggregation Layer

Lightweight ETL pipelines transforming raw metrics into network-level views. Derived indicators like jitter coefficient, latency percentiles, and uptime scores.

4. Public Dashboard & APIs

A standalone public dashboard presenting live and historical metrics. Crucially, read-only APIs that any application can consume for aggregate SLA scores, GPU performance data, and network demand metrics.

5. Operations & Stewardship

Ongoing operation of testing, analytics, and dashboard infrastructure. Maintenance and community support for 1 year.

The API Layer: Enabling Smart Orchestrator Selection

This is the part I’m most excited about. The APIs aren’t just for dashboards — they’re building blocks for the entire ecosystem.

The proposal includes public API endpoints that applications can consume:

Endpoint	Purpose
`/gpu/metrics`	Real-time per-GPU performance metrics (FPS, latency, jitter)
`/network/demand`	Aggregate network demand and capacity data
`/sla/compliance`	SLA compliance scores for any orchestrator
`/datasets`	Public load test datasets for verification

What This Enables

For Gateway Providers:

Query real-time SLA scores before routing workloads
Select high-performing orchestrators automatically
Avoid underperforming GPUs based on historical data

For Orchestrators:

Benchmark against network averages
Identify performance gaps
Prove their reliability with verifiable data

For Builders:

Build custom tools on top of the metrics API
Create specialized dashboards for specific use cases
Integrate network health into their applications

This is the data layer that will power intelligent workload routing. When a gateway needs to select an orchestrator, it won’t be guessing — it’ll be querying live performance data.

Timeline & Milestones

Duration: ~6 months (work began November 2025)

Milestone	Target	Status
M1: Metrics Collection & Aggregation	February 2026	🟡 In Progress
M2: Test Signals & Derived Analytics	March 2026	Upcoming
M3: Stabilization & Review	April 2026	Upcoming

Milestone 1 Progress (Current Focus)

We’ve built the initial infrastructure for data ingestion, processing, and deployment:

✅ Event ingestion pipeline (Kafka + Flink + ClickHouse)
✅ Schema design for all 7 event types
✅ Basic Grafana dashboard provisioning
✅ End-to-end data flow validated
🔄 E2E latency calculation (stream trace correlation)
🔄 DLQ (Dead Letter Queue) for failed event parsing

Code: Cloud-SPE/livepeer-naap-analytics

Project Board: GitHub Projects

Milestone Tracking: GitHub Milestone

The Bigger Picture: Network-as-a-Product Vision

This work is part of a broader vision from Livepeer Inc and the Livepeer Foundation to transform the protocol into a true Network-as-a-Product (NaaP).

The NaaP vision defines three core components:

Permissionless Livepeer Protocol — Orchestrators enroll GPUs with clear SLA requirements and compensation structures
Public Monitorable SLA Framework — Users get assurance that inference requests meet pre-agreed, published SLAs
Workload Management Utility — Tools for deploying, executing, analyzing, and managing AI workloads

What we’re building (Milestone 1) is the foundation for Component #2 — the measurement layer that makes everything else possible.

Future Milestones (Beyond This Proposal)

The NaaP roadmap extends well beyond metrics:

Milestone 2: SLA-based scoring, selection algorithms, and incentive frameworks
Milestone 3: Workload control plane with deployment, lifecycle, and security management
Milestone 4: Complex multi-GPU workload handling with cluster-based redundancy

Our job right now is to get the fundamentals right. You can’t build SLA-aware routing without SLA data. You can’t score orchestrators without metrics. You can’t scale intelligently without visibility.

Why This Matters

Livepeer is transitioning from a transcoding network to a full AI video compute platform. That’s a massive shift.

But you can’t run a production-grade network without production-grade observability. SLAs require data. Optimization requires benchmarks. Trust requires transparency.

This infrastructure lays the groundwork for:

Gateway SLAs — Providers can offer quality guarantees backed by real data
Orchestrator accountability — Operators can prove their performance
Demand routing — Route jobs to the best-performing GPUs
Network visibility — Everyone can see how the network behaves

As the proposal states: “By focusing on shared measurement rather than enforcement or protocol change, this work aims to give the Livepeer ecosystem a common understanding of network behavior today — and a solid foundation for deciding what to build next.”

Get Involved

This is a Cloud SPE initiative, but the code is open and contributions are welcome.

On-Chain Proposal: Livepeer Explorer
Forum Thread: forum.livepeer.org
Code: github.com/Cloud-SPE/livepeer-naap-analytics
Project Board: Cloud-SPE Projects

If you’re an orchestrator operator, a gateway provider, or just interested in decentralized infrastructure — I’d love to hear your thoughts.

This is my primary focus over the next few weeks. Building the foundation. Shipping milestones. Making the network measurable.

Let’s keep the conversation going!
Share your thoughts or ask questions on Twitter (now X.com) at @mikezupper.

Building the Foundation: Livepeer NaaP Analytics and Our Treasury Proposal