NNO Observability
Documentation for NNO Observability
Date: 2026-03-30 Status: Partially Implemented (Phase 1) — structured logging deployed; Logpush/CFAE/dashboards are Phase 2 Parent: System Architecture Scope: All NNO core services + every provisioned platform Worker
Phase 1 implemented:
packages/loggeris live with theLoggerclass,requestLoggerHono middleware, andx-trace-idpropagation. The gateway also hasrequestIdMiddleware(generating/propagatingx-request-id) andtracingMiddleware(distributed tracing context via@neutrino-io/core/tracing) as deployed observability primitives. Deployed in 6 NNO core services (IAM, Registry, Billing, Provisioning, CLI Service, Gateway).Phase 2 not yet implemented: Cloudflare Logpush configuration, Analytics Engine (CFAE) bindings, SLOs, alerting, and observability dashboards are designed below and planned for Phase 2.
Overview
Neutrino's observability stack is built on three Cloudflare-native pillars:
| Pillar | Technology | Purpose |
|---|---|---|
| Logs | Structured console.log → Cloudflare Logpush → R2 / external SIEM | Audit trail, debugging, security forensics |
| Metrics | Cloudflare Analytics Engine (CFAE) | Real-time operational signals, usage metering, SLO tracking |
| Traces | x-trace-id propagation + CFAE span events | Cross-service request correlation |
Each pillar serves two audiences:
- NNO operators — Visibility across all platforms, all services, all tenants. Used for platform health monitoring, incident response, and capacity planning.
- Platform admins — Scoped visibility into their platform only. Accessible via the NNO Portal observability dashboard. No access to other platforms' data or NNO internal service internals.
1. Structured Logging [Phase 1]
1.1 Log Format
Every NNO service and every provisioned platform Worker emits logs as newline-delimited JSON (NDJSON). Cloudflare Workers capture console.log() output and include it in Logpush streams.
// packages/logger/src/logger.ts
export type LogLevel = "info" | "warn" | "error" | "debug";
export interface LogEntry {
timestamp: string; // ISO 8601
level: LogLevel;
service: string; // e.g. 'registry' | 'provisioning'
traceId?: string; // x-trace-id propagated across services
requestId?: string; // x-request-id per request
message: string;
[key: string]: unknown; // arbitrary structured data spread into entry
}The LogEntry is intentionally flat: the Logger spreads any extra data fields directly into the entry (via ...data) rather than nesting them under a data key. This keeps log lines compact and easily queryable.
1.2 Logger Implementation
// packages/logger/src/logger.ts
export class Logger {
constructor(
private readonly service: string,
private readonly traceId?: string,
private readonly requestId?: string,
) {}
private emit(
level: LogLevel,
message: string,
data?: Record<string, unknown>,
): void {
const entry: LogEntry = {
timestamp: new Date().toISOString(),
level,
service: this.service,
...(this.traceId !== undefined && { traceId: this.traceId }),
...(this.requestId !== undefined && { requestId: this.requestId }),
message,
...data, // extra fields are spread into the top-level entry
};
const output = JSON.stringify(entry);
if (level === "error") {
console.error(output);
} else {
console.log(output);
}
}
info(message: string, data?: Record<string, unknown>): void {
this.emit("info", message, data);
}
warn(message: string, data?: Record<string, unknown>): void {
this.emit("warn", message, data);
}
error(message: string, data?: Record<string, unknown>): void {
this.emit("error", message, data);
}
debug(message: string, data?: Record<string, unknown>): void {
this.emit("debug", message, data);
}
}
/** Convenience factory — creates a Logger without trace/request IDs */
export function createLogger(service: string): Logger {
return new Logger(service);
}Key differences from the previous documentation:
- Constructor takes
(service: string, traceId?: string, requestId?: string)— not a context object withversion,platformId, etc. - Error-level logs use
console.error(); all others useconsole.log(). - Extra
datafields are spread directly into the top-level log entry (not nested underdata). traceIdandrequestIdare only included in the entry when defined (conditional spread).- No
child(),request(), orfatal()methods exist.
1.3 Hono Request Logging Middleware
All NNO service Workers and platform Workers use a shared middleware that logs every request:
// packages/logger/src/middleware.ts
import { Logger } from "./logger.js";
import { initTrace } from "./trace.js";
export function requestLogger(service: string): MiddlewareHandler {
return async (c, next) => {
const start = Date.now();
const { traceId, requestId } = initTrace(c.req.raw);
// Set on Hono context — available to all downstream handlers
c.set("traceId", traceId);
c.set("requestId", requestId);
const logger = new Logger(service, traceId, requestId);
c.set("logger", logger);
logger.info("→ request", { method: c.req.method, path: c.req.path });
await next();
logger.info("← response", {
method: c.req.method,
path: c.req.path,
status: c.res.status,
duration: Date.now() - start,
});
};
}Key differences from the previous documentation:
- Takes
service: string(not aLoggerinstance) and creates theLoggerinternally after extracting trace context. - Calls
initTrace(c.req.raw)to extract/generate bothtraceIdandrequestId. - Sets three values on the Hono context:
traceId,requestId, andlogger— downstream handlers access the logger viac.get("logger"). - Emits both a request log (
→ request) and a response log (← response) with duration.
1.4 Metrics Recording
packages/logger/src/metrics.ts provides a lightweight helper for recording metric data points. In production it writes to the Cloudflare Analytics Engine dataset nno_metrics; in development (or when the binding is unavailable) it falls back to structured console.log output.
// packages/logger/src/metrics.ts
export interface MetricLabels {
[key: string]: string;
}
export function recordMetric(
name: string, // e.g. 'gateway.request.count'
value: number, // count, latency ms, bytes, etc.
labels: MetricLabels, // key/value dimensions for filtering
env?: { NNO_METRICS?: AnalyticsDataset }, // CF Workers env binding
): void;Behaviour:
- Production (
env.NNO_METRICSpresent): callsenv.NNO_METRICS.writeDataPoint()with the metric name and label values asblobs, the numeric value asdoubles, and the metric name asindexes. Wrapped in a try/catch so metric failures never throw. - Development (no binding): emits
\{ metric, value, labels, timestamp \}viaconsole.log(not viaLogger, to avoid circular dependency).
Note:
recordMetricis not currently re-exported from the barrelindex.ts. Import it directly:import \{ recordMetric \} from '@neutrino-io/logger/metrics'.
Phase 2 additions to audit infrastructure: The Registry audit_log table gains dedicated columns actor_email TEXT, ip_address TEXT, user_agent TEXT for queryability. Historical rows will have NULL in these columns; the existing metadata JSON column already captures this data for pre-Phase 2 entries. The platform_lifecycle_events table (Phase 2) provides a dedicated lifecycle audit trail separate from the general audit_log — it records every platform status transition with actor, trigger type, and reason.
1.5 What Each Service Logs
NNO Registry
| Event | Level | Key fields |
|---|---|---|
| Resource created/updated/deleted | info | resourceType, resourceId, platformId |
| Manifest fetched | debug | platformId, entityId, featureCount |
| Audit log write | debug | action, actorId |
| Query timeout (>500ms) | warn | query, durationMs |
| Internal error | error | error.stack |
NNO Provisioning
| Event | Level | Key fields |
|---|---|---|
| Job created | info | jobId, operation, platformId |
| Step started/completed | info | jobId, step, durationMs |
| Step failed | error | jobId, step, error, willRollback |
| Rollback started/completed | warn | jobId, stepsToRollback |
| CF API rate limit hit | warn | endpoint, retryAfterMs |
| CF API call | debug | method, endpoint, status, durationMs |
NNO CLI Service
| Event | Level | Key fields |
|---|---|---|
| Repo created | info | platformId, repoUrl |
| Feature config committed | info | platformId, featureId, commitSha |
| CF Pages build triggered | info | platformId, buildId |
| GitHub API error | error | endpoint, status, error |
Platform Auth Workers
| Event | Level | Key fields |
|---|---|---|
| Login success/failure | info | platformId, userId, method, result |
| Session created/invalidated | info | platformId, userId, sessionId |
| Permission denied | warn | platformId, userId, permission |
| 2FA triggered | info | platformId, userId, method |
Auth events are also written to the auth D1
audit_authenticationandaudit_authorizationtables (90-day retention) by the existing audit middleware — the structured log provides real-time streaming; D1 provides queryable history.
Platform Feature Workers
| Event | Level | Key fields |
|---|---|---|
| Request handled | info | platformId, entityId, featureId, status, durationMs |
| Auth validation failure | warn | platformId, featureId, reason |
| D1 query slow (>200ms) | warn | featureId, query, durationMs |
| Unhandled error | error | featureId, error.stack |
2. Cloudflare Logpush [Phase 2]
Cloudflare Logpush streams real-time Worker logs (including console.log output and HTTP request fields) to a configurable destination.
2.1 Logpush Destinations
| Audience | Destination | Retention |
|---|---|---|
| NNO internal (all services) | R2 bucket nno-logs-internal | 90 days |
| Per-platform logs (for admins) | R2 bucket nno-logs-\{platformId\} | 30 days |
| Security/SIEM (optional) | Datadog / Splunk / custom HTTPS endpoint | Per SIEM policy |
2.2 NNO Internal Logpush Configuration
One Logpush job per NNO core service, configured via Cloudflare API at provisioning time:
// Called once during NNO core service deployment
async function createLogpushJob(
accountId: string,
cfApiToken: string,
workerName: string,
r2BucketName: string,
): Promise<void> {
await fetch(
`https://api.cloudflare.com/client/v4/accounts/${accountId}/logpush/jobs`,
{
method: "POST",
headers: {
Authorization: `Bearer ${cfApiToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
name: `logpush-${workerName}`,
destination_conf: `r2://${r2BucketName}/{DATE}/{HOUR}/{filename}`,
dataset: "workers_trace_events",
filter: JSON.stringify({
where: {
key: "ScriptName",
value: workerName,
op: "eq",
},
}),
logpull_options:
"fields=Event,EventTimestampMs,Outcome,Logs,ScriptName,Ray",
enabled: true,
}),
},
);
}2.3 Platform Logpush Configuration
NNO Provisioning creates a Logpush job for each platform's Workers during platform provisioning. All Workers belonging to a platform (auth, feature Workers) are collected into one per-platform R2 bucket:
nno-logs-{platformId}/
├── 2026-02-22/
│ ├── 00/
│ │ └── {platformId}-auth-prod-{uuid}.json.gz
│ ├── 01/
│ │ └── {platformId}-analytics-prod-{uuid}.json.gz
│ └── ...
└── 2026-02-23/
└── ...Platform admins can download log files directly from R2 via a signed URL generated by the NNO Portal. In Phase 2, a basic log search UI is provided in the Portal.
2.4 Logpush Record Format
Each Logpush record from Cloudflare Workers contains:
{
"ScriptName": "k3m9p2xw7q-r8n4t6y1z5-analytics-prod",
"Ray": "8b2e4f1a2b3c4d5e",
"Outcome": "ok",
"EventTimestampMs": 1740220440123,
"Logs": [
{
"Level": "log",
"Message": [
"{\"traceId\":\"abc\",\"service\":\"analytics\",\"level\":\"info\",\"message\":\"GET /api/data 200\",\"http\":{\"durationMs\":45}}"
],
"TimestampMs": 1740220440100
}
]
}The Logs[*].Message[0] field contains the JSON-stringified LogEntry emitted by the Worker's logger. NNO log tooling parses this to extract structured fields.
3. Cloudflare Analytics Engine (Metrics) [Phase 2]
3.1 CFAE Datasets
One Analytics Engine dataset per logical domain. Dataset names follow the NNO naming convention:
| Dataset | Workers that write to it | Key measurements |
|---|---|---|
nno-core-ops | Registry, Provisioning, CLI Service, Stack Registry, IAM | Operation counts, durations, error rates per NNO service |
\{platformId\}-usage | All Workers for a given platform | Per-feature invocation counts (reused from billing metering) |
\{platformId\}-perf | All Workers for a given platform | Response latency percentiles per feature + endpoint |
nno-auth-events | All Auth Workers across all platforms | Login events, session counts (anonymised), 2FA usage |
nno-builds | NNO CLI Service | CF Pages build outcomes, durations |
3.2 Data Point Schema
nno-core-ops
analytics.writeDataPoint({
blobs: [
service, // blob1: e.g. 'registry'
operation, // blob2: e.g. 'GET /platforms'
outcome, // blob3: 'success' | 'error' | 'timeout'
platformId, // blob4: which platform the operation was for (or 'nno-internal')
],
doubles: [
1, // double1: request count (always 1 per data point)
durationMs, // double2: response time in ms
isError ? 1 : 0, // double3: error flag
],
indexes: [service],
});\{platformId\}-perf
analytics.writeDataPoint({
blobs: [
featureId, // blob1: e.g. 'analytics'
endpoint, // blob2: e.g. 'GET /api/data'
String(status), // blob3: HTTP status code
entityId, // blob4: tenant
],
doubles: [1, durationMs, status >= 500 ? 1 : 0],
indexes: [featureId],
});3.3 Querying CFAE
NNO Portal queries CFAE via the Analytics Engine SQL API:
// Error rate for all features on a platform (last 24h)
const sql = `
SELECT
blob1 AS feature_id,
SUM(double1) AS total_requests,
SUM(double3) AS error_count,
AVG(double2) AS avg_duration_ms,
quantileWeighted(0.95)(double2, double1) AS p95_duration_ms
FROM ${platformId}_perf
WHERE timestamp >= now() - INTERVAL '24' HOUR
GROUP BY blob1
ORDER BY total_requests DESC
`;// Provisioning job success rate (last 7 days)
const sql = `
SELECT
blob2 AS operation,
SUM(double1) AS total,
SUM(double3) AS errors,
ROUND(100.0 * SUM(double3) / SUM(double1), 2) AS error_pct
FROM nno_core_ops
WHERE blob1 = 'provisioning'
AND timestamp >= now() - INTERVAL '7' DAY
GROUP BY blob2
`;4. Distributed Tracing [Phase 1]
Cloudflare Workers do not support OpenTelemetry natively (no spans, no trace context propagation built-in). NNO implements a lightweight tracing model using HTTP headers and CFAE span events.
4.1 Trace ID Propagation
A x-trace-id header is generated at the NNO Gateway and propagated through every downstream service call:
Client request
→ NNO Gateway x-trace-id: <uuid> x-request-id: <uuid> (generated here via crypto.randomUUID())
→ NNO Registry x-trace-id: <uuid> x-request-id: <uuid> (forwarded via withTraceHeaders)
→ NNO Provisioning x-trace-id: <uuid> x-request-id: <uuid> (forwarded)
→ CF API call (external — trace stops)If a request already carries x-trace-id (e.g., from the NNO CLI), it is preserved and used throughout.
// packages/logger/src/trace.ts
export function initTrace(request: Request): {
traceId: string;
requestId: string;
} {
const traceId = request.headers.get("x-trace-id") ?? crypto.randomUUID();
const requestId =
request.headers.get("x-request-id") ?? crypto.randomUUID();
return { traceId, requestId };
}
export function withTraceHeaders(
headers: Headers | [string, string][] | Record<string, string> | undefined,
traceId: string,
requestId: string,
): Headers {
const result = new Headers(headers);
result.set("x-trace-id", traceId);
result.set("x-request-id", requestId);
return result;
}Key differences from the previous documentation:
- No global
TRACE_STORE— there is no module-levelMap. Trace context is passed explicitly via function arguments and Hono context, not stored globally. - No
currentTrace()— this function does not exist. Services accesstraceId/requestIdfrom the Hono context (c.get("traceId"),c.get("requestId")) or pass them explicitly. initTraceusescrypto.randomUUID()(Web Crypto API, available in all CF Workers), notnanoid. IDs have notr_/req_prefix.initTracereads bothx-trace-idandx-request-idheaders from the incoming request, falling back tocrypto.randomUUID()for each.withTraceHeadersrequires explicittraceIdandrequestIdarguments — it does not read from a global store. It sets bothx-trace-idandx-request-idon the outgoing headers.
4.2 CFAE Span Events [Phase 2]
Not yet implemented. The
spans.tsfile does not exist inpackages/logger. TheemitSpanfunction and CFAE span dataset are planned for Phase 2 alongside the broader Analytics Engine integration. The design below is the target specification.
For operations spanning multiple async steps (provisioning jobs, stack activation pipeline), NNO will emit span events to CFAE:
// packages/logger/src/spans.ts (Phase 2 — not yet implemented)
export function emitSpan(
analytics: AnalyticsEngineDataset,
span: {
traceId: string;
spanId: string;
parentSpanId?: string;
service: string;
operation: string;
startMs: number;
endMs: number;
outcome: "ok" | "error";
platformId?: string;
},
): void {
analytics.writeDataPoint({
blobs: [
span.traceId,
span.spanId,
span.parentSpanId ?? "",
span.service,
span.operation,
span.outcome,
span.platformId ?? "nno-internal",
],
doubles: [
span.endMs - span.startMs, // duration in ms
span.outcome === "error" ? 1 : 0,
],
indexes: [span.traceId],
});
}With indexes: [traceId], all spans for a given trace will be fetchable efficiently:
-- All spans for a given trace (Phase 2)
SELECT blob4 AS service, blob5 AS operation,
double1 AS duration_ms, blob6 AS outcome,
blob2 AS span_id, blob3 AS parent_span_id
FROM nno_core_ops_spans
WHERE indexes[0] = 'tr_abc123'
ORDER BY timestamp ASC4.3 Trace Correlation with Logs
Because every log entry includes traceId, a single trace ID allows correlation of:
- All log lines across all NNO services that handled the request
- All CFAE span events for the operation
- The Logpush records for that specific Cloudflare Ray ID
This gives a complete picture of a single user action across the entire stack without a dedicated tracing backend.
5. Key Metrics & SLOs [Phase 2]
5.1 NNO Core Service SLOs
| Service | Metric | Target | Alert at |
|---|---|---|---|
| NNO Gateway | Request error rate (5xx) | < 0.1% | > 1% |
| NNO Gateway | p99 latency | < 500ms | > 1000ms |
| NNO Registry | Read p99 latency | < 100ms | > 300ms |
| NNO Registry | Write p99 latency | < 200ms | > 500ms |
| NNO Provisioning | Job success rate | > 99% | < 97% |
| NNO Provisioning | PROVISION_PLATFORM duration | < 120s | > 300s |
| NNO Provisioning | ACTIVATE_FEATURE duration | < 60s | > 180s |
| NNO CLI Service | Feature activation commit time | < 10s | > 30s |
| Stack Registry | Template publish p95 | < 5s | > 15s |
| Stack Registry | Version validation duration | < 60s | > 300s |
5.2 Platform Worker SLOs (per platform)
| Metric | Target | Notes |
|---|---|---|
| Auth Worker p95 latency | < 200ms | Login, session validation |
| Feature Worker p95 latency | < 300ms | Per activated feature |
| Feature Worker error rate | < 0.5% | 5xx responses |
| CF Pages build success rate | > 98% | Platform shell rebuild |
| CF Pages build duration | < 120s | Typical pnpm install + build |
5.3 Platform Shell SLOs (client-facing)
| Metric | Target | Notes |
|---|---|---|
| Shell TTFB (CF Pages CDN) | < 50ms | Static asset, fully cached |
| Auth session check latency | < 100ms | Cookie cache hit |
| Feature registry init time | < 50ms | Static imports, no network |
| Remote manifest fetch (Phase 2) | < 30ms | CDN-cached KV read |
6. Alerting [Phase 2]
Alerts are sent via email (NNO email Worker) and optionally to a Slack webhook or PagerDuty.
6.1 Alert Configuration
// Stored in NNO Registry — per-platform alert config
interface AlertConfig {
platformId: string;
email: string; // platform admin email
slackWebhookUrl?: string;
pagerdutyKey?: string;
alerts: {
errorRateThreshold: number; // e.g. 0.05 = 5%
latencyP99ThresholdMs: number; // e.g. 1000
buildFailureAlert: boolean;
provisioningFailureAlert: boolean;
usageAlerts: boolean; // from billing metering
};
}6.2 Alert Categories
| Category | Trigger | Severity | Default recipients |
|---|---|---|---|
| Platform Worker error spike | Feature Worker 5xx rate > 5% in 5-min window | High | Platform admin |
| Auth Worker down | Auth Worker returning 0 requests for 2 min | Critical | Platform admin + NNO ops |
| CF Pages build failure | Build exits non-zero | Medium | Platform admin |
| Provisioning job failed | Job reaches FAILED state | High | NNO ops team |
| Provisioning job timeout | Job running > 10 min | Medium | NNO ops team |
| Registry D1 latency | p99 read > 300ms for 5 min | High | NNO ops team |
| Stack Registry validation failure rate | > 20% of publish attempts in 1 hour | Low | NNO ops team |
| Usage threshold | 50% / 75% / 90% / 100% of tier limit | Info → Critical | Platform admin |
6.3 Alert Message Format
[NNO ALERT] Platform k3m9p2xw7q — analytics Worker error rate critical
Platform: AcmeCorp (k3m9p2xw7q)
Feature: analytics
Alert: Worker error rate exceeded threshold
Threshold: 5%
Current: 12.4% (over last 5 minutes)
Time: 2026-02-22 10:34:00 UTC
Affected endpoints:
POST /api/data/export — 45% error rate
GET /api/report — 8% error rate
Action: Review logs at https://portal.nno.app/platforms/k3m9p2xw7q/observability
Trace a recent error: https://portal.nno.app/platforms/k3m9p2xw7q/logs?traceId=tr_abc123
—
NNO Platform Monitoring7. Platform Admin Dashboard (NNO Portal) [Phase 2]
Platform admins access observability via NNO Portal → Observability (/platforms/\{id\}/observability).
7.1 Overview Tab
- Health status — Traffic light per deployed Worker (auth, each feature) based on real-time error rate
- Request volume — Chart: total requests/hour across all platform Workers (last 24h, Recharts)
- Error rate — Chart: 5xx error rate per feature (last 24h)
- Active builds — CF Pages build status card with link to CF dashboard
7.2 Feature Performance Tab
Per-feature breakdown:
- Request rate (req/min), error rate (%), p50/p95/p99 latency
- Top 10 slowest endpoints
- Top 10 most-erroring endpoints
Data source: GET /api/observability/features?platformId=\{id\}&range=24h
Backed by CFAE query on \{platformId\}-perf dataset.
7.3 Logs Tab
- Search by: time range, feature, log level, trace ID, user ID, free-text
- Log viewer — Paginated list of structured log entries, expandable to show full JSON
- Download — Export log files as
.json.gzfrom R2 (signed URL, 15-min expiry)
Data source: Logpush files in nno-logs-\{platformId\} R2 bucket, read via the NNO Portal backend.
Phase 1: Download only. Phase 2: Real-time search via a lightweight log indexing Worker.
7.4 Audit Log Tab
- All auth events (logins, logouts, org changes) from
audit_authenticationtable - All authorization decisions from
audit_authorizationtable - Filterable by event type, user, result (success/failure)
- 90-day retention, paginated
Data source: GET /api/auth/admin/audit (auth Worker endpoint, platform-admin only)
8. NNO Operator Dashboard (Internal) [Phase 2]
NNO operators access a richer view via the NNO Portal's internal tools section (/internal/observability).
8.1 Cross-Platform Health
- Fleet overview — One row per active platform: Worker count, aggregate error rate, last build status, subscription tier
- Incident heatmap — Time × platform grid; cells coloured by error rate
8.2 Provisioning Monitor
- Active provisioning jobs with live step progress
- Failed jobs with rollback status and error details
- Historical job completion times (p50/p95 per operation type)
- CF API quota usage (Workers scripts deployed today vs. 200/day limit)
8.3 Stack Registry Pipeline
- Submission queue depth (PENDING + IN_REVIEW counts)
- Automated validation pass/fail rate (last 7 days)
- Average review cycle time (submission → approval)
- Recently approved/rejected packages
8.4 Trace Explorer
- Search by
traceIdacross all CFAE span datasets - Renders a waterfall diagram of span durations across services
- Links to Logpush records for the same trace
9. Retention Policies
| Data store | Retention | Reasoning |
|---|---|---|
| CFAE metrics (all datasets) | 90 days rolling | CF Analytics Engine limit |
| Logpush to R2 (internal) | 90 days | NNO debugging and audit |
| Logpush to R2 (per platform) | 30 days | Platform admin access |
| Auth D1 audit tables | 90 days | Compliance requirement |
Registry D1 audit_log | 1 year | Platform provisioning audit |
Billing usage_snapshots | 2 years | Invoice dispute resolution |
Billing invoices | 7 years | Legal/accounting requirement |
R2 lifecycle rules are configured at bucket creation to auto-delete objects beyond their retention window.
10. Wrangler Configuration
CFAE Binding (all NNO Workers + platform Workers)
# Added to every NNO service and every provisioned platform Worker template
[[analytics_engine_datasets]]
binding = "ANALYTICS"
dataset = "nno-core-ops" # NNO internal services
# Platform Workers use:
# dataset = "{platformId}-usage" (invocation metering — billing)
# dataset = "{platformId}-perf" (latency / error rate — observability)R2 Bindings (NNO logging Worker)
# services/nno-logging/wrangler.toml
[[r2_buckets]]
binding = "INTERNAL_LOGS"
bucket_name = "nno-logs-internal"
# Per-platform log buckets are created dynamically at provisioning time:
# bucket_name = "nno-logs-{platformId}"Secrets
| Secret | Description |
|---|---|
CF_API_TOKEN | Token with Logpush:Edit permission (for job creation at provisioning) |
CF_ACCOUNT_ID | Cloudflare account ID |
ALERT_EMAIL | NNO ops team email for internal alerts |
ALERT_SLACK_WEBHOOK | Slack webhook for NNO ops alerts |
PAGERDUTY_KEY | PagerDuty routing key for critical alerts |
§11 Implementation Phases
Phase 1 Current State
The following observability infrastructure is built and deployed:
| Component | Status | Details |
|---|---|---|
packages/logger — Logger class | ✅ Live | Structured JSON log emission via console.log |
packages/logger — requestLogger middleware | ✅ Live | Hono middleware logging every HTTP request |
packages/logger — x-trace-id propagation | ✅ Live | Trace ID generated at gateway, forwarded to downstream services |
packages/logger — recordMetric helper | ✅ Live | Writes to CF Analytics Engine nno_metrics dataset (console fallback in dev) |
| Deployed in IAM | ✅ Live | services/iam uses Logger + requestLogger |
| Deployed in Registry | ✅ Live | services/registry uses Logger + requestLogger |
| Deployed in Billing | ✅ Live | services/billing uses Logger + requestLogger |
| Deployed in Provisioning | ✅ Live | services/provisioning uses Logger + requestLogger |
| Deployed in CLI Service | ✅ Live | services/cli uses Logger + requestLogger |
| Deployed in Gateway | ✅ Live | services/gateway has requestIdMiddleware + tracingMiddleware (via @neutrino-io/core/tracing) |
The following are not yet implemented (Phase 2):
- Cloudflare Logpush jobs and R2 log destinations
- Cloudflare Analytics Engine (CFAE) dataset bindings and metric writes
- SLO definitions and alert configuration
- Platform admin observability dashboard (NNO Portal)
- NNO operator internal dashboard
Implementation delta: Observability Phase 1 Plan.
Status: Partially implemented — Phase 1 structured logging deployed; Phase 2 Logpush/CFAE/dashboards planned
Implementation target: packages/logger/ · services/logging/ · apps/console/
Related: System Architecture · NNO Provisioning · NNO Registry · NNO Billing & Metering · NNO Auth Model