# Audit Middleware — Staff Portal API

## What it does

A single middleware class — `AuditMiddleware` — registered in the Staff Portal API's `main.py`, after `AuthMiddleware`. It captures API calls and emits one CloudEvent to the Audit Manager service per call.

Key properties:

* **Never blocks the response.** Emission is `asyncio.create_task` fire-and-forget. The user's request has already returned by the time the audit POST completes.
* **Never raises.** Audit Manager unreachable, slow, or returning errors are all logged but never propagated. A broken audit pipeline cannot break the Staff Portal API.
* **Disabled by default.** Both `audit_enabled=true` and a non-empty `audit_manager_url` are required to actually emit events. The default setup is a no-op — safe to ship without configuring Audit Manager at all.

### Audit policy

| Request kind                                                                   | Audited?                                                                                                                                 |
| ------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- |
| Authenticated (`request.state.auth` set), any outcome                          | **Yes**                                                                                                                                  |
| Anonymous + outcome non-2xx (rejected attempt)                                 | **Yes** — captured as `actor.type=anonymous` (or recovered from JWT on 403, see below). Toggle off via `audit_anonymous_failures=false`. |
| Anonymous + outcome 2xx (legitimate public endpoint)                           | No                                                                                                                                       |
| Health probes (`/ping`)                                                        | No                                                                                                                                       |
| OpenAPI surfaces (`/docs`, `/redoc`, `/openapi.json`, `/docs/oauth2-redirect`) | No                                                                                                                                       |
| `OPTIONS` preflight                                                            | No                                                                                                                                       |

**Why audit rejected anonymous calls?** They're attempted unauthorized access — exactly the signal a security review needs. The combination "automatic skip of legitimate anonymous traffic + capture of rejected anonymous traffic" gives you compliance signal without flooding the audit store with bot pings or browser CORS.

**Disabling anonymous-failure auditing.** Set `REGISTRY_STAFF_PORTAL_API_AUDIT_ANONYMOUS_FAILURES=false`. The middleware then reverts to the original "audit only authenticated user calls" rule.

### Recovering the real user on a 403

When a user has a **valid token but the wrong role**, the existing `AuthMiddleware` raises `ForbiddenError` *before* setting `request.state.auth` — so by default the audit would have no user context. The middleware handles this specially: on a `403` with a bearer token present, it **decodes the JWT payload itself** (without re-verifying the signature — `AuthMiddleware` already did that before raising) to recover `sub`, `name`, `preferred_username`, and the client roles. This is safe because we know the signature was validated; we're just reading what the upstream already accepted.

For `401` (no token, invalid signature, expired token), the JWT cannot be trusted, so the actor is recorded as `anonymous` with only the client IP preserved.

## Where it sits in the middleware stack

```
Request
   │
   ▼
┌──────────────────────────────┐
│ AuditMiddleware (outermost)  │  ← added LAST in main.py
│  - dispatch starts           │
│  - calls call_next ──────────┼─┐
└──────────────────────────────┘ │
                                 ▼
┌──────────────────────────────┐
│ AuthMiddleware               │  ← validates JWT, sets request.state.auth
│  - dispatch starts           │
│  - calls call_next ──────────┼─┐
└──────────────────────────────┘ │
                                 ▼
                         FastAPI handler runs
                                 │
                  ┌──────────────┘
                  ▼
┌──────────────────────────────┐
│ AuthMiddleware (return path) │
└──────────────────────────────┘
                  │
                  ▼
┌──────────────────────────────┐
│ AuditMiddleware (return path)│  ← reads request.state.auth + response.status,
│  builds CloudEvent           │     fires async task to Audit Manager,
│  returns response to caller  │     returns response immediately
└──────────────────────────────┘
                  │
                  ▼
              Response
```

The order matters: audit must wrap auth (not the other way around) so that by the time we read `request.state.auth` after the response, it has been populated.

## What gets emitted (per call)

A single CloudEvents 1.0 envelope with the OpenG2P `data` conventions:

| Field                      | Source in the request                                                                                                                                                                                               |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `id`                       | UUID4 generated by the middleware                                                                                                                                                                                   |
| `source`                   | `/openg2p/registry-staff-portal-api` (configurable via `audit_source`)                                                                                                                                              |
| `type`                     | `org.openg2p.staff_portal.<endpoint_function_name>`                                                                                                                                                                 |
| `time`                     | UTC timestamp when the response was built                                                                                                                                                                           |
| `data.actor.type`          | `"user"` for authenticated callers; `"anonymous"` for unauthenticated rejected attempts                                                                                                                             |
| `data.actor.id`            | `principal.sub` (Keycloak subject id), JWT `sub` on 403, or `"anonymous"`                                                                                                                                           |
| `data.actor.name`          | `principal.name` / JWT `name` claim (display name, e.g. "Admin User")                                                                                                                                               |
| `data.actor.username`      | JWT `preferred_username` claim (login handle, e.g. "admin"). Decoded directly from the bearer token. Not in the `Actor` schema explicitly — preserved via `extra="allow"` and lands under `details.actor.username`. |
| `data.actor.roles`         | `principal.client_roles[<keycloak_client_id>]` (or `resource_access.<client>.roles` from JWT on 403) — roles for this client only                                                                                   |
| `data.actor.ip`            | `X-Forwarded-For` first hop → `X-Real-IP` → `request.client.host`. Picks the real user IP behind Istio / a load balancer rather than the proxy's IP.                                                                |
| `data.actor.session_id`    | JWT `session_state` (or `sid`) claim — useful for grouping all actions in the same Keycloak login session.                                                                                                          |
| `data.action`              | First word of the endpoint function name (e.g. `approve_change_request` → `approve`, `get_individuals` → `get`). See note below.                                                                                    |
| `data.outcome`             | `2xx → success`, `401/403 → denied`, other `4xx/5xx → failure`                                                                                                                                                      |
| `data.context.api`         | `"<METHOD> <path>"` — e.g. `"POST /change-requests/approve_change_request"`                                                                                                                                         |
| `data.context.module`      | `"registry-staff-portal-api"` (configurable via `audit_module`)                                                                                                                                                     |
| `data.context.http_status` | `response.status_code`                                                                                                                                                                                              |
| `data.context.request_id`  | Value of the `X-Request-ID` header if present                                                                                                                                                                       |

### Why `action` is the first word, not the full function name

Most Staff Portal endpoints are declared as `POST` (they take JSON bodies for filters / pagination / sort), so the HTTP method tells you nothing about intent. The endpoint **function name** does — `get_individuals`, `create_change_request`, `delete_template`. The middleware splits on the first `_` and stores just the verb in `data.action`.

This keeps `action` a **low-cardinality dimension** (\~6 verbs: get / list / create / update / delete / search) so it's useful for cross-service dashboards and filters like "all `delete` events last week" or "all `login` failures across the platform". If we stored the full name there, the column would have hundreds of distinct values and be useless for aggregation.

Nothing is lost — the full function name is preserved in two other places on the same row:

* `type` → `org.openg2p.staff_portal.get_individuals` (the full operation name)
* `details.context.api` → `POST /getIndividual` (the wire-level call)

So `action` is the **summary verb**, `type` is the **full op**, and `context.api` is the **HTTP form**. Three layers, three uses.

`data.resource` is intentionally **not** populated in this iteration — most Staff Portal endpoints are RPC-shaped POSTs without a clean URL-path entity to extract. We can add it later via per-route hints if needed.

`data.actor.id` and `data.actor.type` land in **flat indexed columns** (`actor_id`, `actor_type`); the rest of `actor.*` (name, username, roles, ip) lives under the `details.actor.*` JSONB column on the audit-manager side — see [Mapping from CloudEvents to Postgres columns](/platform/platform-services/audit-manager/functional-specifications.md#mapping-from-cloudevents-to-postgres-columns).

## Files changed

In `openg2p-registry-gen2-apis/openg2p-registry-staff-portal-api/`:

| File                                                        | Change                                                                                                                                     |
| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `src/openg2p_registry_staff_portal_api/audit_middleware.py` | **New** — the middleware class. \~280 lines including JWT-decode helper for the 403 recovery path.                                         |
| `src/openg2p_registry_staff_portal_api/config.py`           | **+6 settings**: `audit_enabled`, `audit_manager_url`, `audit_timeout_seconds`, `audit_source`, `audit_module`, `audit_anonymous_failures` |
| `src/openg2p_registry_staff_portal_api/main.py`             | **+11 lines** to register `AuditMiddleware` after `AuthMiddleware`                                                                         |
| `.env.example`                                              | **+8 lines** documenting the new env vars                                                                                                  |

In `openg2p-audit-manager/`:

| File                                      | Change                                                                                                                                                        |
| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `src/audit_manager/schema/cloud_event.py` | `Actor` model gains `extra="allow"` so emitter-supplied custom actor fields (e.g. `username`) flow through to `details.actor.*` without a schema change here. |

## Configuration

Six new environment variables (all prefixed with `REGISTRY_STAFF_PORTAL_API_`):

| Env var                                              | Default                              | Purpose                                                                                                                                          |
| ---------------------------------------------------- | ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `REGISTRY_STAFF_PORTAL_API_AUDIT_ENABLED`            | `false`                              | Master on/off switch. Must be `true` AND a URL must be set for emission to happen.                                                               |
| `REGISTRY_STAFF_PORTAL_API_AUDIT_MANAGER_URL`        | empty                                | Base URL of Audit Manager, e.g. `http://localhost:8002` or `http://audit-manager:80`.                                                            |
| `REGISTRY_STAFF_PORTAL_API_AUDIT_TIMEOUT_SECONDS`    | `2.0`                                | Timeout on each POST to Audit Manager. Bounded so a slow audit endpoint can't pile up.                                                           |
| `REGISTRY_STAFF_PORTAL_API_AUDIT_SOURCE`             | `/openg2p/registry-staff-portal-api` | CloudEvents `source` field. Override only if you run multiple staff-portal deployments.                                                          |
| `REGISTRY_STAFF_PORTAL_API_AUDIT_MODULE`             | `registry-staff-portal-api`          | Module name placed in `data.context.module`.                                                                                                     |
| `REGISTRY_STAFF_PORTAL_API_AUDIT_ANONYMOUS_FAILURES` | `true`                               | When `true`, also audit rejected anonymous calls (401/403). Set to `false` to revert to the original "audit only authenticated user calls" rule. |

**To disable auditing entirely:** set `AUDIT_ENABLED=false`, or omit `AUDIT_MANAGER_URL`. Either condition makes the middleware a no-op — there's no need to remove the middleware from `main.py`. The startup log will say `AuditMiddleware disabled (...). No-op.` so you can confirm.

**To disable only anonymous-failure auditing** (and keep authenticated auditing): set `AUDIT_ANONYMOUS_FAILURES=false`. Useful in environments where the service is exposed to bot/scanner traffic and you don't want the audit store to fill with rejected anonymous probes.

**To enable for local dev** (Audit Manager port-forwarded from cluster):

```bash
# In another terminal, port-forward the cluster's audit-manager service
kubectl -n trial port-forward svc/audit-manager 8002:80

# In your .env (or shell env)
REGISTRY_STAFF_PORTAL_API_AUDIT_ENABLED=true
REGISTRY_STAFF_PORTAL_API_AUDIT_MANAGER_URL=http://localhost:8002
```

Restart uvicorn and the startup log will show:

```
AuditMiddleware enabled — emitting to http://localhost:8002/v1/auditmanager/events
```

### Wiring through the registry Helm chart

You **do not** set `REGISTRY_STAFF_PORTAL_API_AUDIT_*` env vars by hand on each deployment. The registry's [base Helm chart](https://github.com/OpenG2P/openg2p-registry-gen2-deployment/blob/develop/charts/openg2p-registry/values.yaml) already plumbs them through three `global.audit*` values that flow into the staff-portal-api's `envVars` block:

```yaml
# charts/openg2p-registry/values.yaml — global section
global:
  # ...
  auditEnabled: false
  auditManagerUrl: 'http://audit-manager:80'
  auditAnonymousFailures: true

# charts/openg2p-registry/values.yaml — staffPortalApi.envVars block
staffPortalApi:
  envVars:
    # ...
    REGISTRY_STAFF_PORTAL_API_AUDIT_ENABLED:            '{{ .Values.global.auditEnabled }}'
    REGISTRY_STAFF_PORTAL_API_AUDIT_MANAGER_URL:        '{{ .Values.global.auditManagerUrl }}'
    REGISTRY_STAFF_PORTAL_API_AUDIT_ANONYMOUS_FAILURES: '{{ .Values.global.auditAnonymousFailures }}'
```

**To enable auditing for an environment**, add to your per-env values file (e.g. `values-trial.yaml`):

```yaml
global:
  auditEnabled: true
  auditManagerUrl: 'http://audit-manager:80'   # override only if cross-namespace
  # auditAnonymousFailures: false              # uncomment to suppress anon noise
```

…then `helm upgrade` the registry release. The Rancher UI also exposes these three under the **Audit Manager** group via [`questions.yaml`](https://github.com/OpenG2P/openg2p-registry-gen2-deployment/blob/develop/charts/openg2p-registry/questions.yaml), so operators can flip them without editing YAML.

**Cross-namespace deployment** — if audit-manager is not in the same namespace as the registry release, set the FQDN:

```yaml
global:
  auditManagerUrl: 'http://audit-manager.<audit-namespace>.svc.cluster.local:80'
```

For the full registry-side documentation (dependency table, version matrix, the 4.1.0 release entry that introduces this feature), see the [registry Helm chart 4.x doc — Audit Manager integration](/products/registry/registry/deployment/helm-chart-4.x.md#audit-manager-integration).

**Enabling without redeploying staff-portal-api code.** Because every audit env var is no-op-by-default, you can ship the chart change first (while the staff-portal-api image still lacks the `AuditMiddleware` — the unknown env vars are silently ignored by pydantic-settings' `extra="allow"`). When you later roll the new image with the middleware, the variables are already in place and emission turns on at restart.

## Verification — end-to-end

After enabling:

```bash
# 1. Hit any authenticated endpoint (token from Phase 6)
curl -i -X POST http://localhost:8001/registry-config/get_registry_configuration \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}'
```

The response is unchanged — same status code, same body, same latency.

```bash
# 2. Wait ~3 seconds for the Kafka → Postgres flush in Audit Manager
sleep 3

# 3. Query the cluster's audit Postgres for the row
kubectl -n trial exec -it commons-postgresql-0 -- \
  psql -U <audit_db_user> -d audit_manager -c \
  "SELECT id, type, actor_id, action, outcome, occurred_at
   FROM audit_events
   WHERE source = '/openg2p/registry-staff-portal-api'
   ORDER BY ingested_at DESC LIMIT 5;"
```

Expected: at least one new row with

* `type = org.openg2p.staff_portal.get_registry_configuration`
* `actor_id = <your admin user's sub>`
* `action = get`
* `outcome = failure` (because the empty body returned 400)

The DB user / DB name may differ depending on how the `postgres-init` chart provisioned them in your cluster — adjust the `psql` command accordingly. See the [Audit Manager deployment notes](/platform/platform-services/audit-manager/deployment.md) for naming.

## What "no-op" looks like

When auditing is off (default), the middleware adds essentially zero overhead per request:

1. One attribute access (`request.state.auth`)
2. One short-circuit return

No CloudEvent is built, no HTTP client is created, no async task is scheduled. A startup log line confirms the disabled state:

```
AuditMiddleware disabled (enabled=False, url=''). No-op.
```

## What we deliberately did NOT do (yet)

| Skipped feature                                         | Why                                                                                                                     | When to add it                                                                       |
| ------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| `data.resource` extraction                              | Most endpoints are RPC POSTs without clean entity URLs                                                                  | When we want investigators to filter by entity id                                    |
| Capture response body's error reason into `data.reason` | Reading the body in middleware needs care with streaming responses                                                      | When 4xx/5xx volume warrants quick triage by reason                                  |
| Local disk spool on emission failure                    | At-least-once is already provided by the Audit Manager itself; staff portal pod crashes mid-emit are rare and tolerable | If volumes/availability targets ever require zero loss on the producer side          |
| Sampling                                                | Audit volume from staff portal is small enough to capture every call                                                    | If a future caller is too chatty (high-traffic public API)                           |
| Promotion to a shared library                           | Keep the integration scoped to one service while we validate the shape                                                  | Once a second service wants the same middleware, lift it to `openg2p-fastapi-common` |

## Operational notes

* **Cold start cost:** the `httpx.AsyncClient` is lazy-created on the first emission, not at app import time. The first audited call sees a small extra latency (\~ms) for client setup; subsequent calls reuse the connection pool.
* **Restart safety:** all in-flight `asyncio.create_task(_emit(...))` calls are cancelled on shutdown. Up to a few hundred ms of tail emissions can be lost during a rolling restart. Acceptable per design (audits at this layer are best-effort; durability is provided by Audit Manager itself once the event reaches Kafka).
* **Audit Manager errors are logged at WARN.** A spike of WARN lines with `Audit emission failed` typically means: Audit Manager is down, unreachable, returning 503 backpressure, or the URL is wrong. None of these affect Staff Portal API responses.

## Related pages

* [Local Install — Staff Portal API](/platform/platform-services/audit-manager/integration-with-registry/local-install.md) — set up the service locally before enabling audit.
* [Audit Manager — Functional Specifications](/platform/platform-services/audit-manager/functional-specifications.md) — the CloudEvents schema and DB column mapping this middleware emits to.
* [Audit Manager — API Reference](/platform/platform-services/audit-manager/api-reference.md) — the HTTP contract the middleware POSTs to.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.openg2p.org/platform/platform-services/audit-manager/integration-with-registry/audit-middleware.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
