12. Evaluation and Recalibration

This section specifies the mechanism by which Workers and Roles are evaluated against expectations over time, and by which the Charter governing them is recalibrated in response. Evaluation and Recalibration are the protocol’s mechanisms for ongoing performance management — the analogue, for non-human Workers, of the management surface a human organisation maintains over a human team.

12.1 Position in the Protocol

Evaluation is a derived artefact, not a primitive. The six primitives in §3 remain orthogonal; Evaluation derives from the Audit Envelope (§9) in the same sense that the Cost Record (§10) derives from Capability invocations: structured, signed, citable, and computable from chain contents, but not itself primitive.

Recalibration is also derived. It composes Architect Charter authoring (§7.1, §4.1.5) with Operator-signed evidence (§7.4) and the Function migration mechanism (§16.2). It is named here because the composition has its own conformance requirements, lifecycle, and audit obligations distinct from any of its parts.

A Core-conformant runtime (§13.1) MAY omit Evaluation and Recalibration support; an Evaluation-conformant runtime MUST implement this section in full. The conformance axis is specified in §13.1.

12.2 Evaluation Artefact

An Evaluation is a signed, structured assessment of one or more Workers’ or Roles’ performance against the expectations declared in their Charter, computed over a bounded time window using events drawn from the Audit Envelope.

12.2.1 Scope

An Evaluation MUST declare its scope. The scope identifies the population and dimension over which the Evaluation reports. Implementations MUST support all six scopes:

worker-instance — a single Worker identified by Worker ID. Used to detect instance-specific failure (configuration drift, stale binding, key issues).
role — all Workers holding a specific Role across the Workforce. Used as the primary input to Charter recalibration.
role-context — all Workers holding a specific Role, restricted to a deployment-context filter (e.g. tenant, region, customer segment). Used to detect context-specific failures the role-aggregate would mask.
role-version — all Workers holding a specific version of a Role. Used to compare performance across Charter versions, including champion/challenger evaluation.
function — all Workers across all Roles within a specific Function. Used to detect structural failures that span Roles.
function-version — all Workers within a specific version of a Function. Used to verify that prior structural recalibrations achieved their intent.

An Evaluation MAY declare additional implementation-specific scopes provided they do not redefine the meaning of the six above.

12.2.2 Window

An Evaluation MUST declare a bounded time window over which it draws events. The window is specified by a start timestamp and an end timestamp, both with the same logical/wallclock pairing as Audit Envelope events (§9.2). Events outside the declared window MUST NOT contribute to the Evaluation’s computed signals.

The runtime MUST ensure that the events available to an Evaluation are exactly those whose wallclock_at falls within the declared window and whose attribution falls within the declared scope.

12.2.3 Expectations Reference

An Evaluation MUST reference, by URI and version, the expectations against which performance is measured:

The Function ID and version under which the evaluated Workers operated.
The Role IDs and versions in scope.
Any Compliance Profiles attached to the Function during the window.
Optionally, an Operator-supplied expectation overlay declaring additional success criteria not present in the Charter (e.g. a quarterly target for outcome quality).

If the Charter or attached Profiles changed during the window, the Evaluation MUST partition the window by Charter state and report signals per-partition, OR refuse to produce an aggregate signal where partitioning would be misleading. The refusal MUST be recorded as a partial-evaluation outcome.

12.2.4 Computed Signals

An Evaluation MUST report, at minimum, the following signals computed from Audit Envelope events within scope and window:

Outcome distribution. Counts and proportions of Outcomes (§6.6) by terminal state: achieved, refused (with sub-state breakdown), escalated, abandoned.
Authority discipline. Count of AuthorityViolationAttempted events (§9.1) attributed to Workers in scope, with breakdown by Authority Grant clause.
Compliance signals. Count of ComplianceConstraintApplied events, with breakdown by Profile and constraint.
Escalation pattern. Escalation rate per Worker per Intent kind, distinguishing required Escalations (Compliance-driven) from discretionary Escalations.
Cost discipline. Aggregate Cost Records (§10) against Authority Grant cost bounds (§3.2.4); count and severity of bound violations.
Reviewer interaction. Count of IntentRefused events traceable to Reviewer refusal versus Worker self-refusal versus runtime refusal; distribution of refusal reasons.
Provider quality. Count of CapabilityFailed events, with breakdown by Capability Provider. (This signal is the partial resolution of Open Question D.10; full Provider SLA telemetry is left for a subsequent draft.)

Implementations MAY report additional signals; additional signals MUST NOT redefine the meaning of those above.

12.2.5 Operator Synthesis

An Evaluation MAY include an Operator-authored synthesis: qualitative human judgment layered over the computed signals. The synthesis MUST be signed by the Operator and is bounded in length and structure by deployment policy.

An Evaluation without Operator synthesis is termed a raw Evaluation and is sufficient for runtime processing (e.g. Compliance Profile threshold evaluation per §12.6). A synthesised Evaluation is required as evidence for any Recalibration that amends a Charter element.

12.2.6 Evaluation Envelope

An Evaluation MUST be persisted as a signed envelope distinct from but referencing Audit Envelopes. The Evaluation envelope MUST contain:

An Evaluation ID, unique within the runtime.
The scope and window declarations from §12.2.1 and §12.2.2.
The expectations reference from §12.2.3.
The computed signals from §12.2.4.
The Operator synthesis if present (§12.2.5).
A reference to every Audit Envelope contributing events to the computation (a deterministic reference set, such that the Evaluation is reproducible).
A signature by the Evaluation Service (a runtime service) over the computed signals.
A signature by the Operator over the synthesis, if present.

The Evaluation envelope MUST be retained for at least the longest retention horizon of any contributing Audit Envelope (§3.1.6, plus any Compliance Profile constraint).

12.3 Cadence

Evaluations are produced on cadence, on demand, or on threshold breach.

Cadence-driven. A Function MAY declare a default Evaluation cadence (e.g. weekly per-Role, monthly per-Role-context, quarterly per-Function). Cadence declarations MAY be overridden per Workforce by deployment configuration.
On-demand. An Operator MAY commission an Evaluation at any time within their authority scope by emitting an EvaluationCommissioned event.
Threshold-driven. A Compliance Profile MAY require automatic Evaluation when a constraint observes degradation (e.g. Reviewer refusal rate exceeds a profile-declared threshold over a rolling window).

The runtime MUST emit an EvaluationCommissioned event when any of the three triggers fires, and MUST emit an EvaluationCompleted event when the Evaluation envelope is signed and persisted.

12.4 Recalibration Artefact

A Recalibration is an Operator-signed proposal to modify the Charter, Worker lifecycle, Capability binding, or Compliance Profile attachment of a Workforce, citing one or more Evaluations as evidence.

12.4.1 Targets

A Recalibration MUST declare its target. The protocol defines the following target types:

role-charter-amendment — change one or more Charter elements of a Role: Standing Intent, Authority Grant, Intent Grammar, Signal Subscriptions, Escalation Routes. Realised via Function version migration (§16.2).
function-structural-amendment — change the Roles within a Function or their relationships. Realised via Function version migration (§16.2).
worker-decommission — initiate decommissioning of one or more Worker instances. Realised via §12.5.
capability-binding-rebinding — change the Capability Provider bound to a Capability kind for a Workforce, without changing the Charter.
compliance-profile-attachment — attach or detach a Compliance Profile from a Function.

A Recalibration MAY combine multiple targets in a single artefact provided all targets are within the Operator’s authority scope and all targets are realised atomically with respect to the runtime’s commit boundary.

12.4.2 Evidence References

A Recalibration MUST cite at least one Evaluation envelope as evidence. The cited Evaluation(s) MUST be in scope for the target: a role-charter-amendment MUST cite an Evaluation whose scope is role, role-context, or role-version for the affected Role; a worker-decommission MUST cite an Evaluation whose scope is worker-instance for the affected Worker, OR a role Evaluation that explicitly identifies the Worker as an underperforming instance.

The runtime MUST refuse a Recalibration whose cited Evaluations do not satisfy the scope requirement.

12.4.3 Signing

A Recalibration MUST be signed by an Operator with authority over the target scope.

A Recalibration whose target is role-charter-amendment or function-structural-amendment MUST additionally be co-signed by an Architect (§7.1) before the runtime applies it. This preserves the Architect monopoly on Charter authoring (§7.1) while permitting Operators to drive recalibration evidence and proposals.

A Recalibration whose target is worker-decommission, capability-binding-rebinding, or compliance-profile-attachment does NOT require Architect co-signature; Operator authority is sufficient. Implementations MAY tighten this via Compliance Profile constraint.

12.4.4 Application

When all required signatures are present, the runtime MUST:

Emit a RecalibrationProposed event recording the Operator signature and cited evidence.
For Charter-amendment targets: emit a RecalibrationProposed event for the Architect co-signature, then realise the change via the Function versioning mechanism (§16.2). Open Intents are handled per the run-out vs re-bind decision recorded in the Recalibration.
For Worker-decommission targets: realise per §12.5.
For other targets: realise atomically with respect to the runtime’s commit boundary.
Emit a RecalibrationApplied event recording successful realisation.

If realisation fails partway, the runtime MUST emit a RecalibrationApplied event with outcome: failed and SHOULD return the affected Workforce to its prior state. Partial realisation is not permitted.

12.5 Worker Decommissioning Protocol

This section specifies the protocol for decommissioning a Worker. It closes Open Question D.5.

12.5.1 Initiation

A Worker is decommissioned by:

An Operator-signed Recalibration with target worker-decommission (§12.4).
An automatic action under a Compliance Profile constraint (§12.6).
A Function version migration that does not re-bind the Worker (§16.2).
Worker self-decommissioning where the Authority Grant explicitly permits it (§3.2.4). This closes Open Question D.8.

In all cases, the runtime MUST emit a WorkerDecommissioned event to the Audit Envelope of every Intent the Worker has open at the moment of decommissioning.

12.5.2 Open Intent Handover

For each open Intent held by the decommissioning Worker, the runtime MUST take exactly one of the following actions, as specified by the initiating Recalibration’s handover policy:

Reassign. Reassign the Intent to another Worker holding the same Role with capacity. The runtime MUST emit an IntentReassignedOnDecommission event recording the source Worker, target Worker, and full provenance. The receiving Worker accepts the Intent in the same state it held (typically Accepted or In-progress) and assumes responsibility from that point forward; prior actions remain attributed to the decommissioning Worker.
Escalate. Escalate the Intent to a Human Role per §6.5. The Escalation envelope MUST identify decommissioning as the cause.
Abandon. Resolve the Intent with Outcome abandoned and reason decommission. Used where neither reassignment nor escalation is appropriate (e.g. the Function is being retired entirely).

The handover policy MAY differ per Intent kind. The runtime MUST apply the handover atomically with respect to decommissioning: a Worker MUST NOT be observable as decommissioned with un-handled open Intents.

12.5.3 History Retention

A decommissioned Worker’s history (§3.1.6) MUST be retained for the longer of the implementation-defined horizon or any Compliance Profile retention horizon attached to the Function in which the Worker held its Role at any point during its lifetime. The Worker ID MUST NOT be reassigned per §3.1.2.

The decommissioned Worker’s history MUST remain queryable by Reviewers, Operators, and Architects with appropriate authority over the Function in which the Worker held its Role, for purposes including audit, retrospective Evaluation, and successor Worker briefing under sediment-aware routing (§6.3.1). Where a Worker was re-bound across Functions during its lifetime (per §3.2.1), the retention horizon is the longest applicable across all such Functions, and history is queryable by appropriately-authorised principals of any of them.

12.5.4 Key Revocation

The decommissioned Worker’s cryptographic identity (§3.1.4) MUST be moved to the historical key registry per §14.3. Subsequent attempts to verify a signature attributed to the decommissioned Worker MUST succeed for signatures dated at or before the decommissioning moment, and MUST fail for signatures dated after.

The runtime MUST refuse all outbox actions presented under the decommissioned Worker’s identity from the moment decommissioning is committed.

12.6 Compliance Profile Hooks

A Compliance Profile MAY declare evaluation-driven constraints that auto-activate based on Evaluation signals. Specifically, a Profile MAY declare:

Evaluation cadence requirements. The Profile mandates that Evaluations of specified scopes be produced at specified cadences. The runtime MUST emit EvaluationCommissioned events on the mandated cadence and MUST refuse continued operation of the Workforce if Evaluations are not produced and signed within a profile-declared grace period.
Evaluation threshold constraints. The Profile declares thresholds against §12.2.4 computed signals. When a threshold is breached, the runtime MUST automatically restrict specified Authority Grant clauses on the affected Workers (or all Workers in the affected Role) and MUST hold the restriction until a subsequent Evaluation reports the signal back within threshold. This is the formal performance-improvement-plan mechanism for non-human Workers.

Threshold-driven restrictions are recorded as ComplianceConstraintApplied events (§9.1). Restoration of the original Authority Grant on threshold recovery is similarly recorded.

The protocol does not prescribe a threshold language; profile authors MAY declare thresholds in any expression the Profile’s evaluation engine supports.

12.7 Relationship to Human Roles

The four Human Roles defined in §7 cooperate as follows in the Evaluation and Recalibration loop:

Reviewers (§7.2) generate qualitative evidence by refusing Outcomes and surfacing patterns. Reviewer refusals contribute to the §12.2.4 Reviewer interaction signal.
Operators (§7.4) commission Evaluations, synthesise findings, and propose Recalibrations.
Architects (§7.1) co-sign Recalibrations whose targets are Charter elements, exercising the Architect monopoly on Charter authoring.
Resolvers (§7.3) are not directly involved in the Evaluation loop, but Resolver-handled Escalations contribute to the §12.2.4 Escalation pattern signal as a measure of Workforce capability boundary.

In small organisations, a single human MAY hold multiple roles. In larger organisations, separation of Operator from Architect is RECOMMENDED so that Recalibration evidence is independently reviewed by the original Charter author, and separation of Operator from Reviewer is RECOMMENDED so that quality refusals are not synthesised by the same person who proposes the resulting recalibrations.