Authority & scope
This policy binds every engineer, engineering manager, Domain Owner, on-call rota member, and external contractor working on Fincra code, infrastructure, or production systems. It also binds Product, QA, and any operations team member whose decisions touch the engineering work-stream — sprint planning, deploy approvals, incident calls, change requests.
The policy operates as the contract between the OS and daily work. The Engineering OS defines the documents — the SDLC, QA Standards, SLO Implementation Guide, KPI Framework, and the rest. This policy makes those documents binding by tying each commitment to a mechanism (a CI gate, a PR-template clause, a Kaizen review, an Encoded Rule). Where this policy and an OS document agree, the OS document is canonical for detail. Where they disagree, this policy reflects intent and the discrepancy is resolved by ADR within five working days.
1.1Source documents
The clauses below derive from, and refer to, the following authoritative artefacts:
- Re-Architecture Plan
- The DDD re-carving (ARCH-001 Rev 0.2) covering Platform, GPS, Processing, OTC verticals, and the Formance migration. Defines the target bounded contexts and the eight migration steps.
- OS Doc 1
- Engineering Operating Structure Memo — Sequenced Hybrid framing, Interim Lead, four shifts.
- OS Doc 2
- Domain Ownership Register — D1–D9, primary + backup per domain.
- OS Doc 7
- Encoded Rules Register — Tech-NN rules with named enforcement.
- OS Doc 14
- Architecture Standards & ADR Framework — mandatory stack, ADR template, Forum charter.
- OS Doc 15
- SDLC Framework v2 — nine phases, per-layer coverage, CI gates.
- OS Doc 16
- QA & Quality Standards v2 — three environments, DoD, Hotfix, five Deployment Quality Gates.
- OS Doc 17
- Sprint Planning Standard v2 — Sprint Goal as gate, given/when/then ACs, grooming-before-planning.
- OS Doc 11
- SLO Implementation Guide v2 — 29 SLIs, error-budget policy.
- OS Doc 12
- Incident, On-Call & Rollback Doctrine.
1.2Hierarchy of authority
- Regulation, contractual obligation, and security mandate (PCI, KYC/AML, scheme rules) — supersede everything.
- Architecture Forum decisions recorded as ADRs.
- This policy.
- OS documents (Docs 1–25).
- Team-local conventions and runbooks.
Where any clause below conflicts with a higher tier, the higher tier wins and the clause shall be revised in the next quarterly review.
1.3Update mechanism
This policy is reviewed at the Architecture Forum every quarter. Changes are proposed via ADR, signed by the Engineering Lead, and counter-signed by the TAC during the CTO transition. Out-of-cycle changes are permitted only when an Encoded Rule (Tech-NN) is added in response to an incident — in that case, the policy is amended in the same ADR that records the rule.
Architectural ground rules
Four rules govern the shape of the system. They are derived from the inverse-Conway target in the re-architecture plan and they are non-negotiable. Every code change, every new service, every cross-team integration is checked against them. A change that requires breaking one of these rules is a change that requires an ADR.
gps-collections, processing-acquiring, or otc-dealing, the logic belongs either inside that Vertical or inside a new Platform capability — never in a back-pointer. Reviewers reject such PRs without further argument. platform-payment command or a published event on the bus. Direct vertical-to-vertical calls are how the distributed monolith returns. platform-integrations. No Vertical, and no other Platform service, holds provider SDKs, credentials, webhook endpoints, or signed URLs. Translation, retry, idempotency, and webhook ingestion live in the adapter — and only there. 2.1Detection & enforcement
The four rules are detected at three levels:
- Mechanical (CI). Module-boundary linting (architecture tests) blocks pull requests that import across forbidden boundaries — e.g. a Vertical importing a provider SDK, a Platform module importing a Vertical, a non-Wallet service importing the Formance client.
- Review. The PR template includes a "boundary check" section. Reviewer must affirm no rule is broken before approving.
- Forum. Cross-context architecture decisions are reviewed monthly at the Architecture Forum; rule-breaking patterns spotted in code, even if they slipped through CI, are surfaced and an Encoded Rule is added.
2.2The shape this enforces
Three vertical streams (GPS, Processing, OTC) consume one Platform layer. Platform exists to make their work cheap. The Platform layer comprises ten bounded contexts plus six shared/generic services; the Verticals comprise nine bounded contexts plus their portals. The full map is held in the Architecture Map of the Re-Architecture Plan; deviations from it require ADR.
2.3Naming & repository discipline
Target bounded contexts use the prefix conventions platform-*, gps-*, processing-*, otc-*, shared-*. New repositories created during the migration shall follow these prefixes. Existing repositories retain their names until decommissioned, but their target context allocation is recorded in OS Doc 2 and in the repo's README.md within thirty days of this policy taking effect.
Bounded context governance
Every service owns to one bounded context. Every bounded context has one Domain Owner (primary) and one named backup. This is recorded in OS Doc 2 (Domain Ownership Register) and re-confirmed quarterly.
3.1What a Domain Owner owns
The Domain Owner is the single human accountable end-to-end for their domain. The accountabilities, in order of weight, are:
- Service health. SLOs met; error budgets respected; incidents owned through to post-mortem and Encoded Rule.
- Boundary integrity. No drift from the target architecture without ADR. Other contexts do not absorb logic that belongs in this domain; this domain does not absorb logic that belongs elsewhere.
- Standards within the domain. Coding standards (§5), test discipline (§5.4), runbook freshness, dependency hygiene.
- People. Hiring rubric (OS Doc 4), deputy training, succession.
- Roadmap input. The domain's contribution to the bi-weekly Roadmap Triage; intake-gate scoring of work that touches the domain.
3.2Decision rights
The Domain Owner may decide the following without escalation:
- Internal architecture of the domain — module boundaries within the bounded context, library choice within the mandated stack, internal data model.
- Sprint backlog priority within the domain's allocation.
- Tactical incident response for severities P2 and below.
- Sign-off on PRs touching only the domain's services.
The Domain Owner shall escalate (per OS Doc 8) for the following:
- Any change that affects another domain's published events, API surface, or data contracts.
- Architecture decisions that establish or modify a cross-context port (adapter interface, event schema, ledger template).
- Any P0/P1 incident.
- Any deviation from the target architecture (§2) — even when the deviation is inside the domain — if other contexts depend on the affected interface.
3.3Cross-context changes
A change is "cross-context" if it modifies any of: a published event schema, an external API surface, a Numscript template, a Chart-of-Accounts mapping, or an adapter port (PayinAdapter, PayoutAdapter, FxAdapter, KycAdapter). Such changes must:
- Be recorded in an ADR following the OS Doc 14 template (Context · Options Considered · Choice · Rationale · Consequences · Status).
- Carry sign-off from both Domain Owners (the writing context and the consuming context). For Ledger / COA changes, CFO sign-off is also required when the change affects GL mappings.
- Pass review at the next Architecture Forum (or the next ad-hoc Forum if the change is urgent).
- Use additive, backward-compatible schema evolution by default. Breaking changes require a deprecation window of at least one full sprint and explicit consumer migration plans.
3.4The Architecture Forum
The Forum convenes monthly. Standing attendees: Engineering Lead (chair), all D1–D9 Domain Owners, SRE Lead, AI Lead. Standing agenda:
- ADRs proposed since the last Forum — accept, defer, reject, amend.
- Cross-context patterns observed in PRs — surfaced as candidate Encoded Rules.
- Architecture drift report — divergence from target architecture by domain.
- Migration step status (§12) — gating decisions for the next step.
Decisions are by consensus where possible; the Engineering Lead breaks ties. Forum decisions land in OS Doc 14 within forty-eight hours; any rules they generate land in OS Doc 7.
SDLC phase gates
All work follows the Fincra SDLC v2 (OS Doc 15). The nine phases are preserved from v1; what v2 adds — and what this policy enforces — is mechanical phase-exit gating. A work item does not leave a phase until its exit criteria are demonstrably met. CI is the demonstration.
4.1The nine phases & their exit gates
| # | Phase | Exit gate (what must be true to advance) |
|---|---|---|
| 1 | Initiation | Work item passes the five-question Roadmap Triage intake gate (OS Doc 20). Domain Owner signs the intake row. |
| 2 | Planning | NFRs captured: latency, throughput, durability, security, multi-tenancy posture. Affected bounded contexts named. ADR raised if §3.3 cross-context conditions apply. |
| 3 | Design | API/event-schema diff documented. Numscript template change (if any) drafted and reviewed by Finance. Architecture Forum signs off if cross-context. |
| 4 | Implementation | All ACs in given/when/then format (§6.2). Code passes static analysis, complexity, and method-length checks (§5). Per-layer coverage thresholds met (§5.4). |
| 5 | Testing | Unit, integration, and contract tests green. Sandbox validation by QA. NFR tests run for the categories declared in Phase 2. |
| 6 | Deployment | Staging → Sandbox promotion green. Five Deployment Quality Gates (§8.4) signed. Error-budget posture acceptable (§14.2). |
| 7 | Operation | SLI dashboards reflect the new surface. Alerts route to the Domain Owner + on-call. Runbook updated. |
| 8 | Maintenance | Post-release review at the Daily Kaizen day-7 slot. Defect rate measured against baseline; trend posted to the KPI dashboard. |
| 9 | Disposal | Decommission ADR. Data retention policy applied. Endpoints permanently retired in platform-config. |
4.2The CI gate as enforcement
Phase exits 4–6 are enforced by CI — not by reviewer goodwill. The CI pipeline blocks merges and deploys that do not satisfy the gate. The pipeline jobs are:
- lint & static analysis — language linter, security linter (Semgrep ruleset), dependency vulnerability scan.
- complexity — cyclomatic ≤ 15 per method (§5.1); method length ≤ 30 lines (§5.2). Hard-fail.
- boundary check — architecture tests against the four ground rules (§2.1).
- unit tests — coverage ≥ the per-layer floor (§5.4). Hard-fail below floor.
- integration tests — coverage ≥ floor; provider adapters tested with recorded fixtures.
- contract tests — every published event and every adapter port covered. 100% of boundary contracts.
- build & package — image scan; SBOM emitted.
- deploy gate — dependent on Staging→Sandbox green, error-budget posture, and approver sign-off.
Any job marked hard-fail blocks the merge. Soft-fail jobs (e.g. dependency scan flagging a low-severity advisory) raise warnings that the reviewer must explicitly acknowledge in the PR.
4.3Traceability — Sprint Goal → AC → DoD
v2 requires traceability from the Sprint Goal (Phase 2) through Phase 5 ACs to the Definition of Done (Phase 5/6). The tracker (Linear/Jira) carries this chain:
- Sprint Goal field (mandatory; see §6.1) — populated before the sprint opens.
- Story title + given/when/then ACs in the description.
- PR description references the story; PR template enforces the DoD checklist (§7).
- Deploy event in the pipeline carries the story ID; the SLI dashboard slice reflects the change.
4.4AI-SDLC tooling
The OS commits to AI as force multiplier (OS Doc 1; OS Doc 19). Adoption is mandatory but bounded:
- AI-generated tests are encouraged for closing the coverage gap from 15% to 60% baseline (per the Index's July 31 commitment). They are still subject to the per-layer floor and reviewer judgement — coverage from generated tests must be meaningful, not nominal.
- AI PR-review runs on 100% of merges. It is advisory, not gating. A human reviewer still approves; the AI review is one input.
- AI-assisted code generation is permitted, but the engineer who merges remains accountable for every line. Provenance is not a defence.
- The World Model (D9) generates the Daily Kaizen pre-read. Its output is reviewed, not blindly accepted.
Coding standards
Coding standards are not literary preferences; they are the shape that lets the SDLC gates work mechanically. The thresholds below are inherited from the existing SDLC document and are enforced by CI in v2.
5.1Cyclomatic complexity
Where existing code already exceeds the threshold, the change shall not increase complexity further; refactors that reduce it are encouraged. Documented exemptions exist only for state-machine driver functions and parser/dispatch tables, and only when the alternative is meaningfully worse — exemptions are recorded in the file with a comment naming the ADR or design rationale.
5.2Method length
5.3Naming & structure
- Bounded context names match the prefix conventions in §2.3.
- Aggregate roots, value objects, and domain events use the names declared in the bounded context's ubiquitous-language register (held in the repo's
DOMAIN.md). - Adapter classes/modules end in
Adapter; ports end inPort; service interfaces end inService. - Numscript templates live in
ledger/templates/*.numand are versioned with semver.
5.4Test coverage — per-layer floors
v2 restates the existing 75–80% target as a per-layer floor rather than a single repo-level number. The floor in this policy is what CI gates against:
| Layer | Floor | What it covers |
|---|---|---|
| Unit | ≥ 80% | Domain model, pure functions, value-object invariants, adapter translation logic. |
| Integration | ≥ 60% | Service-to-database, service-to-bus, service-to-Formance, adapter-to-provider with recorded fixtures. |
| Contract | = 100% | Every published event schema. Every adapter port (PayinAdapter, PayoutAdapter, FxAdapter, KycAdapter). Every external API surface. |
| End-to-end | Smoke pack | Critical-path flow per Vertical, run nightly against Sandbox. |
Coverage is measured by line for unit and integration; by surface (every event, every endpoint, every port) for contract. AI-generated tests count toward the floor only when they exercise meaningful behaviour — hash-of-input assertions and trivially-true expectations are rejected at review.
5.5Idempotency & safety
- Every Platform write endpoint must require an idempotency key. Stored in
platform-paymentfor payments, inplatform-walletfor ledger-affecting calls, in adapters for outbound provider calls. - Every outbound provider call must carry an idempotency key into the provider when the provider supports it; when it does not, the adapter shall persist a deduplication record before invocation.
- Every Formance transaction must carry an
idempotency_keyin metadata as defence-in-depth. - Compensating transactions are explicit. There is no "undo by writing the inverse"; reversals are first-class and lifecycle-tracked.
5.6Mandatory stack
The mandatory stack is held in OS Doc 14 (Architecture Standards). Deviations require ADR. Library choice within a mandated language is the Domain Owner's prerogative subject to the dependency-vulnerability and licensing checks in CI.
Sprint planning discipline
The Sprint Planning Standard v2 (OS Doc 17) preserves the existing template's substance — Sprint Goal, given/when/then ACs, backlog grooming — and adds tracker-level enforcement so the substance becomes binding rather than aspirational.
6.1Sprint Goal as gate
The Sprint Goal is one or two sentences. Examples from the existing template: "Integrate the Front-end with the CSV reconciliation upload API"; "Implement the updated reconciliation Figma designs."
6.2Acceptance criteria — given / when / then
Every story carries acceptance criteria in the given/when/then format. ACs are testable, clear, and written from the user's perspective. Stories without ACs in the agreed format shall not be assigned to an engineer; the tracker enforces this.
The four traits an AC must demonstrate (per the template):
- Testable — pass/fail or yes/no.
- Clear and concise — no ambiguity.
- Understood by everyone — Product, Engineering, QA all read it the same way.
- From the user perspective — written in the customer's frame.
6.3Grooming closes before planning starts
Backlog grooming and sprint planning are separate meetings. Grooming closes before planning starts; planning is not a vehicle for grooming. The grooming activities (per the template):
- Add user stories reflecting newly discovered insights.
- Break down broad stories into smaller items.
- Reorder stories by priority.
- Define stories clearly to avoid black-box communication.
- Assign or re-assign story points.
- Identify roadblocks and minimise risks.
If grooming is incomplete the Friday before planning, planning is rescheduled or shortened — never lengthened with grooming work.
6.4Demo & retrospective cadence
- Sprint demo at end of sprint — visible to Product, Operations, and the wider engineering org.
- Retrospective the same day or next working day — actions tracked in the Daily Kaizen until closed.
- Sprint Goal achievement scored at retro: did the sprint deliver on the goal, in part, or not at all? Posted to the KPI dashboard.
6.5Allocation between feature work, migration, and operational debt
To prevent migration work being permanently squeezed out by feature pressure, every Vertical shall reserve a minimum allocation of sprint capacity for:
- Migration (§12) — at least 20% while the migration programme is active.
- Operational debt & encoded-rule remediation (§13) — at least 15%.
- SLO/error-budget remediation when the relevant budget is below 25% — capacity automatically reallocated until restored.
Reduction below these floors requires Engineering Lead approval recorded against the sprint.
Code review & Definition of Done
The QA & Quality Standards v2 (OS Doc 16) preserves Omotoyosi's three-environment model, the Test Data & Credential Policy, the Hotfix Workflow, and the existing Definition of Done. v2 makes the DoD binding by embedding it as an inline PR-template checklist that the reviewer cannot bypass.
7.1The PR template
Every PR uses the standard template. The template carries:
- Story link (with given/when/then ACs).
- Bounded context(s) affected (with the four-rule boundary check).
- NFR impact statement (latency, durability, security, multi-tenancy).
- The DoD checklist (§7.2).
- Rollback plan — explicit, tested, named.
- SLI impact — what dashboards/alerts change as a result of this merge.
7.2The Definition of Done — checklist
- All acceptance criteria validated against the deployed Sandbox build.
- Unit, integration, and contract tests pass; per-layer coverage floors met.
- NFRs declared in Phase 2 are tested; results posted to the PR.
- Documentation updated — repo
README.md,DOMAIN.mdif ubiquitous-language changed, runbook if alerts changed. - Peer review approval (≥ one engineer not the author; ≥ two for cross-context changes).
- QA sign-off recorded.
- Validated in Sandbox.
- UAT completed where applicable (merchant-facing surfaces).
- Idempotency verified (§5.5) for any new write endpoint.
- Boundary check passed (§2.1) — no rule broken.
7.3The two-eyes rule and its variants
One reviewer not the author is the floor. Cross-context changes (§3.3) require two — one within the writing context, one within the consuming context. PRs touching the Ledger / Wallet / Payment kernel require two reviewers including either the Domain Owner or the named backup. PRs to platform-card (PCI scope) require additional sign-off per the security review process.
7.4QA as a gate, not advisory
QA sign-off is required for promotion to Sandbox and to Production. It is not a courtesy step. The QA team's denial blocks the deploy; their approval is recorded in the deploy event for traceability. Hotfix path (§8.3) carries an abridged QA requirement appropriate to severity.
7.5Engineering ↔ Business interface
Engineering does not field user-facing tickets directly. The four-tier escalation (OS Doc 18) applies: User → Support → Ops/Back-office → Engineering. Engineering's obligation is to make Tier 1 and Tier 2 resolvable without engineering — through admin tooling, runbooks, and self-service capability. Tickets that arrive at Tier 3 carrying ops-resolvable causes are bounced back with a runbook reference, not silently absorbed.
Deployment & environments
8.1The three environments
QA Standards v2 preserves the three-environment model:
| Environment | Purpose | Promotion path |
|---|---|---|
| Staging | Engineering's own validation; experimental builds; integration with stub providers. | Open to engineering merges; CI green required. |
| Sandbox | External-facing test environment; merchant integrations; full provider sandboxes. | Promotion from Staging only after CI green and QA sign-off. |
| Production | Live customer traffic. | Promotion from Sandbox only after the five Deployment Quality Gates. |
8.2The deploy pipeline guard
8.3Hotfix path
The Hotfix Workflow from QA v1 is preserved. Hotfix is permitted for:
- P0 incidents (customer-impacting outage; data loss risk; regulatory breach).
- P1 incidents where standard pipeline timing would extend customer impact materially.
Hotfix requirements:
- Incident commander declares hotfix authorisation.
- Abridged review: one reviewer + Domain Owner (or backup) + on-call SRE.
- Sandbox validation runs but may be parallel to deploy when severity warrants.
- Post-deploy: full DoD reconstructed within 24 hours; ADR recorded; encoded rule (Tech-NN) added if the root cause is reusable knowledge.
8.4The five Deployment Quality Gates
Inherited from QA v1, mechanised in v2. Promotion to Production requires:
- CI green at the candidate build. Unit, integration, contract, security, and dependency-vulnerability checks all pass.
- Sandbox validation by QA against the AC set for every story in the release.
- NFR validation for the categories declared in Phase 2 — performance test, security scan, scalability check as relevant.
- Error-budget posture acceptable per §14.2 — no service in the release is below 50% budget burn.
- Deploy authoriser sign-off — Engineering Lead or designated deputy for releases above a threshold; Domain Owner for routine releases.
8.5Rollback discipline
Every deploy carries a tested rollback. "Rollback" means a return to the previous known-good state with measurable verification, not just a redeploy of an older tag. Database migrations are forward-only by convention — a rollback for a schema change requires a forward-compatible migration written ahead of the deploy. The rollback procedure is recorded in the PR (§7.1) and rehearsed in the Sandbox before Production promotion.
8.6Test data & credentials
The Test Data & Credential Policy from QA v1 is preserved. Production credentials never appear in non-Production environments. Test data is synthetic or de-identified; PII never crosses environment boundaries. Provider sandbox credentials are stored in the Sandbox secret store; rotation cadence per the policy.
Cross-cutting decisions
Six decisions cross every bounded context. They are pinned in the re-architecture plan and binding under this policy. Each is a constraint that, once relaxed, cannot easily be re-imposed.
PaymentSettled, WalletDebited, KYCApproved. Verticals subscribe; Platform publishes. Verticals publishing events for other Verticals is permitted only via Platform-hosted channels. idempotency_key in metadata for defence-in-depth. Non-negotiable; the cost of failing this is duplicate value movement. platform-config. Tenant-aware logic embedded inside Verticals reading from inconsistent sources is non-compliant. platform-card and processing-acquiring are in PCI scope. Everything else handles tokens, never PANs. platform-card is stood up greenfield specifically to keep the audit boundary tight from day one. shared-utils contains only true generics — HTTP clients, error types, date helpers. Domain logic in shared libs is the #1 path back to a distributed monolith. The fincra-core repository is audited for hidden domain logic; anything domain-shaped is moved into the relevant context. Provider integration & the ACL law
The Anti-Corruption Layer is the most violated boundary in pre-rearchitecture Fincra and the cheapest one to police. Every external provider — Wema, Alpay, Nsano, BluPay, Choice, Tembo+, Airtel, Digicash, Visa, Mastercard, Verve — is reachable from Fincra code only through platform-integrations.
10.1The four ports
Provider integrations sit behind exactly four stable ports. New provider categories require an ADR to introduce a fifth port; this is intentionally hard.
| Port | Responsibility | Example providers |
|---|---|---|
| PayinAdapter | Inbound funds — VA credit, mobile-money credit, bank transfer credit. | Wema, Nsano, BluPay (pay-in side), Choice |
| PayoutAdapter | Outbound funds — mobile-money debit, bank account credit. | Alpay, Tembo+, Airtel, Digicash |
| FxAdapter | Quote retrieval and FX execution against external counterparties. | TradePass, XTransfer, TTS |
| KycAdapter | KYC checks, document verification, watchlist screening. | (per fincra-kyc-integration shims) |
Card scheme integrations (Visa, Mastercard, Verve) sit behind platform-card's scheme adapters — they are part of platform-integrations only insofar as they consume the same ACL pattern.
10.2What an adapter contains — and what it does not
platform-payment and platform-switching. Business logic in an adapter is rejected at review. An adapter contains:
- Translation between Fincra's canonical message and the provider's wire format.
- Authentication, signing, and credential handling.
- Idempotency key propagation (and local deduplication where the provider does not honour keys).
- Retry logic appropriate to the transport (network errors, 5xx) — not appropriate to the business outcome.
- Webhook ingestion: signature verification, deduplication, normalisation to a Fincra event.
An adapter does not contain:
- Routing or fallback to a different provider — that is
platform-switching. - Any decision based on the customer, merchant, corridor, or amount — those decisions are made before the call.
- Ledger postings or balance checks — those happen elsewhere in Platform.
- Any reference to another adapter — adapters are leaves, not branches.
10.3Webhook ingestion
Webhooks are received only by adapter endpoints in platform-integrations. Each webhook receiver:
- Verifies the signature against the provider's published key.
- Records the raw payload to durable storage before any processing.
- Deduplicates against the idempotency key (or a synthetic key derived from the payload when absent).
- Translates to a Fincra-canonical event and publishes to the bus.
- Returns the provider's expected acknowledgement only after the event is durably published.
10.4Adding a new provider — runbook
- Open ADR naming the provider, the corridor(s), the port (one of the four), and the SLA.
- Implement the adapter behind the port. Recorded fixtures of provider responses populate integration tests.
- Add the provider to
platform-configwith corridor, fee, and capacity entries. Configuration emits a versioned event; consumers cache. - Add scoring inputs to
platform-switching(success-rate metric, cost data) — switching auto-considers once the inputs land. - Run the new provider in shadow mode (production traffic, decisions logged but not executed) for at least seven days.
- Promote to live routing after Domain Owner sign-off and switching score threshold.
Ledger authority & Formance discipline
The Ledger is the most consequential service in the system. Errors here are not bugs; they are reconciliation failures with regulatory consequence. The Formance migration (steps 04–05 of §12) makes the Ledger a vendor product; what stays Fincra is the Wallet/Ledger Service, the Chart of Accounts mapping, and the Numscript template library.
11.1Single Formance caller
platform-wallet from importing the Formance SDK. 11.2The four ledger instances
Per the Formance migration spec, four Formance instances are deployed, each isolating a product line's accounting. Cross-instance entries are explicit and brokered by the Wallet/Ledger Service.
| Instance | Scope |
|---|---|
processing | Acquiring, Checkout, Card flows. |
gps | Collections, Disbursements, Settlements. |
otc | Dealing desk, FX, SSA corridors, stablecoin ramps. Isolated from product P&L. |
internal | Treasury sweep, internal transfers, suspense and reconciliation accounts. |
11.3Numscript templates
Templates are infrastructure detail, not domain. They are versioned in Git under ledger/templates/*.num and generated from the canonical inputs:
- Chart of Accounts — CSV-driven, Git-versioned.
- GL Mappings — corridor × instrument × direction → account namespace.
- Template catalog —
payin_collection.num,payout_disbursement.num,fx_conversion.num,stablecoin_onramp.num,stablecoin_offramp.num,internal_transfer.num.
New templates and template changes follow §3.3 (cross-context change). CFO sign-off is required when GL mappings are affected.
11.4Defence-in-depth
Even with Formance providing immutable double-entry, the Wallet/Ledger Service retains:
- Idempotency manager — local deduplication before the Formance call.
- Pre-Formance balance verification — confirm the source has the funds before the post.
- Compensating-transaction support — explicit reversal flows; no "negate by inverse".
- CDC stream — Formance changes propagated to read-models for reporting and downstream subscribers.
11.5COA & template ownership
Pending the resolution of the open question recorded in §17, the Numscript template library is jointly owned by Engineering (the writers) and Finance (the policy authors). Engineering writes templates; Finance signs off on the postings they generate. The CSV-driven generation pipeline produces auditable diffs; every template version is tied to a Finance-signed PR.
Migration programme governance
The eight-step migration sequence in the re-architecture plan is the path from today's repos to the target architecture. Each step is independently shippable and reversible. This policy commits to the order — lowest risk first, hardest decompositions deferred until the supporting Platform pieces exist.
12.1The eight steps
| # | Step | Risk | Effort |
|---|---|---|---|
| 01 | Carve out platform-config from corridor-management. | Low | 1 quarter |
| 02 | Carve out platform-integrations properly; define the four ports; cut gps-disbursements over first. | Medium | 1–2 quarters |
| 03 | Productize fincra-switching-engine → platform-switching; tighten the RouteDecision contract. | Low | 1 quarter |
| 04 | Deploy Formance + Chart-of-Accounts migration → platform-ledger. Phase 1+2 of the Formance spec. | Medium | ~4 weeks |
| 05 | Refactor fincra-wallets → Wallet/Ledger Service over Formance. Phase 3+4+5 of the Formance spec. | Medium | ~7 weeks |
| 06 | Stand up platform-payment as orchestration kernel. | High | 2–3 quarters |
| 07 | Decompose checkout-core into processing-checkout, processing-acquiring, and platform-payment. | High | 2–3 quarters |
| 08 | Stand up platform-card for Processing, greenfield. | High | 2–3 quarters |
12.2Step gating
A migration step does not start until the previous step is in production and stable for at least one full sprint. "Stable" means: SLOs met; no regression incidents attributable to the migration; downstream consumers report no contract issues. The Architecture Forum confirms stability and authorises the next step.
12.3Reversibility
12.4Capacity reservation
While the migration programme is active (steps 01–08 not all complete), every Vertical reserves at least 20% of sprint capacity for migration work (§6.5). This is not negotiable except by Engineering Lead approval recorded against the sprint and reviewed at the next Forum.
12.5Step-specific notes
- Step 01 (Config) — establishes the pattern. The published-language convention learned here is the template for steps 02 and 09.
- Step 02 (Integrations) —
gps-disbursementsis the first cutover because its provider mesh is well-understood and its blast radius is bounded. The contract that emerges becomes the published language for the other Verticals' cutovers. - Steps 04–05 (Formance) — together about eleven weeks per the migration spec. These are the load-bearing changes; they unblock everything in steps 06–08 that depends on a stable ledger boundary.
- Step 06 (Payment) — greenfield, no saga skeleton to lift; the largest pure-greenfield work in the programme. Plan as a quarterly programme, not a sprint.
- Step 07 (Checkout decomposition) — the hardest. Four bounded contexts in one repo. Plan as a quarterly programme; expect surprises.
- Step 08 (Card) — done greenfield specifically to keep PCI scope tight from day one. Issuing future-proofs the design.
Incident discipline & encoded rules
Every incident becomes either a permanent rule or a deliberate non-rule. The Encoded Rules Register (OS Doc 7) is the durable memory of what the system has learned. The philosophy — Kintsugi — is that breakage, repaired well, becomes part of the structure.
13.1Severity classification
| Severity | Definition | Response |
|---|---|---|
| P0 | Customer-impacting outage; data loss risk; regulatory breach; payments failing at scale. | All hands; Engineering Lead notified within 15 min; TAC notified within 60 min; status-page comms within 30 min. |
| P1 | Material customer impact; SLO breach; partial outage of a Tier 0/1 service. | Domain Owner + on-call; Engineering Lead notified within 60 min; status-page comms within 90 min. |
| P2 | Limited impact; SLO at risk; degradation a subset of users perceive. | Domain Owner + on-call; resolved or scheduled within 24 hours. |
| P3 | Internal-only; no customer impact; tooling regression; alert noise. | Backlog; sprint scheduling. |
| P4 | Cosmetic; documentation; non-urgent improvement. | Backlog. |
13.2The post-mortem
Every P0 and P1 incident produces a post-mortem within five working days. The post-mortem follows OS Doc 12's template and covers:
- Timeline — detection, declaration, mitigation, resolution.
- Customer impact — who, what, how much, for how long.
- Root cause analysis — five-whys minimum; technical and organisational factors.
- What worked.
- What did not.
- Encoded Rule proposal — the durable lesson.
- Action items with named owners and dates.
Post-mortems are blameless in framing, specific in causation. The point is to fix the system, not to score the human.
13.3The Encoded Rule
The post-mortem proposes a Tech-NN Encoded Rule. The rule is reviewed at the next Daily Kaizen and, if accepted, recorded in OS Doc 7 with:
- The rule statement (one sentence).
- The incident that produced it (link to post-mortem).
- Named enforcement mechanism — a CI check, a PR-template clause, an alert, a runbook step. A rule without an enforcement mechanism is rejected.
- Owner — the Domain Owner accountable for keeping the enforcement live.
Some rules amend this policy directly (§1.3). The amendment is recorded in the same ADR as the rule.
13.4Customer-detected vs internally-detected
The diagnostic noted that customer-detected outages routinely exceed internally-detected. Closing this gap is a tracked KPI. Every customer-detected P0/P1 generates a question at the post-mortem: "what alert should have caught this first?" — and the answer becomes a Tech-NN rule with a named SLI/alert.
13.5Game-Day drills
Quarterly Game-Day drills test recovery without the system's owner present. An engineer who has never operated the service performs a cold-start recovery from a Tier 0/1 service. The drill tests Tech-06 (the three-engineer recovery rule) and the freshness of runbooks. Results are reported to TAC and the Architecture Forum.
SLOs & error-budget policy
The 29 SLIs defined in the SLO Implementation Guide v2 (OS Doc 11) are the authoritative reliability measure for engineering. Every SLI is live in New Relic; every alert routes to a Domain Owner and an on-call rota.
14.1Service tiers and SLO classes
Every named service is classified in the Service Tier & SLO Sheet (OS Doc 10):
| Tier | Class of service | RTO / RPO |
|---|---|---|
| 0 | Critical revenue / regulatory; outage materially impacts the business. | RTO 4h · RPO 15min |
| 1 | Core product surface; outage impacts customers but not regulatory compliance. | RTO 4h · RPO 15min |
| 2 | Important but not critical; degradation tolerable for hours. | RTO 24h · RPO 1h |
| 3 | Internal/operational; degradation tolerable for a working day. | RTO best-effort |
The Wallet/Ledger Service, platform-payment, platform-integrations, and the Vertical Pay-in / Pay-out paths are Tier 0. Misclassification is an ADR-worthy decision.
14.2Error-budget policy
v2 introduces a binding error-budget policy. The budget is the inverse of the SLO — at 99.9% SLO, the budget is 0.1% of the period.
| Burn | Action |
|---|---|
| ≥ 25% | Weekly review by Domain Owner + SRE Lead; remediation actions added to next sprint at the §6.5 floor or above. |
| ≥ 50% | Deploy-freeze for the affected service. Only fixes for the burn cause may deploy. Freeze lifted only after burn is below 50% and trending down. |
| ≥ 75% | Engineering Lead intervention. Capacity reallocated; sprint commitments may be deferred. |
| = 100% | Reliability incident declared independently of any single outage; full post-mortem. |
14.3Alert routing
Every alert lands at a named human, not a shared channel:
- Primary: Domain Owner of the affected service.
- Secondary: on-call SRE per OS Doc 13's rota.
- Tertiary: backup Domain Owner.
Alerts that fire without anyone reading them are noise; they are reviewed monthly at the SLO review and tuned, suppressed, or escalated. An alert that fires more than twice without resolution becomes either a Tech-NN rule with a fix, or a deliberate decision to suppress.
14.4Monthly SLO review
The Engineering Lead chairs a monthly SLO review with Domain Owners and the SRE Lead. Standing agenda: services in burn; alert noise; SLI gaps; D5 / D9 SLI additions per v2. Outputs feed into the KPI Framework dashboard (OS Doc 9).
Enforcement mechanisms
A policy without enforcement is fiction. The OS does not write fictional standards. Every clause above ties to one or more of the five enforcement layers below; clauses that lack a layer are flagged for revision.
15.1Mechanical — CI
The CI pipeline enforces:
- Boundary checks (§2.1) — module imports, no Formance client outside Wallet, no provider SDK outside
platform-integrations. - Coding-standard floors (§5) — cyclomatic, method length, per-layer coverage, contract coverage.
- Phase exits (§4.2).
- Dependency-vulnerability and licensing checks.
- Deploy gate (§8.2) — Staging → Sandbox → Production sequence.
15.2Procedural — PR template & trackers
- PR template inline DoD (§7.2) — reviewer cannot approve unchecked items.
- Tracker enforcement of Sprint Goal (§6.1) and AC format (§6.2).
- ADR template (OS Doc 14) for every cross-context change (§3.3).
15.3Continuous — the Daily Kaizen
The Daily Kaizen (OS Doc 5) is the system's heartbeat. Thirty minutes, six blocks, chaired by the Engineering Manager. Standing items relevant to this policy:
- SLO and error-budget posture (§14).
- Incident review and Encoded Rule candidates (§13).
- Migration step status (§12).
- Boundary-check failures and adapter-rule violations from the prior day's PRs (§2, §10).
- Countermeasure tracking — actions from previous Kaizens.
15.4Periodic — Forums & reviews
- Architecture Forum — monthly. ADRs, drift report, migration gating.
- SLO review — monthly. Alert tuning, SLI gaps, error-budget posture by service.
- Failure Modes review — quarterly (OS Doc 23). Strategic mitigations; reported to TAC.
- Roadmap Triage — bi-weekly (OS Doc 20). Five-question intake gate.
- Quarterly Game-Day (§13.5).
15.5Incident-driven — the Encoded Rules Register
OS Doc 7. Every incident produces either a Tech-NN rule or a recorded decision not to add one. Each rule names its enforcement mechanism (which lands in §15.1, §15.2, §15.3, or §15.4); a rule without enforcement is rejected.
Exceptions & deviations
This policy admits exceptions because reality admits exceptions. What it does not admit is silent deviation. Every exception is recorded, time-boxed, and reviewed.
16.1The exception process
- The deviation is described in an ADR — what is being deviated from, why, what alternative was considered, what compensating controls apply.
- The ADR is reviewed at the Architecture Forum (§3.4) — or out-of-cycle for urgent need, with a confirmatory Forum review at the next sitting.
- The ADR carries an expiry date. Default expiry is six months; longer expiry requires Engineering Lead + TAC sign-off.
- The ADR identifies the owner accountable for either making the deviation permanent (by amending the policy) or removing it.
- At expiry, the ADR is reviewed: closed (deviation removed), renewed (with explicit re-justification), or formalised into the policy via amendment.
16.2What cannot be deviated from
Some clauses are non-deviable. They reflect regulatory, contractual, or load-bearing-architecture commitments where exception creates more risk than the system can carry:
- The four architectural ground rules (§2). Deviation from these is not exception, it is failure.
- PCI scope containment (§9 DEC-04, §10).
- Idempotency on every Platform write (§5.5, §9 DEC-02).
- Single Formance caller (§9 DEC-06, §11.1).
- Production deploy guard (§8.2). Hotfix path (§8.3) is the legitimate variant, not a deviation.
16.3Recording & visibility
All active deviations are listed in OS Doc 14's ADR index with status active, alongside their expiry date. The Architecture Forum reviews active deviations as a standing agenda item. Sustained deviation pressure on a particular clause is a signal to revise the clause; this is healthy and expected.
Adoption schedule
This policy comes into force on its date of issue. Adoption is phased; the steps below align with the 90-Day Implementation Plan (OS Doc 21).
17.1Day 1 — what binds immediately
- The four architectural ground rules (§2) — including the ACL law (§10).
- Domain Ownership Register (§3.1) — every service has a named owner and backup as of Day 1.
- Sprint Goal as gate, AC format (§6.1, §6.2). Stories without ACs in given/when/then are not assigned.
- PR template with inline DoD (§7.1, §7.2) — reviewer cannot approve unchecked items.
- Severity definitions and on-call rota (§13.1) — the on-call schedule from OS Doc 13 is live.
- Daily Kaizen (§15.3).
17.2Day 30 — phase 1 commitments
- CI architecture-test gate live for the four ground rules.
- Per-layer coverage floors enforced (§5.4).
- Cyclomatic and method-length CI gates active (§5.1, §5.2).
- Migration step 01 (Config) shipped or in active flight.
- Encoded Rules Register populated with the existing Tech-01 to Tech-14 rules and their enforcement mechanisms confirmed.
- Visibility / User Journey doc tracking gap reduced — committed targets per OS index (closing 17 No / 9 Pending toward 0 / 0 by July 31).
17.3Day 90 — full enforcement
- All 29 SLIs live in New Relic with alerting (§14).
- Error-budget policy (§14.2) active; deploy-freeze automation in place at 50% burn.
- Migration step 02 (Integrations) cutover for first Vertical complete.
- AI PR-review running on 100% of merges.
- First quarterly Game-Day drill executed (§13.5).
- Architecture Forum operating monthly with ADR index live.
17.4July 31, 2026 — board commitments
Per the OS index's commitments to the board (re-stated here as policy outcomes, not aspiration):
- 22-dimension diagnostic re-scored to ≥ 14 of 22.
- Test coverage from 15% baseline to 60% baseline (75–80% target by year-end).
- Visibility / User Journey tracking gap closed to 0 / 0.
- All 29 SLIs live in New Relic with alerting; SLO breaches feeding the Daily Kaizen pre-read.
- World Model v1 query-capable; AI PR-review on 100% of merges.
- First quarterly Game-Day drill passed.
- The incoming CTO inheriting a system, not chaos — domain-mapped, KPI-anchored, daily-Kaizen-running, AI-augmented.
17.5Open questions tracked under this policy
Five unknowns from the re-architecture plan are tracked under this policy and shall be resolved by ADR before the migration steps that depend on them:
- OTC vertical organisational ownership — separate dealing-desk team or absorbed by GPS?
- Numscript template-library ownership between Engineering and Finance.
- Formance deployment topology — single multi-tenant cluster, per-environment, per-region?
- Risk & fraud bounded context — embedded in switching, in a Vertical, or a gap?
- Shared-observability greenfield vs existing platform reuse.
Each is owned, with a target resolution date set at the next Architecture Forum.