Incident Readiness Runbook
incident-readiness-runbook.md
Boundary
This route preserves legacy markdown access inside the Next.js surface. The raw repository file remains authoritative.
Open raw fileIncident Readiness Runbook
This document defines the minimum monitoring and incident posture for PrivateDAO before and after a Mainnet launch.
It is intentionally practical:
- detect failures early
- preserve operator clarity
- shorten response time
- keep the launch boundary honest
Core monitoring targets
PrivateDAO should continuously watch:
RPC failures
- request timeouts
- repeated degraded latency
- transport failures across primary and fallback providers
Wallet errors
- wallet connection failures
- repeated signature rejections
- signing-boundary confusion in commit, reveal, or execute flows
Instruction failures
- failed `create_dao`
- failed `create_proposal`
- failed `commit_vote`
- failed `reveal_vote`
- failed `finalize_proposal`
- failed `execute_proposal`
Replay and retry anomalies
- repeated transaction attempts
- duplicate execute attempts
- unexpected duplicate commit or reveal patterns
State inconsistencies
- proposal state moving unexpectedly
- reveal or execute attempted outside valid timing windows
- treasury and proposal state drifting from expected UI state
Minimum alerts
The production operating stack should raise alerts for:
- repeated RPC failure or sustained latency regression
- repeated wallet-sign errors on critical routes
- repeated instruction failures for the same action
- unexpected account or proposal state transitions
- treasury-action mismatches or blocked execution anomalies
Minimum logs
The operator should always be able to reconstruct:
- what action was attempted
- by which wallet or operator role
- on which proposal or DAO
- against which network and program id
- whether the action failed at wallet, RPC, or program level
Simple incident flow
- detect
- classify
- freeze or contain if needed
- verify proposal and treasury state
- switch to fallback RPC or safe path if required
- collect evidence
- publish operator update
- document the permanent fix
Runbook principle
The incident path should stay smaller than the product path.
Normal users use the UI.
Operators and reviewers should still have a short, deterministic runbook when something goes wrong.
Truth boundary
This document defines the operating target.
It does not claim 24/7 monitoring, SIRN membership, or external incident-response coverage unless those are explicitly evidenced elsewhere.