The architectural pattern
Three intellectual precedents shape the design — none of them new, all of them load-bearing in their original domain:
| Precedent | What we borrow |
|---|---|
| FICO (1956) | Cohort-conditioned baseline + bounded enumerated adjustments. Score is reproducible from the input — never a black-box opinion. |
| Berkus (2001) | Pre-revenue venture pricing under information scarcity: anchor on observable cohort priors, layer additive credits for evidence. |
| PECOTA (2003) | Cohort-shape-aware projection — comparable players' career arcs shape the band, not a Gaussian assumption that flattens the tails. |
Each pattern handles a problem Preflop has. FICO: the inputs are heterogeneous and the model has to be auditable. Berkus: most issuers list with very little personal data. PECOTA: a founder’s outcome distribution is power-law, not Gaussian — and pretending otherwise is wrong.
Why it matters —The forecast engine doesn’t invent any of these patterns. What’s new is composing them on a single output object — the TEBForecast — that drops directly into the Obligation Ledger’s pricing math without translation.
The data model
CohortBaseline schema
A cohort is a group of issuers whose TEB trajectories the engine treats as similar enough to share a baseline. v1.0 ships 6 cohorts (5 priority + 1 fallback); the design accommodates ~30 by v1.1.
interface CohortBaseline {
id: string; // e.g. "founder-pre-seed-bb-saas-us"
name: string;
parent: string | null;
populationEstimate: number; // approximate count of US individuals
baseline: {
anchorTebByPercentile: { p10, p25, p50, p75, p90 };
ageCurve: Array<{ ageRange, teb: { p10, p50, p90 } }>;
growthSegments: Array<{ window, g: { p10, p50, p90 } }>;
terminalGrowth: { p10, p50, p90 };
};
shape: "gaussian" | "log-normal" | "shifted-pareto";
shrinkageTau: number; // quarters of personal data to equal cohort weight
dataSources: string[]; // e.g. "BLS OES 2024 ortho surgeons"
selectionCorrection: { method, lambda, rationale };
version: string;
lastCalibrated: string;
}TEBForecast output schema
Every forecast produced by the engine ships with a complete provenance trail. Auditors can reconstruct exactly which cohort fired, which adjustments were triggered with which magnitudes, what the shrinkage weight was, and which (if any) hard caps were hit.
interface TEBForecast {
issuerId: string;
anchorTime: number;
tebAtAnchor: { low, mid, high };
growthSchedule: Array<{ tFrom, tTo, g, g_low?, g_high? }>;
terminalGrowth: number; // mid-band, used by priceToken
terminalGrowthBands?: { low, mid, high };
horizonYears: number;
discountRate: number;
provenance: {
engineVersion: "v2.1.0";
cohortId: string;
cohortVersion: string;
shrinkage: { N_personal_quarters, tau, w_personal };
adjustments: Adjustment[]; // one entry per non-zero adjustment
bandShape: "gaussian" | "log-normal" | "shifted-pareto";
convictionModifier: number;
selectionCorrection: { method, lambda, rationale };
capsTriggered: string[]; // empty unless an adjustment hit a cap
computedAt: number; // unix ms
sourceCitations: string[];
};
}discountRate field — design v1Design v1 — In progress
The discountRate field is currently a platform constant written by every forecast. The forecast engine is being extended to produce cohort-conditional (with a survival-curve auxiliary) as part of engine v2.2.0. Design specification at /spec/discount-rate. Engine v2.1.0 (current) is unchanged.
The five cohorts shipped today
| ID | Population | Shape | Terminal mid | Key data sources |
|---|---|---|---|---|
| founder-pre-seed-bb-saas-us | ~12K | log-normal | 2.5% | Crunchbase 2024 founder cohort report; YC 2014–2024 outcomes; PitchBook compensation |
| medicine-surgical-private | ~38K | gaussian | 2.0% | BLS OES 29-1242; Medscape 2024 compensation; MGMA Provider Comp 2024 |
| biglaw-partner | ~28K | gaussian | 2.5% | NALP 2024 Partner Compensation; AmLaw 100 PEP; Major Lindsey & Africa 2024 |
| athlete-major-veteran | ~3.4K | shifted-pareto | 0.5% | NBA CBA 2023–2030; NFLPA 2024 salary db; Forbes highest-paid athletes |
| creator-mid-tier | ~280K | log-normal | 1.0% | ConvertKit Creator Economy 2024; Patreon 2024; YouTube Partner Program; Spotify Loud & Clear |
| _other-fallback | ~50M | log-normal | 2.0% | BLS OES 2024 economy-wide; CPS 2024; Census ACS 2024 |
Why it matters —v1.1 expands the registry to ~30 cohorts (founder sub-stages, additional medical specialties, finance roles, athletes by sport, creators by platform/audience tier). Roadmap: 3 months from v2.1.0 launch.
Design update v2 — implementation pendingDesign v2 — In progress
Cohort priors are being re-anchored to entry-pool data sources (Census ABS, Kauffman, BLS, AAMC, NCAA, IRS) instead of the funded-survivor sources cited in the table above (Crunchbase, NALP, league veteran salary databases, platform-monetized creator surveys). A survival-mixture mechanism is added to the VHC integration so the engine no longer silently assumes forever — for founders the empirical . See /spec/cohort-priors for the full design specification. Ships in engine v2.3.0 alongside Discount-Rate-v1 in v2.2.0 — both designed to ship together.
The 15-adjustment table
Every adjustment computes a magnitude from issuer evidence, gets clamped to a per-adjustment hard bound, then dampened by the composition coefficient β so a stack of small individually- defensible adjustments can’t compound into a large net shift.
With 5 simultaneous adjustments, each is dampened to 28.6% of its nominal magnitude. With 10, each falls to 16.7%. Single- adjustment scenarios (k_nz = 1) get 67% of nominal — still dampened, even alone.
Hard caps (post-composition)
- —Anchor TEB shift: ≤ ±30%
- —Segment growth shift: ≤ ±15pp absolute
- —Terminal growth: immune to per-issuer adjustment — cohort-only
The 15 adjustments
| # | Name | Applies to | Bound | Citation |
|---|---|---|---|---|
| 1 | Personal CAGR vs cohort growth | S1 growth | [-10%, +15%] | tax-return |
| 2 | Income volatility (CV of historical TEB) | bands | [-10%, +30%] | tax-return |
| 3 | Income diversification credit | anchor | [0, +10%] | tax-return |
| 4 | Documented event-capture history | anchor | [0, +15%] | transaction-record |
| 5 | Tax-compliance penalty | anchor | [-20%, 0] | tax-status |
| 6 | Employment-gap penalty | S1 growth | [-8%, 0] | self-disclosed |
| 7 | Pipeline scale (forward $) | anchor | [0, +20%] | signed-contract |
| 8 | Brand-momentum credit (proof links) | bands | [-8%, 0] | public-link |
| 9 | Narrative depth (intent signal) | anchor | [0, +3%] | self-disclosed |
| 10 | Quarterly-reporting commitment | bands | [-5%, 0] | platform-attestation |
| 11 | Commit-letter signed | bands | [-4%, 0] | platform-attestation |
| 12 | Age-vs-cohort runway | all growth | [-5%, +5%] | (structural) |
| 13 | TEB-bucket override anchor scaling | anchor | [-20%, +20%] | self-disclosed |
| 14 | Industry-specificity refinement | S1+S2 growth | [-5%, +10%] | (structural) |
| 15 | Reporting transparency aggregate | bands | [-6%, 0] | platform-attestation |
Why it matters —11 of 15 adjustments map directly to conviction-score signals. The remaining 4 (5, 12, 13, 14) come from issuer profile attributes that conviction doesn’t score directly but the forecast cares about.
Shared-input dependencies — declared, not double-counted —A small number of adjustments consume the same input field as a conviction signal but emit into a different downstream channel. Most prominent example: Adjustment #2 (CV of historical TEB → band width) and conviction signal B2 (CV of historical TEB → drift μ via the composite score). Same evidence (historicalTEB), two orthogonal outputs (band width vs drift). The saturation point (CV = 1.5) is intentionally shared so the two channels agree on the regime boundary. This is not a double-count: the covariance structure is explicit and the downstream effects compose multiplicatively rather than additively. Any future adjustment that consumes a signal-shared input must declare the same orthogonality (which channel it emits to) and survive the shift-equivalence test from the discount-rate spec.
Bayesian shrinkage between personal and cohort
When personal TEB data is sparse, the cohort dominates. As personal data accrues, the engine shifts weight to it. The weight on personal data is monotonically increasing, equal to 0.5 at , asymptoting toward 1.0 as :
Sensitivity table
| Quarters of personal data | Founder (τ=4) | Surgeon (τ=16) |
|---|---|---|
| 0 | 0% (cohort dominates) | 0% (cohort dominates) |
| 4 | 50% | 20% |
| 8 | 67% | 33% |
| 16 | 80% | 50% |
| 40 | 91% | 71% |
Why it matters —Maya at on the founder cohort: , cohort dominates 100%. Amara at on the surgeon cohort: , personal and cohort weighted equally. Same engine, opposite mix — driven by data availability.
Cohort-shape-aware bands
Symmetric (Gaussian) bands are correct for some cohorts and wrong for others. The engine carries a shape tag per cohort and renders bands accordingly:
| Shape | Best fit for | Why |
|---|---|---|
| Gaussian | Medicine, biglaw, established professionals | Outcome distribution is bounded by procedural / billable-hour ceilings; symmetric dispersion is empirically reasonable. |
| Log-normal | Founders, creators, fallback | Right tail has long upside (a small fraction reach $1M+ TEB); left tail bounded near zero. Symmetric bands understate upside dispersion. |
| Shifted-Pareto | Veteran athletes | Career cliff at retirement → most decline sharply; a small fraction (broadcasting, equity) extend high earnings. Power-law fits this two-regime structure. |
Why it matters —Symmetric bands on a power-law cohort would understate upside and overstate the symmetry of failure. Cohort-shape-aware rendering preserves directional information that matters for how a backer reads the band.
Maya end-to-end · v2.1.0 engine output
Maya: age 20, pre-revenue founder, mostRecentTEB $20K, self-assessed taxes, two proof links, willing to report quarterly. Inputs match the canonical Maya from the math foundation Part 9 — but the engine’s outputs are materially different from the math doc’s authorial numbers (more conservative). See the cold-start note below.
Stage 1 — Cohort lookup
Profession founder, industry contains SaaS, age 20 → cohort founder-pre-seed-bb-saas-us. Shape: log-normal. quarters.
Stage 2 — Cohort baseline at age 20
Age 20 falls into the [22, 25] band (lowest defined age band):
Stage 3 — Shrinkage
Maya has no historicalTEB → . Shrinkage weight: . Cohort dominates 100%. Maya’s mostRecentTEB $20K is honored as the personal anchor mid (overrides the cohort mid since it’s an explicit declaration), but the band shape inherits the cohort dispersion ratio.
Stage 4 — Adjustments triggered
4 of 15 adjustments fire (composition damping β ≈ 0.33):
- —#5 Tax-compliance penalty (self-assessed): -5%
- —#10 Quarterly-reporting commitment: -5% bands
- —#12 Age-vs-cohort runway (age 20, cohort peak ~38): +5% growth
- —#15 Reporting transparency aggregate: -2% bands
Stage 5 — Final TEBForecast
| Field | Value |
|---|---|
| tebAtAnchor.low | $4.3K |
| tebAtAnchor.mid | $19.7K |
| tebAtAnchor.high | $58K |
| growthSchedule[0] | [0, 3) g_mid = 26.7% (cohort segment 1, post-adj) |
| growthSchedule[1] | [3, 7) g_mid = 19.7% |
| growthSchedule[2] | [7, 15) g_mid = 8% |
| growthSchedule[3] | [15, 30) g_mid = 4% |
| growthSchedule[4] | [30, 75) g_mid = 2.5% (terminal tail) |
| terminalGrowthBands | { low: -1%, mid: 2.5%, high: 5% } |
| bandShape | log-normal |
| capsTriggered | [] (none) |
Stage 6 — priceToken on canonical covenant
Run Maya’s forecast through priceToken with a canonical covenant (s1 = 3%, s2 = 1%, T = 10):
Engine output
Per-token: low , mid , high · CI width ≈ 273%
Phase 1 PV mid: (66% of value). Phase 2 PV mid: (34%).
. .
At target raise: , → speculative tier.
Cold-start note —Maya’s κ here is materially higher than the math foundation Part 9 (which estimated κ ≈ 1.20 anchored). The engine is more conservative because the cohort prior says the median pre-seed founder doesn’t reach the math doc’s aggressive TEB(10) = $600K target — the cohort p50 at age 22–25 is $25K, growing piecewise. This is the engine telling us the truth about cohort-anchored expectations.The math doc’s Maya numbers will reconcile to engine output in a future vault session; this site reflects engine output as the canonical reference.
Dr. Amara end-to-end · DL with personal data
Amara: age 45, orthopedic surgeon, mostRecentTEB $2M, 5 years of historicalTEB (annual quarterly attestable), 3 income streams, clean tax status, willing to report quarterly, commit-letter signed.
Cohort + shrinkage
Profession professional + industry orthopedic → cohort medicine-surgical-private. Shape: gaussian. quarters.
5 annual historicalTEB points × 4 quarters/year ≈ quarters of personal data. Shrinkage: . Personal and cohort weighted equally— opposite of Maya’s pure-cohort case.
Adjustments triggered
6 of 15 adjustments fire — composition damping β ≈ 0.25:
- —#1 Personal CAGR vs cohort: +1.4% S1 growth (Amara’s ~4% trailing CAGR > cohort ~3%)
- —#2 Income volatility (CV of historical TEB): -10% bands (Amara’s 5-yr CV ≈ 0.064, well below the 0.20 stable-issuer threshold)
- —#10 Quarterly-reporting commitment: -5% bands
- —#11 Commit-letter signed: -4% bands
- —#12 Age-vs-cohort runway: +1% (age 45, cohort peak ~50)
- —#15 Reporting transparency aggregate: -4% bands
Adj #2 swing — HHI → CV migration (2026-04) —Prior to 2026-04, Adj #2 measured income-stream concentration via HHI. Under that logic, Amara’s 3 streams (0.70/0.20/0.10, HHI = 0.54) produced +2% band widening. The implementation is now realized CV of historical TEB; her stable trajectory ($1.7M → $2.0M over 5 years) gives CV ≈ 0.064, returning −10% band narrowing — a 12pp swing on the raw magnitude, ~3pp post-damping. The spec previously narrated Adj #3 firing at +2.5% for Amara; that was incorrect (HHI 0.54 ≥ 0.45 threshold, so #3 returns 0). Both narrative errors are corrected here. See also: shared-input note for B2 + Adj #2 in §adjustments.
Final forecast + priceToken (DL e=2%)
| Field | Value |
|---|---|
| tebAtAnchor | { low: $1.03M, mid: $2M, high: $3.51M } |
| growthSchedule (5 segs) | ramps at 4.5% → 3.2% → 1.0% → -5% (career decline) → 2.0% terminal |
| bandShape | gaussian |
| capsTriggered | [] (none) |
| per-token (DL e=2%) | low $22.73, mid $42.99, high $74.50 · CI 120% |
| VHCmid (back-derived) | ~$21.49M |
| eeff | 2.00% (exact, DL window matches VHC) |
κ at $400K target raise
, → anchored tier. Below engine-mid by ~7%; auction would clear smoothly.
This number reconciles closely with the math foundation Part 9 (κ = 0.94). Established issuers with substantial personal data don’t diverge from math-doc estimates; cold-start issuers do.
Calibration plan
What can change quarterly
- —Cohort baseline numbers (anchor TEB, segment growth, terminal growth) — refit against realized TEB once cohort crosses ≥ 50 issuer-quarters
- —Adjustment magnitudes — bound stays fixed but the elasticity inside each adjustment can recalibrate
- —Shrinkage τ per cohort — empirical mixing-time calibration
- —Selection-correction λ (Heckman) — activates once each cohort has enough listings to estimate selection
What stays structural (locked)
- —Conviction weighting 0.6 backward / 0.4 forward
- —Piecewise-exponential growth structure
- —25% cap invariant
- —Hard caps: anchor ±30%, segment ±15pp, terminal cohort-only
- —15-adjustment table identity (adjustments don’t get added or removed without an engine version bump)
Honest cold-start label —Engine v2.1.0. Calibrated against 0 realized outcomes (cold start). Cohort baselines: analyst priors anchored to public data sources. First quarterly refit targeted once cohort crosses ≥ 50 issuer-quarters of observed data.
Adversarial defenses
| Attack | Defense |
|---|---|
| Inflate TEB declarations | TEB-bucket override capped at ±20% (adj 13); also subject to anchor hard cap ±30% |
| Stack many small credits to compound | Composition damping β = 1/(1 + 0.5·k_nz) — 5 credits each get 28% weight, not 100% |
| Misclassify cohort to game baseline | Cohort lookup is deterministic from inputs; classification disputed at review |
| Front-run a forecast revision | Forecast outputs deterministic; no nondeterminism to front-run |
| Pump conviction signals just before listing | Conviction signals all require attestation; back-dated attestations flagged at review |
| Backdate historicalTEB | Tax-return attestation required for years to count toward shrinkage N |
| Claim higher growth than cohort allows | Per-segment growth bound ±15pp absolute; no escape valve |
| Hide income volatility | Adj #2 computes sample CV from tax-return-attested historicalTEB (≥3 positive years required). Concealing volatility means either omitting attested years (caught at review) or having too sparse data → adjustment fails closed → no band-narrowing benefit |
| Selectively report only good quarters | Quarterly reporting attestation triggers cure-period mechanism on missing quarters |
| Switch industries to find favorable cohort | Cohort lookup uses profession + industry at time of listing; mid-listing switches require new disclosure |
Full attack surface + 10-adversarial-stress-test analysis lives in the design doc Part L.
What this engine does NOT claim
- —Not a prediction of any specific issuer’s actual outcome. The forecast is a band, not a point. Engine output describes cohort-conditioned expectations under analyst priors, not realized future TEB.
- —Not a substitute for issuer due diligence. Backers should still read the disclosure pack. The engine’s job is to anchor pricing, not to vouch for the issuer.
- —Not calibrated against realized data yet. Cohort baselines are analyst priors anchored to public data sources. First refit fires once cohort accrues ≥ 50 issuer-quarters of observed TEB.
- —Not infallible at the cohort boundary. Some issuers don’t cleanly fit any cohort; v1.1 expands the registry, but v1.0 fallbacks may produce wider bands than ideal.
- —Not the price. The auction sets the clearing price; the engine sets the reserve. Market signal can disagree with engine-mid by up to the κ-tier ceiling — and that disagreement is itself information.
Read the full design
The complete design document — including the 30-cohort roadmap, full F-table derivation, 10-adversarial-stress-test analysis, and quarterly-refit operational playbook — lives in PreFlop/wiki/Forecast-Engine-Design.md. v2.1.0 is the first published implementation; v1.1 expands the cohort registry and activates Heckman selection correction.