Forecast Engine — Specs

The architectural pattern

Three intellectual precedents shape the design — none of them new, all of them load-bearing in their original domain:

Precedent	What we borrow
FICO (1956)	Cohort-conditioned baseline + bounded enumerated adjustments. Score is reproducible from the input — never a black-box opinion.
Berkus (2001)	Pre-revenue venture pricing under information scarcity: anchor on observable cohort priors, layer additive credits for evidence.
PECOTA (2003)	Cohort-shape-aware projection — comparable players' career arcs shape the band, not a Gaussian assumption that flattens the tails.

Each pattern handles a problem Preflop has. FICO: the inputs are heterogeneous and the model has to be auditable. Berkus: most issuers list with very little personal data. PECOTA: a founder’s outcome distribution is power-law, not Gaussian — and pretending otherwise is wrong.

Why it matters —The forecast engine doesn’t invent any of these patterns. What’s new is composing them on a single output object — the TEBForecast — that drops directly into the Obligation Ledger’s pricing math without translation.

The data model

CohortBaseline schema

A cohort is a group of issuers whose TEB trajectories the engine treats as similar enough to share a baseline. v1.0 ships 6 cohorts (5 priority + 1 fallback); the design accommodates ~30 by v1.1.

interface CohortBaseline {
  id: string;                                  // e.g. "founder-pre-seed-bb-saas-us"
  name: string;
  parent: string | null;
  populationEstimate: number;                  // approximate count of US individuals
  baseline: {
    anchorTebByPercentile: { p10, p25, p50, p75, p90 };
    ageCurve: Array<{ ageRange, teb: { p10, p50, p90 } }>;
    growthSegments: Array<{ window, g: { p10, p50, p90 } }>;
    terminalGrowth: { p10, p50, p90 };
  };
  shape: "gaussian" | "log-normal" | "shifted-pareto";
  shrinkageTau: number;                        // quarters of personal data to equal cohort weight
  dataSources: string[];                       // e.g. "BLS OES 2024 ortho surgeons"
  selectionCorrection: { method, lambda, rationale };
  version: string;
  lastCalibrated: string;
}

TEBForecast output schema

Every forecast produced by the engine ships with a complete provenance trail. Auditors can reconstruct exactly which cohort fired, which adjustments were triggered with which magnitudes, what the shrinkage weight was, and which (if any) hard caps were hit.

interface TEBForecast {
  issuerId: string;
  anchorTime: number;
  tebAtAnchor: { low, mid, high };
  growthSchedule: Array<{ tFrom, tTo, g, g_low?, g_high? }>;
  terminalGrowth: number;                      // mid-band, used by priceToken
  terminalGrowthBands?: { low, mid, high };
  horizonYears: number;
  discountRate: number;
  provenance: {
    engineVersion: "v2.1.0";
    cohortId: string;
    cohortVersion: string;
    shrinkage: { N_personal_quarters, tau, w_personal };
    adjustments: Adjustment[];                 // one entry per non-zero adjustment
    bandShape: "gaussian" | "log-normal" | "shifted-pareto";
    convictionModifier: number;
    selectionCorrection: { method, lambda, rationale };
    capsTriggered: string[];                   // empty unless an adjustment hit a cap
    computedAt: number;                        // unix ms
    sourceCitations: string[];
  };
}

discountRate field — design v1Design v1 — In progress

The discountRate field is currently a platform constant $r = 12\%$ written by every forecast. The forecast engine is being extended to produce cohort-conditional $r_c$ (with a survival-curve auxiliary) as part of engine v2.2.0. Design specification at /spec/discount-rate. Engine v2.1.0 (current) is unchanged.

The five cohorts shipped today

ID	Population	Shape	Terminal mid	Key data sources
founder-pre-seed-bb-saas-us	~12K	log-normal	2.5%	Crunchbase 2024 founder cohort report; YC 2014–2024 outcomes; PitchBook compensation
medicine-surgical-private	~38K	gaussian	2.0%	BLS OES 29-1242; Medscape 2024 compensation; MGMA Provider Comp 2024
biglaw-partner	~28K	gaussian	2.5%	NALP 2024 Partner Compensation; AmLaw 100 PEP; Major Lindsey & Africa 2024
athlete-major-veteran	~3.4K	shifted-pareto	0.5%	NBA CBA 2023–2030; NFLPA 2024 salary db; Forbes highest-paid athletes
creator-mid-tier	~280K	log-normal	1.0%	ConvertKit Creator Economy 2024; Patreon 2024; YouTube Partner Program; Spotify Loud & Clear
_other-fallback	~50M	log-normal	2.0%	BLS OES 2024 economy-wide; CPS 2024; Census ACS 2024

Why it matters —v1.1 expands the registry to ~30 cohorts (founder sub-stages, additional medical specialties, finance roles, athletes by sport, creators by platform/audience tier). Roadmap: 3 months from v2.1.0 launch.

Design update v2 — implementation pendingDesign v2 — In progress

Cohort priors are being re-anchored to entry-pool data sources (Census ABS, Kauffman, BLS, AAMC, NCAA, IRS) instead of the funded-survivor sources cited in the table above (Crunchbase, NALP, league veteran salary databases, platform-monetized creator surveys). A survival-mixture mechanism is added to the V_HC integration so the engine no longer silently assumes $P(\text{stay}) = 1$ forever — for founders the empirical $P(\text{stay at 45} \mid \text{founder at 22}) \approx 6\%$ . See /spec/cohort-priors for the full design specification. Ships in engine v2.3.0 alongside Discount-Rate-v1 in v2.2.0 — both designed to ship together.

The 15-adjustment table

Every adjustment computes a magnitude from issuer evidence, gets clamped to a per-adjustment hard bound, then dampened by the composition coefficient β so a stack of small individually- defensible adjustments can’t compound into a large net shift.

\beta \;=\; \frac{1}{1 + 0.5 \cdot k_{nz}}

With 5 simultaneous adjustments, each is dampened to 28.6% of its nominal magnitude. With 10, each falls to 16.7%. Single- adjustment scenarios (k_nz = 1) get 67% of nominal — still dampened, even alone.

Hard caps (post-composition)

—Anchor TEB shift: ≤ ±30%
—Segment growth shift: ≤ ±15pp absolute
—Terminal growth: immune to per-issuer adjustment — cohort-only

The 15 adjustments

#	Name	Applies to	Bound	Citation
1	Personal CAGR vs cohort growth	S1 growth	[-10%, +15%]	tax-return
2	Income volatility (CV of historical TEB)	bands	[-10%, +30%]	tax-return
3	Income diversification credit	anchor	[0, +10%]	tax-return
4	Documented event-capture history	anchor	[0, +15%]	transaction-record
5	Tax-compliance penalty	anchor	[-20%, 0]	tax-status
6	Employment-gap penalty	S1 growth	[-8%, 0]	self-disclosed
7	Pipeline scale (forward $)	anchor	[0, +20%]	signed-contract
8	Brand-momentum credit (proof links)	bands	[-8%, 0]	public-link
9	Narrative depth (intent signal)	anchor	[0, +3%]	self-disclosed
10	Quarterly-reporting commitment	bands	[-5%, 0]	platform-attestation
11	Commit-letter signed	bands	[-4%, 0]	platform-attestation
12	Age-vs-cohort runway	all growth	[-5%, +5%]	(structural)
13	TEB-bucket override anchor scaling	anchor	[-20%, +20%]	self-disclosed
14	Industry-specificity refinement	S1+S2 growth	[-5%, +10%]	(structural)
15	Reporting transparency aggregate	bands	[-6%, 0]	platform-attestation

Why it matters —11 of 15 adjustments map directly to conviction-score signals. The remaining 4 (5, 12, 13, 14) come from issuer profile attributes that conviction doesn’t score directly but the forecast cares about.

Shared-input dependencies — declared, not double-counted —A small number of adjustments consume the same input field as a conviction signal but emit into a different downstream channel. Most prominent example: Adjustment #2 (CV of historical TEB → band width) and conviction signal B₂ (CV of historical TEB → drift μ via the composite score). Same evidence (historicalTEB), two orthogonal outputs (band width vs drift). The saturation point (CV = 1.5) is intentionally shared so the two channels agree on the regime boundary. This is not a double-count: the covariance structure is explicit and the downstream effects compose multiplicatively rather than additively. Any future adjustment that consumes a signal-shared input must declare the same orthogonality (which channel it emits to) and survive the shift-equivalence test from the discount-rate spec.

Bayesian shrinkage between personal and cohort

When personal TEB data is sparse, the cohort dominates. As personal data accrues, the engine shifts weight to it. The weight on personal data is monotonically increasing, equal to 0.5 at $N = \tau$ , asymptoting toward 1.0 as $N \to \infty$ :

w_{\text{personal}} \;=\; \frac{N_{\text{quarters}}}{N_{\text{quarters}} + \tau_{\text{cohort}}}

Sensitivity table

Quarters of personal data	Founder (τ=4)	Surgeon (τ=16)
0	0% (cohort dominates)	0% (cohort dominates)
4	50%	20%
8	67%	33%
16	80%	50%
40	91%	71%

Why it matters —Maya at $N = 0$ on the founder cohort: $w = 0$ , cohort dominates 100%. Amara at $N = 16$ on the surgeon cohort: $w = 0.5$ , personal and cohort weighted equally. Same engine, opposite mix — driven by data availability.

Cohort-shape-aware bands

Symmetric (Gaussian) bands are correct for some cohorts and wrong for others. The engine carries a shape tag per cohort and renders bands accordingly:

Shape	Best fit for	Why
Gaussian	Medicine, biglaw, established professionals	Outcome distribution is bounded by procedural / billable-hour ceilings; symmetric dispersion is empirically reasonable.
Log-normal	Founders, creators, fallback	Right tail has long upside (a small fraction reach $1M+ TEB); left tail bounded near zero. Symmetric bands understate upside dispersion.
Shifted-Pareto	Veteran athletes	Career cliff at retirement → most decline sharply; a small fraction (broadcasting, equity) extend high earnings. Power-law fits this two-regime structure.

Why it matters —Symmetric bands on a power-law cohort would understate upside and overstate the symmetry of failure. Cohort-shape-aware rendering preserves directional information that matters for how a backer reads the band.

Maya end-to-end · v2.1.0 engine output

Maya: age 20, pre-revenue founder, mostRecentTEB $20K, self-assessed taxes, two proof links, willing to report quarterly. Inputs match the canonical Maya from the math foundation Part 9 — but the engine’s outputs are materially different from the math doc’s authorial numbers (more conservative). See the cold-start note below.

Stage 1 — Cohort lookup

Profession founder, industry contains SaaS, age 20 → cohort founder-pre-seed-bb-saas-us. Shape: log-normal. $\tau = 4$ quarters.

Stage 2 — Cohort baseline at age 20

Age 20 falls into the [22, 25] band (lowest defined age band):

\text{anchor}_{\text{cohort}}(20) \;=\; \{\,p10 = \$5\text{K},\;\; p50 = \$25\text{K},\;\; p90 = \$75\text{K}\,\}

Stage 3 — Shrinkage

Maya has no historicalTEB → $N_{\text{quarters}} = 0$ . Shrinkage weight: $w_{\text{personal}} = 0/(0+4) = 0$ . Cohort dominates 100%. Maya’s mostRecentTEB $20K is honored as the personal anchor mid (overrides the cohort mid since it’s an explicit declaration), but the band shape inherits the cohort dispersion ratio.

Stage 4 — Adjustments triggered

4 of 15 adjustments fire (composition damping β ≈ 0.33):

—#5 Tax-compliance penalty (self-assessed): -5%
—#10 Quarterly-reporting commitment: -5% bands
—#12 Age-vs-cohort runway (age 20, cohort peak ~38): +5% growth
—#15 Reporting transparency aggregate: -2% bands

Stage 5 — Final TEBForecast

Field	Value
tebAtAnchor.low	$4.3K
tebAtAnchor.mid	$19.7K
tebAtAnchor.high	$58K
growthSchedule[0]	[0, 3) g_mid = 26.7% (cohort segment 1, post-adj)
growthSchedule[1]	[3, 7) g_mid = 19.7%
growthSchedule[2]	[7, 15) g_mid = 8%
growthSchedule[3]	[15, 30) g_mid = 4%
growthSchedule[4]	[30, 75) g_mid = 2.5% (terminal tail)
terminalGrowthBands	{ low: -1%, mid: 2.5%, high: 5% }
bandShape	log-normal
capsTriggered	[] (none)

Stage 6 — priceToken on canonical covenant

Run Maya’s forecast through priceToken with a canonical covenant (s₁ = 3%, s₂ = 1%, T = 10):

Engine output

Per-token: low $\$0.33$ , mid $\$1.53$ , high $\$4.51$ · CI width ≈ 273%

Phase 1 PV mid: $\$1.00/\text{tok}$ (66% of value). Phase 2 PV mid: $\$0.53/\text{tok}$ (34%).

$V_{\text{HC}}^{\text{mid}} \approx \$860\text{K}$ . $e_{\text{eff}}^{\text{piecewise}} \approx 1.78\%$ .

At $\$60\text{K}$ target raise: $\hat V_{\text{HC}} = \$3.37\text{M}$ , $\kappa \approx 3.92$ → speculative tier.

Cold-start note —Maya’s κ here is materially higher than the math foundation Part 9 (which estimated κ ≈ 1.20 anchored). The engine is more conservative because the cohort prior says the median pre-seed founder doesn’t reach the math doc’s aggressive TEB(10) = $600K target — the cohort p50 at age 22–25 is $25K, growing piecewise. This is the engine telling us the truth about cohort-anchored expectations.The math doc’s Maya numbers will reconcile to engine output in a future vault session; this site reflects engine output as the canonical reference.

Dr. Amara end-to-end · DL with personal data

Amara: age 45, orthopedic surgeon, mostRecentTEB $2M, 5 years of historicalTEB (annual quarterly attestable), 3 income streams, clean tax status, willing to report quarterly, commit-letter signed.

Cohort + shrinkage

Profession professional + industry orthopedic → cohort medicine-surgical-private. Shape: gaussian. $\tau = 16$ quarters.

5 annual historicalTEB points × 4 quarters/year ≈ $N = 16$ quarters of personal data. Shrinkage: $w_{\text{personal}} = 16 / (16 + 16) = 0.50$ . Personal and cohort weighted equally— opposite of Maya’s pure-cohort case.

Adjustments triggered

6 of 15 adjustments fire — composition damping β ≈ 0.25:

—#1 Personal CAGR vs cohort: +1.4% S1 growth (Amara’s ~4% trailing CAGR > cohort ~3%)
—#2 Income volatility (CV of historical TEB): -10% bands (Amara’s 5-yr CV ≈ 0.064, well below the 0.20 stable-issuer threshold)
—#10 Quarterly-reporting commitment: -5% bands
—#11 Commit-letter signed: -4% bands
—#12 Age-vs-cohort runway: +1% (age 45, cohort peak ~50)
—#15 Reporting transparency aggregate: -4% bands

Adj #2 swing — HHI → CV migration (2026-04) —Prior to 2026-04, Adj #2 measured income-stream concentration via HHI. Under that logic, Amara’s 3 streams (0.70/0.20/0.10, HHI = 0.54) produced +2% band widening. The implementation is now realized CV of historical TEB; her stable trajectory ($1.7M → $2.0M over 5 years) gives CV ≈ 0.064, returning −10% band narrowing — a 12pp swing on the raw magnitude, ~3pp post-damping. The spec previously narrated Adj #3 firing at +2.5% for Amara; that was incorrect (HHI 0.54 ≥ 0.45 threshold, so #3 returns 0). Both narrative errors are corrected here. See also: shared-input note for B₂ + Adj #2 in §adjustments.

Final forecast + priceToken (DL e=2%)

Field	Value
tebAtAnchor	{ low: $1.03M, mid: $2M, high: $3.51M }
growthSchedule (5 segs)	ramps at 4.5% → 3.2% → 1.0% → -5% (career decline) → 2.0% terminal
bandShape	gaussian
capsTriggered	[] (none)
per-token (DL e=2%)	low $22.73, mid $42.99, high $74.50 · CI 120%
V_HC^mid (back-derived)	~$21.49M
e_eff	2.00% (exact, DL window matches V_HC)

κ at $400K target raise

$\hat V_{\text{HC}} = \$400\text{K} / 0.02 = \$20\text{M}$ , $\kappa = \$20\text{M} / \$21.49\text{M} \approx 0.93$ → anchored tier. Below engine-mid by ~7%; auction would clear smoothly.

This number reconciles closely with the math foundation Part 9 (κ = 0.94). Established issuers with substantial personal data don’t diverge from math-doc estimates; cold-start issuers do.

Calibration plan

What can change quarterly

—Cohort baseline numbers (anchor TEB, segment growth, terminal growth) — refit against realized TEB once cohort crosses ≥ 50 issuer-quarters
—Adjustment magnitudes — bound stays fixed but the elasticity inside each adjustment can recalibrate
—Shrinkage τ per cohort — empirical mixing-time calibration
—Selection-correction λ (Heckman) — activates once each cohort has enough listings to estimate selection

What stays structural (locked)

—Conviction weighting 0.6 backward / 0.4 forward
—Piecewise-exponential growth structure
—25% cap invariant
—Hard caps: anchor ±30%, segment ±15pp, terminal cohort-only
—15-adjustment table identity (adjustments don’t get added or removed without an engine version bump)

Honest cold-start label —Engine v2.1.0. Calibrated against 0 realized outcomes (cold start). Cohort baselines: analyst priors anchored to public data sources. First quarterly refit targeted once cohort crosses ≥ 50 issuer-quarters of observed data.

Adversarial defenses

Attack	Defense
Inflate TEB declarations	TEB-bucket override capped at ±20% (adj 13); also subject to anchor hard cap ±30%
Stack many small credits to compound	Composition damping β = 1/(1 + 0.5·k_nz) — 5 credits each get 28% weight, not 100%
Misclassify cohort to game baseline	Cohort lookup is deterministic from inputs; classification disputed at review
Front-run a forecast revision	Forecast outputs deterministic; no nondeterminism to front-run
Pump conviction signals just before listing	Conviction signals all require attestation; back-dated attestations flagged at review
Backdate historicalTEB	Tax-return attestation required for years to count toward shrinkage N
Claim higher growth than cohort allows	Per-segment growth bound ±15pp absolute; no escape valve
Hide income volatility	Adj #2 computes sample CV from tax-return-attested historicalTEB (≥3 positive years required). Concealing volatility means either omitting attested years (caught at review) or having too sparse data → adjustment fails closed → no band-narrowing benefit
Selectively report only good quarters	Quarterly reporting attestation triggers cure-period mechanism on missing quarters
Switch industries to find favorable cohort	Cohort lookup uses profession + industry at time of listing; mid-listing switches require new disclosure

Full attack surface + 10-adversarial-stress-test analysis lives in the design doc Part L.

What this engine does NOT claim

—Not a prediction of any specific issuer’s actual outcome. The forecast is a band, not a point. Engine output describes cohort-conditioned expectations under analyst priors, not realized future TEB.
—Not a substitute for issuer due diligence. Backers should still read the disclosure pack. The engine’s job is to anchor pricing, not to vouch for the issuer.
—Not calibrated against realized data yet. Cohort baselines are analyst priors anchored to public data sources. First refit fires once cohort accrues ≥ 50 issuer-quarters of observed TEB.
—Not infallible at the cohort boundary. Some issuers don’t cleanly fit any cohort; v1.1 expands the registry, but v1.0 fallbacks may produce wider bands than ideal.
—Not the price. The auction sets the clearing price; the engine sets the reserve. Market signal can disagree with engine-mid by up to the κ-tier ceiling — and that disagreement is itself information.

Read the full design

The complete design document — including the 30-cohort roadmap, full F-table derivation, 10-adversarial-stress-test analysis, and quarterly-refit operational playbook — lives in PreFlop/wiki/Forecast-Engine-Design.md. v2.1.0 is the first published implementation; v1.1 expands the cohort registry and activates Heckman selection correction.

How Preflop forecasts a person’s future income