Spec · Forecast EngineBuilding next

How Preflop forecasts a person’s future income

Cohort baselines + bounded adjustments + Bayesian shrinkage + cohort-shape-aware bands. The bottom of the pricing pipeline — everything downstream (VHC, eeff, κ, reserve, clearing) inherits whatever rigor (or sloppiness) lives here. Engine v2.1.0. Calibrated against 0 realized outcomes today; cohort baselines are analyst priors anchored to named public data sources.

The architectural pattern

Three intellectual precedents shape the design — none of them new, all of them load-bearing in their original domain:

PrecedentWhat we borrow
FICO (1956)Cohort-conditioned baseline + bounded enumerated adjustments. Score is reproducible from the input — never a black-box opinion.
Berkus (2001)Pre-revenue venture pricing under information scarcity: anchor on observable cohort priors, layer additive credits for evidence.
PECOTA (2003)Cohort-shape-aware projection — comparable players' career arcs shape the band, not a Gaussian assumption that flattens the tails.

Each pattern handles a problem Preflop has. FICO: the inputs are heterogeneous and the model has to be auditable. Berkus: most issuers list with very little personal data. PECOTA: a founder’s outcome distribution is power-law, not Gaussian — and pretending otherwise is wrong.

Why it mattersThe forecast engine doesn’t invent any of these patterns. What’s new is composing them on a single output object — the TEBForecast — that drops directly into the Obligation Ledger’s pricing math without translation.

The data model

CohortBaseline schema

A cohort is a group of issuers whose TEB trajectories the engine treats as similar enough to share a baseline. v1.0 ships 6 cohorts (5 priority + 1 fallback); the design accommodates ~30 by v1.1.

interface CohortBaseline {
  id: string;                                  // e.g. "founder-pre-seed-bb-saas-us"
  name: string;
  parent: string | null;
  populationEstimate: number;                  // approximate count of US individuals
  baseline: {
    anchorTebByPercentile: { p10, p25, p50, p75, p90 };
    ageCurve: Array<{ ageRange, teb: { p10, p50, p90 } }>;
    growthSegments: Array<{ window, g: { p10, p50, p90 } }>;
    terminalGrowth: { p10, p50, p90 };
  };
  shape: "gaussian" | "log-normal" | "shifted-pareto";
  shrinkageTau: number;                        // quarters of personal data to equal cohort weight
  dataSources: string[];                       // e.g. "BLS OES 2024 ortho surgeons"
  selectionCorrection: { method, lambda, rationale };
  version: string;
  lastCalibrated: string;
}

TEBForecast output schema

Every forecast produced by the engine ships with a complete provenance trail. Auditors can reconstruct exactly which cohort fired, which adjustments were triggered with which magnitudes, what the shrinkage weight was, and which (if any) hard caps were hit.

interface TEBForecast {
  issuerId: string;
  anchorTime: number;
  tebAtAnchor: { low, mid, high };
  growthSchedule: Array<{ tFrom, tTo, g, g_low?, g_high? }>;
  terminalGrowth: number;                      // mid-band, used by priceToken
  terminalGrowthBands?: { low, mid, high };
  horizonYears: number;
  discountRate: number;
  provenance: {
    engineVersion: "v2.1.0";
    cohortId: string;
    cohortVersion: string;
    shrinkage: { N_personal_quarters, tau, w_personal };
    adjustments: Adjustment[];                 // one entry per non-zero adjustment
    bandShape: "gaussian" | "log-normal" | "shifted-pareto";
    convictionModifier: number;
    selectionCorrection: { method, lambda, rationale };
    capsTriggered: string[];                   // empty unless an adjustment hit a cap
    computedAt: number;                        // unix ms
    sourceCitations: string[];
  };
}

discountRate field — design v1Design v1 — In progress

The discountRate field is currently a platform constant r=12%r = 12\% written by every forecast. The forecast engine is being extended to produce cohort-conditional rcr_c (with a survival-curve auxiliary) as part of engine v2.2.0. Design specification at /spec/discount-rate. Engine v2.1.0 (current) is unchanged.

The five cohorts shipped today

IDPopulationShapeTerminal midKey data sources
founder-pre-seed-bb-saas-us~12Klog-normal2.5%Crunchbase 2024 founder cohort report; YC 2014–2024 outcomes; PitchBook compensation
medicine-surgical-private~38Kgaussian2.0%BLS OES 29-1242; Medscape 2024 compensation; MGMA Provider Comp 2024
biglaw-partner~28Kgaussian2.5%NALP 2024 Partner Compensation; AmLaw 100 PEP; Major Lindsey & Africa 2024
athlete-major-veteran~3.4Kshifted-pareto0.5%NBA CBA 2023–2030; NFLPA 2024 salary db; Forbes highest-paid athletes
creator-mid-tier~280Klog-normal1.0%ConvertKit Creator Economy 2024; Patreon 2024; YouTube Partner Program; Spotify Loud & Clear
_other-fallback~50Mlog-normal2.0%BLS OES 2024 economy-wide; CPS 2024; Census ACS 2024

Why it mattersv1.1 expands the registry to ~30 cohorts (founder sub-stages, additional medical specialties, finance roles, athletes by sport, creators by platform/audience tier). Roadmap: 3 months from v2.1.0 launch.

Design update v2 — implementation pendingDesign v2 — In progress

Cohort priors are being re-anchored to entry-pool data sources (Census ABS, Kauffman, BLS, AAMC, NCAA, IRS) instead of the funded-survivor sources cited in the table above (Crunchbase, NALP, league veteran salary databases, platform-monetized creator surveys). A survival-mixture mechanism is added to the VHC integration so the engine no longer silently assumes P(stay)=1P(\text{stay}) = 1 forever — for founders the empirical P(stay at 45founder at 22)6%P(\text{stay at 45} \mid \text{founder at 22}) \approx 6\%. See /spec/cohort-priors for the full design specification. Ships in engine v2.3.0 alongside Discount-Rate-v1 in v2.2.0 — both designed to ship together.

The 15-adjustment table

Every adjustment computes a magnitude from issuer evidence, gets clamped to a per-adjustment hard bound, then dampened by the composition coefficient β so a stack of small individually- defensible adjustments can’t compound into a large net shift.

β  =  11+0.5knz\beta \;=\; \frac{1}{1 + 0.5 \cdot k_{nz}}

With 5 simultaneous adjustments, each is dampened to 28.6% of its nominal magnitude. With 10, each falls to 16.7%. Single- adjustment scenarios (k_nz = 1) get 67% of nominal — still dampened, even alone.

Hard caps (post-composition)

  • Anchor TEB shift: ≤ ±30%
  • Segment growth shift: ≤ ±15pp absolute
  • Terminal growth: immune to per-issuer adjustment — cohort-only

The 15 adjustments

#NameApplies toBoundCitation
1Personal CAGR vs cohort growthS1 growth[-10%, +15%]tax-return
2Income volatility (CV of historical TEB)bands[-10%, +30%]tax-return
3Income diversification creditanchor[0, +10%]tax-return
4Documented event-capture historyanchor[0, +15%]transaction-record
5Tax-compliance penaltyanchor[-20%, 0]tax-status
6Employment-gap penaltyS1 growth[-8%, 0]self-disclosed
7Pipeline scale (forward $)anchor[0, +20%]signed-contract
8Brand-momentum credit (proof links)bands[-8%, 0]public-link
9Narrative depth (intent signal)anchor[0, +3%]self-disclosed
10Quarterly-reporting commitmentbands[-5%, 0]platform-attestation
11Commit-letter signedbands[-4%, 0]platform-attestation
12Age-vs-cohort runwayall growth[-5%, +5%](structural)
13TEB-bucket override anchor scalinganchor[-20%, +20%]self-disclosed
14Industry-specificity refinementS1+S2 growth[-5%, +10%](structural)
15Reporting transparency aggregatebands[-6%, 0]platform-attestation

Why it matters11 of 15 adjustments map directly to conviction-score signals. The remaining 4 (5, 12, 13, 14) come from issuer profile attributes that conviction doesn’t score directly but the forecast cares about.

Shared-input dependencies — declared, not double-countedA small number of adjustments consume the same input field as a conviction signal but emit into a different downstream channel. Most prominent example: Adjustment #2 (CV of historical TEB → band width) and conviction signal B2 (CV of historical TEB → drift μ via the composite score). Same evidence (historicalTEB), two orthogonal outputs (band width vs drift). The saturation point (CV = 1.5) is intentionally shared so the two channels agree on the regime boundary. This is not a double-count: the covariance structure is explicit and the downstream effects compose multiplicatively rather than additively. Any future adjustment that consumes a signal-shared input must declare the same orthogonality (which channel it emits to) and survive the shift-equivalence test from the discount-rate spec.

Bayesian shrinkage between personal and cohort

When personal TEB data is sparse, the cohort dominates. As personal data accrues, the engine shifts weight to it. The weight on personal data is monotonically increasing, equal to 0.5 at N=τN = \tau, asymptoting toward 1.0 as NN \to \infty:

wpersonal  =  NquartersNquarters+τcohortw_{\text{personal}} \;=\; \frac{N_{\text{quarters}}}{N_{\text{quarters}} + \tau_{\text{cohort}}}

Sensitivity table

Quarters of personal dataFounder (τ=4)Surgeon (τ=16)
00% (cohort dominates)0% (cohort dominates)
450%20%
867%33%
1680%50%
4091%71%

Why it mattersMaya at N=0N = 0 on the founder cohort: w=0w = 0, cohort dominates 100%. Amara at N=16N = 16 on the surgeon cohort: w=0.5w = 0.5, personal and cohort weighted equally. Same engine, opposite mix — driven by data availability.

Cohort-shape-aware bands

Symmetric (Gaussian) bands are correct for some cohorts and wrong for others. The engine carries a shape tag per cohort and renders bands accordingly:

ShapeBest fit forWhy
GaussianMedicine, biglaw, established professionalsOutcome distribution is bounded by procedural / billable-hour ceilings; symmetric dispersion is empirically reasonable.
Log-normalFounders, creators, fallbackRight tail has long upside (a small fraction reach $1M+ TEB); left tail bounded near zero. Symmetric bands understate upside dispersion.
Shifted-ParetoVeteran athletesCareer cliff at retirement → most decline sharply; a small fraction (broadcasting, equity) extend high earnings. Power-law fits this two-regime structure.

Why it mattersSymmetric bands on a power-law cohort would understate upside and overstate the symmetry of failure. Cohort-shape-aware rendering preserves directional information that matters for how a backer reads the band.

Maya end-to-end · v2.1.0 engine output

Maya: age 20, pre-revenue founder, mostRecentTEB $20K, self-assessed taxes, two proof links, willing to report quarterly. Inputs match the canonical Maya from the math foundation Part 9 — but the engine’s outputs are materially different from the math doc’s authorial numbers (more conservative). See the cold-start note below.

Stage 1 — Cohort lookup

Profession founder, industry contains SaaS, age 20 → cohort founder-pre-seed-bb-saas-us. Shape: log-normal. τ=4\tau = 4 quarters.

Stage 2 — Cohort baseline at age 20

Age 20 falls into the [22, 25] band (lowest defined age band):

anchorcohort(20)  =  {p10=$5K,    p50=$25K,    p90=$75K}\text{anchor}_{\text{cohort}}(20) \;=\; \{\,p10 = \$5\text{K},\;\; p50 = \$25\text{K},\;\; p90 = \$75\text{K}\,\}

Stage 3 — Shrinkage

Maya has no historicalTEB → Nquarters=0N_{\text{quarters}} = 0. Shrinkage weight: wpersonal=0/(0+4)=0w_{\text{personal}} = 0/(0+4) = 0. Cohort dominates 100%. Maya’s mostRecentTEB $20K is honored as the personal anchor mid (overrides the cohort mid since it’s an explicit declaration), but the band shape inherits the cohort dispersion ratio.

Stage 4 — Adjustments triggered

4 of 15 adjustments fire (composition damping β ≈ 0.33):

  • #5 Tax-compliance penalty (self-assessed): -5%
  • #10 Quarterly-reporting commitment: -5% bands
  • #12 Age-vs-cohort runway (age 20, cohort peak ~38): +5% growth
  • #15 Reporting transparency aggregate: -2% bands

Stage 5 — Final TEBForecast

FieldValue
tebAtAnchor.low$4.3K
tebAtAnchor.mid$19.7K
tebAtAnchor.high$58K
growthSchedule[0][0, 3) g_mid = 26.7% (cohort segment 1, post-adj)
growthSchedule[1][3, 7) g_mid = 19.7%
growthSchedule[2][7, 15) g_mid = 8%
growthSchedule[3][15, 30) g_mid = 4%
growthSchedule[4][30, 75) g_mid = 2.5% (terminal tail)
terminalGrowthBands{ low: -1%, mid: 2.5%, high: 5% }
bandShapelog-normal
capsTriggered[] (none)

Stage 6 — priceToken on canonical covenant

Run Maya’s forecast through priceToken with a canonical covenant (s1 = 3%, s2 = 1%, T = 10):

Engine output

Per-token: low $0.33\$0.33, mid $1.53\$1.53, high $4.51\$4.51 · CI width ≈ 273%

Phase 1 PV mid: $1.00/tok\$1.00/\text{tok} (66% of value). Phase 2 PV mid: $0.53/tok\$0.53/\text{tok} (34%).

VHCmid$860KV_{\text{HC}}^{\text{mid}} \approx \$860\text{K}. eeffpiecewise1.78%e_{\text{eff}}^{\text{piecewise}} \approx 1.78\%.

At $60K\$60\text{K} target raise: V^HC=$3.37M\hat V_{\text{HC}} = \$3.37\text{M}, κ3.92\kappa \approx 3.92 speculative tier.

Cold-start noteMaya’s κ here is materially higher than the math foundation Part 9 (which estimated κ ≈ 1.20 anchored). The engine is more conservative because the cohort prior says the median pre-seed founder doesn’t reach the math doc’s aggressive TEB(10) = $600K target — the cohort p50 at age 22–25 is $25K, growing piecewise. This is the engine telling us the truth about cohort-anchored expectations.The math doc’s Maya numbers will reconcile to engine output in a future vault session; this site reflects engine output as the canonical reference.

Dr. Amara end-to-end · DL with personal data

Amara: age 45, orthopedic surgeon, mostRecentTEB $2M, 5 years of historicalTEB (annual quarterly attestable), 3 income streams, clean tax status, willing to report quarterly, commit-letter signed.

Cohort + shrinkage

Profession professional + industry orthopedic → cohort medicine-surgical-private. Shape: gaussian. τ=16\tau = 16 quarters.

5 annual historicalTEB points × 4 quarters/year ≈ N=16N = 16 quarters of personal data. Shrinkage: wpersonal=16/(16+16)=0.50w_{\text{personal}} = 16 / (16 + 16) = 0.50. Personal and cohort weighted equally— opposite of Maya’s pure-cohort case.

Adjustments triggered

6 of 15 adjustments fire — composition damping β ≈ 0.25:

  • #1 Personal CAGR vs cohort: +1.4% S1 growth (Amara’s ~4% trailing CAGR > cohort ~3%)
  • #2 Income volatility (CV of historical TEB): -10% bands (Amara’s 5-yr CV ≈ 0.064, well below the 0.20 stable-issuer threshold)
  • #10 Quarterly-reporting commitment: -5% bands
  • #11 Commit-letter signed: -4% bands
  • #12 Age-vs-cohort runway: +1% (age 45, cohort peak ~50)
  • #15 Reporting transparency aggregate: -4% bands

Adj #2 swing — HHI → CV migration (2026-04)Prior to 2026-04, Adj #2 measured income-stream concentration via HHI. Under that logic, Amara’s 3 streams (0.70/0.20/0.10, HHI = 0.54) produced +2% band widening. The implementation is now realized CV of historical TEB; her stable trajectory ($1.7M → $2.0M over 5 years) gives CV ≈ 0.064, returning −10% band narrowing — a 12pp swing on the raw magnitude, ~3pp post-damping. The spec previously narrated Adj #3 firing at +2.5% for Amara; that was incorrect (HHI 0.54 ≥ 0.45 threshold, so #3 returns 0). Both narrative errors are corrected here. See also: shared-input note for B2 + Adj #2 in §adjustments.

Final forecast + priceToken (DL e=2%)

FieldValue
tebAtAnchor{ low: $1.03M, mid: $2M, high: $3.51M }
growthSchedule (5 segs)ramps at 4.5% → 3.2% → 1.0% → -5% (career decline) → 2.0% terminal
bandShapegaussian
capsTriggered[] (none)
per-token (DL e=2%)low $22.73, mid $42.99, high $74.50 · CI 120%
VHCmid (back-derived)~$21.49M
eeff2.00% (exact, DL window matches VHC)

κ at $400K target raise

V^HC=$400K/0.02=$20M\hat V_{\text{HC}} = \$400\text{K} / 0.02 = \$20\text{M}, κ=$20M/$21.49M0.93\kappa = \$20\text{M} / \$21.49\text{M} \approx 0.93 anchored tier. Below engine-mid by ~7%; auction would clear smoothly.

This number reconciles closely with the math foundation Part 9 (κ = 0.94). Established issuers with substantial personal data don’t diverge from math-doc estimates; cold-start issuers do.

Calibration plan

What can change quarterly

  • Cohort baseline numbers (anchor TEB, segment growth, terminal growth) — refit against realized TEB once cohort crosses ≥ 50 issuer-quarters
  • Adjustment magnitudes — bound stays fixed but the elasticity inside each adjustment can recalibrate
  • Shrinkage τ per cohort — empirical mixing-time calibration
  • Selection-correction λ (Heckman) — activates once each cohort has enough listings to estimate selection

What stays structural (locked)

  • Conviction weighting 0.6 backward / 0.4 forward
  • Piecewise-exponential growth structure
  • 25% cap invariant
  • Hard caps: anchor ±30%, segment ±15pp, terminal cohort-only
  • 15-adjustment table identity (adjustments don’t get added or removed without an engine version bump)

Honest cold-start labelEngine v2.1.0. Calibrated against 0 realized outcomes (cold start). Cohort baselines: analyst priors anchored to public data sources. First quarterly refit targeted once cohort crosses ≥ 50 issuer-quarters of observed data.

Adversarial defenses

AttackDefense
Inflate TEB declarationsTEB-bucket override capped at ±20% (adj 13); also subject to anchor hard cap ±30%
Stack many small credits to compoundComposition damping β = 1/(1 + 0.5·k_nz) — 5 credits each get 28% weight, not 100%
Misclassify cohort to game baselineCohort lookup is deterministic from inputs; classification disputed at review
Front-run a forecast revisionForecast outputs deterministic; no nondeterminism to front-run
Pump conviction signals just before listingConviction signals all require attestation; back-dated attestations flagged at review
Backdate historicalTEBTax-return attestation required for years to count toward shrinkage N
Claim higher growth than cohort allowsPer-segment growth bound ±15pp absolute; no escape valve
Hide income volatilityAdj #2 computes sample CV from tax-return-attested historicalTEB (≥3 positive years required). Concealing volatility means either omitting attested years (caught at review) or having too sparse data → adjustment fails closed → no band-narrowing benefit
Selectively report only good quartersQuarterly reporting attestation triggers cure-period mechanism on missing quarters
Switch industries to find favorable cohortCohort lookup uses profession + industry at time of listing; mid-listing switches require new disclosure

Full attack surface + 10-adversarial-stress-test analysis lives in the design doc Part L.

What this engine does NOT claim

  • Not a prediction of any specific issuer’s actual outcome. The forecast is a band, not a point. Engine output describes cohort-conditioned expectations under analyst priors, not realized future TEB.
  • Not a substitute for issuer due diligence. Backers should still read the disclosure pack. The engine’s job is to anchor pricing, not to vouch for the issuer.
  • Not calibrated against realized data yet. Cohort baselines are analyst priors anchored to public data sources. First refit fires once cohort accrues ≥ 50 issuer-quarters of observed TEB.
  • Not infallible at the cohort boundary. Some issuers don’t cleanly fit any cohort; v1.1 expands the registry, but v1.0 fallbacks may produce wider bands than ideal.
  • Not the price. The auction sets the clearing price; the engine sets the reserve. Market signal can disagree with engine-mid by up to the κ-tier ceiling — and that disagreement is itself information.

Read the full design

The complete design document — including the 30-cohort roadmap, full F-table derivation, 10-adversarial-stress-test analysis, and quarterly-refit operational playbook — lives in PreFlop/wiki/Forecast-Engine-Design.md. v2.1.0 is the first published implementation; v1.1 expands the cohort registry and activates Heckman selection correction.

Related specs