GEP 9 — Runtime Type Checking and the User/Canonical Type Split#

Author

Hans-Martin von Gaudecker

Status

Accepted

Type

Standards Track

Created

2026-05-23

Resolution

Accepted

Abstract#

  • GETTSIM today has limited runtime type checking. Mismatched user inputs surface as cryptic TypeErrors from deep inside the DAG — or worse, as silent numerical bugs (TTSIM #97).

  • This GEP adopts beartype as a runtime type checker that automatically verifies every annotated function in ttsim, gettsim, and gettsim-personas against its declared signature. Users get curated errors at the boundary they wrote, not at an internal helper six frames deep.

  • GETTSIM/TTSIM become explicit about the types they expect and about how user inputs (a wide vocabulary, e.g. pd.Series or Python scalars) are canonicalised into a single internal vocabulary (numpy / jax arrays).

  • The convention “every @policy_function carries full type annotations” — already universally followed in gettsim — is promoted to a decoration-time check, so any future omission is caught at the function’s definition site with a clear PolicyFunctionDefinitionError.

Terminology#

  • clawbeartype’s import-hook mechanism: a single beartype_package() call installs an AST rewriter that automatically applies @beartype to every annotated function in the package. No per-file decorator, no opt-in list.

  • runtime check — a guard that runs at function call time and validates the argument values against the function’s declared type annotations.

  • boundary — a function the user calls directly: main(), InputData.df_and_mapper, the @policy_function decorator, and similar entry points.

Motivation and Scope#

The TTSIM DAG accepts a wide range of objects as inputs — pandas Series, numpy arrays, Python scalars, JAX arrays — and converts them internally into a narrower, performance-oriented representation: jaxtyping-shaped JAX or numpy arrays for columns and Python or numpy scalars for parameters. Today this distinction is implicit. typing.py exposes a single Array-based vocabulary, the canonical internal types are not named separately from their user-friendly supersets, and no runtime check guarantees compliance. Four problems follow:

  1. Silent type drift. A policy function annotated int that is invoked with a jax.Array runs silently today; the mismatch only surfaces if a downstream JAX-specific operation fails on the unexpected dtype, often far from the offending input. Annotations are documentation, not specification.

  2. Scattered canonicalisation. Every code path that takes user input re-implements the cast from pd.Series or Python scalar or numpy scalar to a canonical numpy / JAX form. The conversions are scattered, and nothing enforces that they agree.

  3. Indistinguishable bug classes. When a TT DAG raises TypeError, the user cannot tell whether they passed bad data, mis-declared a policy function, or hit an internal TTSIM bug. There is no exception vocabulary that maps to architectural layers.

  4. Past silent bugs. Missing or imprecise type checks have caused real, hard-to-diagnose bugs (e.g. TTSIM #97).

These cost real time during model development and during workshop teaching.

Scope. The GEP covers ttsim, gettsim, and gettsim-personas. The @policy_function dual-mode contract (scalar default vs. column-direct via vectorization_strategy="not_required") is touched here only insofar as the claw makes it enforceable; the full contract is specified in a separate update to GEP 4.

Usage and Impact#

Users see the same API, with sharper errors#

A miss-typed input still raises a TTSIMError subclass, but now with the beartype violation message attached. Calling main() with policy_date_str set to a datetime.date instead of a string raises EntryPointError at the boundary, not AttributeError six frames deep. Passing a pandas Series with object dtype where a FloatColumn is expected raises InputDataError. Writing

@policy_function(start_date="2025-01-01")
def betrag_m(anzahl: int, satz: float) -> float:
    return anzahl * satz

and accidentally calling it with vectorization_strategy="not_required" raises PolicyFunctionDefinitionError at decoration time — the scalar annotations are incompatible with column-direct execution.

Wider boundary types, narrower internal types#

  • A broad set of input types is accepted at the boundary. Their collective name is UserX, where X is FloatColumn, IntColumn, BoolColumn, etc.

  • Each UserX is converted by an explicit _canonicalize_* function into a single internal representation (e.g., a pd.Series of unsigned ints becomes a numpy int64 array). The internal type collection is named X.

  • This formalises how types are reasoned about inside TTSIM and pins the conversion to one named function per boundary. The public API does not change in shape: users still pass the wide forms.

Same runtime, more discoverable failures#

The claw adds an O(n) container check on entry to every clawed function, but TTSIM’s entry points are called rarely (per-run, not per-row), so the cost is invisible at the boundary.

Backward Compatibility#

Existing user code keeps working unchanged in shape. Three narrowed claims:

  • With the claw on by default for everyone (Option A; see Discussion), a user reform or pipeline that previously relied on an implicit type coercion beartype rejects now raises at the boundary where the mismatched value enters, instead of running silently. This did not occur in our own test suite during the default-on trial. A user who needs the pre-GEP behaviour while triaging can opt out with GETTSIM_BEARTYPE_CLAW=0 (or the TTSIM_BEARTYPE_CLAW analogue).

  • Code that caught TypeError or ValueError from inside TTSIM should broaden to TTSIMError (or the relevant subclass). Code that catches Exception is unaffected. Two pre-existing exception types are hoisted into the hierarchy without changing their definition site: ConflictingActivePeriodsError and TranslateToVectorizableError. Both keep their original import path.

  • Every @*_function decorator now requires a type annotation on every parameter and on the return value. This is the convention every existing gettsim policy function already follows. Pre-GEP, missing annotations were tolerated unevenly: @policy_function silently fell back to a wide default union via dags, while @policy_input raised a KeyError at decoration time. Post-GEP, all five @*_function decorators raise their matching *DefinitionError at decoration time, so the convention is enforced uniformly.

Internal code that passes wide types into narrow-typed functions surfaces as BeartypeCallHintViolation from the package-wide claw. These are pre-existing TTSIM bugs to fix at the call site, not user-facing changes.

Detailed Description#

The type vocabulary#

ttsim.typing exposes three layers:

# Narrow canonical column aliases — what flows on the TT DAG.
FloatColumn: TypeAlias = Float[Array | np.ndarray, " n_obs"]
IntColumn: TypeAlias = Int[Array | np.ndarray, " n_obs"]
BoolColumn: TypeAlias = Bool[Array | np.ndarray, " n_obs"]

# Narrow canonical scalar aliases — what flows out of param processing.
ScalarFloat: TypeAlias = float | np.floating
ScalarInt: TypeAlias = int | np.integer
ScalarBool: TypeAlias = bool | np.bool_

# Wide user-boundary aliases — what `main()` and friends accept.
UserFloatColumn: TypeAlias = FloatColumn | pd.Series
UserIntColumn: TypeAlias = IntColumn | pd.Series
UserBoolColumn: TypeAlias = BoolColumn | pd.Series
UserScalarFloat: TypeAlias = float | int | np.floating | np.integer
UserScalarInt: TypeAlias = int | np.integer
UserScalarBool: TypeAlias = bool | np.bool_

The column aliases use the Array | np.ndarray union so the same vocabulary covers both backends. This is the single source of truth for column shapes; callers do not branch on the backend.

The aliases live at module top level, not under if TYPE_CHECKING. The claw needs them at runtime to rewrite call sites.

The wide forms are restricted to the user boundary. Inside TTSIM, the narrow forms are the rule. Conversions are funnelled through explicit _canonicalize_* helpers — one per boundary — typed UserX X. Outside these helpers, no code converts pandas Series to JAX arrays or numeric promotes Python scalars to numpy scalars on the fly.

The exception hierarchy#

ttsim.exceptions defines a single root and one subclass per architectural boundary:

class TTSIMError(Exception): ...


class EntryPointError(TTSIMError): ...


class InputDataError(TTSIMError): ...


class TTTargetsError(TTSIMError): ...


class PolicyFunctionDefinitionError(TTSIMError): ...


class PolicyInputDefinitionError(TTSIMError): ...


class ParamFunctionDefinitionError(TTSIMError): ...


class AggregationDefinitionError(TTSIMError): ...


class GroupCreationDefinitionError(TTSIMError): ...


class RoundingSpecError(TTSIMError): ...

gettsim reuses the hierarchy without adding a GETTSIMError of its own. gettsim-personas adds one class, PersonaDefinitionError(TTSIMError), for persona-construction validation.

Per-component beartype configurations#

ttsim._beartype_conf builds one BeartypeConf per exception class. The violation_param_type argument is the beartype hook that maps type-check failures to the documented project exception:

from beartype import BeartypeConf, BeartypeStrategy

from ttsim.exceptions import (
    AggregationDefinitionError,
    EntryPointError,
    GroupCreationDefinitionError,
    InputDataError,
    ParamFunctionDefinitionError,
    PolicyFunctionDefinitionError,
    PolicyInputDefinitionError,
    RoundingSpecError,
    TTSIMError,
    TTTargetsError,
)


def _conf(exc: type[TTSIMError]) -> BeartypeConf:
    return BeartypeConf(
        violation_param_type=exc,
        strategy=BeartypeStrategy.On,
        is_pep484_tower=True,
    )


ENTRY_POINT_CONF = _conf(EntryPointError)
INPUT_DATA_CONF = _conf(InputDataError)
TT_TARGETS_CONF = _conf(TTTargetsError)
POLICY_FUNCTION_CONF = _conf(PolicyFunctionDefinitionError)
POLICY_INPUT_CONF = _conf(PolicyInputDefinitionError)
PARAM_FUNCTION_CONF = _conf(ParamFunctionDefinitionError)
AGGREGATION_CONF = _conf(AggregationDefinitionError)
GROUP_CREATION_CONF = _conf(GroupCreationDefinitionError)
ROUNDING_SPEC_CONF = _conf(RoundingSpecError)

INTERNAL_CONF = BeartypeConf(
    strategy=BeartypeStrategy.On,
    is_pep484_tower=True,
)

The On strategy validates every entry of every container so a bad row inside a dict-of-columns is reported rather than sampled past. is_pep484_tower=True keeps the PEP 484 numeric tower active so that an int argument satisfies a float parameter — the same implicit promotion that Python and ruff’s PYI041 both assume.

INTERNAL_CONF is the default for the package-wide claw. Its violations surface as beartype’s own BeartypeCallHintViolation, marking them as internal bugs.

The package-wide claw#

Each package’s __init__.py registers the claw before any submodule loads:

# src/ttsim/__init__.py — top of file, before any ttsim.* import
from beartype.claw import beartype_package

from ttsim._beartype_conf import INTERNAL_CONF

beartype_package("ttsim", conf=INTERNAL_CONF)

# ...remaining imports

beartype_package installs an AST rewriter against the package’s import hook. Every subsequent import ttsim.* produces a module whose annotated callables wrap themselves in a beartype check on load. There is no per-file decorator, no opt-in list, and no way to forget a function. gettsim and gettsim-personas do the same with their own root packages and their own INTERNAL_CONF.

Explicit decorators at user boundaries#

The package claw catches every internal mistake. User-facing entry points and decorator factories stack an explicit @beartype(conf=<COMPONENT_CONF>) on top so violations there surface as the documented project exception, not as BeartypeCallHintViolation. The explicit decorator wins at its call site.

The user boundaries covered are:

  • ttsim.main()ENTRY_POINT_CONF

  • InputData.df_and_mapper, InputData.tree, and any sibling factories — INPUT_DATA_CONF

  • TTTargets.tree, TTTargets.qnames, and siblings — TT_TARGETS_CONF

  • @policy_function (the decorator factory; checks meta-arguments such as start_date, end_date, vectorization_strategy) — POLICY_FUNCTION_CONF

  • @policy_input, @param_function, @agg_by_group_function, @agg_by_p_id_function, @group_creation_function — their matching confs

  • RoundingSpec dataclass — ROUNDING_SPEC_CONF

The five @*_function decorators (@policy_function, @param_function, @agg_by_p_id_function, @agg_by_group_function, @group_creation_function) additionally require the wrapped function to carry an annotation on every parameter and on the return — missing annotations raise PolicyFunctionDefinitionError (or the decorator’s matching analogue) at decoration time, so the stack trace points at the function’s definition site rather than at an internal DAG-build helper.

The auto-vectorized-wrapper annotation problem#

Scalar policy functions are wrapped at DAG-build time by ttsim.tt.vectorization and ttsim.tt.rounding. The wrapper closes over the user function but is itself called on columns. If the wrapper inherits the user function’s scalar annotations via functools.wraps, the claw checks column inputs against scalar annotations and rejects every legitimate call.

The fix uses two layers. The inner runtime executor — numpy.vectorize, the AST-rewrite output, or the rounding callable — wraps the user function with functools.wraps(func, assigned=...) that omits __annotations__ and __annotate__ (the PEP 649 deferred alias). This layer carries no annotations and is never beartype-decorated; its only job is to apply the auto-vectorisation or rounding logic.

import functools

# Module-level so other wrappers can re-use it.
_WRAPPER_ASSIGNMENTS_NO_ANNOTATIONS = tuple(
    a for a in functools.WRAPPER_ASSIGNMENTS
    if a not in ("__annotations__", "__annotate__")
)


@functools.wraps(func, assigned=_WRAPPER_ASSIGNMENTS_NO_ANNOTATIONS)
def wrapper(...): ...

The outer layer is a real-parameter forwarder synthesised at DAG-build time. It mirrors the wrapped function’s parameter list verbatim (same names, same order) and forwards every argument positionally. Two channels of annotation live on it:

  • __signature__ carries the narrow per-kind column-type strings (FloatColumn, IntColumn, BoolColumn). dags’ annotation-consistency check reads these and distinguishes a producer typed IntColumn from a consumer expecting BoolColumn.

  • __annotations__ carries the wide numeric-or-scalar union — FloatColumn | IntColumn | BoolColumn | ScalarFloat | ScalarInt | ScalarBool | 0-d-array. beartype compiles its runtime check against this wider type so the boundary catches structural misuse (a string / mapping / None reaching a numeric node) without enforcing exact array dtype.

The forwarder is defined with its __module__ pointed at ttsim.typing, so beartype resolves the column-type strings against the module where the aliases live (rather than the user-function module where they are not importable). It is then decorated with @beartype(conf=INTERNAL_CONF). The result is what the DAG sees and consumers call: beartype catches structural misuse at this boundary, and dags sees concrete column types for the consistency check.

Both vectorize_function and RoundingSpec.apply_rounding build this outer forwarder through a shared helper (build_beartype_checkable_wrapper) so the synthesis pattern stays single-source.

Forward references, from __future__ import annotations, and recursive aliases#

from __future__ import annotations defers all annotations to strings and breaks the claw’s runtime resolution. While Python 3.14’s PEP 649 deferred evaluation makes the pragma unnecessary, at the time of writing we still support 3.11–3.13, so the pragma stays. The trade is local: only the specific names beartype must resolve at decoration time are lifted out of TYPE_CHECKING blocks and into runtime scope — column aliases, scalar aliases, User* aliases, DashedISOString, Callable, Any, ModuleType, datetime, and the few NestedX families that decorated boundaries reference directly. Everything else stays in TYPE_CHECKING to avoid import-cycle costs. A future bump to requires-python = ">=3.14" will let the pragma go and the hoists with it.

Two annotation shapes resist the strip even after hoisting:

  1. Recursive aliases. NestedData = Mapping[str, "FloatColumn | ... | NestedData"] and its siblings (NestedTargetDict, NestedLookupDict, NestedStrings, PolicyEnvironment, FlatPolicyEnvironment) contain stringified inner references that beartype’s runtime forward-ref resolver cannot evaluate. The two-definition pattern resolves them:

    if TYPE_CHECKING:
        NestedData = Mapping[str, "FloatColumn | IntColumn | BoolColumn | NestedData"]
    else:
        NestedData = Mapping[str, object]
    

    ty and IDE tooling see the narrow recursive form; beartype sees a coarse runtime form that always accepts the shape. The runtime check on these specific aliases degrades to “is a mapping with string keys” — weaker than the static type but consistent with the wider claw’s intent to surface structural rather than per-leaf violations on nested trees.

  2. PEP 612 ParamSpec. def __call__(self, *args: P.args, **kwargs: P.kwargs) -> R: is unresolvable under stringified annotations + the claw. The affected methods—InterfaceFunction.__call__/ColumnFunction.__call__/ParamFunction.__call__— are decorated @no_type_check until the migration to PEP 695 generic syntax (which allows the typing machinery to live without the from __future__ pragma).

Limitations#

  • Callable-instance binding under normal import. @policy_function and similar decorators return a callable dataclass instance (PolicyFunction) that wraps the raw function. The claw rebinds module-level names that point at such callable instances: under normal import, the module-level name becomes a bound method of __call__, so isinstance(x, PolicyFunction) fails. The standard policy-module loader (orig_policy_objects.pyimportlib.util.spec_from_file_location) bypasses the claw, so policy modules loaded via main(orig_policy_objects=OrigPolicyObjects(root=...)) are unaffected. The binding only bites users who from my_pkg.policy_module import my_fn and then perform isinstance checks on the imported name. See beartype.md (“The claw binds decorator-produced callable instances”) for the workaround (use the claw-free loader, or read .function off the bound method).

Implementation#

The pattern — package-wide beartype claw, TTSIMError exception hierarchy, wide UserX types at user boundaries narrowing to canonical X types internally — is implemented across ttsim, gettsim, gettsim-personas, and pylcm. Each package’s __init__.py calls beartype_package(...) behind an env-var gate (TTSIM_BEARTYPE_CLAW and GETTSIM_BEARTYPE_CLAW; gettsim-personas reuses GETTSIM_BEARTYPE_CLAW rather than a separate switch). Following acceptance of this GEP (Option A), the gate defaults to on: every import installs the claw, so a user writing a reform, a custom @policy_function, or a pipeline on top of GETTSIM gets the runtime check without doing anything. The env var stays in place as an opt-out (GETTSIM_BEARTYPE_CLAW=0 disables the claw for one process or environment) so anyone who hits a false positive — or who wants the pre-GEP behaviour for their own code — can unblock themselves while the rejection is triaged. Every pixi test environment continues to set the env var explicitly, so CI exercises the claw on every run regardless of the default. See the Discussion section for the resolution.

.ai-instructions/modules/beartype.md documents the conventions contributors follow: when to use UserX vs X, how to add a new boundary decorator, the wrapper-annotation rule, the claw-and-callable-instance gotcha, and the diagnostic workflow when a beartype violation surfaces. The module is included in the tier-a profile by default so every agent picks it up.

Alternatives#

Module-level @beartype decorators instead of a package claw#

Decorating each module’s functions individually keeps the registration explicit but leaves it possible to forget. The package claw makes coverage a property of import, not of discipline. Pylcm tried the per-module approach first and migrated to the claw.

A single TTSIMError with code= attribute#

A flat exception with a discriminator is shorter to write but harder to catch selectively, harder to grep for, and harder to document on a per-call site basis. The named hierarchy maps one-to-one onto user-facing decorators and is the convention pylcm chose.

Keep scalar annotations on auto-vectorized wrappers, suppress the claw on them#

Possible via a per-function opt-out (@beartype(conf=BeartypeConf(...)) with claw_skip_mandatory_conf=True). Rejected because the wrappers are the exact site at which precise column-typed annotations can be synthesised at DAG-build time; an opt-out would have left the boundary unchecked even when the synthesis is mechanical. The layered inner-strip / outer-synthesised-forwarder pattern keeps the boundary beartype-checkable.

Validate vectorization_strategy consistency at TT-DAG-build time#

Possible, but later in the lifecycle than at @policy_function decoration time. Validation at decoration gives the user a stack trace pointing at their function definition, not at an internal DAG-build helper. The full contract specification lives in the GEP 4 update.

Custom per-function type checks instead of runtime checking#

The pre-GEP approach — _fail_if_* helpers in fail_if.py per validated input — remains valid but does not scale. Adding a check requires editing a separate file and remembering to wire it in. The claw eliminates the wiring step; the cost is exactly one mandatory annotation per parameter, which the codebase already carries.

Leave annotations on user-written functions optional#

Possible: _fail_if_missing_annotations could downgrade to a warning, and beartype would silently skip un-annotated parameters. Rejected because the goal — “every function in the package is checked” — degrades into “some functions are checked, some are not”, with no in-band signal to the reader that the missing check is deliberate vs. accidental. Since every existing gettsim policy function already carries full annotations, the strict policy enforces an existing convention, not a new requirement.

Discussion#

Resolved 2026-06-02 in favour of Option A — runtime checking on by default for everyone — with an explicit env-var opt-out retained.

A genuinely open question at the point of asking for feedback on this GEP was the default for user-written code: reforms, custom @policy_functions, and microsim pipelines built on top of GETTSIM. Our own packages (ttsim, gettsim, and gettsim-personas) are already checked on every commit because the claw is on in every test environment; the decision was what import gettsim should do in a user’s own process.

  • Option A — on by default for everyone (accepted). Dropping the opt-in means every import gettsim / import ttsim / import gettsim_personas installs the claw automatically, so a type error is surfaced loudly at the boundary the user wrote, without them doing anything. This matters most for AI-assisted development: agents writing reforms see type violations immediately, at the boundary, with a message they can iterate on — rather than producing code that runs silently with the wrong types and surfaces the bug far from its cause. In a world where most new GETTSIM code is written with AI assistance, the users least likely to set an opt-in flag are exactly the ones whose code most needs the check.

  • Option B — opt-in for user code (rejected). Keeping the switch off by default removes all risk of a user pipeline breaking, but leaves enforcement conditional on the user remembering to set an env var. “Your inputs are checked unless someone forgot to set an env var” is not the guarantee the GEP promises, and it withholds the safety net from precisely the AI-assisted workflows that benefit most.

  • Opt-out caveat (Christian Pugnaghi Zimpelmann). “Remove the switch” was narrowed to “flip the default and keep the switch as a fallback”: the env var stays as an opt-out, so GETTSIM_BEARTYPE_CLAW=0 (and the TTSIM_BEARTYPE_CLAW analogue) disables the claw for one process or environment. Anyone who hits a false positive — or who wants the pre-GEP behaviour for their own code — can unblock themselves while the rejection is triaged.

References and Footnotes#