Skip to content

feat: support JSON-native union inputs#3048

Merged
markphelps merged 3 commits into
mainfrom
union-inputs
Jun 5, 2026
Merged

feat: support JSON-native union inputs#3048
markphelps merged 3 commits into
mainfrom
union-inputs

Conversation

@markphelps

Copy link
Copy Markdown
Collaborator

What

Adds support for JSON-native union input types, e.g.:

def predict(self, value: str | float) -> str: ...
def predict(self, value: str | float | None = Input(default=None)) -> str: ...
def predict(self, nums: list[int] | list[float]) -> str: ...

Unions are restricted to JSON-native members (str, int, float, bool, dict/Any, list[T], None) so request validation happens at the HTTP edge against the OpenAPI schema. Unions involving Path, File, Secret, custom coders, and BaseModel are rejected at build/schema-generation time. Output unions remain unsupported.

How

  • pkg/schema — recursive InputType model + resolver; emits OpenAPI anyOf for union inputs. Multi-variant nullable unions stay required when no default is supplied; non-union optionals are unchanged.
  • pkg/predict — CLI -i parsing inspects anyOf members: numeric values parse as numbers for unions that accept one (schemaAcceptsNumber), and a failed numeric parse falls back to the string member for number-first unions like float | str (schemaAcceptsString).
  • python/cog/_adt — deterministic union normalisation with strict per-member compatibility (bool is not int/float, scalars don't pass through dict/Any, list unions validate each element); anyOf json_type emission.
  • docsarchitecture/02-schema.md and docs/python.md document supported members, anyOf, and nullable semantics.

Behavioral notes

  • No runtime primitive coercion. Input validation happens at the HTTP edge via the OpenAPI schema; the worker does not re-coerce primitives. So -i value=1 for str | float stays an int (consistent with how a plain float field already receives a Python int for -i num=10); only fractional values become floats.
  • Nullable means "may be omitted", not "accepts explicit null". nullable: true lets an omitted optional fall back to its default. An explicit JSON null is still validated against the field type and rejected at the HTTP edge — identical to every other optional field today.

Testing

  • Go unit tests: pkg/schema/..., pkg/predict/...
  • Python regression tests: python/tests/test_adt.py (union ordering, bool/int/float overlap, list-element validation, optional/required None handling, empty list, mixed scalar+list)
  • End-to-end txtar integration tests: union_input_http, union_input_cli (both member orderings), union_input_list_http

All passing locally; lint:go clean; ruff clean.

Add support for union input types such as `str | float` and
`str | float | None`. Unions are restricted to JSON-native members
(str, int, float, bool, dict/Any, list[T], None) so request
validation happens at the HTTP edge against the OpenAPI schema.
Unions involving Path, File, Secret, custom coders, and BaseModel
are rejected at build/schema-generation time, and output unions
remain unsupported.

- pkg/schema: add a recursive InputType model and resolver, emit
  OpenAPI anyOf for union inputs, and keep multi-variant nullable
  unions required when no default is supplied
- pkg/predict: parse numeric CLI values for unions that accept a
  number (schemaAcceptsNumber), and fall back to a string member
  when a numeric parse fails for number-first unions like
  `float | str` (schemaAcceptsString)
- python/cog/_adt: add deterministic union normalisation with strict
  per-member compatibility (bool not int/float, scalars not dict/Any,
  list unions validate elements) and anyOf json_type emission
- tests: Go unit tests, Python regression tests, and end-to-end
  txtar integration tests for HTTP, CLI, and list unions
- docs: document union inputs and nullable semantics
@ask-bonk

ask-bonk Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

I'm Bonk, and I've done a quick review of your PR.

Adds support for JSON-native union input types with recursive InputType modelling, OpenAPI anyOf emission, deterministic Python normalisation, and schema-directed CLI parsing.

Issues found (2 related bugs in CLI union parsing):

  1. int | float fractional values error instead of coercing to float — When the resolved schema type is integer (e.g. int | float where int is first in anyOf), a fractional CLI value like -i value=1.5 hits the integer branch, ParseInt fails, and since schemaAcceptsString is false the function returns an error. It should fall back to the float member.

  2. str | int non-integer numeric values incorrectly sent as float — When resolveSchemaType returns string (e.g. str | int), the switch falls through to schemaAcceptsNumber, which tries ParseFloat unconditionally. A value like 1.5 for str | int gets sent as float even though the union doesn't accept floats. It should fall back to string.

Both bugs stem from the schemaAcceptsNumber / integer branch not distinguishing between unions that accept any number vs. unions that only accept integers.

Suggested fixes:

Add a schemaAcceptsFloat helper (mirroring schemaAcceptsString / schemaAcceptsNumber):

func schemaAcceptsFloat(s *openapi3.Schema) bool {
	if s == nil {
		return false
	}
	if s.Type != nil && s.Type.Is("number") {
		return true
	}
	for _, ref := range s.AnyOf {
		if ref.Value != nil && schemaAcceptsFloat(ref.Value) {
			return true
		}
	}
	for _, ref := range s.AllOf {
		if ref.Value != nil && schemaAcceptsFloat(ref.Value) {
			return true
		}
	}
	return false
}

In the integer branch (around line 115), add a float fallback:

case propertySchema.Type.Is("integer"):
	value, err := strconv.ParseInt(val, 10, 32)
	if err != nil {
		// See the number case above: fall back to a string
		// member for unions such as `int | str`.
		if schemaAcceptsString(originalSchema) {
			break
		}
		// Also fall back to a float member for unions such as `int | float`.
		if schemaAcceptsFloat(originalSchema) {
			if value, err := strconv.ParseFloat(val, 32); err == nil {
				float := float32(value)
				input[key] = Input{Float: &float}
				continue
			}
		}
		return input, err
	}
	valueInt := int32(value)
	input[key] = Input{Int: &valueInt}
	continue

In the schemaAcceptsNumber branch (around line 127), gate ParseFloat:

case schemaAcceptsNumber(originalSchema):
	// Union input (anyOf) that includes a numeric member, e.g.
	// `str | float`. Parse numeric-looking values as numbers so
	// the runtime receives the intended type; otherwise fall
	// through to the string member below.
	if value, err := strconv.ParseInt(val, 10, 32); err == nil {
		valueInt := int32(value)
		input[key] = Input{Int: &valueInt}
		continue
	}
	if schemaAcceptsFloat(originalSchema) {
		if value, err := strconv.ParseFloat(val, 32); err == nil {
			float := float32(value)
			input[key] = Input{Float: &float}
			continue
		}
	}

Please also add unit tests for int | float and str | int unions in pkg/predict/input_test.go.

@ask-bonk

ask-bonk Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Posted a review comment to PR #3048 identifying 2 related CLI parsing bugs for union inputs:

  1. int | float with fractional values errors instead of falling back to the float member
  2. str | int with non-integer numeric values incorrectly parses as float instead of falling back to string

Both stem from schemaAcceptsNumber not distinguishing integer-only unions from float-accepting unions. I suggested adding a schemaAcceptsFloat helper and gating ParseFloat behind it, plus adding unit tests for int | float and str | int unions.

github run

Address two related bugs in CLI `-i` parsing of union inputs:

- `int | float` resolves to the integer member first, so a fractional
  value like `1.5` failed ParseInt and errored instead of falling back
  to the float member.
- `str | int` resolves to the string member, then the schemaAcceptsNumber
  branch parsed `1.5` as a float even though the union only accepts an
  integer, sending an invalid float.

Add a schemaAcceptsFloat helper that matches number members but not
integer-only members, and gate float parsing behind it in both the
integer branch (with a float fallback) and the schemaAcceptsNumber
branch. Add unit tests for `int | float` and `str | int` unions.
@markphelps markphelps marked this pull request as ready for review June 4, 2026 22:03
@markphelps markphelps requested a review from a team as a code owner June 4, 2026 22:03
@ask-bonk

ask-bonk Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@markphelps Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

Comment thread architecture/02-schema.md Outdated
| `list[T]` | `{"type": "array", "items": {...}}` | |
| `Optional[T]` | Type T + not in `required` | Input fields only |
| `A \| B` / `Union[A, B]` | `{"anyOf": [A, B]}` | Input-only, JSON-native unions only |
| `A \| None` / `Optional[A]` | Type A + `nullable: true` | Input fields may still be required if no default is supplied |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it's always better to include the default = None if an input filed is optional right? even if we have Optional[A] defined, otherwise the schema might not pick them up as optional?
I don't recall what I did with optional fields previously but I always had default=something, guess it's not a problem with me but just asking

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, and you've actually hit on a real subtlety.

Short answer: Optional[T] / T | None does not need default=None. What it needs is some = Input(...) assignment. The Input(...) is what makes the field omittable; the None default is implicit for optionals.

BUT doing value: Optional[str] (bare, no Input) generates the openAPI schema correctly (not required, nullable: true), but it fails at runtime with no argument with TypeError: missing 1 required positional argument.. so that is actually a preexisting bug that we've been papering over with =Input() which we should fix!

But for now will update the docs so we are consistent everywhere and look into what it will take to actually fix or atleast make it more clear as to why it doesnt work as expected with a better error message or something

dennnnnnnnnis
dennnnnnnnnis previously approved these changes Jun 5, 2026
pfrohlich
pfrohlich previously approved these changes Jun 5, 2026
The previous table conflated plain single-type optionals with
multi-variant nullable unions. A plain Optional[T] / T | None is never
placed in required, while a multi-variant union like A | B | None stays
required unless a default is supplied. Split these into separate rows
and add a runtime caveat: an optional needs a Python-level default
(via = Input(...) or default=None) so an omitted value resolves to
None; a bare Optional[T] annotation raises TypeError when omitted.
@markphelps markphelps dismissed stale reviews from pfrohlich and dennnnnnnnnis via ad283fe June 5, 2026 12:55
@ask-bonk

ask-bonk Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

LGTM

github run

@markphelps markphelps merged commit 63c65df into main Jun 5, 2026
45 checks passed
@markphelps markphelps deleted the union-inputs branch June 5, 2026 13:26
markphelps added a commit that referenced this pull request Jun 5, 2026
The union-inputs feature (#3048) added input union support, but the
"Type limitations" section still claimed only Optional[T] was supported,
contradicting the new Union section. Scope the limitation correctly:
output unions remain unsupported, JSON-native input unions are
supported, and input unions of Path/File/Secret/custom-coder/BaseModel
members fail at build. Also add the missing Union table-of-contents
entry and regenerate llms.txt.
markphelps added a commit that referenced this pull request Jun 5, 2026
* test: add fuzzing for input type resolution and OpenAPI generation

Add fuzz coverage for the schema-gen codepaths exercised by union
inputs, which previously had no fuzzing and no validity oracle:

- FuzzResolveInputType: feeds arbitrary TypeAnnotation trees through
  ResolveInputType, then validates that any successfully resolved input
  type generates an OpenAPI document the build-time validator accepts.
- FuzzInputTypeJSONSchema: builds arbitrary InputType trees directly
  (reaching shapes the resolver never produces) and validates the
  generated OpenAPI document.

Both use an assertValidOpenAPI oracle (the same kin-openapi validator
as writeAndValidateSchema), so a union shape that resolves cleanly but
emits an invalid schema (e.g. an unsupported `type: null` branch) fails
the fuzzer rather than surfacing as a confusing user build error.

Make the test:fuzz task auto-discover every Fuzz* target via
`go test -list` instead of hardcoding names, so new fuzz tests are
picked up automatically. This also surfaced targets the hardcoded list
missed (pkg/config's three targets and two parser helpers): 11 total
vs the 4 previously run. Manage jq (used for discovery) as a mise tool
and simplify the CI fuzz-go job to call the task.

* docs: correct union input type support in python.md

The union-inputs feature (#3048) added input union support, but the
"Type limitations" section still claimed only Optional[T] was supported,
contradicting the new Union section. Scope the limitation correctly:
output unions remain unsupported, JSON-native input unions are
supported, and input unions of Path/File/Secret/custom-coder/BaseModel
members fail at build. Also add the missing Union table-of-contents
entry and regenerate llms.txt.

* test: use testify require in assertValidOpenAPI

Replace raw t.Fatalf with require.NoError per the project testing
conventions (AGENTS.md).
@markphelps markphelps mentioned this pull request Jun 5, 2026
markphelps added a commit that referenced this pull request Jun 8, 2026
* origin/main:
  Bump version to 0.21.0-rc.3 (#3050)
  test: fuzz schema generation for union inputs + docs fixup (#3049)
  feat: support JSON-native union inputs (#3048)
  ci: pin mise version in release workflows (#3046)
  chore: regen lockfile
  Bump version to 0.21.0-rc.2 (#3045)
  fix: mount weights in cog serve like cog run does (#3044)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants