Skip to content

Type hints overhaul#352

Open
OutSquareCapital wants to merge 18 commits intoduckdb:v1.5-variegatafrom
OutSquareCapital:expr-typing
Open

Type hints overhaul#352
OutSquareCapital wants to merge 18 commits intoduckdb:v1.5-variegatafrom
OutSquareCapital:expr-typing

Conversation

@OutSquareCapital
Copy link

@OutSquareCapital OutSquareCapital commented Feb 27, 2026

This PR provides numerous improvements and one bugfix regarding type hints.

This is my follow-up to our discussion here @evertlammerts:
#341 (comment)

All changes are only in the type stubs, which means that there's no impact whatsoever on any runtime logic.

Changes

  1. New _expression.pyi file to separate the Expression class, and allow circular imports and references. Leans-up a bit the __init__ file, which is nice.
  2. Two new Protocol for numpy array and types. Allow to type check those without emitting errors if the user doesn't have the library installed. Array is useful for Expression conversions, Dtype for DuckDBPyType conversions.
  3. Refactored and expanded the _ExpressionLike type alias. Renamed it to IntoExpr, and added various new type aliases covering as much situations as possible for Expression conversions.
  4. Added a few Literals to cover the ids and str conversions to DuckDBPyType, providing a nice autocompletion for arguments, and a nice interaction with pattern matching when checking the id value.
    Also, provide JSON and BIGNUM convenient instanciation as an added bonus (ATM they are absent from sqltypes constants).
  5. Added various new type aliases, to cover all paths for DuckDBPyType conversions: as python/numpy static type hints, as dict instances, or as Literal | str. This significantly improve the types hints regarding datatypes arguments, who were very often only accepting str or DuckDBTypes in the signatures.
  6. Added various new Literal for files methods/functions argument options.
  7. Centralized type aliases, Literals, and Protocols in a _typing.pyi file, to avoid bloating the __init__.
  8. Added a new CppEnum class to reduce code duplication for enum-like classes, and centralized them in a new _enum.pyi file.
  9. Fixed the StatementType class who had incorrect values (no _STATEMENT at the end of the member names)

Notes

  • I tried to document this as best as I could with docstrings for users and "private" comments.
    I left a few observations, but what I would add is that one thing is clear, the runtime accepted types are all over the place (sometimes Mapping is ok, sometimes only dict is ok, etc...).
    As I said in Typing stubs are too strict about arguments of type Expression #341 , prioritizing collections.abc as much as possible would be the best way to go in the future.

  • Centralizing the type aliases and using them as much as possible make sense IMO, especially with an API that have repeated signatures (connexion methods vs module level function for example).

  • The next step would be to move the type definition in a concrete .py file, allowing user to import those if they want to annotate custom functions or do runtime type introspection.

…d allow circular imports between files.

- added  nested dtypes, bytesarray, and memoryview as literal, convertible python types
- PythonLiteral is a recursive type, to allow dict of list, list of list, etc...
- _ExpressionLike -> IntoExpr
- Expression | str -> IntoExprColumn
…mpy ndarray without creating unknown type errors if the library isn't installed in the venv
- Using IntoExprColumn on StarExpression
- fixed lhs type for LambdaExpression, and value type for ConstantExpression
- fixed all places where it was too narrow. Most of the time str are accepted for sqltypes. odd exception seems to be the map method on Relation
- using Self for annotations on arguments when pertinent
- reorganized expressions/values conversions types, improved their doc
- added Literals for sqltypes ids and string conversion, and various type aliases, covering all paths.
- using aformentionned literals in _sqltypes signatures
- added various new literals for files arguments
- moved join "how" literal in _typing for centralization
- renamed IntoNestedDType -> IntoFields
- added all new literals and type aliases in the main init file
- Builtins Literal had incorrect values for time/timestamp with time zone
- typos fixes
- renamed `DType` for Literals to `PyType` to keep the naming conventions consistent
- Fixed StatementType members, they had incorrect values. the "_STATEMENT" part was only on the C++ side, not on the python side
- Moved all enums in __init__ file in a new _enums.pyi file, to avoid bloating the init file
- Created a new CppEnum Protocol, and used it as a base class for all public enums to reduce duplication.
- Created literals type and using them as argument in conjunction of the corresponding enum whenever pertinent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant