Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ mvn -q -Dexec.mainClass=lakehouse.AzureDemo test-compile exec:java

The `MAVEN_OPTS` flag is required for Apache Arrow on Java 17+.

For ADBC support details, see the [ADBC coverage matrix](docs/adbc-coverage.md).

### Run the live backend tests

The live backend pytest is opt-in because it queries the deployed Azure Container App and reads the `lakehouse-password` secret from Key Vault:
Expand All @@ -157,7 +159,7 @@ LAKEHOUSE_LIVE_BACKEND=1 LAKEHOUSE_LIVE_BACKEND_ADBC_BASIC=1 \
uv run pytest -q tests/test_live_azure_backend.py
```

The ADBC Basic check is marked `xfail` because that is the known client path currently failing against the deployed Container App. A result such as `1 passed, 1 xfailed` means the supported bearer smoke test passed and the tracked ADBC Basic issue reproduced as expected. If that changes to `1 passed, 1 xpassed`, the ADBC Basic path has started working and the `xfail` marker should be removed.
The ADBC Basic check is marked `xfail` because that is the known client path currently failing against the deployed Container App. A result with all bearer-path tests passing and one `xfailed` direct-Basic test means the supported ADBC live checks passed and the tracked ADBC Basic issue reproduced as expected. If that changes to `xpassed`, the ADBC Basic path has started working and the `xfail` marker should be removed.

If you want one copy/paste block for the demo itself:

Expand Down Expand Up @@ -250,16 +252,26 @@ pip install adbc-driver-flightsql
Connect to your Azure deployment:

```python
import base64
import pyarrow.flight as fl
import adbc_driver_flightsql.dbapi as flight_sql
from adbc_driver_flightsql import DatabaseOptions

endpoint = "grpc+tls://ca-lakehouse-xxxxx.centralus.azurecontainerapps.io:443"
token = base64.b64encode(b"lakehouse:<your-password>").decode()

# Supported Azure path: use PyArrow for the Basic-token handshake, then
# pass the returned Bearer token to ADBC.
client = fl.connect(endpoint)
header_name, bearer_header = client.authenticate_basic_token("lakehouse", "<your-password>")
if isinstance(header_name, bytes):
header_name = header_name.decode()
if isinstance(bearer_header, bytes):
bearer_header = bearer_header.decode()
if header_name.lower() != "authorization" or not bearer_header.startswith("Bearer "):
raise RuntimeError("Basic auth did not return a Bearer authorization header")

conn = flight_sql.connect(
endpoint,
db_kwargs={DatabaseOptions.AUTHORIZATION_HEADER.value: f"Basic {token}"},
db_kwargs={DatabaseOptions.AUTHORIZATION_HEADER.value: bearer_header},
)

cursor = conn.cursor()
Expand All @@ -269,6 +281,8 @@ print(cursor.fetchall())

You should see the rows inserted by the JDBC demo, or an empty result if you have not run it yet.

The direct Basic-auth ADBC path is tracked separately as an opt-in `xfail` live test. Do not use it as the supported Azure ADBC example until that test is promoted.

## What Just Happened?

```text
Expand Down Expand Up @@ -499,15 +513,18 @@ docker run -p 31337:31337 -v ./data:/data lakehouse serve \

## Flight SQL Protocol Support

Lakehouse implements all standard Flight SQL RPCs:
Lakehouse implements the following Flight SQL operations:

| Category | Supported Operations |
| -------- | -------------------- |
| **Queries** | `CommandStatementQuery`, `CommandStatementUpdate`, `CommandStatementSubstraitPlan` |
| **Queries** | `CommandStatementQuery`, `CommandStatementUpdate` |
| **Prepared Statements** | `ActionCreatePreparedStatementRequest`, `ActionClosePreparedStatementRequest`, `CommandPreparedStatementQuery`, `CommandPreparedStatementUpdate` |
| **Catalog Metadata** | `CommandGetCatalogs`, `CommandGetDbSchemas`, `CommandGetTables`, `CommandGetTableTypes`, `CommandGetPrimaryKeys`, `CommandGetExportedKeys`, `CommandGetImportedKeys`, `CommandGetCrossReference` |
| **SQL Info** | `CommandGetSqlInfo`, `CommandGetXdbcTypeInfo` |
| **Transactions** | `ActionBeginTransactionRequest`, `ActionEndTransactionRequest`, `ActionBeginSavepointRequest`, `ActionEndSavepointRequest` |
| **Schemas** | Flight `GetSchema` for statement queries, prepared statements, and metadata commands |
| **Transactions** | `ActionBeginTransactionRequest`, `ActionEndTransactionRequest` |

Substrait and savepoint commands are intentionally not listed as supported. See the [ADBC coverage matrix](docs/adbc-coverage.md) for ADBC-specific support and limitations.

---

Expand Down
47 changes: 47 additions & 0 deletions docs/adbc-coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# ADBC Coverage Matrix

This matrix documents ADBC compatibility for this repository's Flight SQL server backed by DuckDB/DuckLake. It is scoped to server-visible behavior through the Python `adbc-driver-flightsql` client, not to implementing the ADBC C ABI inside this server.

Primary sources:
- [ADBC API Standard](https://arrow.apache.org/adbc/current/format/specification.html)
- [ADBC Flight SQL driver](https://arrow.apache.org/adbc/current/driver/flight_sql.html)
- [ADBC driver implementation status](https://arrow.apache.org/adbc/current/driver/status.html)
- [Canonical `adbc.h`](https://raw.githubusercontent.com/apache/arrow-adbc/main/c/include/arrow-adbc/adbc.h)
- [Arrow Flight SQL protocol](https://arrow.apache.org/docs/format/FlightSql.html)

Status values:
- `covered`: implemented and exercised by repo tests or live smoke tests.
- `partial`: implemented for the main path, with known limitations.
- `unsupported-by-server`: the Flight SQL server does not implement this feature.
- `not-supported-by-python-flightsql-driver`: the server may expose a Flight SQL feature, but the Python ADBC Flight SQL driver does not currently expose a matching ADBC path.
- `client-only/N/A`: ADBC driver manager, client ABI, or client option behavior that this server does not implement.
- `needs-verification`: not enough repo evidence to claim support.

| ADBC area | Status | Notes |
| --- | --- | --- |
| ADBC object and driver ABI: database, connection, statement lifecycle, driver loading, option getter/setter variants, `AdbcError` layout | `client-only/N/A` | These are ADBC driver/driver-manager responsibilities from the ADBC API and `adbc.h`, not Flight SQL server APIs. |
| Connection URI and basic client setup | `covered` | Python `adbc-driver-flightsql` connects to local and deployed Flight SQL endpoints. |
| Authorization header bearer flow | `covered` | Supported Azure path is PyArrow Basic-token bootstrap followed by ADBC Bearer auth. |
| Direct ADBC Basic auth against Azure Container App | `needs-verification` | Tracked as an opt-in `xfail` path; do not document as the supported Azure ADBC path. |
| TLS server authentication | `covered` | Covered by TLS e2e tests and live Azure smoke path. |
| mTLS client certificates, OAuth flows, cookies, timeout knobs, queue size, custom call headers | `client-only/N/A` | Mostly Flight SQL driver options. Only server-observable effects should be promoted to support claims after dedicated tests. |
| SQL query execution and Arrow result fetch | `covered` | Maps to Flight SQL statement query `GetFlightInfo`/`DoGet`. |
| SQL update, DDL/DML, and row counts | `covered` | Maps to Flight SQL update `DoPut` with update-result metadata. |
| Execute schema / Flight `GetSchema` | `covered` | Server implements Flight `GetSchema` for statement queries, prepared statements, and metadata commands, with direct unit and ADBC `execute_schema` acceptance coverage. |
| Prepared statement create/close | `covered` | Server supports Flight SQL prepared statement actions. |
| Prepared statement query/update with single-row parameter batches | `covered` | Supported by current handler behavior and tests. |
| Prepared statement multi-row parameter batches | `partial` | Current implementation should be treated as single-row oriented until multi-row parameter binding is tested or implemented. |
| Metadata: `GetInfo`, catalogs, schemas, tables, table types, table schema, XDBC type info | `covered` | Maps to Flight SQL metadata commands and local ADBC `GetObjects` depth/filter tests. |
| Metadata: primary keys, imported/exported keys, cross-reference | `covered` | Server has Flight SQL handlers and unit coverage. |
| ADBC `GetObjects` table constraints | `not-supported-by-python-flightsql-driver` | The Flight SQL driver docs note `AdbcConnectionGetObjects()` does not currently populate column constraint info such as primary/foreign keys. |
| ADBC metadata pattern/filter semantics | `partial` | Flight SQL driver docs note catalog filters are simple string matches, not `LIKE` patterns. |
| Transactions: begin/end, commit, rollback, autocommit discovery | `covered` | Current Flight SQL and SQL transaction `SqlInfo` IDs are emitted from the generated proto constants, and ADBC DBAPI rollback/autocommit discovery is covered. |
| Savepoints | `unsupported-by-server` | Server reports savepoints as unsupported. |
| Cancellation via ADBC and Flight SQL actions | `partial` | Server supports deprecated `CancelQuery` behavior returning cancellation state, but does not guarantee true DuckDB query interruption and does not claim modern `CancelFlightInfo` coverage. |
| Partitioned or incremental results | `partial` | ADBC `ExecutePartitions`/`ReadPartition` is covered for the server's single-endpoint result path. Multi-endpoint distributed optimization and incremental execution are not implemented. |
| Bulk ingest via Flight SQL `CommandStatementIngest` | `covered` | Server-side ingest handlers and tests cover create/append/replace/fail behavior. |
| Bulk ingest through Python `adbc-driver-flightsql` | `partial` | The current local `adbc-driver-flightsql` exposes and passes `adbc_ingest`; the test xfails if a driver/environment returns `NotSupportedError` because the upstream Flight SQL driver docs still state bulk ingestion is not currently implemented. |
| ADBC statistics APIs: `GetStatistics`, `GetStatisticNames` | `unsupported-by-server` | No server support is claimed for ADBC statistics metadata. |
| Substrait plans | `unsupported-by-server` | Substrait commands are intentionally not implemented and should not be listed as supported protocol surface. |
| Rich ADBC error details | `needs-verification` | Server maps errors to Flight/gRPC status paths, but ADBC driver-specific error-detail propagation has no dedicated coverage. |
| Flight SQL session options | `unsupported-by-server` | Session option get/set/erase lifecycle is not part of the current server support claim. |
120 changes: 94 additions & 26 deletions src/lakehouse/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
import pyarrow.flight as flight

from lakehouse.dispatch import FlightSqlServer
from lakehouse.proto import fs, pack_any
from lakehouse.proto import fs, pack_any, unpack_any
from lakehouse.session import SessionManager

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -324,22 +324,22 @@ def _prepare_get_tables_query(
[
pa.field("catalog_name", pa.utf8()),
pa.field("db_schema_name", pa.utf8()),
pa.field("table_name", pa.utf8()),
pa.field("table_type", pa.utf8()),
pa.field("table_name", pa.utf8(), nullable=False),
pa.field("table_type", pa.utf8(), nullable=False),
]
)

_TABLES_SCHEMA_WITH_SCHEMA = pa.schema(
[
pa.field("catalog_name", pa.utf8()),
pa.field("db_schema_name", pa.utf8()),
pa.field("table_name", pa.utf8()),
pa.field("table_type", pa.utf8()),
pa.field("table_schema", pa.binary()),
pa.field("table_name", pa.utf8(), nullable=False),
pa.field("table_type", pa.utf8(), nullable=False),
pa.field("table_schema", pa.binary(), nullable=False),
]
)

_TABLE_TYPES_SCHEMA = pa.schema([pa.field("table_type", pa.utf8())])
_TABLE_TYPES_SCHEMA = pa.schema([pa.field("table_type", pa.utf8(), nullable=False)])

_PRIMARY_KEYS_SCHEMA = pa.schema(
[
Expand Down Expand Up @@ -726,24 +726,31 @@ def _build_xdbc_type_info_table(
]
)

# SqlInfo enum constants (matching Flight SQL protobuf values)
_FLIGHT_SQL_SERVER_NAME = 0
_FLIGHT_SQL_SERVER_VERSION = 1
_FLIGHT_SQL_SERVER_ARROW_VERSION = 2
_FLIGHT_SQL_SERVER_READ_ONLY = 500
_FLIGHT_SQL_SERVER_SQL = 501
_FLIGHT_SQL_SERVER_SUBSTRAIT = 502
_FLIGHT_SQL_SERVER_TRANSACTION = 504
_FLIGHT_SQL_SERVER_CANCEL = 505
_FLIGHT_SQL_SERVER_BULK_INGESTION = 507
_FLIGHT_SQL_SERVER_INGEST_TRANSACTIONS_SUPPORTED = 508
_FLIGHT_SQL_SERVER_STATEMENT_TIMEOUT = 100
_FLIGHT_SQL_SERVER_TRANSACTION_TIMEOUT = 101
_SQL_DDL_CATALOG = 500
_SQL_DDL_SCHEMA = 501
_SQL_DDL_TABLE = 502
_SQL_IDENTIFIER_QUOTE_CHAR = 503
_SQL_IDENTIFIER_CASE = 504
# SqlInfo enum constants (matching the generated Flight SQL protobuf values)
_FLIGHT_SQL_SERVER_NAME = fs.FLIGHT_SQL_SERVER_NAME
_FLIGHT_SQL_SERVER_VERSION = fs.FLIGHT_SQL_SERVER_VERSION
_FLIGHT_SQL_SERVER_ARROW_VERSION = fs.FLIGHT_SQL_SERVER_ARROW_VERSION
_FLIGHT_SQL_SERVER_READ_ONLY = fs.FLIGHT_SQL_SERVER_READ_ONLY
_FLIGHT_SQL_SERVER_SQL = fs.FLIGHT_SQL_SERVER_SQL
_FLIGHT_SQL_SERVER_SUBSTRAIT = fs.FLIGHT_SQL_SERVER_SUBSTRAIT
_FLIGHT_SQL_SERVER_TRANSACTION = fs.FLIGHT_SQL_SERVER_TRANSACTION
_FLIGHT_SQL_SERVER_CANCEL = fs.FLIGHT_SQL_SERVER_CANCEL
_FLIGHT_SQL_SERVER_BULK_INGESTION = fs.FLIGHT_SQL_SERVER_BULK_INGESTION
_FLIGHT_SQL_SERVER_INGEST_TRANSACTIONS_SUPPORTED = (
fs.FLIGHT_SQL_SERVER_INGEST_TRANSACTIONS_SUPPORTED
)
_FLIGHT_SQL_SERVER_STATEMENT_TIMEOUT = fs.FLIGHT_SQL_SERVER_STATEMENT_TIMEOUT
_FLIGHT_SQL_SERVER_TRANSACTION_TIMEOUT = fs.FLIGHT_SQL_SERVER_TRANSACTION_TIMEOUT
_SQL_DDL_CATALOG = fs.SQL_DDL_CATALOG
_SQL_DDL_SCHEMA = fs.SQL_DDL_SCHEMA
_SQL_DDL_TABLE = fs.SQL_DDL_TABLE
_SQL_IDENTIFIER_CASE = fs.SQL_IDENTIFIER_CASE
_SQL_IDENTIFIER_QUOTE_CHAR = fs.SQL_IDENTIFIER_QUOTE_CHAR
_SQL_DEFAULT_TRANSACTION_ISOLATION = fs.SQL_DEFAULT_TRANSACTION_ISOLATION
_SQL_TRANSACTIONS_SUPPORTED = fs.SQL_TRANSACTIONS_SUPPORTED
_SQL_SUPPORTED_TRANSACTIONS_ISOLATION_LEVELS = fs.SQL_SUPPORTED_TRANSACTIONS_ISOLATION_LEVELS
_SQL_DATA_DEFINITION_CAUSES_TRANSACTION_COMMIT = fs.SQL_DATA_DEFINITION_CAUSES_TRANSACTION_COMMIT
_SQL_DATA_DEFINITIONS_IN_TRANSACTIONS_IGNORED = fs.SQL_DATA_DEFINITIONS_IN_TRANSACTIONS_IGNORED


def _build_sql_info_table(
Expand All @@ -764,12 +771,25 @@ def _build_sql_info_table(
(_FLIGHT_SQL_SERVER_READ_ONLY, 1, False),
(_FLIGHT_SQL_SERVER_SQL, 1, True),
(_FLIGHT_SQL_SERVER_SUBSTRAIT, 1, False),
(_FLIGHT_SQL_SERVER_TRANSACTION, 2, 1), # TRANSACTION supported
(
_FLIGHT_SQL_SERVER_TRANSACTION,
3,
fs.SQL_SUPPORTED_TRANSACTION_TRANSACTION,
),
(_FLIGHT_SQL_SERVER_CANCEL, 1, True),
(_FLIGHT_SQL_SERVER_BULK_INGESTION, 1, True),
(_FLIGHT_SQL_SERVER_INGEST_TRANSACTIONS_SUPPORTED, 1, True),
(_FLIGHT_SQL_SERVER_STATEMENT_TIMEOUT, 3, 0), # no timeout
(_FLIGHT_SQL_SERVER_TRANSACTION_TIMEOUT, 3, 0), # no timeout
(_SQL_DEFAULT_TRANSACTION_ISOLATION, 3, fs.SQL_TRANSACTION_SERIALIZABLE),
(_SQL_TRANSACTIONS_SUPPORTED, 1, True),
(
_SQL_SUPPORTED_TRANSACTIONS_ISOLATION_LEVELS,
3,
1 << fs.SQL_TRANSACTION_SERIALIZABLE,
),
(_SQL_DATA_DEFINITION_CAUSES_TRANSACTION_COMMIT, 1, False),
(_SQL_DATA_DEFINITIONS_IN_TRANSACTIONS_IGNORED, 1, False),
]

# Filter if specific IDs requested
Expand Down Expand Up @@ -876,6 +896,54 @@ def shutdown(self) -> None:
self._db.close()
super().shutdown()

def get_schema(
self,
context: flight.ServerCallContext,
descriptor: flight.FlightDescriptor,
) -> flight.SchemaResult:
"""Return the schema for supported Flight SQL descriptors.

``GetSchema`` must be side-effect-free. In particular, prepared
statements with empty schemas can be DDL/DML; do not route through
``get_flight_info_prepared_statement`` because that method eagerly
executes such statements for ADBC query execution compatibility.
"""
command = unpack_any(descriptor.command)

if isinstance(command, fs.CommandStatementQuery):
conn = self._get_session(context)
schema_query = f"SELECT * FROM ({command.query}) AS __schema_probe LIMIT 0"
return flight.SchemaResult(_execute_query(conn, schema_query).schema)

if isinstance(command, fs.CommandPreparedStatementQuery):
session_id = _get_session_id(context)
handle = command.prepared_statement_handle.decode("utf-8")
meta = self._prepared_meta.get((session_id, handle))
return flight.SchemaResult(meta.schema if meta is not None else pa.schema([]))

if isinstance(command, fs.CommandGetCatalogs):
return flight.SchemaResult(_CATALOGS_SCHEMA)
if isinstance(command, fs.CommandGetDbSchemas):
return flight.SchemaResult(_DB_SCHEMAS_SCHEMA)
if isinstance(command, fs.CommandGetTables):
schema = _TABLES_SCHEMA_WITH_SCHEMA if command.include_schema else _TABLES_SCHEMA
return flight.SchemaResult(schema)
if isinstance(command, fs.CommandGetTableTypes):
return flight.SchemaResult(_TABLE_TYPES_SCHEMA)
if isinstance(command, fs.CommandGetXdbcTypeInfo):
return flight.SchemaResult(_XDBC_TYPE_INFO_SCHEMA)
if isinstance(command, fs.CommandGetSqlInfo):
return flight.SchemaResult(_SQL_INFO_SCHEMA)
if isinstance(command, fs.CommandGetPrimaryKeys):
return flight.SchemaResult(_PRIMARY_KEYS_SCHEMA)
if isinstance(command, (fs.CommandGetImportedKeys, fs.CommandGetExportedKeys)):
return flight.SchemaResult(_FK_KEYS_SCHEMA)
if isinstance(command, fs.CommandGetCrossReference):
return flight.SchemaResult(_FK_KEYS_SCHEMA)

msg = f"Unsupported Flight SQL command for get_schema: {type(command).__name__}"
raise NotImplementedError(msg)

Comment on lines +945 to +946
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance() cannot take a PEP 604 union (A | B) as the second argument; this will raise TypeError and break get_schema for imported/exported keys (and any command that reaches this branch). Use a tuple instead (e.g., isinstance(command, (fs.CommandGetImportedKeys, fs.CommandGetExportedKeys))).

Copilot uses AI. Check for mistakes.
# ═══════════════════════════════════════════════════════════════════════
# get_flight_info handlers
# ═══════════════════════════════════════════════════════════════════════
Expand Down
2 changes: 2 additions & 0 deletions tests/jdbc/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@
</argLine>
<systemPropertyVariables>
<flight.url>${flight.url}</flight.url>
<flight.user>${flight.user}</flight.user>
<live.azure.jdbc.required>${live.azure.jdbc.required}</live.azure.jdbc.required>
<tls.ca.cert>${tls.ca.cert}</tls.ca.cert>
</systemPropertyVariables>
</configuration>
Expand Down
Loading
Loading