Skip to content

[Bug] Plan IcebergCreateTable operator In CTAS queries to avoid double table create requests#974

Open
Tmonster wants to merge 8 commits intoduckdb:v1.5-variegatafrom
Tmonster:CTAS_union_with_reference
Open

[Bug] Plan IcebergCreateTable operator In CTAS queries to avoid double table create requests#974
Tmonster wants to merge 8 commits intoduckdb:v1.5-variegatafrom
Tmonster:CTAS_union_with_reference

Conversation

@Tmonster
Copy link
Copy Markdown
Member

@Tmonster Tmonster commented May 6, 2026

This supersedes #962

Resolves: #595
related: dbt-labs/dbt-fusion#1635
related: duckdb/dbt-duckdb#725

962 involves registering a client context on attach, but this will break down if many clients are involved since only one client will run the attach.

The solution in this PR is to introduce an intermediate Create Iceberg Table operator between the Copy to File and Select that is only responsible for creating the table on first chunk. This way the table is not created During Binding/Planning.

This also fixes an issue where EXPLAIN Create table as .. queries would actually create the table (assuming supports_stage_create = false.

@Tmonster Tmonster changed the title [Bug] Plan intermediate operator responsible for making API call. [Bug] Plan IcebergCreateTable operator In CTAS queries to avoid double table create requests May 6, 2026
Copy link
Copy Markdown

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really appreciate you picking this up @Tmonster -- cheers!

I've left an LLM-proposed test for your consideration and a pedantic suggestion to remove what smells like a spurious ;

Tmonster and others added 3 commits May 7, 2026 09:55
CTAS with partition information and table properties

Co-authored-by: Anders <anders.swanson@dbtlabs.com>
…n so copy operator has partition information as well
OperatorResultType PhysicalIcebergCreateTable::Execute(ExecutionContext &context, DataChunk &input, DataChunk &chunk,
GlobalOperatorState &gstate_p, OperatorState &state) const {
auto &global_state = gstate_p.Cast<IcebergCreateTableGlobalState>();
// Create the table in the IRC, record the field ids (columns ids) and update the binding in the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the binding in the?


// create shared state to be used between IcebergTableCreate and IcebergInsert
auto create_state = make_shared_ptr<IcebergCTASCreateState>();
// create a pass through IcebergCTASCrecateStatement operator to make the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IcebergCTASCrecateStatement ?

GlobalOperatorState &gstate_p, OperatorState &state) const {
auto &global_state = gstate_p.Cast<IcebergCreateTableGlobalState>();
// Create the table in the IRC, record the field ids (columns ids) and update the binding in the
MakeCreateTableRequest(context, global_state);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do this only in FinalExecute, that should ensure it only runs once I believe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CREATE TABLE fails when Iceberg is accessed from Adbc

3 participants