Skip to content

Add single-dispatch layer-by-layer multi-head attention#91

Draft
andrej wants to merge 11 commits intoamd:develfrom
andrej:mha-lxl-sd
Draft

Add single-dispatch layer-by-layer multi-head attention#91
andrej wants to merge 11 commits intoamd:develfrom
andrej:mha-lxl-sd

Conversation

@andrej
Copy link
Copy Markdown
Collaborator

@andrej andrej commented Apr 6, 2026

"Naive" alternative implementation for multi-head attention from the currently checked-in data-flow design. This is a simple layer-by-layer implementation, but it uses the single-dispatch mechanism to fuse it all into one MLIR file and save on CPU roundtrips and XRT overheads.

Includes two variants:

  1. "core": Only does the core matmuls and softmax; assumes projected and repeated inputs Q, K, V. This matches the functionality of the checked-in dataflow MHA.
  2. "projected": Performs the Q, K, V projections, applies a RoPE positional embedding and repeats K and V matrices for grouped-query attention. Takes an embedding vector and RoPE angles as input.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the reference from the existing mha? (Note: does not include RoPE and Q, K, V projections, but some code reuse should be possible.)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

📊 Test Results for Test Example Applications

1d87fe8 (2026_04_07_21_05_39)

IRONCLAD

Tested on 2026_04_07_21_05_39 at commit 1d87fe8.

Test Checks TTFT (mean)TPS (mean)
llama_3.2_1b_prompt_1024_tokens_1 ✅ 5/5 2.13 n/a
llama_3.2_1b_prompt_1024_tokens_40 ✅ 5/5 2.18 4.31
llama_3.2_1b_prompt_13_tokens_1 ✅ 5/5 2.09 n/a
llama_3.2_1b_prompt_13_tokens_40 ✅ 5/5 2.09 4.31
📈 Trends (vs main branch) for Test Example Applications

1d87fe8 (2026_04_07_21_05_39)

IRONCLAD Trends

llama_3.2_1b

Commit/Date Num Tokens (max)Num Tokens (mean)Num Tokens (median)Num Tokens (min)Num Tokens (stddev)TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)Total (max)Total (mean)Total (median)Total (min)Total (stddev)
130b6ea — 2025-12-05 21:33:1240.00 (+0.00%)40.00 (+0.00%)40.00 (+0.00%)40.00 (+0.00%)0.00 (n/a)4.71 (-0.42%)4.64 (-0.09%)4.64 (+0.65%)4.55 (-0.22%)0.05 (-17.66%)4.41 (-0.34%)4.39 (-0.19%)4.38 (-0.33%)4.37 (-0.15%)0.01 (-25.90%)12.96 (-0.00%)12.80 (+0.07%)12.80 (-0.23%)12.67 (+0.44%)0.09 (-21.12%)
0a6c11c — 2025-12-03 23:35:1540.00 (n/a)40.00 (n/a)40.00 (n/a)40.00 (n/a)0.00 (n/a)4.73 (n/a)4.64 (n/a)4.61 (n/a)4.56 (n/a)0.06 (n/a)4.42 (n/a)4.40 (n/a)4.40 (n/a)4.37 (n/a)0.02 (n/a)12.96 (n/a)12.79 (n/a)12.83 (n/a)12.62 (n/a)0.12 (n/a)

llama_3.2_1b_prompt_1024_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
1d87fe8 — 2026-04-07 21:00:002.15 (+0.09%)2.13 (+0.08%)2.13 (-0.42%)2.12 (+0.62%)0.01 (-31.21%)
912e6bc — 2026-04-07 19:08:432.15 (n/a)2.13 (n/a)2.13 (n/a)2.11 (n/a)0.02 (n/a)

llama_3.2_1b_prompt_1024_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
1d87fe8 — 2026-04-07 21:00:004.33 (+2.90%)4.31 (+3.44%)4.31 (+3.58%)4.29 (+3.77%)0.01 (-46.93%)2.29 (+0.48%)2.18 (+0.83%)2.15 (+0.80%)2.13 (+0.61%)0.07 (-4.73%)
912e6bc — 2026-04-07 19:08:434.21 (n/a)4.17 (n/a)4.16 (n/a)4.14 (n/a)0.03 (n/a)2.28 (n/a)2.16 (n/a)2.13 (n/a)2.12 (n/a)0.07 (n/a)

llama_3.2_1b_prompt_13_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
1d87fe8 — 2026-04-07 21:00:002.10 (-0.10%)2.09 (+0.11%)2.09 (+0.19%)2.09 (+0.00%)0.01 (+8.87%)
912e6bc — 2026-04-07 19:08:432.10 (n/a)2.09 (n/a)2.09 (n/a)2.09 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_13_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
1d87fe8 — 2026-04-07 21:00:004.36 (+4.23%)4.31 (+3.57%)4.30 (+3.44%)4.29 (+3.23%)0.03 (+128.30%)2.09 (-0.38%)2.09 (-0.04%)2.09 (+0.00%)2.08 (+0.44%)0.01 (-34.93%)
912e6bc — 2026-04-07 19:08:434.18 (n/a)4.16 (n/a)4.16 (n/a)4.15 (n/a)0.01 (n/a)2.10 (n/a)2.09 (n/a)2.09 (n/a)2.07 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_2048_tokens_1

Commit/Date Num_Tokens (max)Num_Tokens (mean)Num_Tokens (median)Num_Tokens (min)Num_Tokens (stddev)TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
897d04e — 2026-03-06 22:56:071.00 (+0.00%)1.00 (+0.00%)1.00 (+0.00%)1.00 (+0.00%)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)2.68 (-1.06%)2.68 (-1.06%)2.68 (-1.06%)2.68 (-1.06%)0.00 (n/a)
84d3478 — 2026-02-17 23:16:231.00 (n/a)1.00 (n/a)1.00 (n/a)1.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)2.70 (n/a)2.70 (n/a)2.70 (n/a)2.70 (n/a)0.00 (n/a)

llama_3.2_1b_prompt_2048_tokens_40

Commit/Date Num_Tokens (max)Num_Tokens (mean)Num_Tokens (median)Num_Tokens (min)Num_Tokens (stddev)TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
897d04e — 2026-03-06 22:56:0740.00 (+0.00%)40.00 (+0.00%)40.00 (+0.00%)40.00 (+0.00%)0.00 (n/a)4.00 (-1.72%)4.00 (-1.72%)4.00 (-1.72%)4.00 (-1.72%)0.00 (n/a)2.70 (-0.44%)2.70 (-0.44%)2.70 (-0.44%)2.70 (-0.44%)0.00 (n/a)
84d3478 — 2026-02-17 23:16:2340.00 (n/a)40.00 (n/a)40.00 (n/a)40.00 (n/a)0.00 (n/a)4.07 (n/a)4.07 (n/a)4.07 (n/a)4.07 (n/a)0.00 (n/a)2.71 (n/a)2.71 (n/a)2.71 (n/a)2.71 (n/a)0.00 (n/a)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

CI Test Results

c2f68ef (2026_04_15_17_54_50)

IRONCLAD - CI Summary

Examples

Test Krackan Phoenix
llama_3.2_1b_prompt_1024_tokens_1 pass -
llama_3.2_1b_prompt_1024_tokens_40 pass -
llama_3.2_1b_prompt_13_tokens_1 pass -
llama_3.2_1b_prompt_13_tokens_40 pass -

Small

Test Krackan Phoenix
GPT2-S1024-causal - no pass
GPT2-S1024-nomask - no pass
GPT2-S2048-causal - no pass
GPT2-S2048-nomask - no pass
GPT2-S256-causal - no pass
GPT2-S256-nomask - no pass
GPT2-S4096-causal - no pass
GPT2-S4096-nomask - no pass
GPT2-S512-causal - no pass
GPT2-S512-nomask - no pass
GPT2-Small-256seq pass no pass
H2 pass no pass
Llama3.2-256seq pass no pass
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128 pass pass
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1 pass -
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1 pass pass
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1 pass pass
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 pass pass
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 pass pass
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1 pass -
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048 pass pass
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024 pass pass
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512 pass pass
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256 pass -
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8 - pass
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8 - pass
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1 pass pass
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4 pass pass
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024 pass pass
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024 pass pass
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024 pass pass
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024 pass -
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1 pass -
embedding_dim_2048-hidden_dim_2048 - pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048 pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32 pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False - pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True - pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024 pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32 pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False - pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True - pass
input_length_2048-num_aie_columns_1-tile_size_2048 pass pass
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0 pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024 pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32 pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False - pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True - pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512 pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32 pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False - pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True - pass
input_length_2048-num_aie_columns_2-tile_size_1024 pass pass
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0 pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512 pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32 pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False - pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True - pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256 pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32 pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False - pass
input_length_2048-num_aie_columns_4-tile_size_512 pass pass
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0 pass pass
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256 pass -
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32 pass -
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128 pass -
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32 pass -
input_length_2048-num_aie_columns_8-tile_size_256 pass -
input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0 pass -
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048 pass pass
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128 pass -
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024 pass pass
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024 pass pass
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512 pass pass
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512 pass pass
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256 pass -
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256 pass pass
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024 - pass
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048 - pass
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512 - pass
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0 - pass
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0 - pass
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0 - pass
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0 - pass
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0 - pass
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0 - pass
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0 pass -
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False - pass

Extensive

Test Krackan Phoenix
(no data) - -
Krackan - Small

IRONCLAD

Tested on 2026_04_15_17_54_50 at commit c2f68ef.

Test Checks Latency (mean)Bandwidth (mean)Throughput (mean)
GPT2-Small-256seq ✅ 5/5 30593.80 n/a 50.86
H2 ✅ 5/5 26837.70 n/a 13.31
Llama3.2-256seq ✅ 5/5 33790.72 n/a 67.86
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128 ✅ 5/5 n/a 0.20 0.20
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1 ✅ 5/5 2300.86 4.12 1618.71
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 205.06 1.10 46.87
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 201.32 1.11 47.15
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 49212.40 0.51 349.10
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 28763.26 0.87 597.30
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 7502.08 3.36 2291.26
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048 ✅ 5/5 n/a 13.24 13.24
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024 ✅ 5/5 n/a 24.42 24.40
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512 ✅ 5/5 n/a 39.97 39.95
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256 ✅ 5/5 n/a 42.59 42.57
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1 ✅ 5/5 2430.86 3.35 877.46
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4 ✅ 5/5 3304.30 0.39 20.91
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 12.18 12.17
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 23.90 23.88
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 38.44 38.42
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 41.36 41.34
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1 ✅ 5/5 1383.96 4.96 1531.18
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048 ✅ 10/10 167.40 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32 ✅ 5/5 159.10 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024 ✅ 10/10 169.12 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32 ✅ 5/5 165.70 0.03 n/a
input_length_2048-num_aie_columns_1-tile_size_2048 ✅ 10/10 178.69 0.07 n/a
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0 ✅ 5/5 168.76 0.07 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024 ✅ 10/10 222.19 0.04 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32 ✅ 5/5 163.60 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 10/10 183.29 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32 ✅ 5/5 164.44 0.03 n/a
input_length_2048-num_aie_columns_2-tile_size_1024 ✅ 10/10 175.73 0.07 n/a
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0 ✅ 5/5 189.36 0.07 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512 ✅ 10/10 191.78 0.04 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32 ✅ 5/5 176.42 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256 ✅ 10/10 171.03 0.05 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32 ✅ 5/5 177.38 0.03 n/a
input_length_2048-num_aie_columns_4-tile_size_512 ✅ 10/10 178.15 0.07 n/a
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0 ✅ 5/5 215.74 0.06 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256 ✅ 10/10 197.66 0.04 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32 ✅ 5/5 196.08 0.03 n/a
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128 ✅ 10/10 204.91 0.04 n/a
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32 ✅ 5/5 223.82 0.02 n/a
input_length_2048-num_aie_columns_8-tile_size_256 ✅ 10/10 198.62 0.06 n/a
input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0 ✅ 5/5 264.70 0.05 n/a
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048 ✅ 5/5 157.42 0.05 n/a
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128 ✅ 5/5 227.66 0.04 n/a
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024 ✅ 5/5 167.14 0.05 n/a
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024 ✅ 5/5 166.70 0.05 n/a
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512 ✅ 5/5 180.94 0.05 n/a
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512 ✅ 5/5 191.98 0.04 n/a
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256 ✅ 5/5 199.00 0.04 n/a
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256 ✅ 5/5 180.54 0.05 n/a
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0 ✅ 5/5 40566.88 0.21 n/a

Trends:

IRONCLAD Trends

GPT2-Small-256seq

Commit/Date Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:35140228.80 (n/a)30593.80 (n/a)3188.60 (n/a)3160.60 (n/a)61287.83 (n/a)63.70 (n/a)50.86 (n/a)63.14 (n/a)1.44 (n/a)27.63 (n/a)

H2

Commit/Date Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:35125961.80 (n/a)26837.70 (n/a)2233.30 (n/a)1689.50 (n/a)55412.62 (n/a)19.86 (n/a)13.31 (n/a)15.02 (n/a)0.27 (n/a)7.60 (n/a)

Llama3.2-256seq

Commit/Date Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:35143319.20 (n/a)33790.72 (n/a)6528.60 (n/a)6033.60 (n/a)61228.73 (n/a)88.98 (n/a)67.86 (n/a)82.23 (n/a)3.75 (n/a)35.98 (n/a)

M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:350.24 (n/a)0.20 (n/a)0.19 (n/a)0.18 (n/a)0.03 (n/a)0.24 (n/a)0.20 (n/a)0.19 (n/a)0.18 (n/a)0.02 (n/a)

M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:354.43 (n/a)4.12 (n/a)4.28 (n/a)3.52 (n/a)0.36 (n/a)2673.60 (n/a)2300.86 (n/a)2195.50 (n/a)2125.00 (n/a)220.72 (n/a)1740.84 (n/a)1618.71 (n/a)1685.00 (n/a)1383.67 (n/a)142.12 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:351.40 (n/a)1.10 (n/a)1.05 (n/a)0.93 (n/a)0.18 (n/a)237.30 (n/a)205.06 (n/a)210.70 (n/a)158.00 (n/a)28.87 (n/a)59.74 (n/a)46.87 (n/a)44.79 (n/a)39.76 (n/a)7.55 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:351.25 (n/a)1.11 (n/a)1.04 (n/a)1.03 (n/a)0.10 (n/a)214.40 (n/a)201.32 (n/a)212.00 (n/a)176.40 (n/a)16.85 (n/a)53.49 (n/a)47.15 (n/a)44.51 (n/a)44.01 (n/a)4.17 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:350.51 (n/a)0.51 (n/a)0.51 (n/a)0.51 (n/a)0.00 (n/a)49412.60 (n/a)49212.40 (n/a)49245.20 (n/a)48910.10 (n/a)183.69 (n/a)351.25 (n/a)349.10 (n/a)348.86 (n/a)347.68 (n/a)1.31 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:350.88 (n/a)0.87 (n/a)0.88 (n/a)0.87 (n/a)0.00 (n/a)28922.30 (n/a)28763.26 (n/a)28753.00 (n/a)28599.00 (n/a)153.18 (n/a)600.72 (n/a)597.30 (n/a)597.50 (n/a)594.00 (n/a)3.18 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:353.46 (n/a)3.36 (n/a)3.32 (n/a)3.25 (n/a)0.09 (n/a)7735.40 (n/a)7502.08 (n/a)7579.70 (n/a)7273.80 (n/a)194.86 (n/a)2361.89 (n/a)2291.26 (n/a)2266.56 (n/a)2220.93 (n/a)59.71 (n/a)

M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:3513.58 (n/a)13.24 (n/a)13.30 (n/a)13.00 (n/a)0.24 (n/a)13.57 (n/a)13.24 (n/a)13.29 (n/a)12.99 (n/a)0.24 (n/a)

M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:3525.37 (n/a)24.42 (n/a)24.44 (n/a)22.92 (n/a)0.95 (n/a)25.35 (n/a)24.40 (n/a)24.43 (n/a)22.90 (n/a)0.95 (n/a)

M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:3541.58 (n/a)39.97 (n/a)40.67 (n/a)36.53 (n/a)2.00 (n/a)41.56 (n/a)39.95 (n/a)40.64 (n/a)36.51 (n/a)2.00 (n/a)

M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:3543.48 (n/a)42.59 (n/a)42.54 (n/a)41.83 (n/a)0.59 (n/a)43.45 (n/a)42.57 (n/a)42.52 (n/a)41.81 (n/a)0.59 (n/a)

M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:353.72 (n/a)3.35 (n/a)3.48 (n/a)2.89 (n/a)0.35 (n/a)2791.80 (n/a)2430.86 (n/a)2315.60 (n/a)2166.60 (n/a)261.80 (n/a)975.69 (n/a)877.46 (n/a)912.93 (n/a)757.20 (n/a)91.07 (n/a)

M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:350.48 (n/a)0.39 (n/a)0.35 (n/a)0.31 (n/a)0.08 (n/a)3953.20 (n/a)3304.30 (n/a)3522.70 (n/a)2579.70 (n/a)610.21 (n/a)26.01 (n/a)20.91 (n/a)19.05 (n/a)16.98 (n/a)4.07 (n/a)

M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:3513.14 (n/a)12.18 (n/a)12.11 (n/a)11.01 (n/a)0.95 (n/a)13.13 (n/a)12.17 (n/a)12.10 (n/a)11.01 (n/a)0.95 (n/a)

M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:3524.77 (n/a)23.90 (n/a)24.22 (n/a)22.09 (n/a)1.09 (n/a)24.76 (n/a)23.88 (n/a)24.21 (n/a)22.08 (n/a)1.08 (n/a)

M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:3540.02 (n/a)38.44 (n/a)38.81 (n/a)35.55 (n/a)1.81 (n/a)39.99 (n/a)38.42 (n/a)38.79 (n/a)35.53 (n/a)1.81 (n/a)

M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:3543.35 (n/a)41.36 (n/a)42.43 (n/a)37.30 (n/a)2.40 (n/a)43.32 (n/a)41.34 (n/a)42.40 (n/a)37.27 (n/a)2.40 (n/a)

M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:49:356.18 (n/a)4.96 (n/a)5.10 (n/a)3.60 (n/a)0.93 (n/a)1847.40 (n/a)1383.96 (n/a)1305.10 (n/a)1076.50 (n/a)284.45 (n/a)1909.18 (n/a)1531.18 (n/a)1574.74 (n/a)1112.49 (n/a)285.80 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)188.50 (n/a)167.40 (n/a)176.20 (n/a)128.40 (n/a)20.04 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)178.90 (n/a)159.10 (n/a)166.40 (n/a)127.80 (n/a)20.07 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)227.70 (n/a)169.12 (n/a)162.25 (n/a)120.90 (n/a)31.50 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)179.20 (n/a)165.70 (n/a)164.60 (n/a)149.10 (n/a)11.27 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.11 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.02 (n/a)221.40 (n/a)178.69 (n/a)184.75 (n/a)113.70 (n/a)36.82 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.09 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)202.10 (n/a)168.76 (n/a)169.30 (n/a)139.50 (n/a)24.87 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)346.00 (n/a)222.19 (n/a)205.00 (n/a)157.60 (n/a)57.90 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)195.10 (n/a)163.60 (n/a)157.30 (n/a)133.80 (n/a)28.83 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.06 (n/a)0.05 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)227.80 (n/a)183.29 (n/a)189.55 (n/a)132.20 (n/a)31.50 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)201.40 (n/a)164.44 (n/a)166.30 (n/a)128.00 (n/a)32.10 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.11 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.02 (n/a)243.10 (n/a)175.73 (n/a)169.05 (n/a)108.80 (n/a)42.90 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.07 (n/a)0.07 (n/a)0.06 (n/a)0.06 (n/a)0.00 (n/a)207.10 (n/a)189.36 (n/a)192.60 (n/a)173.50 (n/a)13.61 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.06 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)253.80 (n/a)191.78 (n/a)191.95 (n/a)134.50 (n/a)33.28 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)231.90 (n/a)176.42 (n/a)173.20 (n/a)136.40 (n/a)36.23 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)214.50 (n/a)171.03 (n/a)176.45 (n/a)118.60 (n/a)28.92 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.00 (n/a)217.30 (n/a)177.38 (n/a)175.70 (n/a)138.60 (n/a)27.92 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.09 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.01 (n/a)243.40 (n/a)178.15 (n/a)175.95 (n/a)141.30 (n/a)29.96 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.09 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.02 (n/a)277.60 (n/a)215.74 (n/a)219.70 (n/a)142.20 (n/a)48.18 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)249.50 (n/a)197.66 (n/a)201.15 (n/a)160.70 (n/a)32.33 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)319.10 (n/a)196.08 (n/a)171.60 (n/a)151.80 (n/a)69.73 (n/a)

input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)229.70 (n/a)204.91 (n/a)213.90 (n/a)159.00 (n/a)24.96 (n/a)

input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)284.50 (n/a)223.82 (n/a)223.90 (n/a)165.50 (n/a)43.21 (n/a)

input_length_2048-num_aie_columns_8-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.09 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)249.40 (n/a)198.62 (n/a)201.75 (n/a)141.30 (n/a)33.21 (n/a)

input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.06 (n/a)0.05 (n/a)0.06 (n/a)0.03 (n/a)0.01 (n/a)359.30 (n/a)264.70 (n/a)221.30 (n/a)217.90 (n/a)64.51 (n/a)

input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)190.70 (n/a)157.42 (n/a)164.50 (n/a)121.10 (n/a)34.00 (n/a)

input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.00 (n/a)253.40 (n/a)227.66 (n/a)221.90 (n/a)216.10 (n/a)15.26 (n/a)

input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)192.00 (n/a)167.14 (n/a)171.00 (n/a)146.50 (n/a)19.26 (n/a)

input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)186.40 (n/a)166.70 (n/a)170.60 (n/a)145.20 (n/a)20.01 (n/a)

input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)214.00 (n/a)180.94 (n/a)172.10 (n/a)160.20 (n/a)22.08 (n/a)

input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)215.60 (n/a)191.98 (n/a)196.80 (n/a)156.90 (n/a)25.39 (n/a)

input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.00 (n/a)219.80 (n/a)199.00 (n/a)203.30 (n/a)170.00 (n/a)18.72 (n/a)

input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)223.50 (n/a)180.54 (n/a)168.40 (n/a)142.40 (n/a)33.73 (n/a)

seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:49:350.21 (n/a)0.21 (n/a)0.21 (n/a)0.21 (n/a)0.00 (n/a)40740.40 (n/a)40566.88 (n/a)40663.60 (n/a)40166.90 (n/a)234.97 (n/a)
Krackan - Examples

IRONCLAD

Tested on 2026_04_15_17_46_04 at commit c2f68ef.

Test Checks TTFT (mean)TPS (mean)
llama_3.2_1b_prompt_1024_tokens_1 ✅ 5/5 2.14 n/a
llama_3.2_1b_prompt_1024_tokens_40 ✅ 5/5 2.17 4.31
llama_3.2_1b_prompt_13_tokens_1 ✅ 5/5 2.10 n/a
llama_3.2_1b_prompt_13_tokens_40 ✅ 5/5 2.09 4.29

Trends:

IRONCLAD Trends

llama_3.2_1b_prompt_1024_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
c2f68ef — 2026-04-15 17:40:262.15 (n/a)2.14 (n/a)2.14 (n/a)2.12 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_1024_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
c2f68ef — 2026-04-15 17:40:264.33 (n/a)4.31 (n/a)4.31 (n/a)4.28 (n/a)0.02 (n/a)2.27 (n/a)2.17 (n/a)2.15 (n/a)2.13 (n/a)0.06 (n/a)

llama_3.2_1b_prompt_13_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
c2f68ef — 2026-04-15 17:40:262.10 (n/a)2.10 (n/a)2.10 (n/a)2.08 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_13_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
c2f68ef — 2026-04-15 17:40:264.32 (n/a)4.29 (n/a)4.29 (n/a)4.27 (n/a)0.02 (n/a)2.10 (n/a)2.09 (n/a)2.09 (n/a)2.08 (n/a)0.01 (n/a)
Phoenix - Small

IRONCLAD

Tested on 2026_04_15_19_22_37 at commit c2f68ef.

Test Checks Latency (mean)Bandwidth (mean)Throughput (mean)
GPT2-S1024-causal ❌ 0/5 n/a n/a n/a
GPT2-S1024-nomask ❌ 0/5 n/a n/a n/a
GPT2-S2048-causal ❌ 0/5 n/a n/a n/a
GPT2-S2048-nomask ❌ 0/5 n/a n/a n/a
GPT2-S256-causal ❌ 0/5 n/a n/a n/a
GPT2-S256-nomask ❌ 0/5 n/a n/a n/a
GPT2-S4096-causal ❌ 0/5 n/a n/a n/a
GPT2-S4096-nomask ❌ 0/5 n/a n/a n/a
GPT2-S512-causal ❌ 0/5 n/a n/a n/a
GPT2-S512-nomask ❌ 0/5 n/a n/a n/a
GPT2-Small-256seq ❌ 0/10 n/a n/a n/a
H2 ❌ 0/10 n/a n/a n/a
Llama3.2-256seq ❌ 0/10 n/a n/a n/a
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128 ✅ 5/5 n/a 0.09 0.09
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 467.56 0.49 20.89
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 388.74 0.57 24.44
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 81977.48 0.31 209.60
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 24105.74 1.04 712.71
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048 ✅ 5/5 n/a 3.65 3.65
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024 ✅ 5/5 n/a 6.48 6.47
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512 ✅ 5/5 n/a 10.29 10.28
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8 ✅ 5/5 464.02 1.19 n/a
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8 ✅ 5/5 657.56 0.94 n/a
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1 ✅ 5/5 3212.66 2.76 724.18
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4 ✅ 5/5 6168.16 0.21 11.20
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 3.78 3.77
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 6.44 6.44
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 11.43 11.42
embedding_dim_2048-hidden_dim_2048 ✅ 5/5 11245.34 0.00 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048 ✅ 30/30 423.52 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32 ✅ 5/5 386.34 0.01 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False ✅ 5/5 489.92 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True ✅ 5/5 398.20 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024 ✅ 25/25 439.32 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32 ✅ 5/5 429.30 0.01 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False ✅ 5/5 656.10 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True ✅ 5/5 378.80 0.03 n/a
input_length_2048-num_aie_columns_1-tile_size_2048 ✅ 10/10 454.07 0.03 n/a
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0 ✅ 5/5 388.44 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024 ✅ 30/30 529.66 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32 ✅ 5/5 308.38 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False ✅ 5/5 433.40 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True ✅ 5/5 397.16 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 25/25 523.54 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32 ✅ 5/5 439.00 0.01 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False ✅ 5/5 358.02 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True ✅ 5/5 426.82 0.02 n/a
input_length_2048-num_aie_columns_2-tile_size_1024 ✅ 10/10 370.66 0.04 n/a
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0 ✅ 5/5 557.82 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512 ✅ 30/30 459.11 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32 ✅ 5/5 863.98 0.01 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False ✅ 5/5 390.38 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True ✅ 5/5 378.10 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256 ✅ 25/25 495.19 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32 ✅ 5/5 519.62 0.01 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False ✅ 5/5 415.78 0.02 n/a
input_length_2048-num_aie_columns_4-tile_size_512 ✅ 10/10 383.80 0.04 n/a
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0 ✅ 5/5 504.12 0.03 n/a
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048 ✅ 5/5 695.04 0.02 n/a
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024 ✅ 5/5 517.62 0.02 n/a
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024 ✅ 5/5 488.20 0.02 n/a
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512 ✅ 5/5 410.88 0.02 n/a
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512 ✅ 5/5 417.28 0.02 n/a
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256 ✅ 5/5 519.08 0.02 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024 ✅ 5/5 483.94 0.32 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048 ✅ 5/5 417.14 0.37 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 5/5 756.98 0.31 n/a
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0 ✅ 5/5 463.14 0.23 n/a
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0 ✅ 5/5 412.56 0.25 n/a
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0 ✅ 5/5 517.44 0.24 n/a
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0 ✅ 5/5 433.18 0.20 n/a
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0 ✅ 5/5 730.92 0.22 n/a
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0 ✅ 5/5 496.68 0.15 n/a
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False ✅ 5/5 19788.62 0.11 n/a

Trends:

IRONCLAD Trends

GPT2-S1024-causal

No metrics available.

GPT2-S1024-nomask

No metrics available.

GPT2-S2048-causal

No metrics available.

GPT2-S2048-nomask

No metrics available.

GPT2-S256-causal

No metrics available.

GPT2-S256-nomask

No metrics available.

GPT2-S4096-causal

No metrics available.

GPT2-S4096-nomask

No metrics available.

GPT2-S512-causal

No metrics available.

GPT2-S512-nomask

No metrics available.

GPT2-Small-256seq

No metrics available.

H2

No metrics available.

Llama3.2-256seq

No metrics available.

M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:190.11 (n/a)0.09 (n/a)0.10 (n/a)0.06 (n/a)0.02 (n/a)0.11 (n/a)0.09 (n/a)0.10 (n/a)0.06 (n/a)0.02 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:190.60 (n/a)0.49 (n/a)0.47 (n/a)0.37 (n/a)0.10 (n/a)590.10 (n/a)467.56 (n/a)475.30 (n/a)370.00 (n/a)96.52 (n/a)25.51 (n/a)20.89 (n/a)19.86 (n/a)15.99 (n/a)4.33 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:190.64 (n/a)0.57 (n/a)0.58 (n/a)0.51 (n/a)0.05 (n/a)437.50 (n/a)388.74 (n/a)381.40 (n/a)347.80 (n/a)35.47 (n/a)27.14 (n/a)24.44 (n/a)24.74 (n/a)21.57 (n/a)2.19 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:190.31 (n/a)0.31 (n/a)0.31 (n/a)0.30 (n/a)0.00 (n/a)83669.80 (n/a)81977.48 (n/a)81891.40 (n/a)80835.20 (n/a)1048.88 (n/a)212.53 (n/a)209.60 (n/a)209.79 (n/a)205.33 (n/a)2.66 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:191.05 (n/a)1.04 (n/a)1.05 (n/a)1.03 (n/a)0.01 (n/a)24317.90 (n/a)24105.74 (n/a)24051.50 (n/a)23948.50 (n/a)147.30 (n/a)717.37 (n/a)712.71 (n/a)714.29 (n/a)706.47 (n/a)4.34 (n/a)

M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:193.76 (n/a)3.65 (n/a)3.64 (n/a)3.55 (n/a)0.10 (n/a)3.76 (n/a)3.65 (n/a)3.64 (n/a)3.54 (n/a)0.10 (n/a)

M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:197.15 (n/a)6.48 (n/a)6.79 (n/a)5.44 (n/a)0.72 (n/a)7.14 (n/a)6.47 (n/a)6.78 (n/a)5.44 (n/a)0.72 (n/a)

M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:1913.91 (n/a)10.29 (n/a)10.09 (n/a)7.62 (n/a)2.52 (n/a)13.90 (n/a)10.28 (n/a)10.08 (n/a)7.61 (n/a)2.52 (n/a)

M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:191.49 (n/a)1.19 (n/a)1.10 (n/a)0.80 (n/a)0.29 (n/a)654.70 (n/a)464.02 (n/a)475.80 (n/a)351.90 (n/a)122.31 (n/a)

M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:191.57 (n/a)0.94 (n/a)0.90 (n/a)0.49 (n/a)0.43 (n/a)1065.70 (n/a)657.56 (n/a)584.50 (n/a)333.40 (n/a)294.94 (n/a)

M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:194.15 (n/a)2.76 (n/a)2.11 (n/a)1.93 (n/a)1.00 (n/a)4166.00 (n/a)3212.66 (n/a)3828.10 (n/a)1942.70 (n/a)1016.02 (n/a)1088.14 (n/a)724.18 (n/a)552.21 (n/a)507.42 (n/a)263.53 (n/a)

M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:190.28 (n/a)0.21 (n/a)0.21 (n/a)0.17 (n/a)0.04 (n/a)7189.40 (n/a)6168.16 (n/a)6071.50 (n/a)4476.90 (n/a)1088.54 (n/a)14.99 (n/a)11.20 (n/a)11.05 (n/a)9.33 (n/a)2.28 (n/a)

M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:193.83 (n/a)3.78 (n/a)3.82 (n/a)3.61 (n/a)0.09 (n/a)3.83 (n/a)3.77 (n/a)3.82 (n/a)3.61 (n/a)0.09 (n/a)

M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:197.67 (n/a)6.44 (n/a)5.97 (n/a)5.63 (n/a)0.93 (n/a)7.67 (n/a)6.44 (n/a)5.96 (n/a)5.63 (n/a)0.92 (n/a)

M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
c2f68ef — 2026-04-15 17:40:1914.04 (n/a)11.43 (n/a)12.69 (n/a)8.01 (n/a)2.94 (n/a)14.03 (n/a)11.42 (n/a)12.69 (n/a)8.01 (n/a)2.93 (n/a)

embedding_dim_2048-hidden_dim_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)17939.32 (n/a)11245.34 (n/a)9154.00 (n/a)5559.89 (n/a)5235.20 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.02 (n/a)0.03 (n/a)0.00 (n/a)0.01 (n/a)1923.20 (n/a)423.52 (n/a)325.45 (n/a)194.00 (n/a)312.11 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)548.20 (n/a)386.34 (n/a)374.50 (n/a)286.40 (n/a)99.10 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)565.10 (n/a)489.92 (n/a)484.30 (n/a)439.00 (n/a)48.83 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)641.50 (n/a)398.20 (n/a)410.10 (n/a)247.10 (n/a)155.76 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.02 (n/a)0.03 (n/a)0.00 (n/a)0.01 (n/a)1920.00 (n/a)439.32 (n/a)303.20 (n/a)222.70 (n/a)335.25 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)658.60 (n/a)429.30 (n/a)379.80 (n/a)262.60 (n/a)167.09 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.03 (n/a)0.00 (n/a)0.01 (n/a)1957.40 (n/a)656.10 (n/a)317.00 (n/a)268.00 (n/a)732.03 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)492.10 (n/a)378.80 (n/a)415.90 (n/a)251.60 (n/a)113.21 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.05 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)1058.40 (n/a)454.07 (n/a)381.95 (n/a)242.10 (n/a)246.13 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.05 (n/a)0.03 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)571.40 (n/a)388.44 (n/a)342.90 (n/a)251.30 (n/a)134.54 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2489.70 (n/a)529.66 (n/a)479.80 (n/a)197.50 (n/a)431.57 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)373.70 (n/a)308.38 (n/a)345.40 (n/a)195.60 (n/a)74.79 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)601.40 (n/a)433.40 (n/a)444.90 (n/a)272.90 (n/a)119.86 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.03 (n/a)0.04 (n/a)0.01 (n/a)0.01 (n/a)683.10 (n/a)397.16 (n/a)273.90 (n/a)271.00 (n/a)184.63 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)1031.60 (n/a)523.54 (n/a)531.70 (n/a)182.50 (n/a)222.01 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)536.20 (n/a)439.00 (n/a)463.40 (n/a)261.50 (n/a)103.98 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)545.30 (n/a)358.02 (n/a)309.30 (n/a)215.00 (n/a)135.74 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)569.70 (n/a)426.82 (n/a)414.00 (n/a)347.00 (n/a)90.80 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)571.80 (n/a)370.66 (n/a)362.35 (n/a)239.60 (n/a)127.32 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)835.30 (n/a)557.82 (n/a)518.30 (n/a)327.50 (n/a)185.01 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.05 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2450.30 (n/a)459.11 (n/a)382.45 (n/a)162.30 (n/a)401.82 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)0.00 (n/a)2448.70 (n/a)863.98 (n/a)485.50 (n/a)360.90 (n/a)888.68 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)523.60 (n/a)390.38 (n/a)417.50 (n/a)243.90 (n/a)104.64 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)558.40 (n/a)378.10 (n/a)346.90 (n/a)230.60 (n/a)121.64 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)1406.00 (n/a)495.19 (n/a)463.40 (n/a)281.60 (n/a)242.52 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)648.40 (n/a)519.62 (n/a)559.50 (n/a)257.50 (n/a)151.52 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)685.20 (n/a)415.78 (n/a)346.50 (n/a)254.10 (n/a)170.19 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.07 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)624.80 (n/a)383.80 (n/a)335.75 (n/a)188.20 (n/a)145.16 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.05 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)852.00 (n/a)504.12 (n/a)490.00 (n/a)240.40 (n/a)231.14 (n/a)

input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.04 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)1907.50 (n/a)695.04 (n/a)477.50 (n/a)224.00 (n/a)685.85 (n/a)

input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)631.20 (n/a)517.62 (n/a)560.20 (n/a)267.70 (n/a)145.26 (n/a)

input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)583.70 (n/a)488.20 (n/a)486.60 (n/a)390.00 (n/a)69.07 (n/a)

input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)606.30 (n/a)410.88 (n/a)435.10 (n/a)243.10 (n/a)152.03 (n/a)

input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)577.90 (n/a)417.28 (n/a)478.00 (n/a)260.50 (n/a)144.55 (n/a)

input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)681.30 (n/a)519.08 (n/a)483.90 (n/a)429.40 (n/a)99.30 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.48 (n/a)0.32 (n/a)0.21 (n/a)0.20 (n/a)0.15 (n/a)643.20 (n/a)483.94 (n/a)610.80 (n/a)270.30 (n/a)194.05 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.54 (n/a)0.37 (n/a)0.38 (n/a)0.19 (n/a)0.15 (n/a)675.70 (n/a)417.14 (n/a)349.00 (n/a)241.00 (n/a)186.61 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.58 (n/a)0.31 (n/a)0.23 (n/a)0.07 (n/a)0.21 (n/a)1980.20 (n/a)756.98 (n/a)574.20 (n/a)224.20 (n/a)714.29 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.33 (n/a)0.23 (n/a)0.19 (n/a)0.16 (n/a)0.08 (n/a)604.70 (n/a)463.14 (n/a)509.40 (n/a)293.70 (n/a)135.88 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.34 (n/a)0.25 (n/a)0.21 (n/a)0.19 (n/a)0.07 (n/a)506.40 (n/a)412.56 (n/a)472.40 (n/a)292.30 (n/a)103.58 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.40 (n/a)0.24 (n/a)0.22 (n/a)0.09 (n/a)0.11 (n/a)1055.00 (n/a)517.44 (n/a)453.40 (n/a)246.80 (n/a)315.32 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.29 (n/a)0.20 (n/a)0.20 (n/a)0.10 (n/a)0.08 (n/a)715.70 (n/a)433.18 (n/a)372.40 (n/a)258.60 (n/a)198.36 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.31 (n/a)0.22 (n/a)0.27 (n/a)0.03 (n/a)0.12 (n/a)2487.50 (n/a)730.92 (n/a)271.20 (n/a)240.80 (n/a)984.39 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.19 (n/a)0.15 (n/a)0.14 (n/a)0.12 (n/a)0.03 (n/a)612.00 (n/a)496.68 (n/a)514.40 (n/a)380.50 (n/a)85.03 (n/a)

seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
c2f68ef — 2026-04-15 17:40:190.15 (n/a)0.11 (n/a)0.10 (n/a)0.08 (n/a)0.03 (n/a)25118.22 (n/a)19788.62 (n/a)19965.81 (n/a)13744.68 (n/a)4880.90 (n/a)
Phoenix - Examples

IRONCLAD

Tested on 2026_04_15_19_26_50 at commit c2f68ef.

Test Checks TTFT (mean)TPS (mean)

Trends:

IRONCLAD Trends

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant