Add single-dispatch layer-by-layer multi-head attention#91
Add single-dispatch layer-by-layer multi-head attention#91
Conversation
There was a problem hiding this comment.
Can we reuse the reference from the existing mha? (Note: does not include RoPE and Q, K, V projections, but some code reuse should be possible.)
📊 Test Results for Test Example Applications1d87fe8 (2026_04_07_21_05_39) IRONCLADTested on
📈 Trends (vs main branch) for Test Example Applications1d87fe8 (2026_04_07_21_05_39) IRONCLAD Trendsllama_3.2_1b
llama_3.2_1b_prompt_1024_tokens_1
llama_3.2_1b_prompt_1024_tokens_40
llama_3.2_1b_prompt_13_tokens_1
llama_3.2_1b_prompt_13_tokens_40
llama_3.2_1b_prompt_2048_tokens_1
llama_3.2_1b_prompt_2048_tokens_40
|
CI Test Resultsc2f68ef (2026_04_15_17_54_50) IRONCLAD - CI SummaryExamples
Small
Extensive
Krackan - SmallIRONCLADTested on
Trends: IRONCLAD TrendsGPT2-Small-256seq
H2
Llama3.2-256seq
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32
input_length_2048-num_aie_columns_8-tile_size_256
input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0
Krackan - ExamplesIRONCLADTested on
Trends: IRONCLAD Trendsllama_3.2_1b_prompt_1024_tokens_1
llama_3.2_1b_prompt_1024_tokens_40
llama_3.2_1b_prompt_13_tokens_1
llama_3.2_1b_prompt_13_tokens_40
Phoenix - SmallIRONCLADTested on
Trends: IRONCLAD TrendsGPT2-S1024-causalNo metrics available. GPT2-S1024-nomaskNo metrics available. GPT2-S2048-causalNo metrics available. GPT2-S2048-nomaskNo metrics available. GPT2-S256-causalNo metrics available. GPT2-S256-nomaskNo metrics available. GPT2-S4096-causalNo metrics available. GPT2-S4096-nomaskNo metrics available. GPT2-S512-causalNo metrics available. GPT2-S512-nomaskNo metrics available. GPT2-Small-256seqNo metrics available. H2No metrics available. Llama3.2-256seqNo metrics available. M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
embedding_dim_2048-hidden_dim_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False
Phoenix - ExamplesIRONCLADTested on
Trends: IRONCLAD Trends |
"Naive" alternative implementation for multi-head attention from the currently checked-in data-flow design. This is a simple layer-by-layer implementation, but it uses the single-dispatch mechanism to fuse it all into one MLIR file and save on CPU roundtrips and XRT overheads.
Includes two variants:
Q,K,V. This matches the functionality of the checked-in dataflow MHA.