RISC-V Matrix: PULP Quadrilatero
ISA Extension Spec
Instructions
mld_wmst_bmst_hmst_wmzerommaqa_bmmada_hmmasa_wfmmacc_bfmmacc_hfmmacc_s
Gem5 Implementation
Matrix Register Model
The matrix register model is defined in: src/arch/riscv/regs/mat.hh
NumMatRegs = 8- Matrix registers are exposed as
m0throughm7 - Storage uses
MatRegContainer = gem5::MatStore<16, 4> - Each matrix register stores:
- 4 * 4 * 32 bits
- 4 rows (16B), 4 cols
The same raw storage is reinterpreted as:
int8_tformmaqa_b/fmmacc_bint16_t/fp16bit patterns formmada_h/fmmacc_hint32_t/fp32bit patterns formmasa_w/fmmacc_s
Matrix Register Row packing helpers
mat.hh also contains the helper functions used by the memory micro-ops:
- row byte serialization
- row word deserialization
- per-macro load state support
Instruction
decoder.isa:
- 检查删除
ENABLE_QMAT- decode RD 命名不合法
Matrix instruction format support lives in: src/arch/riscv/isa/formats/matrix.isa
Includes:
- regular matrix op templates
- matrix macro instruction templates
- matrix load row micro-op templates
- matrix store row micro-op templates
Arithmetic Instructions
Arithmetic instructions:
mzeroclears the destination matrix registermmaqa_bperforms 16-element signed byte dot products per output elementmmada_hperforms 8-element signed halfword dot products per output elementmmasa_wperforms 4-element signed word dot products per output elementfmmacc_sperforms 4-element fp32 dot products per output elementfmmacc_hconverts fp16 bit patterns to fp32 and accumulates in fp32fmmacc_bcurrently treats 8-bit lanes as integer values converted to float
before accumulation
Memory Access Instructions
matrix memory instructions were expanded into row-level micro-ops during decoding:
mld_wexpands into 4 row-load micro-opsmst_b,mst_h,mst_weach expand into 4 row-store micro-opsmst_w: 4 x 32-bit words per rowmst_h: each 32-bit word split into low 16 bits then high 16 bitsmst_b: each 32-bit word split into 4 little-endian bytes
Architectural visibility:mld_w uses atomic final visibility semantics:
- the 4 row loads fill a temporary tile buffer
- the matrix destination register is only committed after all rows complete
All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.