SiFive Patents(US20230367715A1): Load-Store Pipeline Selection for Vector

Created2026-03-03|Updated2026-03-20|Archvectorrvv

|Post Views:

Patents	Inc.	Title
US20230367715A1	SiFive	Load-Store Pipeline Selection for Vector.

背景与针对问题

RVV 向量指令的操作数长度可能超过单周期内存写入能力，因此需要多拍执行
向量 store 指令可能使用 stride 或 index 寻址，导致地址不连续，难以在指令调度时完全确定所有地址
处理器中可能有多个 store unit（如 load/store pipeline 和 store-only pipeline）以提升内存写入带宽

存在的问题：

store 指令的执行策略：
1. 通过 L1 Cache 写入内存，带宽有限
2. 绕过 L1 Cache 直接写入 L2 Cache，带宽更高，但可能引发 cache invalidation （一致性导致），降低性能
如何在不引发缓存一致性问题的情况下，动态选择最优存储单元成为关键

解法

1. 基于第一拍地址的动态选择

在 vector store 指令被调度时，识别出第一拍（first beat）的地址（即第一个元素的目标地址）

根据该地址决定使用哪个存储单元：

若该地址在 L1 Cache 中命中，则使用第一存储单元（通过 L1 Cache 写入）
若未命中，则使用第二存储单元（绕过 L1 Cache，直接写入 L2 Cache）

2. 使用预测器（Predictor）进行选择

引入一个预测器电路，输入为：

PC
第一拍地址

输出为预测结果：选择第一或第二存储单元

预测器条目通过 PC 与地址的哈希值索引，存储一个饱和计数器

更新机制：

若某条指令使用第二存储单元导致 L1 Cache 无效化，则对应计数器减一
若指令顺利完成且未引发无效化，则计数器加一

3. 内存映射I/O（MMIO）处理

若第一拍地址属于 MMIO 区域，则强制使用第一存储单元，确保 I/O 操作的顺序性和正确性

4. 处理指令依赖（Hazard Handling）

由于两个存储单元并行执行，可能出现 WAW 或 WAR 冒险

解决方案：

强制调度到同一单元：若检测到目标地址冲突，将后续指令调度到同一存储单元，确保顺序执行
延迟调度：若冲突存在，延迟后续指令的调度，直到前一条指令完成

Author: xixi

Link: https://xixi-shredp.github.io/2026/03/03/Arch/vector/rvv/patent_US20230367715A1/

Copyright Notice: All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.

Related Articles

SiFive Patents(US20230367599A1): Vector Gather with a Narrow Datapath

SiFive 专利解读: RVV Gather 操作

Ventana Patent(US12487830B1): Prediction unit with first predictor that provides a hashed fetch address of a current fetch block to its own input and to a second predictor that uses it to predict the fetch address of a next fetch block

Ventana 专利简介：BTB 预测 ICache Set/Way + SCP 前端完整流水线

Alibaba Patents(US11467844B2): Storing multiple instructions in a single reordering buffer entry

Alibaba 专利解读: ROB Entry 存储多个指令

Tenstorrent Patent(US20250328350A1): Branch Status Table and Control Instruction Buffer for Processor Instruction Pipeline

Tenstorrent 专利解读: 分支预测恢复时对于控制类指令（如 vsetvl）的处理

ventana-US12253951B1: Microprocessor with branch target buffer whose entries include fetch block hotness counters used for selective filtering of macro-op cache allocations

ventana 专利简介：BTB 决定 Fetch Block 是否进入 Mop Cache

Intel Patent(US12190114B2): Segmented branch target buffer based on branch instruction type

Intel 专利简介：新型 BTB 结构设计 Segmented BTB