Apple Data Prefetcher Overview
Patents List
| Title | Filing Date | Status | Number / URL | 主要方向 |
|---|---|---|---|---|
| Coprocessor prefetcher | 2024-07-25 latest continuation; original 2021-12-10 | Grant / Application | US11755333B2, https://patents.google.com/patent/US11755333B2/en; US12050918B2, https://patents.google.com/patent/US12050918B2/en; US20250094174A1, https://patents.google.com/patent/US20250094174A1/en | 协处理器 operand-address 捕获预取;非 CPU DMP,但属于 Apple 近年 prefetcher/indirect operand fetch 布局 |
| Deny list for a memory prefetcher circuit | 2023-05-24 | Grant | US12306762B1, https://patents.google.com/patent/US12306762B1/en | 无用预取过滤、deny list、pollution control |
| Multi-table signature prefetch | 2021-07-21 | Grant | US11630670B2, https://patents.google.com/patent/US11630670B2/en | instruction/control-flow signature prefetch;与数据预取的 signature 机制相邻 |
| Multi-block cache fetch techniques | 2021-05-19 | Grant | US12248399B2, https://patents.google.com/patent/US12248399B2/en | cache miss 后多 block 聚合 fetch;接近 adjacent/sector prefetch fill path |
| Low latency fetch circuitry for compute kernels | 2020-10-08 continuation; original 2018-09-26 | Grant | US10838725B2, https://patents.google.com/patent/US10838725B2/en; US11256510B2, https://patents.google.com/patent/US11256510B2/en | compute command stream 中提前解析 indirect-data-access items;GPU/compute front-end 的 IMA-like fetch,而非 CPU cache DMP |
| Secondary prefetch circuit that reports coverage to a primary prefetch circuit to limit prefetching by primary prefetch circuit | 2020-03-27 | Grant | US11176045B2, https://patents.google.com/patent/US11176045B2/en | primary/secondary prefetcher 协同、coverage hint、large-stride/SMS |
| Sequential prefetch boost | 2017-04-26 | Grant | US10346309B1, https://patents.google.com/patent/US10346309B1/en | 顺序流 prefetch boost / prefetch depth 调节 |
| Prefetch circuit with global quality factor to reduce aggressiveness in low power modes | 2017-02-17 | Grant | US10331567B1, https://patents.google.com/patent/US10331567B1/en | global quality factor、low-power throttling、large-stride/SMS adjunct |
| Unified prefetch circuit for multi-level caches | 2016-04-07 | Grant | US10180905B1, https://patents.google.com/patent/US10180905B1/en; continuation US10621100B1, https://patents.google.com/patent/US10621100B1/en | AMPM/access-map prefetch、multi-level cache target、per-cache quality factor |
| Prefetch throttling in a multi-core system | 2016-04-07 | Grant | US9904624B1, https://patents.google.com/patent/US9904624B1/en | shared-cache/request-queue pressure、low-confidence prefetch throttling |
| Access map-pattern match based prefetch unit for a processor | 2013-07-16 | Grant | US9015422B2, https://patents.google.com/patent/US9015422B2/en | AMPM/access-map pattern match、wildcard、quality factor |
| Pointer chasing prediction | 2013-05-09 | Grant | US9116817B2, https://patents.google.com/patent/US9116817B2/en | dependent load 提前调度与 LSU-internal forwarding;IMA/pointer-chasing 支撑机制 |
| Prefetching across page boundaries in hierarchically cached processors | 2012-11-29 | Grant | US9047198B2, https://patents.google.com/patent/US9047198B2/en | upper-level prefetcher 提前获取 next-page translation,避免 lower-level stream prefetcher 在页边界停顿 |
| Converting memory accesses near barriers into prefetches | 2012-07-17 | Grant | US8856447B2, https://patents.google.com/patent/US8856447B2/en | memory barrier 后受阻 load/store 转换为 prefetch request;ARM DMB/DSB 场景的 latency hiding |
| Coordinated prefetching in hierarchically cached processors | 2012-03-20; EP filed 2013-03-18 | Grant | US9098418B2, https://patents.google.com/patent/US9098418B2/en; EP2642398B1 / EP2642398A1, https://patents.google.com/patent/EP2642398A1/en | 多级 cache 层级协同、stream training、prefetch target selection |
| Prefetch unit | 2009-01-07 | Grant | US7779208B2, https://patents.google.com/patent/US7779208B2/en; continuation US7996624B2, https://patents.google.com/patent/US7996624B2/en | 多 active stream、software/hardware initiated data cache prefetch |
Stream Prefetching
Sequential prefetch boost (Apple, US10346309B1, filed 2017-04-26)
- 背景:顺序 stream 预取有时训练保守,无法快速把预取距离拉到足够隐藏内存延迟。
- 核心设计:在检测到稳定 sequential access / stream 后提升 prefetch behavior,例如增加深度或更积极地推进预取位置。
- 关联方向:这是常规 stream prefetch 增强;但可作为 Apple 数据预取子系统中的低复杂度高覆盖组件。
Prefetch unit (Apple / originally P.A. Semi lineage, US7779208B2, filed 2009-01-07; continuation US7996624B2, filed 2010-07-06)
- 背景:早期 Apple-assignee 处理器预取专利仍以 data cache stream prefetch 为核心。
- 核心设计:prefetch unit 连接 data cache,同时维护多个 active prefetch streams;stream 可由 software prefetch instruction 启动,也可由 load/store miss 硬件启动。
- 关联方向:代表 Apple/PA Semi 早期 data-cache stream prefetch 基线,后续 AMPM/access-map 与 multi-level 控制可看作在该基础上的系统化扩展。
Prefetching across page boundaries in hierarchically cached processors (Apple, US9047198B2, filed 2012-11-29)
- 背景:lower-level prefetch units 通常跑在 upper-level prefetch units 前面,遇到 virtual page boundary 时若没有 next-page translation 会停顿。
- 核心设计:upper-level prefetch unit 在接近页边界时预先请求 next-page translation,并在 lower-level prefetch units 到达当前页末尾前传递该 translation,使其可跳到 next physical page 继续 prefetch。
- 关联方向:不是 DMP/IMA,但它补齐了 Apple 多级 stream prefetch 在页边界、TLB/translation 和 L2/L3 预取连续性上的设计。
Spatial Prefetcher
- Access map-pattern match based prefetch unit for a processor (Apple, US9015422B2, filed 2013-07-16)
- 背景:out-of-order execution 会让 cache access map 出现噪声,固定 stride/stream 不能表达很多 irregular-but-repeatable pattern。
- 核心设计:在 access map memory 中记录 region 内 cache-block access 状态,用 access pattern memory 匹配 pattern;pattern 可包含 wildcard,并用 quality factor 控制预取速率。
- 关联方向:这是 Apple AMPM-style data prefetcher 的基础专利,和 region/spatial footprint 预取器关系很近。
IMA/DMP
| Title | Filing Date | Number | URL |
|---|---|---|---|
| Content-directed prefetch circuit with quality filtering | 2016-08-25 | US9886385B1 | https://patents.google.com/patent/US9886385B1/en |
| Prefetch circuit for a processor with pointer optimization | 2015-06-24 | US9971694B1, https://patents.google.com/patent/US9971694B1/en; continuation US10402334B1, https://patents.google.com/patent/US10402334B1/en | pointer-aware load/store prefetch |
CDP: Content-directed prefetch circuit with quality filtering (Apple, US9886385B1, filed 2016-08-25)
- 背景:pointer chasing 和 linked data structure 不能被简单 stride/stream 预取很好覆盖;从缓存行内容中识别 pointer candidate 又容易误判。
- 核心设计:从 loaded cache line 中寻找 memory pointer candidate,并用 quality factor table 过滤;该表可由 PC 与 relative cache-line offset 等上下文索引,达到阈值后才允许 prefetch。
- 关联方向:这是目前公开专利中最接近 Apple CPU DMP 的核心文本:它明确扫描 data cache line fill 内容、识别 memory pointer candidate,并基于质量反馈决定是否预取 candidate 指向的 cache line。
AMPM-Pointer: Prefetch circuit for a processor with pointer optimization (Apple, US9971694B1, filed 2015-06-24; US10402334B1, filed 2018-04-09)
- 背景:pointer-heavy code 中 load/store 序列可能包含可提前识别的目标地址,但普通 AMPM/stride 机制覆盖不足。
- 核心设计:在 processor prefetch circuit 中加入 pointer optimization,围绕 load/store access、cache state 和 prefetch request 控制 pointer-aware prefetch。
- 关联方向:和
Content-directed prefetch circuit with quality filtering一起构成 Apple 对 pointer/pointer-like data access 的专利布局。
Pointer Chasing Prediction
| Title | Filing Date | Number | URL |
|---|---|---|---|
| Reducing latency for pointer chasing loads | 2014-04-29 | US9710268B2, https://patents.google.com/patent/US9710268B2/en | pointer chasing load-to-load/store 地址生成旁路;不是预取器,但解释 Apple 对 IMA/pointer-chasing latency 的硬件优化 |
Reducing latency for pointer chasing loads (Apple, US9710268B2, filed 2014-04-29)
- 背景:pointer chasing 中 producer load 的结果会作为 younger dependent load/store 的地址输入,常规通过 register file / reservation station 转发会增加 load-to-load latency。
- 核心设计:当 producer load 预计不会命中 store queue 时,dependent load/store 可提前 issue;producer load 结果从 data cache 直接旁路到 address generation unit,用于生成 dependent load/store 地址。
- 关联方向:不是预取器,但它直接面向 pointer chasing latency,是 IMA/DMP 讨论中值得放在旁边的 Apple load-use 旁路优化。
Pointer chasing prediction (Apple, US9116817B2, filed 2013-05-09)
- 背景:linked list traversal 等 pointer chasing 会形成 older producing load -> younger consuming load 的依赖链。
- 核心设计:scheduler 预测 producing load 的结果适合 LSU-internal forwarding,且 younger load 依赖该结果时,提前 issue younger load;LSU 将 producing load 的结果转发给 address generation logic。
- 关联方向:同样不是 prefetcher,而是 dependent-load scheduling;与 Apple DMP/IMA 共同瞄准 pointer-chasing/indirect-address latency。
Instructions Prefetching
- Multi-table signature prefetch (Apple, US11630670B2, filed 2021-07-21)
- 背景:单一 signature generation technique 对不同 control-flow path 的历史长度和混叠表现不一致。
- 核心设计:使用多个 signature generation technique 和多个 signature prefetch table,在 training event 上更新并用于 future instruction/control-flow prefetch。
- 关联方向:它更偏 instruction/control-flow prefetch,不宜作为数据预取器直接类比;但 multi-table signature、history hashing 和 confidence 管理思路可借鉴到 data delta/signature prefetcher。
Prefetching Throttling
Deny list for a memory prefetcher circuit (Apple, US12306762B1, filed 2023-05-24)
- 背景:aggressive prefetcher 会把未使用的 cache line 带入 cache,造成污染、带宽和功耗浪费。
- 核心设计:cache eviction 时读取 prefetch indicator,如果 prefetched line 被 evicted untouched,则将地址或地址组加入 prefetch deny list;后续命中 active deny-list entry 的 prefetch request 被拒绝。
- 关联方向:这是工业界很直接的 useless-prefetch filter,可作为 FDP/PPF/NST 之外的轻量负反馈节流机制。
Prefetch throttling in a multi-core system (Apple, US9904624B1, filed 2016-04-07)
- 背景:多核共享 lower-level cache / memory subsystem 时,一个 core 的 prefetch 可能阻塞另一个 core 的 demand fetch。
- 核心设计:shared/external cache 侧按 processor 统计 request queue 中 demand fetch 与 low-confidence prefetch 的 occupancy;根据阈值和历史样本向 core 发送 throttle control,逐步限制低置信度预取。
- 关联方向:与 HPAC/SPAC/NST 一类多核 prefetch throttling 目标一致,但更强调 shared cache queue pressure 与 per-core 反馈。
Multi-Prefetcher Management
Secondary prefetch circuit that reports coverage to a primary prefetch circuit to limit prefetching by primary prefetch circuit (Apple, US11176045B2, filed 2020-03-27)
- 背景:多个 data prefetcher 使用不同机制时,可能同时覆盖同一 data stream,造成 over-prefetching、低准确率和额外功耗。
- 核心设计:primary prefetch circuit 对 demand access 生成预取并调用 secondary prefetch circuit;secondary circuit 达到 threshold confidence 后回传 coverage hint,primary circuit 据此减少对应 demand/access-map 的预取数量。
- 关联方向:非常贴近多预取器协同问题;专利明确提到 large stride prefetch circuit 与 Spatial Memory Streaming (SMS) prefetch circuit 可作为 secondary prefetcher。
Unified prefetch circuit for multi-level caches (Apple, US10180905B1, filed 2016-04-07; continuation US10621100B1, filed 2018-12-05)
- 背景:同一个 access pattern 对 L1/data cache、L2/LLC 等不同层级的最佳填充位置不同;分散的预取器难以统一控制准确率和目标层级。
- 核心设计:基于 access map-pattern match 的 prefetch circuit 统一观察 load/store demand accesses,按 pattern 产生面向不同 cache level 的 prefetch,并用 per-cache quality factor 控制各层预取。
- 关联方向:这基本是 Apple AMPM-style 数据预取器的 multi-level 版本,对 L1/L2/LLC target selection 和 per-level feedback 很有参考价值。
Coordinated prefetching in hierarchically cached processors (Apple, US9098418B2, filed 2012-03-20; EP2642398B1 / EP2642398A1 filed 2013-03-18)
- 背景:多级 cache 中,不同层级 prefetcher 若独立训练和发请求,容易重复、错层填充或互相干扰。
- 核心设计:以单一 unified training mechanism 训练 core 产生的 stream;core 向 lower-level cache 发送 prefetch request 时携带 stream ID 和 training information,下级 cache 依此生成自己的 prefetch。
- 关联方向:比单体 AMPM 更接近 L1/L2/LLC 协同;适合与 Intel physical-page prefetch、AMD throttling、Arm pattern selection 放在同一类讨论。
Coprocessor Prefetcher
- Coprocessor prefetcher (Apple, US11755333B2 / US12050918B2 / US20250094174A1, original filed 2021-12-10; latest continuation filed 2024-07-25)
- 背景:processor 与 coprocessor 混合执行时,coprocessor operand data 的地址由 processor 侧代码序列生成,若等到 coprocessor 执行再取数会暴露额外延迟。
- 核心设计:coprocessor prefetcher 监控 processor 获取的 code sequence,识别 coprocessor instructions 后捕获 processor 生成的 operand memory addresses,并在 coprocessor 执行前向其可访问 cache 发起 prefetch。
- 是 Apple 2023-2025 仍在延续的 prefetcher 专利族,说明 Apple 仍在围绕 indirect operand fetch / heterogeneous execution latency hiding 做布局
Power
- Prefetch circuit with global quality factor to reduce aggressiveness in low power modes (Apple, US10331567B1, filed 2017-02-17)
- 背景:移动 SoC 中 prefetch performance 与 energy/battery life 之间需要动态平衡。
- 核心设计:在 prefetch circuit 中维护 per-entry quality factor 之外的 global quality factor;当 outstanding prefetch 或低功耗模式下的整体质量较低时,减少生成的 prefetch request。
- 关联方向:把 AMPM/access-map、large-stride prefetch、SMS-like spatial mechanism 和 global throttling 放在同一 prefetch subsystem 中,是 Apple 数据预取器专利族的关键连接点。