Look Before You Leap: Precision Instruction Supply via SmartScout

Xuefeng Zhang, Peng Qu, Tingji Zhang, Fang Su, Zhe Pan, Youhui Zhang

May, 2026

Abstract

Modern high-performance processors extensively employ Fetch-Directed Instruction Prefetching (FDIP) to mitigate instruction supply bottlenecks. However, the efficacy of FDIP is fundamentally constrained by the accuracy of the Branch Prediction Unit (BPU). As the critical component within the BPU, the Branch Target Buffer(BTB) faces severe capacity bottlenecks. While pre-decoding-based prefetching offers a remedy, existing approaches suffer from two critical impediments: (1) The Noise Dilemma: Suboptimal trade-off between coverage and accuracy. (2) Inefficient Miss Resolution: Current designs rely on reactive recovery or stalls, failing to leverage available front-end slack for proactive correction. To address these challenges, we propose SmartScout, a high-accuracy and timely BTB prefetching architecture. SmartScout integrates two synergistic mechanisms: (1) Runtime-Based Noise Filtering, which leverages branch prediction confidence to isolate and prefetch only taken-biased branches, eliminating pollution at the source; and (2) FTQ In-Flight Correction, which exploits the “Verification Slack” within the populated FTQ to detect and correct BTB misses before instructions enter the backend. Evaluated against state-of-the-art baselines, SmartScout delivers a 5% performance improvement at iso-hardware cost.

Publication

In ACM International Conference on Supercomputing 2026