Stream-based Memory Access Specialization for General Purpose Processors
Authors: Zhengrong Wang, Tony Nowatzki
Venue: ISCA 2019
This paper presents change to architecture, ISA, and compilers to optimize the performance of memory loading in load/store streams. Streams are defined as "the dynamic sequence of memory operations associated with a static instruction, where the longest extent is defined as the entry and exit of the outermost containing loop." These can be characterized as affine (simple strides), indirect (based off a single pointer), or pointer-chasing. Most streams are affine or indirect via the author's measurements.
To accelerate the memory subsystem, code must be augmented with sematics which pass information to the proposed stream engine. Based off this and other code information, the stream engine can fetch instructions ahead of time. The authors also extend this design with the option to bypass the cache in streaming designs. The mechanism as a whole outperforms 1000 instruction run-ahead processing as well as hardware prefetching designs.
Discussion: One of the key benefits to this approach seems to be prefetching beyond page boundaries, which hardware prefetchers are limited to. The additional semantic information augments the approach, but at the cost of recompilation, ISA changes, etc. The paper also presents a potential security vulnerability via side-channel, as prefetched accesses can bypass memory protection faults. Despite this, the paper still establishes a new state-of-the-art in what I would deem a well-studied area. The paper is well-written definitely worth a read if interested in memory sub-system improvements at the prefetcher/ISA level.
Full Text
Venue: ISCA 2019
This paper presents change to architecture, ISA, and compilers to optimize the performance of memory loading in load/store streams. Streams are defined as "the dynamic sequence of memory operations associated with a static instruction, where the longest extent is defined as the entry and exit of the outermost containing loop." These can be characterized as affine (simple strides), indirect (based off a single pointer), or pointer-chasing. Most streams are affine or indirect via the author's measurements.
To accelerate the memory subsystem, code must be augmented with sematics which pass information to the proposed stream engine. Based off this and other code information, the stream engine can fetch instructions ahead of time. The authors also extend this design with the option to bypass the cache in streaming designs. The mechanism as a whole outperforms 1000 instruction run-ahead processing as well as hardware prefetching designs.
Discussion: One of the key benefits to this approach seems to be prefetching beyond page boundaries, which hardware prefetchers are limited to. The additional semantic information augments the approach, but at the cost of recompilation, ISA changes, etc. The paper also presents a potential security vulnerability via side-channel, as prefetched accesses can bypass memory protection faults. Despite this, the paper still establishes a new state-of-the-art in what I would deem a well-studied area. The paper is well-written definitely worth a read if interested in memory sub-system improvements at the prefetcher/ISA level.
Full Text
Comments
Post a Comment