Final project for CS246 - Advanced Computer Architecture
This is a simulation of a Value Prediction Unit (VPU) as introduced in the Exceeding the Dataflow Limit via Value Prediction [Mikko H. Lipasti et. all, 1996], using Intel Pin Utility
The basic structure includes a Value Prediction Table (VPT), containing the previous write values for each address indexed by the lowest n bits of the instruction pointer (PC), and a Classification Table (CT) containing an y bit saturating counter that represents the likelihood of a positive prediction, indexed by the lowest m bits of PC address
Values for m,n, and y are all configurable knob parameters.
Later versions of this VPU includes an implimentation of variable-depth value histories, where multiple previous values can be stored in the VPT under a Least Recently Used (LRU) eviction policy.
The most recent version of this VPU adds the option to add a Victim Cache to the the VPT. The victim cache stores values evicted from the VPT with an x bit LRU stack.
install Intel Pin Utility
make
pin -t obj-intel64/main.so -outfile results.out -size 10 -- /bin/ls
To create a 1024 entry VPT (and default 256 entry), and simulate on /bin/ls
bash command
CLI arguments
ARG | DEFAULT | DESCRIPTION |
---|---|---|
"outfile" | "tool.out" | "Output file for the pintool" |
"pid" | "0" | "Append pid to output" |
"inst_limit" | "1000000000" | "Quit after executing x number of instructions" |
"inst_cat" | "ALL" | "What category of instructions?" (Not supported) |
"size" | "8" | "Size of Value Prediction table in bits. Total length = 2**size" |
"CTbits" | "1" | "Size of CT prediction history counter in bits" |
"CTsize" | "8" | "Size of Classification table in bits. Total length = 2**size"" |
"HistDepth" | "1" | "Value history size" |
"VictimCache" | "0" | "Entries in victim cache" |
Instructions are catagorized by the following for processing:
Instruction Category | Description |
---|---|
I_PURE_LOAD | Integer Pure Load Instruction (e.g. mov esi, dword ptr [rsi] ) |
I_LOAD_ARITH | Integer Load + Arithmetic Instruction (e.g. add r8, qword ptr [rsi+0x10] ) |
I_ARITH_1OP | Integer Pure Arithmetic Instruction, 1 operand (e.g add rcx, 0x40 ) |
I_ARITH_2OP | Integer Pure Arithmetic Instruction, 2 operand (e.g add rcx, rax ) |
I_REG_MOV | Integer Register Move Instruction (e.g. mov rax, rdi ) |
F_PURE_LOAD | Floating Point Pure Load Instruction (e.g. movdqa xmm5, xmmword ptr [rdi] ) |
F_LOAD_ARITH | Floating Point Load with Arithmetic Instruction (e.g. pminub xmm4, xmmword ptr [rdi+0x30] ) |
F_PURE_ARITH | Floating Point Pure Arithmetic Instruction (e.g. paddd xmm0, xmm6 ) |
F_REG_MOVE | Floating Point Register Move Instructions (e.g. movd xmm0, esi ) |
UNKNOWN | Anything not classified above |
Full results can be found here: Report
Instruction Value Locality
The following bars indicated the % of cases where the output of an operation was the same as the previous time the operation was executed (i.e. history depth of 1). The dashed line shows the % of instructions seen for each catagory.
We used a simple calculation to model the speedup for an architecture implementing our Value Prediction Unit. The maximum possible speedup under this would be 2x.