background.tex

\section{Background}
\label{sec:background}
%The background section covers knowledge that is necessary
%for understanding the proposed solution, but you cannot expect
%the audience of the paper to be familiar with. This is commonly
%the case when you propose solutions that incorporate
%knowledge from different research areas. An example would
%be an implementation of a compression scheme between
%memory and processor in order to decrease memory bandwidth
%utilization. It is very likely that you would need to explain
%the compression scheme, since most computer architecture
%researchers would not be familiar with it.

% == This stuff can probably be assumed known ==
%The speed of processors have been improving much faster than the
%speed of memory and thus memory is not able to keep up with the
%processors. Processors have to wait for too long if needed data
%must be fetched from memory. Caches that are smaller, but faster
%than main memory are thus used to store data which is in use by
%the processor. As the cache needs to be a lot faster than main
%memory, it has to be small to not be too expensive. Therefore it
%is limited what can be stored in the cache

In a cache hierarchy with no prefetcher, data is fetched from a lower level every time a cache miss occurs.
By prefetching data, ensuring that it is present in cache at the right time, we can reduce the amount of misses and thereby improve performance.
There are several prefetching strategies available.

\subsubsection{Sequential prefetching}

The simplest form of prefetching is to just prefetch the next block
as well when fetching a block the processor needs. This can
sometimes be improved by changing the prefetch distance. With a
different prefetch distance, the second or a later block is
prefetched instead of the next block.
Sometimes the processor needs the next block before it is in
the cache even if it was prefetched. In these cases prefetching a
later block can improve performance.

In addition, more than one block can be prefetched at a time.
We refer to this as increasing the prefetching degree. If the
running program traverses memory in a linear fashion, this can
improve performance. If too many blocks are prefetched, it can cause cache pollution, where blocks that
are still in use can be evicted from the cache, which can degrade performance.

\subsubsection{Stride Directed Prefetching}

With stride directed prefetching (SDP), the memory address requested by an instruction
is stored together with the program counter address of the instruction
and a valid bit is set. When the instruction is encountered again, the stride
between the address accessed the last time, and the address currently being accessed
is calculated. Then the stride is added to the address currently
being accessed and the block containing the current address is prefetched.

\subsubsection{Reference Prediction Tables}

The reference prediction table (RPT) technique is similar to SDP, but in
addition stores the stride and a state. The first time an
instruction is encountered, an entry for that address is added to a table
and the entry is set to the initial state. The second time the
instruction is encountered, the stride is calculated and stored as in SDP.
The state changes to training. The next time the instruction is
encountered, the stride is calculated again, and compared to the
currently stored stride. If they are equal, the state changes to
prefetching and the stride added to the current address is
prefetched.

\subsubsection{Global History Buffer}

With a global history buffer (GHB), all accesses by the same
instructions are stored in a linked list. Each entry points to
the previous entry which came from the same instruction. This
history can be used by a variety of different algorithms to
determine what to prefetch.

\subsubsection{Delta Correlation}

Using delta correlation (DC), the access history is traversed and the
deltas between the addresses accessed are stored. Then the delta
history is searched to see if a pair equal to the most recent pair
of deltas exists. If such a pair is found, the next delta, or
several of the next deltas, are added to the current address,
and the result is prefetched.