methodology.tex

\section{Methodology}
\label{sec:methodology}
% \subsection{Description}
% The methodology section explains your experimental setup.
% It should state every bit of information that is necessary in
% order to be able to reproduce your results. It further covers
% your actions taken in order to ensure that you are really
% measuring what you think you are measuring as well as your
% actions that were taken in order to validate your results. The
% goals are to set up realistic experiments and be absolutely
% convinced that you took care of all side-effects as well as to
% state enough information such that your experiments are fully
% reproducible.

\subsection{Simulated Architecture}

We use the M5 computer architecture simulator~\cite{bib:m5} to
evaluate different prefetching techniques.  The simulated architecture
is that of the Alpha 21264\cite{bib:alpha-21264}, a superscalar,
out-of-order CPU.  The simulator is configured with a two level cache
system; level 1 is a split cache with 32 kB for instructions and 64 kB
for data, level 2 a combined cache of 1 MB.  Access to main memory
takes $30$ns over a $64$ bit memory bus clocked at $400$ MHz.

We implement a prefetcher as a set of C++ functions specified by the
memory interface of the simulator~\cite[Section 3.2]{bib:doc}.
\texttt{prefetch\_init()}, which is called when the simulation starts,
allows us to initialize data structures.  \texttt{prefetch\_access()}
is called every time the L2 cache is accessed.  Here, we decide if and
what we should prefetch.

The prefetcher is compiled into the simulator before running
benchmarks to evaluate its performance.

\subsection{Benchmarking}

The SPEC CPU2000\cite{bib:cpu2000} is a suite of benchmark programs
designed to test hardware performance.  To evaluate prefetcher
performance, we use a subset of the programs available in the suite.
They are as follows; \texttt{bzip2}, a compression algorithm,
\texttt{twolf}, a place-and-route package used in microchip design,
\texttt{wupwise}, a solver for a set of equations in the field of
quantum chromodynamics, \texttt{swim} a weather prediction program,
\texttt{applu} and \texttt{ammp}, differential equation solvers,
\texttt{galgel}, a solver for a particular problem in fluid dynamics,
\texttt{art}, a neural network, and \texttt{apsi}, a program used to
predict spread of air pollution.  These programs have different memory
access patterns, helping us to expose flaws in prefetching schemes.
The size of each history entry varies between SDP, RPT and GHB. For
each of these prefetchers, we set the maximum number of entries so
that it does not exceed the $8$kB memory usage limit. SDP 508 stores
entries, RPT 474, DC and hybrid 700.

\subsection{Performance measures}
Several different characteristics are used to measure the performance
of the prefetchers. The most important one is the speedup of a
prefetcher compared to running without a prefetcher:

$speedup = \frac{execution time_{without prefetcher}}{execution time_{with prefetcher}}$.

In additon, accuracy and coverage are useful measures. Accuracy measures
how often the prefetcher's prediction is correct:

$accuracy = \frac{good prefetches}{total prefetches}$.

Even if the prefetches were needed it is not guaranteed that they
were used. They may have been issued too late.

Coverage measures how many of the misses were avoided by the
prefetcher:

$coverage = \frac{good prefetches}{cache misses without prefetching}$.

\cite{bib:doc}