-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintroduction.tex
55 lines (45 loc) · 4.07 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
\section{Introduction}
\label{sec:introduction}
In 1965 Gordon E. Moore observed that the transistor density on integrated circuits doubles approximately every 2 years,
and predicted that it would continue to do so for many years~\cite{bib:moore}.
Up until recently, this prediction has held true~\cite{bib:moore-slowdown}.
Processor performance has increased alongside transistor density.
However, memory technology has not improved at the same rate as the processor.
As a result there is a gap between processor and memory performance, known as the memory gap.
Increasing processor performance means little if the processor spends a lot of time idle, waiting for data from memory.
Modern computer architectures often include different mechanisms for reducing this time.
One such mechanism is the cache hierarchy.
The idea behind the cache hierarchy is to have frequently accessed data in small, fast memories closer to the processor.
By reducing the total amount of requests that have to go through to main memory, the processor spends less time idle.
Modern architectures often have 3-4 levels of cache between processor cores and main memory.
L1 and L2 are often local to each core, while L3 and L4 are shared.
While cache hierarchies reduce the effect of the memory gap, performance can still be improved.
When a program requests data that is not in cache, we say that a cache miss has occurred.
Every time a miss happens, the latency penalty of accessing main memory is incurred.
To minimize the number of cache misses, and thereby increasing performance, a technique called prefetching is utilized.
By predicting what data is required by the processor, we can avoid a cache miss by fetching it in advance.
In this paper we evaluate various prefetching techniques across a collection of benchmarks.
We also present a prefetcher that combines ideas from several other schemes and
we show how and why this approach is outperformed by simpler schemes.
Our hybrid prefetcher achieves an average speedup of 1.08 while the best prefetcher we evaluate achieves an average speedup of 1.09.
Section~\ref{sec:background} provides a background on various prefetcher strategies.
In Section~\ref{sec:prefetcher} our final prefetcher design is presented.
Subsequently, our methodology and process is reviewed in Section~\ref{sec:methodology}.
Finally, we present and discuss our findings in Section~\ref{sec:results-and-discussion}, provide an overview of some related works in Section~\ref{sec:related-work} and a conclusion in Section~\ref{sec:conclusion}.
%In modern computer architecture, the memory hierarchy plays an important part.
%Without intelligent usage of caches and scheduling of requests to main memory, the processor will spend an undesirable amount of time idle while waiting for data.
%Prefetching aims to reduce cache misses and increase data availability by attempting to predict which data will be needed in the future and fetch it early. \todo{rephrase}
%Designing a prefetcher that performs well for all programs is difficult.
%Depending on the type of program, different memory access patterns arise, making it necessary for the prefetcher to be adaptive.
%In this paper, we present a prefetcher design that measurably increases performance for a wide range of program types.
%Our final prefetcher is an implementation of global history buffer with delta correlation.
%The memory accesses made by the instructions are stored in a linked list.
%When an instruction is encountered again, the linked list is traversed and the deltas between the earlier accesses are examined to see wether a pair of deltas equal to the most recent ones exist.
%f it does, a block is prefetched using the same delta.
%The introduction section introduces the larger research area
%the paper is a part of and illustrates the concrete problem(s) at
%hand the paper tries to solve. It explains the proposed solution
%from a 20.000 feet abstraction level. Furthermore, it states
%the contributions of the paper and briefly highlights its main
%results. It finishes with an outline of the paper, giving a short
%explanation of the contribution/meaning of each section.