Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 1: Gabriel Naghi #18

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
8ffe708
brute force kernel and helper implementation
Sep 5, 2016
6d5e5f2
step simulation impl
Sep 5, 2016
b274078
update for Moore100 computers
Sep 5, 2016
1f2907c
= != ==
gabenaghi Sep 6, 2016
941f9d7
remove incorrect averaging, add relative COM velocity calc. Looks pre…
Sep 7, 2016
312b489
fixed rule3 and update README
Sep 7, 2016
48872be
initial part 2.1 stuff
Sep 7, 2016
834ab74
kernComputeIndices impl
Sep 8, 2016
5077f67
kernIdentifyCellStartEnd impl
Sep 8, 2016
fbf50cf
started writing getNeighbors. Yet i know its crap
Sep 8, 2016
cb23e69
this commit actually has getNeighbors;
gabenaghi Sep 8, 2016
745582c
inital work on getNeighbors as described by @krupkad
Sep 8, 2016
4b82963
does getNeighbors imple, some updateScattered impl
Sep 8, 2016
7f93c65
add rules calculation to updateScattered
Sep 8, 2016
970c8a9
finish updateScattered impl
Sep 8, 2016
9559bbf
finished uniformGrid impl. does not build
Sep 8, 2016
2690585
fix build errors
Sep 9, 2016
823bea9
Revert "fixed rule3 and update README"
Sep 9, 2016
b4e890d
whoops. Start debugging 2.1 functionality
Sep 9, 2016
1f746fb
lots of int/float casting confusing. build status unknown
Sep 9, 2016
c0e138d
initial implementation work for coherent search
Sep 9, 2016
50ee99f
a couple fixes. More dbug
Sep 11, 2016
ee95cc1
a couple minor fixes, more debug. Boids seem to be gravitating to cer…
Sep 11, 2016
23c925d
fix wrong boid index for rule2
Sep 11, 2016
85c4174
add second pos buffer
Sep 12, 2016
f80c0b3
initial coherent implementation. Doesnt work properly
Sep 12, 2016
70183a1
implementation complete. Now need to clean up, readme, and profile
Sep 12, 2016
c2b4c50
readme edits
Sep 12, 2016
7c0df7a
some profiling features
Sep 12, 2016
da98383
needs work
gabenaghi Sep 13, 2016
6923df4
boids everywhere
gabenaghi Sep 13, 2016
ad8c79d
Update README.md
gabenaghi Sep 13, 2016
2c4040e
add simulation image
gabenaghi Sep 13, 2016
1e5ff00
Update README.md
gabenaghi Sep 13, 2016
2bc2bc4
Merge branch 'master' of https://github.com/gabenaghi/Project1-CUDA-F…
gabenaghi Sep 13, 2016
e82f6ba
Update README.md
gabenaghi Sep 13, 2016
c4de626
performace analysis
Sep 13, 2016
f4b9963
Merge branch 'master' of https://github.com/gabenaghi/Project1-CUDA-F…
gabenaghi Sep 13, 2016
c696eff
README.md: analysis questions
Sep 13, 2016
a007436
add blocksize images
gabenaghi Sep 13, 2016
ea2c2b0
block size analysis and bugs
Sep 13, 2016
6ed7c83
add blocksize images
gabenaghi Sep 13, 2016
cf4e457
Merge branch 'master' of https://github.com/gabenaghi/Project1-CUDA-F…
gabenaghi Sep 13, 2016
2739d49
update readme with scattered threadperblock images
Sep 14, 2016
588e6e1
Update README.md
gabenaghi Sep 14, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 183 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,183 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)

### (TODO: Your README)

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
University of Pennsylvania,
[CIS 565: GPU Programming and Architecture]
(http://www.seas.upenn.edu/~cis565/)

Implemented by [Gabriel Naghi]
(https://www.linkedin.com/in/gabriel-naghi-78ab4738) on
Windows 7, Xeon E5-1630 @ 3.70GHz 32GB, GeForce GTX 1070 4095MB
(MOR103-56 in SIG Lab)

Project 1 - Flocking
=====================

![](images/simulation.png)

Over the course of this project, we implemented a flocking
simulation. It is inteded to mimic roughly the behavior of
groups of fish or birds- known throughout the code base as Boids.

There are 3 components to the flocking algorithm:
1. Boids gravitate toward the local center of gravity within a radius r1.
2. Boids maintain a minimum disance r2 from their neighbors.
3. Boids attmpt to match the velocity of their neighbors within a radius r3.

We implemented three different methods of calculating the effects
of these rules. The first, the naive implementation, checks,
for each boid, every other boid and applies each of the rules
if they are within the area of effect. The second implementation
utilized a uniform grid which sorted the boid indices by the
"sector" of the scene they occupied, and only checked the relevant
adjacent cells for boids. The final implementation also udes a
uniform grid, but removed one layer of indirection by resorting
the data itself rather than saving a pointer to its original
location, maximizing data coherence.

Perfomance Analysis
----------------------
My performance analysis was not done in an efficient manner. If
I had to do this again, I would alter the program to take in
command line args for the parameters (N_FOR_VIS and blockSize)
and print time elapsed between events. I would then write a script
to iterate though my test cases.

But alas, I did no do that and instead relied on the nsight
performace analysis tools to take time readings. I didn't have
a chance to sum up all the results, but the results are
desplayed below.

Essentially, what we are trying to optimize here is the time it
takes to prepare the new velocities for the updatePos kernel,
which is standard accross implementaions.
This is the time interval I am trying to show in the results below.

The metrics below clearly indicate that performace is inversely proportional to the number of boids. This is becuase as the number of boids rises, so does the population density. As a result, each boid will have that many more neighbors for which to calculate the three rules. Moreover, since each boid needs to calculate the effect of every other boid, the impact of increased boids is exponential.

Interestingly enough, increasing the number of threads per block seems to
have improved the performace of the brute force algorithm in the short term, (best time @ 256)
while generally negatively impacted performance for both uniform grid algorithms (best time @ 128).
This might have occurred becuase whereas the brute force implementation's performace relied heavily on computation,
since it was doing computation for each boid on every other boid, the grid implementations
were bottlenecked by memory. Perhaps increasing the number of threads per block allows for more simultaneous computation but also causes more computation for memory bandwidth. This would explain the performace impact well.


Implementing the coherent uniform grid definitely resulted in performace
increase. This is the result we expected, since it cuts out a memory
access and instead uses a uniform addressing scheme. I found this a bit suprising, since we are still required to do a memory accces, albeit in
the form of data relocation. Perhaps it has to do with a not needing to
flush a data set out of cache.

## Implementation vs Boid Count

###Naive Implementation
Fortunately, only one kernel call occurs between position updates
in the naive implementation.

|# Boids| Time Elapsed |
|-------|--------------|
| 500 | 1.2 ms |
| 5000 | 11.2 ms |
| 50000 | crashed CUDA |

###Uniform Grid Implementation

500 Boids

![](images/uniform500.PNG)

5,000 Boids

![](images/uniform5_000.PNG)

50,000 Boids

![](images/uniform50_000.PNG)

500,000 Boids

![](images/uniform500_000.PNG)

5,000,000 Boids

![](images/uniform5_000_000.PNG)

###Coherent Grid Implementation

500 Boids
coherent
![](images/coherent500.PNG)

5,000 Boids

![](images/coherent5_000.PNG)

50,000 Boids

![](images/coherent50_000.PNG)

500,000 Boids

![](images/coherent500_000.PNG)

5,000,000 Boids

![](images/coherent5_000_000.PNG)

## Block Sizes

### Naive Implementation

|Threads Per Block| Time Elapsed |
|-----------------|--------------|
| 128 | 11.2 ms |
| 256 | 9.6 ms |
| 512 | 12.2 ms |
| 1024 | 12.2 ms |


### Scattered Grid Implementation

128 Threads per Block (same as scattered/50,000 above)

![](images/uniform50_000.PNG)

256 Threads per Block

![](images/scattteredblocksize256.PNG)

512 Threads per Block

![](images/scattteredblocksize512.PNG)


1024 Threads per Block - for reasons unknown, attmpting to lanch the
program with blocksize of 1024 crashed the program at the point where it
would have done the grid search.

### Coherent Grid Implementation

128 Threads per Block (same as coherent/50,000 above)

![](images/coherent50_000.PNG)

256 Threads per Block

![](images/blocksize256.PNG)

512 Threads per Block

![](images/blocksize512.PNG)


1024 Threads per Block - Crashed, as in scattered implementation.

# Big Bugs

![](images/boids_meme.jpg)

## Black Hole Boids

My coherent grid search had a bug where, instead of moving away from neighbors as per rule 2, it would gravitate toward them. This resulted in some boids clumping and as they moved around, would suck in any boids that came within their rule2Distance event Horizon.

## Boid outta Hell

Due to a faulty type delaration, boids which were being set with their own vlaues were getting high nigative values, most likely resulting from implicit float to int cast. This resulted in red boids zipping around on the top of the scene like embers above a fire.
Binary file added images/blocksize256.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/blocksize512.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/boids_meme.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/coherent500.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/coherent500_000.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/coherent50_000.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/coherent5_000.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/coherent5_000_000.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/scattteredblocksize256.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/scattteredblocksize512.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/simulation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/uniform500.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/uniform500_000.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/uniform50_000.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/uniform5_000.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/uniform5_000_000.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ set(SOURCE_FILES

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_30
)
Loading