- Deeper understanding of the barrier as a synchronization structure.
- Being able to recognize parallelizable sections of code.
- Profiling and debugging a parallel program.
You will implement a reusable barrier using only condition variables, mutexes and/or semaphores. Using this barrier we ask that you parallelize a provided serial implementation of a Poisson equation solver. Here’s some interesting, but unnecessary to complete this assignment, background:
You’ve already done most of this
Perhaps you haven’t realized it yet but throughout the last few assignments chances are that you’ve implemented a barrier. You’ve probably written some code along these lines:
This is the essence of a barrier. The question we would like for you to ponder before moving on is this: What happens if we want to go through this critical section more than once? In an iterative parallel application that has to go through this critical section hundreds of times we need to have a way to automatically reset the barrier without introducing any race conditions. The only real guarantee that we would like to give our barrier is that the number of threads it can handle is fixed at initiation, any more trying to use the barrier would result in undefined behavior and any less would result in our code hanging.
Implement your barrier
This is the meat and potatoes of this lab. We want to see how you decide to implement a reusable
barrier without giving you any explicit hints outside of what is in the wiki (wink, wink, nudge,
nudge). Ask the TAs if the direction you are going in is the right one or if you really feel lost.
You can find some stub functions inside of
barrier.c to get you started.
Test your barrier
We provide some silly code inside
barrier_test.c which should give you an idea of whether
or not your barrier is working at all. We strongly encourage you to come up with some more robust
tests. In addition we provide a library with a reference implementation. Simply call
to use our barrier implementation with your test file.
poisson.c you will find the serial implementation of the Poisson Solver. There are a lot
of moving parts as this is a full implementation of a relatively complex application, try to
understand what is happening at at least a high level before diving into your parallelization.
If you read the pdf linked at the top you would know that the way we have chosen to solve the poisson equation (you don’t need to know what poisson equations are) is to use a Jacobi iterative numerical method (you don’t need to know what that is either). This method gets closer to the real solution of the equation every time it goes through an iteration until there is an arbitrarily small difference between iterations (we call this arbitrary number epsilon).
Given an equation in the form provided in
basic_function, let’s say, a sine wave, this program starts
with an empty image (all the indices of the array are initialized to 0). Each iteration of this solver
then populates the
current array with the next guess of the solution and calculates the error between it
previous array. This updating takes place inside the
During this process we want BMP image files of the state of the solution to be created at a fixed interval,
we call this interval the granularity. That is, if we decide we want to see the state every 20 iterations
(i.e. a granularity of 20) we expect to have an image of what our closest guess to the solution is at
iteration 20, 40, 60, … and so on. In addition we want an output of the image when we hit the desired error.
The math required for each index of an iteration in our 2-D array is completely independent of any other result in the array. This means that, if we wanted to, we could have an individual thread calculate the result for each index entirely at the same time. A problem with this attribute is said to be embarrassingly parallel. However, each iteration depends on the previous one. Therefore you need to make sure all the indices of the previous iteration have been calculated before moving on (hint, hint).
To do all this image writing we provide a little BMP library. The only function you absolutely need
to use in it is
write_to_file. We do provide some “lower level” functions inside of
bmp.h in case
you would like to increase the parallelism of your code. This should be unnecessary however and is simply
a challenge to you if you would like to implement it. Just use function
write_to_file in your parallel
implementation at first. Remember, however, you probably don’t want all your threads writing this file and you
need to make sure all the calculations are done before writing it (hint, hint).
There are three pointers to external memory in
poisson_struct, which is defined inside of
poisson.h. The double pointers
previous are the arrays for the calculated value of the current
iteration and the previous iteration respectively. We need to keep both these arrays in memory as each
new iteration depends on the previous one. At the end of the iteration you may notice that we swap the
pointer values. This way we overwrite the solution from two iterations ago which is no longer needed.
The double pointer
target_image is the function that we are trying to satisfy the poisson equation for.
The memory that these three
double ** variables point to are the same for every thread.
The other values are metadata that we use inside of solve_poisson to decide, amongst other things,
what section of
previous a thread should work on.
n is simply the size of the grid the user
requested be solved.
num_threads is a hint as to the number of threads to be created. You are not
forced to create threads if you don’t think it will be beneficial.
granularity is how many
iterations to calculate before printing an image.
Once you look over the code you will notice that the metadata variables are just input from the user
and are put in the
poisson_struct in order for all the functions inside
poisson.c to be as opaque
as possible. Any information from the outside that any part of the poisson equation solver may need
in its serial form is represented inside the struct. We expect you to add more elements to this struct
in order to accommodate the parallel implementation.
iter variable is only really there to get returned to the user, it’s simply the number of
iterations that were run.
Functions you must modify
parallel_poisson is the function we will call when we test your code. We do not care about the
details of the implementation of any other function. As long as the output BMPs are identical to the
reference implementation and you use your own barrier you will get full credit for correctness.
Functions you can modify
Any function definition or declaration inside
bmp.h is fair game to modify. We expect the signatures for the functions inside of
be unchanged. We highly recommend that you modify only the poisson files however. TAs will not
debug nor guide any modification to
Functions we recommend you modify
This function is provided inside of
poisson_test.c as an example of a
mathematical function that can be solved by our program. If you wish to input a different
mathematical function to the solver we recommend you copy and modify
We use this function as the primary interface with the solver. This is where you should spin up your threads and handle the memory creation and cleanup.
Here is where the bulk of the parallelization can be achieved. We expect you to use the helper function to do the actual calculations based off of what thread is currently running this function. Please focus most of your attention on this function.
We provide this function as a convenience to the user. We allocate the “2-D” array of
doubles here. In addition we allocate the memory necessary for a
per thread. If you take a closer look at the implementation of this function you will notice
that we’re actually allocating each of the “2-D” arrays as a single piece of memory. This helps
the compiler and the machine implement optimizations that make your code faster.
Here we clean up all the memory allocated to each the
poisson_struct. Since many resources
are shared between threads
poisson_setup didn’t need to allocate a
for each struct for example. Don’t be alarmed by the way we free things in here it’s only
done this way because of the 1-D to 2-D abstraction we did in setup.
You may notice a lot of
N+2 floating around the code. This is because we want the boundaries of the
uo to always be 0. Therefore we allocate an additional 2 rows and columns in our
arrays. If you examine the manner in which the solution is calculated inside of
it should be obvious why we need to give ourselves this little bit of extra space and to set it to an
arbitrary value (think about going out of bounds in the array).
Compile and Run
will make both your barrier_test and poisson_test binaries.
will use your test files with the reference implementations. Please tell us if you see something that you think is wrong with the references. This is a young lab.
will make debug binaries for all the code you have written.
You can run your tests any which way you would like but we have given you a simple interface in
poisson_test.c. You can run the test this way:
Where size is the size of the grid we would like to solve over, num_threads is the number of threads to be used, granularity is the number of iterations to calculate before writing an image and epsilon is the desired precision of the calculation.
We recommend using a small value for size and large granularity and epsilon values until you are fairly certain that your code works. An epsilon between 0.05 and 0.0005 is adequate. Be very careful when using a small granularity on sizes that will run for a long time. BMPs are uncompressed image files and will use up your filesystem quota very quickly if you are not careful.
Some recommended inputs to use until you feel comfortable about the correctness of your code are:
where they should take an increasing amount of time. When testing with different sizes you will probably notice some input that takes much longer than an input that is not much smaller than it. This is normal. If you recall from 233 this is called a performance cliff and happens in this application when the data no longer fits in a cache level, or in the cache at all.
60% of this lab will be based on correctness, 50% from barrier and 50% from poisson. The following 40% will be based on performance compared to the reference implementation. The 40% will be on a sliding scale and no you do not need to be as fast or faster than the reference to get full credit, we will choose goals the vast majority of the class can achieve.