In this MP, you will be creating a program that can recover lost passwords.
For security reasons, passwords are usually never stored in plain text.
Instead, the hashed versions of passwords are stored.
For an example of this, take a look at the
/etc/shadow file on any modern linux machine.
When a user tries to log in, the password they enter is hashed and compared with the stored hash. This way, there’s no need to store your actual password.
Given the output of a good hash function, it is hard or impossible to reconstruct the input using the hashed value. However, if you are willing to burn some CPU time, it is possible to try every possible password (brute force attack) until you find one such that hashes to the target hash.
We will be using
crypt_r() (a reentrant/thread safe version) of the
crypt() function, as our hashing function.
crypt_r() takes three arguments: the string to hash, a salt string, a
Make sure to set the
initialized member of your
struct crypt_data before using it for the first time.
This code outputs the following:
struct crypt_data is necessary for
This is because
crypt() needs to store information between invocations, so calling
crypt() on multiple threads will cause this information to be inaccurate.
crypt_r() gets around this by storing information in a
struct crypt_data instead.
You should check the man page for
crypt_r, do you need to free the string it returns?
Why is there salt in my hash?
The salt argument ‘flavors’ the string so that when you hash the same password twice, you’ll get a different result (if you use a different salt). For example, we can use part of a user’s username to salt their password before hashing. This way, two users with the same password will not have the same hash stored in the database.
For the sake of this assignment, always use
"xx" for the salt argument.
You will be given a list of password hashes which you must recover. Each password is a password partial, meaning that we have these two pieces of information:
- The first few letters in their password
- The total length of the password
All the passwords only contain lowercase letters!
For example, we may say that a password begins with
"hello" and has a total of 8 letters. We know the hashed value associated with this password is
"xxsczBXm6z4zA", so we simply have to try hashing each possible password (staring with the prefix provided) until we find one that hashes to the desired value.
Your input will be a file with one line for each password to recover. Each line will contain:
- Username (1-8 characters)
- Password (13 characters)
- Known part of password (plus periods representing unknown characters) (1-8 characters, contains 0-8 lowercase letters followed by 0-8 periods)
These three fields are separated by a single space. Don’t worry about duplicate usernames, duplicate password hashes, or duplicate prefixes. Each line is an independent task.
We will not provide input not in this format.
Version 1: Thread Pool
We will not grade any output which is not the result of a call to a function in
You will not be graded on a single threaded implementation, but it is always a good idea to write a single threaded version of your code before trying to parallelize!
Use multiple threads to speed up the password processing.
The main thread will start up a pool of worker threads, then read lines of input from standard input.
Each line will be converted to a task which is added to a task queue.
The task queue is provided in
It is very similar to the synchronized queue you’ve implemented in lab.
The worker threads will pull one task from the task queue, then process the task.
When a worker thread starts processing a task, it will print the username of the task (use
When a worker thread finishes a task, use
format.h to print the cracked password, along with the index of the thread (starting with index 1), and the amount of CPU time spent working on the password (use
When your main thread finishes reading in lines from the input, we can’t shut down immediately. The worker threads may still cracking some passwords. You need to decide how to cleanly shut all the threads down when there are no more passwords to crack. Every thread must join with the main thread!
After all the worker threads have exited, the main thread will print (this is provided):
- Number of successful and unsuccessful password cracks
- Wall clock time since the program was started (via
- CPU time used (a sum of the CPU time used in all threads).
- Proportion of CPU time to wall clock time.
By default, the provided code creates 4 worker threads. If a command line argument is supplied to the program, it will use that as the number of worker threads rather than the default.
The times and order may vary slightly.
Remember to use appropriate synchronization, and make sure to use
If you create a new thread for each task (instead of keeping the threads in the thread pool running), you will loose points! (and your implementation will be very slow)
Version 2: Parallelize each task
We will not grade any output which is not the result of a call to a function in
Version 1 works great when there is a long list of passwords that need cracking in parallel, but it’s no faster than a single threaded version when there’s one really hard password that needs cracking. For version 2, you’ll still have a pool of threads, but rather than assigning one thread to each password task, all the threads will work in parallel on each password task.
Distribute the work by splitting the search space into equal-sized chunks, one for each worker thread. For example, if there are 3 unknown characters, then there are 26^3 = 17576 possible passwords that need to be tested. With 4 worker threads, you would split the work up like this:
- Thread 1: 0..4393 (aaa..gmz)
- Thread 2: 4394..8787 (gna..mzz)
- Thread 3: 8788..13181 (naa..tmz)
- Thread 4: 13182..17575 (tna..zzz)
When the number of threads doesn’t divide the search space evenly, it’s easy to get off-by-one errors due to integer rounding.
We’ve provided functions
setStringPosition() to help you with this.
With all the threads working on the same task, you may want to restructure your thread synchronization a little. Rather than a queue, you may wish to use a barrier.
Startup Task 0.............................. Task 1.............................. main thread: read task | idle | print results, read next | idle | print results, read next worker threads: idle | computing | idle | computing | idle ↑ barrier
Like with version 1, you may not create new threads for each task. The threads you create at the beginning of the program must be the same threads that compute the last task.
When the main thread reads a task, it should print
When a thread starts processing a task, it should print its index and starting position.
As usual, make sure to use
When a worker thread finds the correct password, it should tell all the other threads to stop working on the task. You can implement this with a simple flag variable that each thread checks on each iteration. Since all the threads are reading this variable and any thread may write to it, you’ll need to properly synchronize access to it.
When the worker threads finish a task, each thread will print the number of passwords it tried and a word describing how its run finished:
- (found) - this thread found the password
- (cancelled) - stopped early because another thread found the password
- (end) - finished with no password found. Note: this can happen if the password was found, but this thread finished its chunk before another thread found the password.
After all worker threads finish each task, the main thread will print the password (if found), the total number of hashes, the wall clock and CPU time spent on that task, and the ratio of CPU time to wall clock time.
Note that we have not provided any of the timing print statements in
Building and Running
As usual, we have provided a Makefile which can build a
release and a
debug version of your code.
make will compile
cracker2 in release mode, as well as a tool called
create_examples (more on this in the next section).
make debug will compile
cracker2 in debug mode, and will also compile
We have also included the target
make tsan, which compiles your code with Thread Sanitizer (run
ThreadSantizer is a race condition detection tool. See this page for more information.
We will be using ThreadSanitizer to grade your code! If the autograder detects a data race, you won’t automatically get 0 points, but a few points will be deducted.
Thread status hook
We’ve provided a simple tool to help you when debugging your program. See
thread_status.c. We’ve install
threadStatusPrint() as a handler for
It will print a brief summary of what each thread is currently doing any time you hit ctrl-c.
To use it:
- Call threadStatusSet() to describe what the thread is currently doing. The argument to
threadStatusSet()should be a string constant. For example:
threadStatusPrint() is called, it doesn’t print the exact line number that each thread is at. It just prints the line number of the most recent call to
threadStatusSet(). So, for more precise reporting, add more calls to
threadStatusSet() to your code.
thread_status.h contains macros that will redefine calls to common thread synchronization functions so that when a thread is blocking on one of them, its status will represent that (like the “semaphore wait” on line 219 in the example above).
Note: Since Thread Status is hooked to Ctrl-C, you might need to use Ctrl-D (EOF) or Ctrl-\ (SIGQUIT) to shutdown a running password cracker
You’re not required to use the thread status tool as part of the assignment, we just thought it might make your debugging easier.
We’ve also provided a small program to create example input files, to help you with your testing. To build the
create_examples program, run
To use the program, write it’s output to a file, then use the file as input to a cracker program.
To see what the cracked passwords should be, use the
-soln flag when running
create_examples (see the usage documentation given when running the program with no arguments).
CPU time and so called “wall clock” time are not always the same thing. CPU time is defined as “the amount of time your program spends running on a CPU,” and wall clock time is quite literally, the amount of time that would pass on a wall clock (the kind of clock on a wall) between the time a program starts and a program finishes running. These numbers are often not the same!
If your program makes a large number of blocking system calls, it may take 10 seconds to run, but only actually consume 5 seconds of CPU time. In this case, the kernel spent time reading from a file, or writing packets to the network, while your program sat idle.
CPU time can also be much larger than wall clock time. If a program runs in multiple threads, it may use 40 seconds of CPU time, but only take 10 seconds of wall clock time (4 threads, each ran for 10 seconds).
To demonstrate these differences, we’ve provided a program in
tools/timing.c which shows an example of both kinds cases.
To compile this program, run
make timing, then run
./timing. You should see output like this: