MP7: Build Your Own Proxy Server

CS 241


Purpose

For this MP, you will implement a simple HTTP proxy server with cache. When a client connects to the proxy server and requests some web page, available at a different server, the proxy provides that page to the client. If a copy of that page is locally stored at the proxy, the proxy serves that request locally. Otherwise, the proxy connects to the relevant server, fetches the page on behalf of the client and sends the page back to the client. You can find more information about proxy servers here.

Your MP7 directory contains the following files:
proxy.c and proxy.h: You will implement your proxy server here.
aux_socket.c and aux_socket.h: You will implement helper functions for setting up TCP sockets and tearing down the connections.
aux_http.c and aux_http.h: provides you with all the functions necessary to understand HTTP packets for this MP. Do NOT modify this file.
priqueue.c and priqueue.h: provides you with all the functions necessary to build the proxy cache. Do NOT modify this file.

Tasks

Task 1:

Setup your proxy server so that it listens at a specific port for incoming connections from the web browser. This port is specified in the second argument of the command line.

Empty helper functions are provided in aux_socket.c to setup the proxy server socket and accept connections from the client on a particular port. You will need to implement these functions in aux_socket.c and then use these functions to setup your proxy in proxy.c:

int newServerSocket(const char *port)
Creates a new socket file descriptor, binds to the local port (port), and sets up a listening queue.
Returns the server socket on success, otherwise returns -1

int acceptSocket(int serverFd)
Accepts new connections on the server socket serverFd and returns the new client socket file descriptor.
Returns -1 on error

Task 2:

Upon accepting a TCP connection from the browser, the proxy will launch a thread for the handling of that connection. The use of threads will allow your proxy server to process multiple requests at the same time. (This may happen more than you think; if a page has eight images on it, default Firefox settings open two sockets to download the images in parallel.)

Inside the thread, the proxy will read data from and send data to the accepted socket. In the latter sections, we refer to this thread as the 'connection handler'.

Task 3:

In the connection handler for each TCP connection, you need to recv() data from the socket. The packet received will be the HTTP request sent by the web browser for a particular file. In order to recv() the entire packet contents, multiple calls to recv() may have to be made as it is not guaranteed that the packet will be read in one call. In order to know when the end of the packet has been reached, HTTP ends all requests with the '\r\n\r\n' sequence. You may also find select()to be useful to identify if there is any more data available to be read from the socket.

The proxy can serve this HTTP request in two possible ways: serve the page from its own cache, given a recent copy of the HTTP response for the same page exists in its cache, or else serve the page by fetching the page on behalf of the web browser from the corresponding host of the page and then sending it back to the browser. We have provided you helper functions in priqueue.h to build up your proxy cache. The details of the functions in priqueue.h are described in a later section.

Task 3a: Fetching the page from a web server

If the file is not stored in the proxy cache, the proxy server sets up a TCP connection with the host of that file. You can obtain the hostname and the port to connect to by using the following auxillary function getHostFromHTTPRequest() in aux_http.h.

hostAddress* getHostNameFromHTTPRequest(const char *HTTPRequest)
Returns the hostname and the port address of the host of the file requested by HTTPRequest in a hostAddress structure pointer. The function allocates memory for the structure as well as the hostname and port inside the returned hostAddress. It is your responsibility to free the memory for these allocated memory.
Returns -1 on error.

typedef struct _host{
    char *hostname;
    char *port;
} hostAddress;
Once you have the host and port address of the web server of the requested file, you have to connect to that web server. Empty helper functions are provided in aux_socket.h to setup a client socket. You will need to implement this function and use it to setup the connection from your proxy to the web server:
int newClientSocket(const char *host, const char *port)
Creates a new socket file descriptor, and then connects to the web server host and the port.
Returns the client socket on success, otherwise returns -1

The proxy server will now send() the HTTPRequest recv()ed from the web browser to the client socket, and wait for responses back from the web server. The proxy will recv() the HTTPResponse from the client socket and forward the response back to the web browser. You may need multiple recv() and send() calls to receive and forward the entire HTTPResponse from the web server to the web browser. A 0 value returned from recv() indicates that the web server has closed its connection. At this point, the proxy can safely close the client socket to the web server. The proxy should also close its socket to the web browser to indicate to the web browser that the web server has torn down the connection.

Task 3b: Fetching the page from the proxy cache

To enable the proxy to serve requests from its cache, you need to store HTTP response for the HTTP requests in the cache. Since all files are not cacheable, you must use the auxulliary function cacheHTTPResponse() in aux_http.h to decide whether to cache this (HTTPRequest, HTTPResponse) pair or not. You may need to store additional information for each pair to decide which page to remove from the cache when the cache becomes full.

int cacheHTTPResponse(const char *HTTPResponse)
Returns 1 if the HTTPResponse indicates that the file should be cached. Otherwise returns 0.
You can build up your cache using the auxilliary functions in priqueue.h.
void priqueue_init(priqueue_t* Q, int MaxElements, int (*comparer)(const void* qElem1, const void* qElem2), int (*match)(const void* qElem, const void* key))
Initializes the priority queue Q to have a capacity of MaxElements, and to use the function comparer to compare two elements on the queue, and to use the function match to find out whether a particular key exists in the queue.
The comparer function returns 0 if qElem1 and qElem2 are same according to some criteria, returns greater than 0 (smaller than 0) if qElem1 is greater than (smaller than) qElem2.
The match function returns 0 if the given key (key) matches with the key in the queue element (qElem). Returns -1 otherwise.

void priqueue_destroy(priqueue_t* Q)
Frees the queue. You must free each element inside the queue yourself.

int priqueue_insert(priqueue_t* Q, void* X)
Inserts the element X in the queue Q. Returns 0 on success, and -1 if the queue is full.

void* priqueue_element_exists(priqueue_t* Q, void* key)
Returns the element in the queue that matches key according to the match function. The function does NOT remove the element from the queue. Returns NULL if none of the elements matches with key.

void* priqueue_element_remove(priqueue_t* Q, void* key)
Removes and returns the element inside the queue with the key. Returns NULL if no such key exists.

void* priqueue_delete_min(priqueue_t* Q)
Removes and returns the minimum element according to the comparer function from the queue. Returns NULL if the queue is empty.

After getting the request from the browser, if the proxy finds an entry for that request inside the cache, then it must serve the response from the cache. You have to write the match function to determine whether the new HTTP request matches with a particular (HTTP request, HTTP response) entry in the cache.

For this MP, you may assume that all responses stored inside the cache are recent and has not expired. If the proxy does not find the page, then proxy must get it from the web server and store the response in the cache if the page is cacheable.

When inserting a new element into the cache, if the proxy finds the cache to be full, then it must remove the oldest element from the cache. Note that a cache element is considered to be recent if it is inserted recently or have been fetched recently by the proxy server. You must write your comparer function to reflect this cache replacement policy.

Task 4:

Your proxy server will now send back the response to the web browser. You may have to call send() multiple times to send the entire response packet back to the browser as it is not guaranteed that the packet will be sent in one call.

Task 5:

Once the proxy server has successfully sent the entire HTTP response back to the web browser, it can safely close the connection to the web server and the web browser. You are not required to implement persistent HTTP connections. You should implement the closeSocket() function in the aux_socket.h file.

void closeSocket(int sockFd)
Closes the socket (sockFd)

Once you have successfully completed the above tasks, your simple proxy server is almost complete. Your proxy server will need to exit on a SIGINT signal (you can deliver a SIGINT signal to your program by hitting CTRL-C in the console it is running on). Before exiting, your program should finish all requests, perform any necessary cleanup, and print out the total number of HTTP requests served using the following line of code:

printf("%10d total HTTP requests served.\n",num_requests)

Note: It might be necessary to use some form of synchronization to provide correctness of this statistic.

Compiling And Running

Running the proxy server:

To run your program, run the following commands:

%> make
%> ./proxy <port#> ...where <port#> is a port number.

Note:
1. When choosing your <port#>, choose a number above 1000 and below 30000.
2. Since ports are shared globally, your bind() call will fail if someone else is already using your port. If this happens, wait a few seconds and then try again. If bind still fails, choose a new port.

Running the client:

To Test your proxy server, you need to setup the web browser such that every request goes through your proxy server. To do this, change the proxy setting for your web browser and use the name of the host and the specified port number as the proxy for the browser. To  determine the hostname of the machine where you are running your proxy server, run the following commad:

%> hostname
You can find intructions on how to change the proxy settings in Firefox here and the instructions for Chrome in here. Remember to set the proxy setting back to its original setting once you're done testing your MP. Otherwise, the web browser will not be able to fetch pages from the Internet.

Once you have set up the proxy connection, your web browser will redirect all its requests through your proxy.


To start with try simple pages like www.google.com. If you see a webpage, your proxy server successfully served an HTTP request! We will not be grading anything on the command line output, so feel free to use stdout and stderr for any debugging or status messages you'd like.

We expect that your browser will be able to successfully fetch pages with multiple images for example (http://news.google.com/nwshp?hl=en&tab=wn). In this MP, we will only evaluate on GET HTTP requests, we will not evaluate on CONNECT requests.

Lab Report

In your lab report, please include answers to the following points:

Submission, Grading And Other Details

Please fully read cs241.html for more details on grading, submission, and other topics that are shared between all MPs in CS 241.