COMP30023 -无代写|学霸联盟

COMP30023 -无代写

时间：2025-05-06

COMP30023 Project 2
Web proxy
Release date: 1 May 2025
Due date: No later than 11:59pm Monday 26 May, 2025 AEST
Weight: 15% of the final mark
1 Project Overview
The aim of this project is to familiarize you with socket programming. Your task is to write a caching web proxy
for HTTP/1.1 (without persistent connections).
Your code must be written in C or Rust. Submissions that do not compile and run on a cloud VM may receive
zero marks.
A web proxy is a process that runs on an internet host and receives web requests for URLs hosted on other hosts.
It either serves these requests from a cache, or forwards the requests to the actual hosts.
There are many reasons for using proxies.
One reason is to cache content. Web browsers cache content locally, but if multiple computers try to download
the same content, they cannot get it from another browser’s cache. If they all use the same nearby proxy, then the
proxy can download the content once, and individual computers can download copies from there.
Caching is difficult with HTTPS, which often simply rely on proxies to forward encrypted data without be-
ing able to look at the headers. (For the reasons for forcing HTTPS, see https://www.troyhunt.com/
heres-why-your-static-website-needs-https and the accompanying video https://youtu.
be/gZ1mM6OtXIc.)
A web server accessible from your VM will be provided. If you want to test from your personal machine, you can
use the following public sites that do not force an upgrade to HTTPS:
• http://www.washington.edu
• http://yimg.com
• http://icio.us
• http://rs6.net
• http://www.faqs.org/faqs
• http://icanhazip.com
• http://example.com
• http://detectportal.firefox.com
• http://info.cern.ch
• http://anzac.unimelb.edu.au
Another reason for proxying is security. It is common for private networks to use “private” IP addresses, and so
hosts cannot make TCP connections to hosts on the global internet. However, they can be configured to download
web resources by using a HTTP proxy. The proxy has two IP addresses: one in the “private” address space and
another in the global address space, which can reach the web servers.
2 Project Details
Your task is to design and code a simple caching web proxy, capable of proxying GET requests.
You should create an executable named htproxy, with command line syntax:
./htproxy -p listen-port [-c]
If the optional -c is on the command line, then caching should be performed (stages 2–4 and stretch goal). The
order of arguments is fixed. Argument listen-port is a TCP port number. You may assume that the input is valid.
1
2.1 Stage 1: Simple proxy
The first stage is simply to proxy all requests, without caching.
This stage will create a listening TCP socket on the port specified by -p on the command line, listening to all
interfaces (including IPv6 interfaces), queueing up to backlog=10 incoming requests in listen(3). For any
request it receives on that socket, it should identify the host (the “origin server”) from the Host header, create a
TCP connection to that on port 80 and send the request (unchanged) to that host. It will then read the complete
response and send it back to the host that sent the request.
The request is terminated by the first blank line. (That is, requests do not have a body.)
All header names are case-insensitive, and you can assume they are followed immediately by a single colon and a
single space, and that the rest of the line is the value of that header, which you can assume is case-sensitive.
The length in bytes of the body of the response is specified by the Content-Length header. Note that there is
no limit on the maximum size of the response; you should not need to read the entire response into memory before
starting to send it. However, you may choose to truncate responses longer than 100 kiB, with a penalty of only 0.5
marks. Long responses should not cause your code to abort.
The program should log the line:
Accepted
to stdout once a connection socket is created. If socket creation fails (client or server), the server may discard
this request and return to the loop waiting for a new request.
The program should log the last line (only) of the header in the format:
Request tail last line
to stdout once the request has been read. Omit the trailing \r\n from last line before printing.
Whenever a request is forwarded to the origin server, the program should log a line:
GETting host request-URI
to stdout, where the request-URI is the second value specified on the first line of the request (request-
line), separated by a single space from the GET and by a single space from the HTTP/1.1. There should be a
single space between host and request-URI in the log line.
On receiving the response, the program should log the Content-Length header value:
Response body length content-length
to stdout.
All lines logged to stdout must be terminated by a LF character. Flush stdout after each write such as by
using fflush(3), or ensure that stdout is line buffered.
Remember to check the server for both IPv4 and IPV6 addresses.
After serving one request, the proxy should close the connection socket (that is, not support persistent connections)
but keep listening for the next request. To kill it, use CTRL-C (SIGINT).
You may notice that a port and interface which has been bound to a socket sometimes cannot be reused until after
a timeout. To make your testing and our marking easier, please override this behaviour by placing the following
lines before the bind() call:
int enable = 1;
if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(int)) < 0) {
perror("setsockopt");
exit(1);
}
You can assume that:
2
1. The initial request and the server’s response conform to the RFCs.
2. There will be no requests to ports other than 80.
3. Each header entry is a single line (no line folding).
4. All characters in header names and values is printable ASCII.
5. There will be exactly one occurrence of the Host header in the request, with no commas in the field value.
6. There will be exactly one occurrence of the Content-Length header in the response, with a single
decimal non-negative integer field value.
You do not need a timeout; if the server doesn’t respond, your program is allowed to hang. If you implement a
timeout, it should not be less than 30 seconds.
2.2 Stage 2: Naive caching
The second stage is to keep a copy of all requests and their responses.
Allocate a cache with 10 entries, each 100 kiB in size.
This stage is not required to distinguish between cacheable and non-cacheable requests. Its behaviour on non-
cacheable requests is undefined. Until you attempt stage 3, it can simply cache all requests.
The cache is a key-value store, with the key being the entire request. For this project, two requests match if they
are the exact same byte string. (In practice, fields can be reordered, and some fields are case-insensitive.)
For every request received, if the request is less than 2000 bytes, look in the cache to see if you have received this
request before. If you have, then reply with the response that you received last time.
If you do not have the entry in the cache, evict the least recently used (LRU) element of the cache if there is no
empty slot. Then fetch the response from the actual host, and if the request is less than 2000 bytes and the response
is 100 kiB or less, place it in the cache. The eviction should occur even if the request is not cached.
Whenever a response is sent from the cache, the program should log a line:
Serving host request-URI from cache
to stdout, instead of logging the GET command.
Whenever an entry is evicted, the program should log a line:
Evicting host request-URI from cache
to stdout.
2.3 Stage 3: Valid caching
Not all responses can be cached. For this stage, only responses to commands that can be cached should be cached.
Responses that contain Cache-Control headers should be respected. If a Cache-Control header contains
private, no-store, no-cache, max-age=0, must-revalidate or proxy-revalidate then do
not cache that response.
The ABNF for the Cache-Control header (according to RFC 9111 and RFC 9110) is as follows:
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "’" / "*"
/ "+" / "-" / "." / "^" / "_" / "‘" / "|" / "~"
/ DIGIT / ALPHA
OWS = *( SP / HTAB )
Cache-Control = [ cache-directive *( OWS "," OWS cache-directive ) ]
cache-directive = token [ "=" ( token / quoted-string ) ]
3
RFC 9111 5.2: Cache directives are identified by a token, to be compared case-insensitively.
Whenever an item is fetched but not put in the cache due to the Cache-Control header, the program should
log a line:
Not caching host request-URI
to stdout, after logging the GET command and response body length.
2.4 Stage 4: Expiration
The Cache-Control header can specify a max-age=xxx field. This specifies how many seconds the response
is valid for. You can assume xxx is a positive integer that fits in uint32. If it is cached, then the cache entry
should become “stale” after that time. You may assume that there will be at most one Cache-Control header,
with at most one max-age directive.
For this stage, the code should not respond with stale cache entries. Instead, it should fetch a fresh copy. If it fits
the criteria for caching then the fresh copy should be cached. Otherwise, the entry should be evicted, and logged
as such after the GET command and response body length, and if applicable, after Not caching from Stage 3.
Whenever a stale cache entry matches, the program should log a line:
Stale entry for host request-URI
to stdout, before logging the GET command.
2.5 Stretch goal: Checking for updates
If the cache entry is stale, it is still not always necessary to download the document again. Instead, it is possible
for the proxy to insert the If-Modified-Since header to request that the page be downloaded only if it is
newer than the cached version. Note that this is no longer simply forwarding the request as-is.
This allows the origin server to reply with status code 304 (Not Modified) if the cached entry is still valid. In this
case, return the contents of the cache. The max-age argument will not be updated, and so the next request to this
URL should again query with If-Modified-Since.
This information can also be obtained using a HEAD request, although that is less efficient as it requires two
HTTP requests.
For simplicity, use the Date: header to determine the argument for If-Modified-Since. (In practice, the
Last-Modified header is preferable, but neither is a required field, and so a real proxy would have multiple
if-thens to determine a time.)
Whenever a stale cache entry is served this way, without being downloaded again, the program should log a line:
Entry for host request-URI unmodified
to stdout, after logging the presence of the stale entry and after logging Serving...from cache.
The key value for the cache entry should be the header received from the client, without the If-Modified-
Since line.
The marks allocated to this stretch goal are deliberately not worth effort it will take. It should only be attempted
by those realistically hoping to get 15/15. If the project seems too big, then do not attempt this stretch goal.
3 Development and Testing
Your code will be marked using curl, and so you should use curl for testing. In addition, you should test that
it works in the following contexts:
1. Access the provided server (or one of the HTTP websites listed above) using
4
telnet [hostname] 80
Note that pressing in telnet sends the backspace character, rather than erasing the last
character you typed. It may be better to cut-and-paste the query from a text document. You should try
pasting part of a line at a time. (What bugs will this highlight?)
2. Access the site using a browser. You can use lynx on your VM. If you are testing on your local machine,
it may help to install a second browser (for example, if you use Chrome, install Firefox). That way you can
keep your main browser working while your second browser has its proxy setting set to use your proxy.
4 Marking Criteria
The marks are broken down as follows
Task # and description Marks
1. Correctly proxy requests 4
2. Naive caching 3
3. Valid caching 2
4. Expiration 1
5. Safety 1
6. Build quality 1
7. Quality of software practices 2
Stretch goal: Checking for updates 1
Code that does not compile and run on cloud VM will usually be awarded zero marks for parts 1–5. Use the
GitHub CI infrastructure to ensure your submission is valid. Your submission will be tested and marked with the
following criteria:
Task 1. Correctly proxy requests Your code correctly
• opens a socket (logs Accepted at the right time) (0.5 marks)
• receives a request, which may come as multiple packets (0.5 marks)
• logs the Content-Length of the response (1 marks)
• sends the (correct) response to the client (1 mark)
• continues to process requests after the first is served (0.5 marks)
• processes replies longer than 100 kiB (0.5 marks)
Task 2. Naive caching Your code correctly
• serves the second and later requests from the cache. (This will only be tested for pages that should be
cached; valid stage 3 code will pass.) (2 marks)
• serves pages too large for the cache (0.5 marks)
• evicts entries correctly (0.5 marks)
Task 3. Valid caching Your code correctly
• doesn’t cache requests whose headers require them not to be cached, with simple Cache-Control header
values (1 mark)
• doesn’t cache requests whose headers require them not to be cached, with complex Cache-Control
header values (1 mark)
(To do well on Task 2, the code must cache responses with no Cache-Control header; don’t break that by
attempting Task 3.)
5
Task 4. Expiration Your code correctly
• re-loads stale entries after the expiration time (0.5 marks).
• expires pages, even if the Cache-Control header is complex (0.2 marks)
• handles different expiration times for different cache entries (0.3 marks)
(To do well on Task 2, the code must serve from cache until the expiration time; don’t break that by attempting
Task 4.)
Task 5. Safety Network code should never crash with a segmentation fault, even if the hosts on the other
side behave poorly. The sorts of bugs that cause segmentation faults (memory errors) also introduce security
vulnerabilities. It is OK to print an error message to stderr and abandon the request (or, if necessary, call
exit() with an error code). Task 5 covers segmentation faults, but code that crashes with a segmentation fault
may be marked down in other tasks too.
Task 6. Build quality
• The repository must contain a Makefile that produces an executable named “htproxy”, along with all
source files required to compile the executable. Place the Makefile at the root of your repository, and
ensure that running make places the executable there too.
• Running make clean && make -B && ./htproxy should ex-
ecute the submission.
• Compiling using “-Wall” should yield no warnings (C).
Compiling using “rustc” should yield no warnings (Rust).
Do not suppress any default warnings inline.
• Running make clean should remove all object code and executables.
• Do not commit htproxy or other executable files. Scripts (with .sh extension) are exempted.
Test this by committing regularly, and checking the CI feedback; the CI will tell you the mark that you get for this
section. (If you need help, ask on the forum.)
Task 7. Quality of software practices
• Proper use of version control, based on the regularity of commit and push events, their content and asso-
ciated commit messages (e.g., repositories with a single commit and/or non-informative commit messages
will lose marks).
• Quality of code, based on the choice of variable names, comments, formatting (e.g. consistent indentation
and spacing), and structure (e.g. abstraction, modularity).
• Proper memory management, based on the absence of memory errors and memory leaks.
Code will be tested with Valgrind to ensure no memory errors, as these are a security risk. Avoid memory leaks,
but you should not catch SIGINT to clean up memory when terminating.
Further deductions may be applied to inappropriate submissions, e.g. catching segmentation faults, hard-coding
the output into the code.
Stretch goal As stated in the instructions.
6
5 Submission
All code must be written in C or Rust (e.g., it should not be a C wrapper over code in another language) and cannot
use any external libraries, except standard libraries as noted below. You must not use or adapt any code or libraries
relating to HTTP. Rust submissions must be compiled with stable rustc, with no external crates or build scripts.
You can reuse the code that you wrote for your other individual projects if you clearly specify when and for what
purpose you have written it (e.g., the code and the name of the subject, project description and the date, that can
be verified if needed). You may use standard libraries (e.g., to create sockets, send, receive data etc.). Your code
must compile and run on the provided VMs.
The repository must contain a Makefile which produces an executable htproxy along with all source files
required to compile the executable. Place the Makefile at the root of your repository, and ensure that running
make places the executable there too.
Make sure that all source code is committed and pushed. Executable files (that is, all files with the executable bit
which are in your repository) will be removed before marking, and cause loss of marks. Hence, ensure that none
of your source files have the executable flag set. (You can verify this by cloning your repo onto your VM, and
using ls -l.)
If you import code from somewhere else, within the collaboration policy, there should be a commit that does
nothing but import that code, with a commit message saying “importing code from [reference]”. You should then
customise the imported code in later commits.
GitHub The use of GitHub is mandatory. Your submission will be assessed based using the code in your
Project 2 repository (proj2-〈usernames...〉) under the subject’s organization.
We strongly encourage you to commit your code at least once per day. Be sure to push after you commit. This
is important not only to maintain a backup of your code, but also because the git history may be considered for
matters such as special consideration, extensions and potential plagiarism. Proper use of git will have a positive
effect on the mark you get for quality of software practices.
Submission To submit your project, please follow these steps carefully:
1. Push your code to the repository named proj2-〈usernames...〉 under the subject’s organization,
https://github.com/feit-comp30023-2025.
Ensure your code compiles and runs on the provided VMs. Code that does not compile or produce correct
output on VMs will typically receive very low or 0 marks.
2. Submit the full 40-digit SHA1 hash of the commit you want us to mark to the Project 2 Assignment on
the LMS.
You are allowed to update your chosen commit by resubmitting the LMS assignment as many times as de-
sired. However, only the last commit hash submitted to the LMS before the deadline (or approved extension)
will be marked without a late penalty.
3. Ensure that the commit that you submitted to the LMS is correct and accessible from a fresh clone of your
repository. An example of how to do this is as follows:
git clone git@github.com:feit-comp30023-2025/proj2- proj2
cd proj2 && git checkout
Please be aware that we will only mark the commit submitted via the LMS. It is your responsibility to ensure
that the submission is correct and corresponds to the commit you want us to mark.
Late submissions will incur a deduction of 2 mark per day (or part thereof).
We strongly encourage you to allow sufficient time to follow the submission process outlined above. Leaving it to
the last minute usually results in a submission that is a few minutes to a few hours late, or in the submission of the
incorrect commit hash. Either case leads to late penalties.
7
The submission date is determined solely by the date in which the LMS assignment was submitted. Forgetting to
submit via the LMS or submitting the wrong commit hash will result in a late penalty that will apply regardless of
the commit date.
We will not give partial marks or allow code edits for either known or hidden cases without applying a late penalty
(calculated from the deadline).
Extension policy For extensions between 1-3 business days, you must:
1. Have an AAP or fill in FEIT’s short extension declaration form before the project’s deadline.
2. Submit an extension request via form in Project Module on LMS.
For extensions of more than 3 business days, you must:
1. Apply for an extension via the special consideration portal before the assessment deadline.
2. Receive a successful outcome for your application.
3. Submit the outcome of your application via form in Project Module on LMS.
Further details are available on the “FEIT Extensions and Special consideration" page on Canvas (under the
Welcome module).
6 Testing
You will have access to several test cases (via a HTTP server – see Ed) and their expected outputs. However,
these test cases are far from exhaustive; they are mainly to avoid misinterpretation of the specification. Designing
and running your own tests is a part of this project. Your code will be assessed on these cases other cases that
you haven’t seen before. The unseen cases are not “trick” cases, but are chosen to reflect the fact that real world
programming tasks do not come with an exhaustive list of test cases.
Project 2 Repository : The project skeleton and sample outputs are available from:
feit-comp30023-2025/project2.
Continuous Integration Testing: To provide you with feedback on your progress before the deadline, we will
set up a Continuous Integration (CI) pipeline on GitHub with the same set of test cases.
Though you are strongly encouraged to use this service, the usage of CI is not assessed, i.e., we do not require CI
tasks to complete for a submission to be considered for marking.
The requisite ci.yml file has been provisioned and placed in your repository, but is also available from the
.github/workflows directory of the project2 repository linked above.
7 TeamWork
Both team members are expected to contribute equally to the project. If this is not the case, please approach the
head tutor or lecturer to discuss your situation. In cases in which a student’s contribution is deemed inadequate,
the student’s mark for the project will be adjusted to reflect their lack of contribution. We will look at git history
when making such an assessment.
8 Collaboration and Plagiarism
This is a pair project. Please keep a log of your group interactions in a GIT file called collab.txt or
collab.tex. This should include things like who agreed to do what at which meeting, and any changes of
plan.
8
There are no marks allocated to this file, but it will be used in cases where either party wants marks to be allocated
unequally between the two partners. We will look at the GIT history of this file, so please update it as soon as an
issue arises, such as if one of you is unable to attend a meeting. Please check this file regularly to check that you
are happy with what you partner may have written.
Even if you do not expect problems, it is good practice to keep minutes of meetings, and this file is a suitable place
for that. If you want to keep a formatted document, you can either use LaTeX, or keep a separate word processor
document and export a plain text version for GIT.
Collaboration outside your group You may discuss this project abstractly with your classmates but what gets
typed into your program must be individual work, not copied from anyone else. Do not share your code and do not
ask others to give you their programs. The best way to help your friends in this regard is to say a very firm “no” if
they ask to see your program, point out that your “no”, and their acceptance of that decision, are the only way to
preserve your friendship. See https://academicintegrity.unimelb.edu.au for more information.
Note also that solicitation of solutions via posts to online forums, whether or not there is payment involved, is
also Academic Misconduct. You should not post your code to any public location (e.g., github.com) until final
subject marks are released.
If you use a small amount of code not written by you, you must attribute that code to the source you got it from
(e.g., a book or Stack Exchange) in both the comments and the git commit messages.
Do not post your code on the subject’s discussion board Ed, except in a Private thread.
Plagiarism policy: You are reminded that all submitted project work in this subject is to be your own individual
work. Automated similarity checking software will be used to compare submissions. It is University policy that
cheating by students in any form is not permitted, and that work submitted for assessment purposes must be the
independent work of the student concerned.
Using git properly is an important step in the verification of authorship. We should see the stages of your code
being written, not just the finished product.
AI software such as ChatGPT can generate code, but it will not earn you marks. You are allowed to use tools like
ChatGPT, but if you do then you must strictly adhere to the following rules.
1. Have a file called AI.txt
2. That file must state the query you gave to the AI, and the response it gave
3. You will only be marked on the differences between your final submission and the AI output.
If the AI has built you something that gains you points for Task 1, then you will not get points for Task 1;
the AI will get all those points.
If the AI has built you something that gains no marks by itself, but you only need to modify five lines to get
something that works, then you will get credit for identifying and modifying those five lines.
4. If you ask a generic question like “How do I convert an integer to network byte order?” or “What does the
error ‘implicit declaration of function rpc_close_server’ mean?” then you will not lose any marks for using
its answer, but please report it in your AI.txt file.
If these rules seem too strict, then do not use the AI tools.
These issues are new, and this may not be the best policy, but it is this year’s policy. If you have suggestions for
better rules for future years, please mention them on the forum.
Good luck!
9

学霸联盟