CS0019-无代写|学霸联盟

CS0019-无代写

时间：2023-02-12

CS 0019是一门专为留学生开设的计算机科学课程，旨在帮助学生更好地了解计算机科学的基本概念和技能。课程内容涵盖了计算机科学的核心领域，包括算法与数据结构、计算机网络、数据库系统、操作系统、编程语言等。通过本课程，学生将学会如何使用编程语言进行软件开发，并了解如何使用计算机科学的知识解决实际问题。本课程采用了多种教学方法，包括课堂讲解、实际项目实践和小组合作，旨在帮助学生更好地掌握课程知识，培养他们的创新能力和团队合作精神。总之，CS 0019是一门富有启发性和实用价值的课程，适合所有对计算机科学感兴趣的留学生。

UCL CS 0019 Brad Karp
Individual Assessed Coursework 3:
A Bit-Level LZW Compressor and Decompressor
Due date: 4 PM, 23rd February 2023
Value: 13% of marks for module
Introduction
Network link speeds increase quickly, as do magnetic disk and SSD storage capacities, but so do
the sizes of data sets we manipulate. No matter how fast the network link or how big the disk,
there are workloads that exceed available capacity. One way to try to squeeze every last bit of
capacity out of a link or a disk is to compress data—to take an input stream of data, encode it
so as to be represented by fewer bits, and send or store the resulting smaller, compressed data
stream. A complementary processing pass can then decompress the compressed data stream and
recover the original data. In a lossless compression scheme, decompressing the compressed data
recovers the exact same original bits that the compressor took as input. In a lossy compression
scheme, the result of decompressing the compressed data may be close to the original data, but
not identical; JPEG image compression, for example, is a lossy compression scheme.
In this coursework, you will build a complete implementation of the Lempel-Ziv-Welch
(LZW) lossless data compression algorithm, including both a compressor (which takes an in-
put file and produces a compressed version of it as output) and a decompressor (which reverses
this process, and produces the original data from a compressed file). LZW is a widely used com-
pression algorithm: PDF, TIFF, and GIF files all employ variants of LZW compression. Your
LZW implementation will have a unique twist: unlike most others, yours will not operate on
input data consisting of 8-bit bytes (as do most file compression utilities), but will rather be tai-
lored for compression and decompression of DNA data. There are only four possible data values
in DNA data, which correspond to the four bases that comprise DNA: guanine (G), adenine (A),
cytosine (C), and thymine (T).
This coursework will cause you to become proficient in bit-level manipulation of data in C:
both the input to your compressor (a stream of DNA bases encoded in a file) and its output,
and by symmetry, the input to your decompressor and its output, will consist of data values that
are bit strings that will often not begin or end at byte boundaries. And the compressed data
you produce with LZW will include bit strings of varying lengths. UNIX’s (Linux’s) file input
and output (I/O) operations, however, operate only at a granularity of whole bytes. A major
part of your task in this coursework is thus to bridge between the bit-level granularity of data
your compressor and decompressor must manipulate and the byte-level granularity of file I/O.
You’ll become familiar with basic file I/O operations in C as supported by the UNIX standard
I/O (stdio) library along the way.
Tasks
• Implement an LZW compressor that takes a series of DNA bases stored in a packed binary
format as input from a file and produces LZW-compressed output to a file.
• Implement an LZW decompressor that takes LZW-compressed input from a file and pro-
duces the original, decompressed DNA data encoded as a series of DNA bases in a packed
binary format in a file.
1
Your LZW compressor and decompressor must comply with all of the specification we pro-
vide below. It is vital that you understand the specification before you begin coding: an imple-
mentation that deviates from the specification will fail the tests, and the tests alone determine
your mark.1 Read this handout in its entirety carefully before you begin!
As ever, it is important to get started early. While LZW is not all that complex an algorithm,
and a complete solution can fit in on the order of only 300 lines of code, coming to grips with
bit-level operations in C will likely take time. You will likely need the two weeks allotted to
complete CW3.
Specification
You should implement the LZW algorithms for compression and decompression (one separate C
function for each) as described in Welch’s 1984 article (available on the UCL CS 0019 web site),
with five modifications:
• Because your compressor operates over DNA bases, for which there are only four possible
values, each input symbol is two bits long (rather than one byte long, as described in
Welch’s article). We describe below the exact format of an input file of DNA bases.
• When you initialize your compressor’s table, you should populate its first four entries with
the four possible input base values (0, 1, 2, 3) in increasing order (as Welch does in his
version of the algorithm, but for all 256 possible 8-bit byte values).
• You must initialize your compressor and decompressor to begin with a code length of
3 bits (i.e., such that the first compressed code your compressor outputs and that your
decompressor expects as input is 3 bits long).2
• If your compressor’s or decompressor’s table of codes fills, you must dynamically grow
it as needed at runtime. (That is, do not implement “clearing” of the code table or a
reserved code indicating that the decompressor should clear its table.) See “Tips” later in
this handout for more information on how to correctly grow the table of codes.
• Your compressor and decompressor must increase their output or input code length (re-
spectively) when necessary to accommodate growth in the number of entries in the table.
See the further explanation in “Tips” later in this handout.
The format of files containing uncompressed DNA data is as follows:
• the first four bytes of a file of DNA data contain an unsigned, little-endian integer that
specifies the number of bases that follow in the remainder of the file;
• each base is encoded in two bits; each of the four possible two-bit values represents one of
the four bases;
1There is one exception: if a submission clearly produces output using code that is not an LZW implementation,
that submission will receive zero marks, regardless of test results.
2We make this requirement because if you begin running your compressor and decompressor with a table of codes
that is already fully populated, there is a corner case that arises at the decompressor that Welch doesn’t discuss
(because he uses a 4K table that has only 256 entries at the start). It’s readily solvable, but we didn’t feel the need to
include it in the coursework.
2
• within an eight-bit byte, the order of the bases stored within that byte is as follows: the
first base is in the two most significant bits, the second base in the next two less significant
bits, the third base in the next two less significant bits, and the fourth base in the two least
significant bits;
• each successive byte contains bases that are ordered after the bases contained in the prior
byte
• in the final byte of a DNA file, any trailing bit positions that do not contain base data must
be zero-filled (i.e., must contain only zero bits)
In schematic form, the above format looks like this:
The format of files containing LZW-compressed DNA data is as follows:
• the first four bytes of a file of LZW-compressed DNA data contain an unsigned, little-
endian integer that specifies the number of bases represented by the compressed stream in
the remainder of the file;
• unlike in the figure above, which shows fixed-width, two-bit bases, LZW codes increase
in width (in bits) over the course of a run of LZW, and a single code may span byte
boundaries; the compressor and decompressor track how long in bits the codes they write
and read (respectively) should be;
• a series of codes fills in the bits of an eight-bit byte from most significant to least significant;
should a code span two bytes, it continues in the second byte’s most significant bit;3
• in the final byte of an LZW-compressed DNA file, any trailing bit positions that do not
contain codes must be zero-filled (i.e., must contain only zero bits)
Further Requirements
Implement your compressor in C source files named comp0019.{c,h} (pun intended), and your
decompressor in C source files named decomp0019.{c,h}. If you wish, you may implement
library routines that you use in both the compressor and decompressor in C source files named
comp0019lib.{c,h}. We provide vestigial, nearly empty versions of these files for you in the
code we hand out. Don’t delete any of these files from your repo, even if you don’t wind up using
them. The grading server requires that these files all exist, and its compilation of your code will
fail, yielding a failure of all tests, if any of them is missing.
Your compressor and decompressor must be invoked with the following function prototypes:
void Encode(FILE *in_file, FILE *out_file); // compressor
void Decode(FILE *in_file, FILE *out_file); // decompressor
3As it happens, this is how PDF and TIFF files pack bit strings shorter than one byte in length into bytes; GIF files
do so in the opposite direction.
3
Our tests provide a main() that invokes your compressor and decompressor by the above API.
If your compressor or decompressor function in the above API encounters input that is
invalid (i.e., that deviates from the specification for the input format for a file containing a
sequence of bases or the input format for a file containing an LZW-compressed sequence of
bases, respectively), it should print an error message Invalid {encoder,decoder} input:
aborting... (depending on whether the compressor or decompressor is running) and return
cleanly, without crashing. If you encounter this error case in your decompressor, the code should
also write out any buffered valid decompressed bits to the output file and terminate the output
file correctly, without writing any invalid decompressed bits to the output file.
Note that an LZW compressor or decompressor will yield very different results depending
upon the width in bits with which it considers its input and renders its output, respectively: a
byte-width-symbol LZW compressor or decompressor will yield different results than a two-bit-
width-symbol LZW compressor or decompressor. Our tests (which determine your mark on this
coursework) expect a compressor that treats its input as two-bit symbols and a decompressor
that renders its output as two-bit symbols.
You should use the C stdio file I/O library routines for file I/O; these include fopen(),
fread(), fwrite(), and fclose(), among others. Do not use the UNIX I/O system calls in
this coursework (e.g., open(), read(), write(), close(), etc.). (Code that writes very small
numbers of bytes at a time to a file, such as the code you are likely to write in this coursework,
benefits in performance from the buffering that the stdio file I/O library routines provide.)
Tips
One detail of implementing LZW omitted from Welch’s classic paper is what to do when the
table of codes fills—i.e., when the compressor or decompressor generates a table index that
falls outside the currently allocated number of entries in the table. For this coursework, you are
required to grow your table (both in the compressor and decompressor) in this situation. Broadly
speaking, you should do so by allocating more memory for further table entries (so that you can
continue by filling in entries in the now-larger table). Your implementation should be capable of
growing its table without bound (until the system reports that no more memory is available).
Another implementation detail omitted by Welch is how to dynamically increase the size
of a code in bits (i.e., the number of bits in a single compressed output value; notionally a
“table index” in Welch’s description of the compressor). That is, while the specification in this
coursework requires you to begin by emitting codes that are three bits long, as your table grows,
you will find that you need to emit codes that cannot be represented in three bits, because the
table index to emit is too large to fit in three bits. You will thus need to maintain a notion of the
current code size in bits, and dynamically increase it when necessary as your table grows. Your
compressor and decompressor must note when they need to emit or read (respectively) a code
longer than the current code size, increment the current code size at that moment, and thereafter
emit (read) only codes of that length in bits (until the next code length increase, if further ones are
needed). N.B. that once you increase the code size, subsequent emissions/reads of all codes must
use the new, increased code size. That is, even if some code had previously been emitted/read
using a smaller code size, from the moment your implementation increases the code size onward,
that code should be emitted/read using the increased code size.
For the compressor, you will need to implement a data structure that lets you search for the
table entry that contains a string. Various data structures, including some variant of trie or a hash
table, allow efficient lookups of this sort. We do not require you to implement any particularly
efficient data structure for this part of the compressor. Even a linear search, while asymptotically
inefficient, is allowed. You are of course welcome to implement a more efficient data structure
if you so choose, but the marking scheme doesn’t reward doing so (the glory of knowing you’ve
implemented a clever, efficient data structure is its own reward). Strategically speaking, it’s in
4
your interest not to optimize too early: better to focus first on producing a simple implementation
with an inefficient string search that passes all the tests and earns full marks, and then to circle
back to implement a more ambitious data structure for this search if that’s your bag, than to risk
running out of time before the deadline because the implementation of the more ambitious data
structure winds up taking more time than expected. (This really happens to people; don’t let it
happen to you!)
In his pseudocode in the assigned LZW reading, Welch (writing in 1984) is very concerned
about memory efficiency. For the decompressor, consider a single code c read from the com-
pressed input that corresponds to some string B of multiple bases. Being memory-efficient in the
decoder means storing in code c’s table entry only the last base in B represented by code c, along
with a “pointer” (in the guise of another code p that represents all prior bases in B) to code p’s
table entry (and so on, until reaching a “terminal” code for the first base represented by c, whose
table entry contains only that first base, and no pointer to a further code). This approach is
asymptotically memory-efficient as the size of the table increases, as it stores a constant number
of bits per table entry. We don’t require your implementation to be this memory-efficient: in
2020, it’s fine for you to keep the entire sequence of bases represented by a code in that code’s ta-
ble entry, which will make your implementation simpler (not least because the memory-efficient
one above must also reverse the order of bases extracted from the table before outputting them—
the job of the stack in the Welch article). Again, while the marking scheme doesn’t reward it, if
you’re looking for more of a challenge after you’ve passed all the tests, you’re welcome to circle
back and for your own edification attempt the memory-scrimping design in Welch’s pseudocode.
You will want to familiarize yourself with the C operators for manipulating individual bits,
including bitwise and (&), bitwise or (|), shift left and right (<<, >>), and complement (˜).
These operators allow you to extract bit fields from within a byte and set bit fields within a byte.
You can find helpful examples of the use of these operators in CS:APP/3e §2.1.7 and 2.1.9. The
discussion of masking in 2.1.7 is particularly relevant.
Introductory programming classes often suggest a “top-down” approach to programming:
implementing code for the high-level sequence of operations a program should take first, leaving
the details of lower-level functionality initially unspecified, and then iteratively writing routines
that fill in progressively lower levels of detail. Experienced programmers who study the definition
of the problem they are solving often proceed in the exact opposite fashion: by first building a
library of low-level routines that parse input and place it into a representation where the program
can operate on it, and that convert this program-internal representation into the needed output
format. We suggest you adopt this “bottom-up” approach as you work on this coursework: first
implement routines that can do the bit-twiddling necessary to read and write two-bit symbols in
the file format specified in this handout, and to handle the generation and reading of variable-
length bit strings, as needed by a bit-granularity LZW compressor and decompressor. Once these
routines are in place, you can then tackle implementing the LZW compressor and decompressor
atop them.
It is likely that you may find implementing the bit-twiddling routines more time consuming
than implementing the LZW compressor and decompressor—possibly significantly moreso, de-
pending on how much you’ve worked with bit fields in C or other languages before. This is
intentional: one of our chief goals for this coursework is to give you experience grappling with
bit fields in C.
You may find the shell command xxd -b useful if you’d like to examine the bit-level content
of files while debugging your compressor and decompressor.
5
Logistics
Before you follow the instructions below to retrieve the code for CW3, you MUST first
complete the 0019 grading server registration process. You only need to complete that pro-
cess once for the whole term, so if you did so before for CW1 or CW2, you can proceed
straight to retrieving the code for CW3.
If you have not yet done so, STOP NOW, find the email you received with the subject
line “your 0019 grading server token,” retrieve the instructions document at the link in that
email, follow those instructions, and only thereafter proceed with the instructions below for
retrieving the code for CW3.
We will use GitHub for CW3 in much the same manner as for CW2. To obtain a copy of the
initial code for CW3, please visit the following URL:
https://classroom.github.com/a/k2ovyWH5
If you’d like a refresher on using git with your own local repository and syncing it to GitHub,
please refer to the CW2 handout.
Each time you push updated code to your GitHub repository for CW3, our automatic grading
server will pull a copy of your code, run our automated tests on your code, and place a grade
report in a file grade report.md in your GitHub repository. Your mark on CW3 will be
that produced by the automated tests run by our automatic grading server on the latest commit
(update) you make to your GitHub repository before the CW3 deadline. More on these tests
below.
You must develop your code for CW3 in the official 0019 Linux development environment
on an x86-64 CPU.4 As for CW2, the UCL CS Linux lab machines (remotely accessible via ssh)
and our 0019 Linux Docker container are available to you for CW3, and you must use one of
these two environments to do your development and testing. As we advised for CW2, if you
encounter difficulty getting Docker working on your own machine, you should “give up early”
on Docker and switch to working over ssh, as you will otherwise lose valuable time that you
need for the real work of CW3.
Please note that your code’s behavior on the automated tests when run in the 0019 Linux
Docker container on the grading server will determine your mark on CW3, without excep-
tion. The CS 0019 staff cannot “support” development environments other than the 0019
ssh-accessible Linux machines and the 0019 Docker Linux container: we cannot diagnose
problems you encounter should your code pass the tests in some other environment, but fail
them in the official 0019 Linux environment on the grading server. In past years students
who have disregarded this requirement and done CW3 under Mac OS or Windows Subsys-
tem for Linux have found that their code yields different results in these non-0019-supported
environments than on the 0019-supported version of Linux used on the grading server, 0019
ssh-accessible Linux machines, and in the Docker environment we provide. Don’t let that
happen to you–do your work in the supported 0019 Linux environment!
Building Your Code and Running the Tests
Clone your repository from GitHub to your working environment. You will find the C source
files named above in the src/ subdirectory of your repository. You should add your implemen-
tation of LZW to these files in this directory.
4If you have an ARM CPU on your machine, as we explained in the CW2 handout, the 0019 Docker environment
will emulate an x86-64 CPU for you.
6
To prepare your working copy for running the tests, you must first issue the following shell
commands in the top-level directory of your working copy:
mkdir build
cd build
cmake ../
(You only need do the above once, when you first clone the repository.)
To build your code and run the tests, go into the build directory that you created above,
and issue the shell command
make test-all
Further information on how to interpret the output of the tests and debug follows below.
ARM users only:
As with CW2, to run the gdb debugger on the x86-64 code that your 0019 Docker Linux
environment produces, you need to take an extra step, because gdb does not quite work
seamlessly in this cross-development (ARM to x86-64) environment.
If you want to run gdb on an executable that you build as part of CW3, you will need to
do so using a gdb script we provide in the CW3 code we hand out. For example, if you want
to run gdb on the executable file named encode, you may do so by typing the following
commands:
• Make sure you are in the build subdirectory of your CW3 repo.
• Run gdb at the shell prompt.
• At the (gdb) prompt, run the command source ../lzw.malloc.
• At the (gdb) prompt, run the command run-lzw encode [ARG1] [ARG2]
[...] , where ARG1 and so on are further command-line arguments (if any) you
would like to supply to the program being run under gdb.
Hereafter, you may run whatever gdb commands you wish in the usual way (setting
breakpoints, stepping through execution one line of C source a time, etc.). Substitute the
name of any CW3 executable you want to debug for encode in the above.
Keep in mind that if you find using gdb in this fashion inconvenient, you can always ssh
to the 0019 x86-64 Linux machines we provide at UCL CS, clone your git repository there,
and run gdb there on a native x86-64 box.
Submitting via GitHub
We will deem the timestamp of your CW3 submission to be the timestamp on GitHub’s server of
the last push you make before the submission deadline of 4 PM GMT on 23rd February 2023.
Your mark will be the mark you receive on the automated tests for that version of your code.
0019 follows the UCL-wide standard late coursework penalties, as described on the 0019 class
web site.
If you wish to submit after the deadline, you must take the following steps for your course-
work to be marked:
7
1. When you wish to receive a mark for a version of your code that you push to GitHub after
the submission deadline, you must begin your commit log message for that commit with
the exact string LATESUBMIT. Our grading system will not record a mark for your late
submission unless you comply with this requirement. We follow this policy so that if a
student accidentally pushes a further commit after the deadline, they aren’t penalized for a
late submission.
2. You may make only one late submission (i.e., one GitHub commit with the initial string
LATESUBMIT. If you make more than one late submission, we will only mark the first one.
Tests and Marking Scheme
There are three groups of tests, as follows:
1. Group 1: Exhaustive tests of compression and decompression of all possible base sequences
up to and including length 6.
2. Group 2: Tests of compression and decompression of fixed, randomly pre-generated base
sequences up to a length of 1 million bases.
3. Group 3: Tests of corner-case inputs to your compressor and decompressor, and whether
your implementations handle these cases correctly.
Within each group of tests, there are multiple individual tests. In groups 1 and 2, each
individual test consists of an input file of bases and the corresponding correct compressed output
file from a run of a correct, spec-compliant compressor. Each of the tests in groups 1 and 2
runs your compressor on the input file of bases and compares the resulting output with that
of the correct compressed output; if they don’t match (even in a single bit), you will receive a
report about the failure. The test will then progress to run your decompressor on the correct
compressed data, and compare your decompressor’s output with the original input file of bases;
if they don’t match (again, even in a single bit), you will receive a report about the failure.
In group 3, each individual test consists of an invalid input file, either for your compressor or
decompressor, runs the compressor or decompressor on that file, and validates that the invoked
code handles the invalid input in a way that complies with the specification in this handout.
In some cases, a group of tests includes as many as thousands of individual tests. Do not fear
the sheer total number of tests; they are intended to be exhaustive and are highly redundant. By
redundant, we mean that code that passes one test will almost certainly pass a great many of
them (and similarly, fixing one bug that had caused many tests to fail will almost certainly cause
many of those tests to pass thereafter).
Because there are so many individual tests, the test code we provide runs them in a test
harness that hides much of the detail of the individual tests. The test harness outputs results for
the tests within a group in “subgroups” of closely related individual tests (which again may be
numerous), and summarizes the results for each such subgroup.
Again, because there are so many individual tests, we don’t provide you the thousands of
input files and output files for all of them, as that would be unwieldy. Rather, the test harness
dynamically synthesizes the test input and output files as it runs the tests. When you fail an
individual test, we of course realize that you may wish to learn more detail about the individual
test that failed, and further to run and debug your code on that test’s input, and/or study the
content of the input or the correct output. To let you do so, when your code fails a test, the
test harness will automatically tell you two shell commands to explore the failed test further:
one command for running just that failed test and seeing its detailed output, and a second for
generating the input and output files for the test you failed, and reporting their names. You can
then run your code on these files.
8
For example, here is the start of the output from the tests when the implementation of the
compressor and decompressor is empty, which is the state of affairs in the code you receive when
you first create your GitHub repository:
Test group one (all possible sequences with N elements):
testing encode for 4 sequences of size 1: FAILED
to repeat only this test run:
./lzw_test --file ../test_data/all_bases_1 --test_index 0
--type encode
to extract input for test run:
./lzw_test --file ../test_data/all_bases_1 --test_index 0
--type encode --dump
To see more detail on the individual test you failed in this subgroup of tests, you can issue
the first command above, which will produce:
[student@host] ./lzw_test --file ../test_data/all_bases_1 --test_index 0
--type encode
Will run test at index 0
Encoder output does not match model output
The above output tells you why this test failed: your encoder’s output doesn’t match that
produced by a spec-compliant one (as it shouldn’t; we’re running a null implementation here).
To obtain the input and output files for the test you failed, so that you can examine them and
run your code on them directly, you can issue the second command above, which will produce:
./lzw_test --file ../test_data/all_bases_1 --test_index 0 --type encode
--dump
Encoder input saved to ’encoder_input’
Model encoder output saved to ’model_encoder_output’
You can then run your compressor on this test’s base sequence input with:
./encode encoder input outfile
where your compressor’s output will be written to outfile, and your decompressor on a spec-
compliant compressed version of the base sequence with
./decode model encoder output
(You can of course run gdb on these two programs with these inputs, as well.)
The tests in Groups 1, 2, and 3 are worth 35, 35, and 30 marks respectively. Within a group,
you earn marks linearly in the number of lines (subgroups) whose tests you fully pass. (For
example, if there are 5 separate lines of tests in a group and you fully pass the tests in two lines
only, that will be 40% of the marks within that group.)
Once again, we urge you to get started early.
Academic Honesty
This coursework is an individual coursework. Every line of code you submit must have been
written by you alone, and must not be a reproduction of the work of others—whether from the
work of students in this (or any other) class from this year or prior years, from the Internet,
or elsewhere (where “elsewhere” includes code written by anyone anywhere). You may not
use AI-based tools to produce any lines of code for your submission (even if you modify them
manually).
9
Students are permitted to discuss with one another the definition of a problem posed in the
coursework and the general outline of an approach to a solution, but not the details of or code
for a solution. Students are strictly prohibited from showing their solutions to any problem
(in code or prose) to a student from this year or in future years. In accordance with academic
practice, students must cite all sources used; thus, if you discuss a problem with another student,
you must state in your solution that you did so, and what the discussion entailed.
Any use of any online question-and-answer forum (other than the CS 0019 Piazza web site) to
obtain assistance on this coursework is strictly prohibited, constitutes academic dishonesty, and
will be dealt with in the same way as copying of code. Reading any online material specifically
directed toward solving this coursework is also strictly prohibited, and will also be dealt with in
the same way.
You are free to read other reference materials found on the Internet (and any other reference
materials), apart from source code in any language that implements LZW compression or im-
plements storage of variable-length bit strings in files. You may of course use the code we have
given you. Again, all other code you submit must be written by you alone.
Copying of code from student to student (or by a student from the Internet or elsewhere)
is a serious infraction; it typically results in awarding of zero marks to all students involved,
and is viewed by the UCL administration as cheating under the regulations concerning Plagia-
rism, Collusion, and/or Falsification. Penalties imposed can include exclusion from all further
examinations at UCL. The course staff use extremely accurate plagiarism detection software to
compare code submitted by all students (as well as code found on the Internet) and identify in-
stances of copying of code; this software sees through attempted obfuscations such as renaming
of variables and reformatting, and compares the actual parse trees of the code. Rest assured that
it is far more work to modify someone else’s code to evade the plagiarism detector than to write
code for the assignment yourself!
Read the Piazza Web Site
You will find it useful to monitor the 0019 Piazza web site during the period between now and
the due date for the coursework. Any announcements (e.g., helpful tips on how to work around
unexpected problems encountered by others) will be posted there. And you may ask questions
there. Please remember that if you wish to ask a question that reveals the design of your solution,
you must mark your post on Piazza as private, so that only the instructors may see it. Questions
about the interpretation of the coursework text, or general questions about C that do not relate
to your solution, however, may be asked publicly—and we encourage you to do so, so that the
whole class benefits from the discussion.
References
Welch, Terry, A Technique for High-Performance Data Compression, in IEEE Computer, 17(6),
June 1984.
CS:APP/3e §2.1.7 and 2.1.9
K&R §2.9
Linux/UNIX man pages for fread(), fwrite(), fputc(), xxd
10