python代写-FIT3155-Assignment 3|学霸联盟

python代写-FIT3155-Assignment 3

时间：2021-10-19

FIT3155 S2/2021: Assignment 3 (Due midnight 11:55pm on FRIDAY 22 Oct 2021) [Weight: 10 = 2.5 + 2.5 + 5 marks.] Your assignment will be marked on the performance/efficiency of your program. You must write all the code yourself, and should not use any external library routines (that conflict with the core components being assessed. Standard data structures (such as dictionary, list, etc.) are allowed, but you will have to consider their efficiency in places of use. Also, use of bitarray is permitted for this assignment. Follow these procedures while submitting this assignment: The assignment should be submitted online via moodle strictly as follows: All your scripts MUST contain your name and student ID. Use gzip or Winzip to bundle your work into an archive which uses your student ID as the filename. (STRICTLY AVOID UPLOADING .rar ARCHIVES!) – Your archive should extract to a directory which is your student ID. – This directory should contain a subdirectory for each of the three questions, named as: q1/, q2/ and q3/. – Your corresponding scripts and work should be tucked within those subdirectories. Submit your zipped file electronically via Moodle. Academic integrity, plagiarism and collusion Monash University is committed to upholding high standards of honesty and academic integrity. As a Monash student your responsibilities include developing the knowl- edge and skills to avoid plagiarism and collusion. Read carefully the material available at https://www.monash.edu/students/academic/policies/academic-integrity to understand your responsibilities. As per FIT policy, all submissions will be scanned via MOSS. Assignment Questions 1. Towards a string-based proof of an observation you have learnt in this unit: In this question you will implement something really straightforward. Although the exercise itself is easy, the result your program will produce will inform a proof for an observation we made in one of the weeks in this unit. (Feel assured that you will be assessed only for Prepared by: Arun Konagurthu, Monash University your implementation and output. Yet it is encouraged that you reason and analyse the observed behaviour of your output.) Your program takes two inputs: (i) the size of a (small) alphabet, say |A|. For practicality during testing, restrict |A| within the range [1, 5]. Also, assume the characters in the alphabet start from ‘a’ onwards. That is, if |A| = 3, assume the alphabet A is: {‘a’,‘b’,‘c’}. (ii) the length of the string, say N . For practicality during testing, restrict N within the range [1, 10]. Using the two input parameters, |A| and N , write a program to: generate all possible strings of length N from the alphabet A, and for each string, report the number of distinct cyclic rotations that are possible for that string. (Refer to Week 1 tute sheet, questions 4 and 5, for the definition of a cyclic rotation.) Note: Although the inputs (|A| and N) are restricted within a range for testing, your implementation however must be generalizable to any values (≥ 1) of those parameters. Strictly follow the following specification to address this question: Program name: stringtest.py Arguments to your program: Two integers: (i) alphabet size (during testing, restrict to the range [1, 5]) (ii) string length (during testing, restrict to the range [1, 10]). Command line usage of your script: stringtest.py

Output to the terminal the following statistics (in a single line, space separated)

(i) The number of strings of length N from alphabet A with ≥ 2 distinct cyclic
rotations.
(ii) The number of strings of length N from alphabet A with exactly N distinct
cyclic rotations.
(iii) The number of strings of length N from alphabet A with exactly 1 distinct cyclic
rotation.
(iv) Is the number of strings with ≥ 2 distinct cyclic rotations an integer multiple of
N? If yes, print true, else print false.
Sample outputs (with supporting debug information, which you do not have to report)
(SAMPLE 1) Input values:
2 4
Output:
14 12 2 false
---------------------------------------------------------------------------------------------
Debug information for |A| = 2 and N = 4:
2
cyclicrotations(aaaa) | cyclicrotations(abaa) | cyclicrotations(baaa) | cyclicrotations(bbaa)
aaaa | abaa | baaa | bbaa
aaaa | baaa | aaab | baab
aaaa | aaab | aaba | aabb
aaaa | aaba | abaa | abba
nDistinct = 1 | nDistinct = 4 | nDistinct = 4 | nDistinct = 4
cyclicrotations(aaab) | cyclicrotations(abab) | cyclicrotations(baab) | cyclicrotations(bbab)
aaab | abab | baab | bbab
aaba | baba | aabb | babb
abaa | abab | abba | abbb
baaa | baba | bbaa | bbba
nDistinct = 4 | nDistinct = 2 | nDistinct = 4 | nDistinct = 4
cyclicrotations(aaba) | cyclicrotations(abba) | cyclicrotations(baba) | cyclicrotations(bbba)
aaba | abba | baba | bbba
abaa | bbaa | abab | bbab
baaa | baab | baba | babb
aaab | aabb | abab | abbb
nDistinct = 4 | nDistinct = 4 | nDistinct = 2 | nDistinct = 4
cyclicrotations(aabb) | cyclicrotations(abbb) | cyclicrotations(babb) | cyclicrotations(bbbb)
aabb | abbb | babb | bbbb
abba | bbba | abbb | bbbb
bbaa | bbab | bbba | bbbb
baab | babb | bbab | bbbb
nDistinct = 4 | nDistinct = 4 | nDistinct = 4 | nDistinct = 1
---------------------------------------------------------------------------------------------
(SAMPLE 2) Input values:
3 3
Output:
24 24 3 true
---------------------------------------------------------------------------------------------
Debug information for |A| = 3 and N = 3:
cyclicrotations(aaa) | cyclicrotations(acb) | cyclicrotations(bbc) | cyclicrotations(cba)
aaa | acb | bbc | cba
aaa | cba | bcb | bac
aaa | bac | cbb | acb
nDistinct = 1 | nDistinct = 3 | nDistinct = 3 | nDistinct = 3
cyclicrotations(aab) | cyclicrotations(acc) | cyclicrotations(bca) | cyclicrotations(cbb)
aab | acc | bca | cbb
aba | cca | cab | bbc
baa | cac | abc | bcb
nDistinct = 3 | nDistinct = 3 | nDistinct = 3 | nDistinct = 3
cyclicrotations(aac) | cyclicrotations(baa) | cyclicrotations(bcb) | cyclicrotations(cbc)
aac | baa | bcb | cbc
aca | aab | cbb | bcc
caa | aba | bbc | ccb
nDistinct = 3 | nDistinct = 3 | nDistinct = 3 | nDistinct = 3
cyclicrotations(aba) | cyclicrotations(bab) | cyclicrotations(bcc) | cyclicrotations(cca)
aba | bab | bcc | cca
baa | abb | ccb | cac
aab | bba | cbc | acc
nDistinct = 3 | nDistinct = 3 | nDistinct = 3 | nDistinct = 3
cyclicrotations(abb) | cyclicrotations(bac) | cyclicrotations(caa) | cyclicrotations(ccb)
abb | bac | caa | ccb
bba | acb | aac | cbc
bab | cba | aca | bcc
nDistinct = 3 | nDistinct = 3 | nDistinct = 3 | nDistinct = 3
cyclicrotations(abc) | cyclicrotations(bba) | cyclicrotations(cab) | cyclicrotations(ccc)
abc | bba | cab | ccc
bca | bab | abc | ccc
cab | abb | bca | ccc
nDistinct = 3 | nDistinct = 3 | nDistinct = 3 | nDistinct = 1
cyclicrotations(aca) | cyclicrotations(bbb) | cyclicrotations(cac) |
aca | bbb | cac |
caa | bbb | acc |
aac | bbb | cca |
nDistinct = 3 | nDistinct = 1 | nDistinct = 3 |
---------------------------------------------------------------------------------------------
Non-examinable component: Systematically generate output values by varying the input
parameters |A| and N in the ranges specified above, and see if you can detect a pattern
3
that emerges whenever the output line reports ‘true’. Does that inform any observation
we have considered in this unit? If so, reason why to glean a ‘string based’ proof for that
key observation we learnt.
2. A prefix-free code for positive integers based on Fibonacci numbers: The se-
quence of Fibonacci numbers can be used to construct a uniquely decodable prefix-free
code for positive integers. As you know, any Fibonacci number, denoted by Fn, is defined
by the second-order recurrence Fn = Fn−1 +Fn−2, starting from F1 = 1 and F2 = 2. This
recurrence results in the infinite Fibonacci sequence: {1, 2, 3, 5, 8, 13, 21, . . .}.
A Belgian medical doctor, Edouard Zeckendorf, proved that every positive integer N > 0
can be uniquely represented as a sum of one or more distinct Fibonacci numbers, that
is, N = Fi1 + Fi2 + Fi3 + . . ., such that the sum does NOT contain any two consecutive
Fibonacci numbers from the Fibonacci sequence. For example, 73 = 55 + 13 + 5 =
F9 + F6 + F4.
Finding such a unique decomposition for any N > 0 translates to a simple greedy algo-
rithm. Start from N and iteratively decompose by choosing the largest possible Fibonacci
number at each iteration, to identify the terms Fi1 , Fi2 , . . . in a strictly decreasing order
(subject to the constraint highlighted in red above).
Then, the prefix-free Fibonacci code word for any positive integer N consists of i1 + 1
bits. If some Fibonacci number Fk appears in unique decomposition of N , then the k-th
bit of the code word is set to ‘1’ otherwise it is ‘0’. Finally, the (i1 + 1)-th bit is set to ‘1’
to signify the end of the code word. For example, the Fibonacci code word for N = 73
is: FibonacciCode(73) = 0001010011.
It is easy to see that this encoding is decodable and prefix-free, because the above con-
straint that Zeckendorf defined ensures that one never observes two consecutive Fibonacci
numbers in the decomposition of any N .
Write a program that accepts an integer MaxN > 0 and generates Fibonacci code words
for all integers in the range [1, MaxN]. Strictly follow the following specification to address
this question:
Program name: fibonaccicode.py
Arguments to your program: A positive integer MaxN > 0.
Command line usage of your script:
fibonaccicode.py
Output to terminal: Fibonacci code words of each integer N in the range [1, MaxN],
one per line (see output format below).
Sample output for MaxN = 10 :
1 11
2 011
3 0011
4 1011
5 00011
6 10011
7 01011
8 000011
4
9 100011
10 010011
3. BWT-based lossless data compression (encoder and decoder) scheme: In this exercise,
you will be implementing a simple Burrows-Wheeler Transform (BWT) based encoder
and decoder.
Ideally, the input to BWT-based encoders should be a string that we want to compress. The
general pipeline of the encoding involves reading the original string, computing its BWT, and
compressing the BWT using a set of encoding schemes. The decoder on the other hand decodes
the BWT and then inverts the BWT to recover the original string. However, in this exercise,
to simplify matters for you, your encoder will be reading a BWT directly (rather than computing
it from the original string). Similarly, the decoder will only have to recover the encoded BWT
(without having to invert it to the original string). Having said this, as a non-examinable target, it would be an
extremely satisfactory exercise for your encoder and decoder to accept and recover (respectively) the original string (rather than
its BWT). You already know how to construct a BWT of a string and invert it, both in linear time.
Your encoder will read a BWT of some original string from an ASCII file, containing
all characters strictly in the range [36,126], and with ‘$’ (ASCII value 36) being the
smallest (terminal) character in the string. Your encoder will then perform a run-length
encoding on the BWT using Huffman code words (see Lecture 9) for encoding characters
and Fibonacci code words (see Question 2 above) for encoding any positive integer.
What is run-length encoding? As the name suggests, run-length encoding of a string
(here, BWT) is based on encoding the lengths of runs of individual characters as they
appear, when scanning the string from left to right. The observed characters and their
corresponding run-lengths can be encoded as tuples of the form 〈character, run-length〉.
For example, the run-length encoding of this BWT string aaac$bb results in the tuples:
〈a, 3〉, i.e. a appears over a run of length=3, followed by
〈c, 1〉, i.e. c appears over a run of length=1, followed by
〈$, 1〉, i.e. $ appears over a run of length=1, followed by
〈b, 2〉, i.e. b appears over a run of length=2.
To generate a fully-decodable binary stream, your encoder encodes (as a continuous
stream of bits) the following pieces of information (in that order):
(a) The length of the BWT, encoded using its Fibonacci code word.
(b) The number of distinct BWT characters, encoded using its Fibonacci code word.
(c) For each distinct character in the BWT (in any order of these characters):
the 7-bit ASCII code word of that character,
the length of its constructed Huffman code word, where the length is encoded
using its Fibonacci code word, and
the Huffman code word of that character.
(d) For each run-length encoded tuple of the BWT (in left-to-right order):
the Huffman code word of the character being encoded, and
the Fibonacci code word of its run-length.
Strictly follow the following specification to address this question.
Write a BWT-based encoder and decoder using the following specification. For dealing
with packed binary input/output, you are free to use Python’s bitarray.
5
ENCODER SPEC: [3 marks]
Program name: mybzip.py
Arguments to your program: An input file containg the BWT of some original
string. (Note: You do not have to compute the BWT in this exercise.)
Command line usage of your script:
mybzip.py
Output file name: encodedBWT.bin
Output format: The output is a binary stream of bits that losslessly (run-
length) encodes the input BWT, by concatenating component code words
for all pieces of information listed above.
Encoding example:
If the INPUT FILE contained the BWT string: "b$aaaaaaaaaa"
the OUTPUT FILE will contain the following stream of bits:
10101100111100001111110001001100010010001101001101111010011
The above binary stream encodes the following pieces of information
------------------------------------------------------------------------
Length(BWT) = 12 => FibonacciCode(12) = 101011
nUniqChars = 3 (i.e., {‘a’,‘b’,‘$’}) => FibonacciCode(3) = 0011
Encoding unique characters in the BWT and their constructed...
...Huffman code words: {‘a’ = 1, ‘b’ = 00, ‘$’ = 01}:
ASCII(‘a’)= 1100001 FibonacciCode(codelen=1) = 11 HuffmanCode(‘a’)= 1
ASCII(‘b’)= 1100010 FibonacciCode(codelen=2) = 011 HuffmanCode(‘b’)= 00
ASCII(‘$’)= 0100100 FibonacciCode(codelen=2) = 011 HuffmanCode(‘$’)= 01
Run-length encoded tuples of the BTW string b$aaaaaaaaaa
<‘b’,1> => HuffmanCode(‘b’) = 00 FibonacciCode(runlen=1) = 11
<‘$’,1> => HuffmanCode(‘$’) = 01 FibonacciCode(runlen=1) = 11
<‘a’,10>=> HuffmanCode(‘a’) = 1 FibonacciCode(runlen=10)= 010011
ENCODING COMPLETE!-------------------------------------------------------
Illustration of the packed (binary) representation of the OUTPUT:
|--------|--------|--------|--------|--------|--------|--------|--------|
|Byte-1 |Byte-2 |Byte-3 |Byte-4 |Byte-5 |Byte-6 |Byte-7 |Byte-8 |
|--------|--------|--------|--------|--------|--------|--------|--------|
|10101100|11110000|11111100|01001100|01001000|11010011|01111010|011ooooo|
|--------|--------|--------|--------|--------|--------|--------|--------|
Note: The binary output has all the bits packed into bytes. Therefore,
depending on the length of your stream, you may have to pad the encoded
stream with additional bits so that the length of the encoded binary
stream becomes a perfect multiple of 8. In the example above, the encoded
stream is 59 bits long. So, the last byte (Byte-8) is padded with five
extra ‘0’ bits (shown above as ‘o’s, for readability).
6
DECODER SPEC: [2 marks]
The decoder is a separate program that should be able to recover the BWT from
the encoded file, without any additional information besides what it can gather from
the binary file it is reading.
Program name: mybunzip.py
Arguments to your program: Any output file generated by your encoder (above).
Command line usage of your script:
mybunzip.py
Output file name: decodedBWT.txt
If the input file encodedBWT.bin file contained the following bit stream:
10101100111100001111110001001100010010001101001101111010011
then, the output decodedBWT.txt file will contain:
b$aaaaaaaaaa
-=o0o=-
END
-=o0o=-
7

学霸联盟