Rstudio代写-ST340-Assignment 1
时间:2021-01-31
ST340 Programming for Data Science
Assignment 1
Released: Friday week 2, 2021-1-22.
Deadline: 12:00 on Monday week 5, 2021-2-8.
Instructions
• Links and instructions on how to submit can be found on the module website.
• Work in groups of at least one and at most three. Group work is preferred. Please note that
no special consideration for marking will be given to solo work despite more effort.
• Specify your student numbers and names on your assignment. You need submit only one copy
for each group.
• Any programming should be in R. Your report should be created using R markdown. Submit
two files: a knitted pdf document and the corresponding R markdown file.
• This assignment is worth16% of your overall mark.
1. Mergesort is a divide and conquer algorithm for sorting an array.
(a) Devise an algorithm, merge, that takes as input two sorted arrays and returns a single
array with the elements of the two input arrays in sorted order. For example, with inputs
(1, 3, 3, 6, 7) and (2, 4, 5) it should output (1, 2, 3, 3, 4, 5, 6, 7). If the lengths of the input
arrays are n1 and n2, your algorithm should require O(n1 + n2) comparisons in the worst
case. Describe, implement and test your algorithm.
(b) The mergesort algorithm is recursive. Taking as input an array of length n it calls itself
twice, once on the first half of the array and once on the second half of the array (with
appropriate modifications if n is odd). It then uses merge to combine the output of the
two recursive calls into an array, which it then returns. Describe, implement and test the
mergesort algorithm.
(c) Prove by induction that mergesort outputs its input array in sorted order.
(d) Let T (n) be the total number of comparisons made by mergesort on an input of size
n. Write a recurrence inequality for T (n), assuming n is even. Prove by induction that
T (n) ≤ n log2 n for all n ∈ {2k : k ∈ N}.
(e) Compare and contrast mergesort with quicksort in a few short sentences.
2. A simple version of the majority element problem is as follows. Let a be an array of n = 2k
elements and ≡ be an equivalence relation for those elements. We want to find a majority element
of a, which is an element that occurs in an equivalence class with strictly greater than n2 elements
(e.g. if n = 8 then the element is equivalent to at least four others plus itself). Furthermore, we
want to do this only using the ≡ relation and no < relations, i.e., we cannot order the elements.
(a) Devise a recursive, divide and conquer algorithm to return the majority element x of a if
it exists and which returns “no majority” if it doesn’t. At each step, your algorithm should
make two recursive calls and perform no more than 2n equivalence (≡) tests outside of these
recursive calls. Remember to present your pseudo code. Explain why your algorithm works
(by proving something).
(b) Provide an upper bound on the number of equivalence checks that your algorithm makes
on an input of size n, along the lines of Q1(d).
3. For this question you are encouraged to re-use as much of the code from Lab 2 as possible (include
in your report only the parts you have changed). You are asked to compress them×n = 767×584
matrix representation of Nightingale in pictures.rdata in such a way that the accuracy of the
approximation is counterbalanced by the space used. You are to use the greyscale representation
computed in Lab 2 as the input matrix A. Specifically, you are asked to produce an integer
k ∈ {1, . . . , n}, vectors b1, . . . , bk, d1, . . . ,dk and values c1, . . . , ck such that you minimize the
function
f(k, b1, . . . , bk,d1, . . . ,dk, c1, . . . , ck) = exp (‖A−B‖F ) + k(m+ n+ 1),
where ‖ · ‖F denotes the Frobenius norm, and
B =
k∑
i=1
cibid
T
i .
Use R to find the value of (k, b1, . . . , bk,d1, . . . ,dk, c1, . . . , ck) that minimizes f . Explain all
computations you have done, including why your answer minimizes the function above. (You do
not need to ensure that the entries of B are in [0, 1], which we did in the lab only to visualize
the matrices.) You should not need to compute any matrices of the form of B to find what is
required, nor do you need to print any of B, bi, di, or ci in your submitted answer.