xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

Python代写-1ECE 273

时间：2021-05-01

1ECE 273 Course Projects

Spring 2021

General Instructions

1) Deadline for Project Report: June 5, 2021, 11:59 pm. Online Submission on Canvas.

2) Guidelines: You should work individually on the project and choose one topic from the list provided below.

Alternatively, you can define your own project (see announcement from Prof. Pal on Canvas). You are expected

to submit a project report, along with figures, as well as MATLAB/Python source codes, and the datasets with

proper citations. You will also present your project results in the form of a short presentation ( „ 15 minutes per

student) which will be scheduled during the finals week. You are welcome to compose your report in Jupyter

Notebook with live code and visualizations. Wherever applicable, you should write your own code. Any

plagiarism will be considered a violation of academic integrity and will be dealt with accordingly. Ideally, we

should be able to reproduce the exact results as those in your report; otherwise you will lose points. Negative

results are welcome, but be sure to discuss them in your report. The report should include the following

sections:

‚ Objective/Goal of the project.

‚ Background and motivation: You should cite relevant references, explain why the problem is important,

and its applications. You should also cite the datasets being used.

‚ Method/Techniques.

‚ Results: Include all figures; you can also use tables to present your results, especially for comparing

multiple algorithms.

‚ Discussion: This section will provide a critical discussion of your results, and evaluate the strengths and

weaknesses of different algorithms and the associated trade-offs.

‚ Dataset: In your report, you should include a web link to all datasets you use (if any) and give proper

credit. If the datasets are not publicly accessible, submit them with your report in a separate file. If you

have more than one file, pack all files into a .zip file. Dont use the formats (like .rar) that require specific

software to unpack your file.

‚ You will use CVX (for MATLAB) or CVXPY (for Python) which come with inbuilt solvers (optimized

for respective platforms) for convex problems. Since CVX is a software package meant for modeling, you

will only need to formulate the optimization problem in a standard form using MATLAB/Python syntax.

Refer to your syllabus for more details for resources on how to quickly get started with CVX/CVXPY.

List of Topics

1. Topic 1: Blind Deconvolution

References:

1) Ahmed, A. and Recht, B. (2014). “Blind Deconvolution Using Convex Programming” IEEE Transactions

on Signal Processing,vol. 60, no. 3, Mar. 2014..

Blind deconvolution is the problem of recovering two signals given the output of their convolution. The

signals can not be recovered unless certain restrictions are imposed on them. In this project, you will study

and implement convex algorithms for solving the blind deconvolution problem under suitable conditions, and

evaluate their performance guarantees.

– Study the paper by Ahmed, Recht and Romberg to understand the formulation of the blind deconvolution

problem. Why is the solution to the problem non-unique in general? Can you explain what kind of

ambiguities exist? Under what conditions on the signals (sparsity, subspace structure etc), the problem

can permit unique solutions? Explain why.

– Why is the problem non-convex? Explain the main idea in this paper under which the measurements can be

expressed as a linear function of a suitable transformed variable. How is this transformed variable related

to the constituent signals, and given this transformed variable, how can one obtain the constituent signals?

2Does the variable transformation make the problem convex? What additional relaxations are necessary to

cast it as a convex problem?

– What are the theoretical guarantees of the convex algorithm proposed in the paper? Are they deterministic,

or probabilistic? What should be the relation of the number of measurements, sparsity and the dimension

of the subspace, so that exact (or stable, in presence of noise) recovery is possible? Clearly state the

mathematical guarantees and spell out the assumptions.

– Implement the convex algorithm proposed in the paper. You should be able to recreate the figures. Test

the algorithm by generating the phase transition plots which plots the probability of successful recovery

as a function of number of measurements, and the sparsity and/or dimension of the subspace. Do you see

a sharp region around which the probability changes from close to zero, to close to 1? Compare this with

the theoretical guarantees from the paper.

– Compare performance of the blind deconvolution algorithm against non-blind deconvolution (i.e. where

you are given one of the constituent signals). Do a literature survey on alternating minimization techniques

for blind deconvolution (which are non-convex), and implement one algorithm of your choice. Compare

the performance of convex and non-convex blind deconvolution.

– Design numerical experiments to evaluate the performance of the algorithm when one (or more) conditions

(such as sparsity, low rank) are violated. Can you comment on the robustness of the algorithm against

such violations?

– (BONUS): Extend the algorithm to perform blind deconvolution in 2D. One example is image deblurring.

Test the algorithm on an image of your choice (by blurring it with a suitable kernel).

2. Topic 2: Low-Rank Matrix Recovery From Quadratic Measurements (Application in Polynomial Neural

Network)

References

1) Soltani, Mohammadreza, and Chinmay Hegde. “Towards provable learning of polynomial neural networks

using low-rank matrix estimation”. International Conference on Artificial Intelligence and Statistics. 2018.

2) Candes, Emmanuel J., and Yaniv Plan. “Tight oracle inequalities for low-rank matrix recovery from a

minimal number of noisy random measurements.” IEEE Transactions on Information Theory 57.4 (2011):

2342-2359.

3) Recht, B., Fazel, M. and Parrilo, P.A., 2010. “Guaranteed minimum-rank solutions of linear matrix equations

via nuclear norm minimization”. SIAM review, 52(3), pp.471-501.

4) Goldstein, Tom, Christoph Studer, and Richard Baraniuk. “FASTA: A generalized implementation of forward-

backward splitting.” arXiv preprint arXiv:1501.04979 (2015).

In [1], the authors showed that training a neural network with one hidden layer consisting of fewer number of

units compared to the size of input, and with quadratic activation function can be modeled as low-rank matrix

recovery problem with noisy quadratic measurements.

min

LPRpˆp F pLq “

1

2m

mÿ

i“1

pyi ´ xTi Lxiq2

s.t. rankpLq ď r (1)

where tpyi,xiqumi“1,xi P Rp is the input data set, r is the number of hidden units with the activation function

σpzq “ z2. The ground truth matrix L˚ is constructed from the weights in the first and second layers of the

corresponding neural network in the following form:

L˚ “

rÿ

j“1

αjwjwTj

where wj is the weight vector in the jth hidden unit and α P Rr is the weight vector of the second layer.

This optimization problem is not a convex problem (Why?), however, there are approaches which first convert

the original problem to a simpler convex problem and then solve it as a convex optimization problem. The

3authors in [2,3] have proposed a convex relaxation of the low-rank constraint based on the nuclear norm of a

matrix. In particular, consider the following problem:

y “ ApLq ` n (2)

where L P Rpˆp is an unknown matrix, A : Rpˆp Ñ Rm is a linear mapping,

rApLqsi “ xTi Lxi

and n P Rm denotes additive noise. The desired matrix L˚ (which has p2 unknown entries) can be recovered

from (compressive) measurements (with m ! p2) by solving the following convex problem:

min

LPRpˆp }L}˚

s.t. }A˚peq} ď λ

e “ y ´ApLq (3)

where A˚ is the adjoint operator of A,

A˚peq “

mÿ

i“1

eixix

T

i

}.} is the operator norm, }.}˚ is its dual (i.e the nuclear norm), and λ is a constant which is dependent on the

size of L and noise.

In this project, you are expected to do all of the followings (exlcuding BONUS):

– Read the references and explain why solving the problem (1) is equal to training the corresponded network?

Why is it not a convex optimization problem? Why is problem (3) a suitable convex relaxation for problem

(2)?

– Generate a random data set txiumi“1 from a normal distribution (you can choose the mean and the

covariance), and one ground truth network (Based on the given architecture in [1]) which generates

the output yi corresponding to the input xi. Implement Algorithm 1 in [1] and test it on the data set you

have generated. Try data sets with different sizes and different initialization. Can you recover the ground

truth L˚ (or equivalently, the network)? Plot the probability of success by varying the values of m, p and

r, as well as different initializations (you can refer to [1] for some representative plots). Comment on

your results. Add noise to the measurements and try different noise powers. Report where the algorithm

fails and succeeds.

– Formulate the problem of training neural network (or equivalently, finding L˚) as a convex problem,

based on [2] and [3]. Use the data set you generated in the last part and solve this convex problem with

CVX/CVXPY library. You may need to tune λ by trying different values and cross validating against the

training dataset (Read [2] for suitable ways to choose λ). Compare your result with previous solutions

and plots, including the need for initialization. Try different sizes of data sets. How accurate is the convex

relaxation for this problem? How sensitive is it to the size of data set and noise?

– Generate a new data set where each xi can come from two different Gaussian distributions (representing

two classes) with two different means (you can keep the covariance same if you wish). Assign each data

point a binary label based on its class. (Note that there is no ground truth network in this case). Run

the two algorithms that you have implemented in the last two parts (non-convex and convex) to train the

polynomial neural network (which is also equivalent to learning a matrix L). Generate some new test data

points from your classes and evaluate the performance of the learned network on the test dataset. Are

these algorithms able to classify the data? If not, why? Try to justify.

– (BONUS) If you were able to solve a binary classification problem on the random data set in previous

question, try to apply your algorithm on the MNIST dataset (You can restrict your dataset to only 0 and

1 digits). Is there any way to extend this idea for multi-class classification? How?

43. Topic 3: 1-bit Compressed Sensing

References:

1) Plan, Y., and Vershynin, R. (2013) “Robust 1-bit compressed sensing and sparse logistic regression: A convex

programming approach”, IEEE Transactions on Information Theory, 59(1), 482-494.

2) Chen, L., Hassani, S. H., and Karbasi, A. (2017) “Near-Optimal Active Learning of Halfspaces via Query

Synthesis in the Noisy Setting”, In AAAI (pp. 1798-1804).

Compressed Sensing (sparse signal recovery) has been a popular research topic in signal processing since

early 90s. It has many applications in image processing, machine learning, communication, radar, and medical

imaging. The main goal of Compressed Sensing is to recover a sparse signal x (}x}0 ď s) from linear

measurements of the form y “ Ax where A is a measurement matrix. For many practical applications, infinite-

precision linear measurements are not available. Instead, linear measurements are collected with finite precision

(quantization). In the extreme case with 1-bit quantization, the noiseless 1-bit compressed sensing measurements

is modeled as

yi “ signpaTi xq, i “ 1, 2, ¨ ¨ ¨m (4)

In this project, you are expected to do the following:

– Find two different applications of 1-bit Compressive Sensing. Describe why 1-bit is important in those

applications. Examine their noise models and comment on whether you think the noise is modelled

reasonably in each application. Cite references in your report

– Read [1] and compute, by simulation, the Gaussian mean width of the sparse signal set wpSn,sq. Fix s “ 5

and plot wpSn,sq versus n with its upper bound and lower bound given by Lemma 2.3

– Let S1n,s “ tx : x P Sn,s, xi ě 0u, S2n,s “ tx : x P Sn,s, xi “ c or 0u and S3n,s “ tx : x P Sn,s, xi “

˘c or 0u Plot wpS1n,sq, wpS2n,sq and wpS3n,sq with (b). Interpret the result by using Theorem 1.1 in [1]

– Explain why the relaxation from l0-norm to l1-norm is reasonable, by using the Gaussian mean width,

(III.1) and Theorem 1.1

– In [2], the measurement/observation is also t`1,´1u. Compare the problem in [1] and [2] and describe

the mathematical differences between them

– By using the noise model in [2], implement both algorithms in [1] and [2] and compare them numerically

for s “ 1 to s “ n with various n, which one is better and why.

– (BONUS) Develop an algorithm that incorporates the sparse constraint s into the algorithm in [2] and

compare your algorithm with [1].

4. Topic 4: Coordinate Descent, Dykstra’s Algorithm and LASSO

References:

1) R. J. Tibshirani, “Dykstra’s algorithm, ADMM, and Coordinate Descent: Connections, insights, and ex-

tensions”, in Advances in Neural Information Processing Systems (NeurIPS), I. Guyon, U. V. Luxburg, S.

Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc.,

2017.

2) Tseng, Paul, “Convergence of a block coordinate descent method for nondifferentiable minimization”, Journal

of optimization theory and applications 109.3 (2001): 475-494.

3) Hastie, Trevor, Robert Tibshirani, and Martin Wainwright, “Statistical learning with sparsity: the LASSO

and generalizations”. CRC press, 2015.

In this project, you will implement two different algorithms: Dykstra and Block Coordinate descent to solve

certain types of optimization problems. You are expected to implement those algorithms (i.e. write the codes

on your own) on different problems and study their connections. Consider the following two types of problems

(P1) and (P2)

min

β P Rp

1

2

}y ´Xβ}22 `

dÿ

i“1

hipβq (P1)

5where y P Rn,X P Rnˆp, hi : Rp Ñ R

and

min

z P Rn }y ´ z}2 (P2)

s.t. z P C1 X C2 X . . .X Cd

where Ci’s are closed convex sets.

1) Review coordinate-descent algorithm. Briefly describe how and why it works. What are the conditions on

hi for coordinate-descent to succeed on pP1q?

2) Review Dykstra’s algorithm. Describe for what kind of problems it works, and how. How is it different

from alternating projection algorithm? When are they equivalent?

3) What are the relationships between pP1q and pP2q? Under what conditions are they equivalent? When they

are equivalent, how are Dykstra’s algorithm and Coordinate Descent related?

4) Consider the problem pP1q when hi “ λ|βi| and d “ p. This problem is called LASSO. This problem is

often used to recover sparse β˚, |β˚|0 “ s from measurements of the form

y “ Xβ˚ `w

where w denotes additive noise. What is the dual of the LASSO problem? How are the solutions of LASSO

and its dual related?

5) Write a code implementing the coordinate descent algorithm to solve LASSO. What should be the coordinate

descent iterations? Design various experiments (by generating data in suitable ways) for analyzing the

recovery performance of LASSO in different regimes of N, s, p. Analyze both l2 error }β´β˚}2 and support

recovery performance. Clearly describe your settings and experiments. Study the performance convergence

performance of coordinate descent. Compare the speed of your implementation of coordinate descent and

that from CVX for large problem sizes.

6) Solve the dual of LASSO with Dykstra’s algorithm. Verify the relationship between coordinate descent and

Dykstra’s algorithm through your simulations.

7) (BONUS). Choose another problem/application in literature for which you can apply coordinate descent.

Describe coordinate descent iterations. Similarly find and implement another problem which can be solved

using Dykstra.

5. Topic 5: Federated Learning

References:

1) Gafni, Tomer, et al. “Federated Learning: A Signal Processing Perspective”, arXiv preprint arXiv:2103.17150

(2021).

2) T. Li, A. K. Sahu, A. Talwalkar and V. Smith, “Federated Learning: Challenges, Methods, and Future Direc-

tions”, in IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50-60, May 2020, doi: 10.1109/MSP.2020.2975749.

3) Yin, Dong, Yudong Chen,Kannan Ramchandran and Peter Bartlett. “Byzantine-robust distributed learning:

Towards optimal statistical rates”, International Conference on Machine Learning. PMLR, 2018.

4) L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients”, in Advances in Neural Information Processing

Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch-Buc, E. Fox,and R. Garnett, Eds., vol.

32.Curran Associates, Inc., 2019.

In this project, you will study various task pertaining to Federated Learning. Federated Learning (FL) is a

distributed Machine Learning scheme where multiple users train a model using only local (and possibly private)

data. Then each user shares its trained model with a central server. The server aggregates the model of each

user, judiciously generates one global model and sends that model back to the user devices. You can find more

information about federated learning in references [1] and [2].

1) Find two applications of Federated Learning in literature and include them in your report. Describe a simple

Federative Learning model. Describe how each edge device trains their model, what update they send to the

central server and how central device aggregates them.

62) Implement a simple Federated Learning task of your choice using Fedarative Averaging Scheme from [1].

Describe your data set, loss function etc. Report your results. How quickly does the algorithm converge?

Suppose now you can train the same model using a centralized learning. How quick is it compared to

federated learning?

3) The success of Federated Learning depends on successful participation of edge users in process. Now suppose

a percentage of edge users sends wrong information to the central server during training as described in

[3]. Implement algorithms described in [3] to adapt your federative learning task to this challenge. Conduct

simulations for different percentages of corrupt users and report your results.

4) Even though one of the purposes of Federated Learning is to ensure user privacy, it is possible to infer user

data from their updates to the central server as described in [4]. Implement the algorithm described in [4]

to learn user data from their gradient updates. Do multiple experiments and report your results.

5) (BONUS) Peform literature review on Federated Learning and identify one more associated challenge related

to Federated Learning and discuss possible solutions for that challenge as reported in literature. Implement

the problem that you found interesting and report your results.

学霸联盟

Spring 2021

General Instructions

1) Deadline for Project Report: June 5, 2021, 11:59 pm. Online Submission on Canvas.

2) Guidelines: You should work individually on the project and choose one topic from the list provided below.

Alternatively, you can define your own project (see announcement from Prof. Pal on Canvas). You are expected

to submit a project report, along with figures, as well as MATLAB/Python source codes, and the datasets with

proper citations. You will also present your project results in the form of a short presentation ( „ 15 minutes per

student) which will be scheduled during the finals week. You are welcome to compose your report in Jupyter

Notebook with live code and visualizations. Wherever applicable, you should write your own code. Any

plagiarism will be considered a violation of academic integrity and will be dealt with accordingly. Ideally, we

should be able to reproduce the exact results as those in your report; otherwise you will lose points. Negative

results are welcome, but be sure to discuss them in your report. The report should include the following

sections:

‚ Objective/Goal of the project.

‚ Background and motivation: You should cite relevant references, explain why the problem is important,

and its applications. You should also cite the datasets being used.

‚ Method/Techniques.

‚ Results: Include all figures; you can also use tables to present your results, especially for comparing

multiple algorithms.

‚ Discussion: This section will provide a critical discussion of your results, and evaluate the strengths and

weaknesses of different algorithms and the associated trade-offs.

‚ Dataset: In your report, you should include a web link to all datasets you use (if any) and give proper

credit. If the datasets are not publicly accessible, submit them with your report in a separate file. If you

have more than one file, pack all files into a .zip file. Dont use the formats (like .rar) that require specific

software to unpack your file.

‚ You will use CVX (for MATLAB) or CVXPY (for Python) which come with inbuilt solvers (optimized

for respective platforms) for convex problems. Since CVX is a software package meant for modeling, you

will only need to formulate the optimization problem in a standard form using MATLAB/Python syntax.

Refer to your syllabus for more details for resources on how to quickly get started with CVX/CVXPY.

List of Topics

1. Topic 1: Blind Deconvolution

References:

1) Ahmed, A. and Recht, B. (2014). “Blind Deconvolution Using Convex Programming” IEEE Transactions

on Signal Processing,vol. 60, no. 3, Mar. 2014..

Blind deconvolution is the problem of recovering two signals given the output of their convolution. The

signals can not be recovered unless certain restrictions are imposed on them. In this project, you will study

and implement convex algorithms for solving the blind deconvolution problem under suitable conditions, and

evaluate their performance guarantees.

– Study the paper by Ahmed, Recht and Romberg to understand the formulation of the blind deconvolution

problem. Why is the solution to the problem non-unique in general? Can you explain what kind of

ambiguities exist? Under what conditions on the signals (sparsity, subspace structure etc), the problem

can permit unique solutions? Explain why.

– Why is the problem non-convex? Explain the main idea in this paper under which the measurements can be

expressed as a linear function of a suitable transformed variable. How is this transformed variable related

to the constituent signals, and given this transformed variable, how can one obtain the constituent signals?

2Does the variable transformation make the problem convex? What additional relaxations are necessary to

cast it as a convex problem?

– What are the theoretical guarantees of the convex algorithm proposed in the paper? Are they deterministic,

or probabilistic? What should be the relation of the number of measurements, sparsity and the dimension

of the subspace, so that exact (or stable, in presence of noise) recovery is possible? Clearly state the

mathematical guarantees and spell out the assumptions.

– Implement the convex algorithm proposed in the paper. You should be able to recreate the figures. Test

the algorithm by generating the phase transition plots which plots the probability of successful recovery

as a function of number of measurements, and the sparsity and/or dimension of the subspace. Do you see

a sharp region around which the probability changes from close to zero, to close to 1? Compare this with

the theoretical guarantees from the paper.

– Compare performance of the blind deconvolution algorithm against non-blind deconvolution (i.e. where

you are given one of the constituent signals). Do a literature survey on alternating minimization techniques

for blind deconvolution (which are non-convex), and implement one algorithm of your choice. Compare

the performance of convex and non-convex blind deconvolution.

– Design numerical experiments to evaluate the performance of the algorithm when one (or more) conditions

(such as sparsity, low rank) are violated. Can you comment on the robustness of the algorithm against

such violations?

– (BONUS): Extend the algorithm to perform blind deconvolution in 2D. One example is image deblurring.

Test the algorithm on an image of your choice (by blurring it with a suitable kernel).

2. Topic 2: Low-Rank Matrix Recovery From Quadratic Measurements (Application in Polynomial Neural

Network)

References

1) Soltani, Mohammadreza, and Chinmay Hegde. “Towards provable learning of polynomial neural networks

using low-rank matrix estimation”. International Conference on Artificial Intelligence and Statistics. 2018.

2) Candes, Emmanuel J., and Yaniv Plan. “Tight oracle inequalities for low-rank matrix recovery from a

minimal number of noisy random measurements.” IEEE Transactions on Information Theory 57.4 (2011):

2342-2359.

3) Recht, B., Fazel, M. and Parrilo, P.A., 2010. “Guaranteed minimum-rank solutions of linear matrix equations

via nuclear norm minimization”. SIAM review, 52(3), pp.471-501.

4) Goldstein, Tom, Christoph Studer, and Richard Baraniuk. “FASTA: A generalized implementation of forward-

backward splitting.” arXiv preprint arXiv:1501.04979 (2015).

In [1], the authors showed that training a neural network with one hidden layer consisting of fewer number of

units compared to the size of input, and with quadratic activation function can be modeled as low-rank matrix

recovery problem with noisy quadratic measurements.

min

LPRpˆp F pLq “

1

2m

mÿ

i“1

pyi ´ xTi Lxiq2

s.t. rankpLq ď r (1)

where tpyi,xiqumi“1,xi P Rp is the input data set, r is the number of hidden units with the activation function

σpzq “ z2. The ground truth matrix L˚ is constructed from the weights in the first and second layers of the

corresponding neural network in the following form:

L˚ “

rÿ

j“1

αjwjwTj

where wj is the weight vector in the jth hidden unit and α P Rr is the weight vector of the second layer.

This optimization problem is not a convex problem (Why?), however, there are approaches which first convert

the original problem to a simpler convex problem and then solve it as a convex optimization problem. The

3authors in [2,3] have proposed a convex relaxation of the low-rank constraint based on the nuclear norm of a

matrix. In particular, consider the following problem:

y “ ApLq ` n (2)

where L P Rpˆp is an unknown matrix, A : Rpˆp Ñ Rm is a linear mapping,

rApLqsi “ xTi Lxi

and n P Rm denotes additive noise. The desired matrix L˚ (which has p2 unknown entries) can be recovered

from (compressive) measurements (with m ! p2) by solving the following convex problem:

min

LPRpˆp }L}˚

s.t. }A˚peq} ď λ

e “ y ´ApLq (3)

where A˚ is the adjoint operator of A,

A˚peq “

mÿ

i“1

eixix

T

i

}.} is the operator norm, }.}˚ is its dual (i.e the nuclear norm), and λ is a constant which is dependent on the

size of L and noise.

In this project, you are expected to do all of the followings (exlcuding BONUS):

– Read the references and explain why solving the problem (1) is equal to training the corresponded network?

Why is it not a convex optimization problem? Why is problem (3) a suitable convex relaxation for problem

(2)?

– Generate a random data set txiumi“1 from a normal distribution (you can choose the mean and the

covariance), and one ground truth network (Based on the given architecture in [1]) which generates

the output yi corresponding to the input xi. Implement Algorithm 1 in [1] and test it on the data set you

have generated. Try data sets with different sizes and different initialization. Can you recover the ground

truth L˚ (or equivalently, the network)? Plot the probability of success by varying the values of m, p and

r, as well as different initializations (you can refer to [1] for some representative plots). Comment on

your results. Add noise to the measurements and try different noise powers. Report where the algorithm

fails and succeeds.

– Formulate the problem of training neural network (or equivalently, finding L˚) as a convex problem,

based on [2] and [3]. Use the data set you generated in the last part and solve this convex problem with

CVX/CVXPY library. You may need to tune λ by trying different values and cross validating against the

training dataset (Read [2] for suitable ways to choose λ). Compare your result with previous solutions

and plots, including the need for initialization. Try different sizes of data sets. How accurate is the convex

relaxation for this problem? How sensitive is it to the size of data set and noise?

– Generate a new data set where each xi can come from two different Gaussian distributions (representing

two classes) with two different means (you can keep the covariance same if you wish). Assign each data

point a binary label based on its class. (Note that there is no ground truth network in this case). Run

the two algorithms that you have implemented in the last two parts (non-convex and convex) to train the

polynomial neural network (which is also equivalent to learning a matrix L). Generate some new test data

points from your classes and evaluate the performance of the learned network on the test dataset. Are

these algorithms able to classify the data? If not, why? Try to justify.

– (BONUS) If you were able to solve a binary classification problem on the random data set in previous

question, try to apply your algorithm on the MNIST dataset (You can restrict your dataset to only 0 and

1 digits). Is there any way to extend this idea for multi-class classification? How?

43. Topic 3: 1-bit Compressed Sensing

References:

1) Plan, Y., and Vershynin, R. (2013) “Robust 1-bit compressed sensing and sparse logistic regression: A convex

programming approach”, IEEE Transactions on Information Theory, 59(1), 482-494.

2) Chen, L., Hassani, S. H., and Karbasi, A. (2017) “Near-Optimal Active Learning of Halfspaces via Query

Synthesis in the Noisy Setting”, In AAAI (pp. 1798-1804).

Compressed Sensing (sparse signal recovery) has been a popular research topic in signal processing since

early 90s. It has many applications in image processing, machine learning, communication, radar, and medical

imaging. The main goal of Compressed Sensing is to recover a sparse signal x (}x}0 ď s) from linear

measurements of the form y “ Ax where A is a measurement matrix. For many practical applications, infinite-

precision linear measurements are not available. Instead, linear measurements are collected with finite precision

(quantization). In the extreme case with 1-bit quantization, the noiseless 1-bit compressed sensing measurements

is modeled as

yi “ signpaTi xq, i “ 1, 2, ¨ ¨ ¨m (4)

In this project, you are expected to do the following:

– Find two different applications of 1-bit Compressive Sensing. Describe why 1-bit is important in those

applications. Examine their noise models and comment on whether you think the noise is modelled

reasonably in each application. Cite references in your report

– Read [1] and compute, by simulation, the Gaussian mean width of the sparse signal set wpSn,sq. Fix s “ 5

and plot wpSn,sq versus n with its upper bound and lower bound given by Lemma 2.3

– Let S1n,s “ tx : x P Sn,s, xi ě 0u, S2n,s “ tx : x P Sn,s, xi “ c or 0u and S3n,s “ tx : x P Sn,s, xi “

˘c or 0u Plot wpS1n,sq, wpS2n,sq and wpS3n,sq with (b). Interpret the result by using Theorem 1.1 in [1]

– Explain why the relaxation from l0-norm to l1-norm is reasonable, by using the Gaussian mean width,

(III.1) and Theorem 1.1

– In [2], the measurement/observation is also t`1,´1u. Compare the problem in [1] and [2] and describe

the mathematical differences between them

– By using the noise model in [2], implement both algorithms in [1] and [2] and compare them numerically

for s “ 1 to s “ n with various n, which one is better and why.

– (BONUS) Develop an algorithm that incorporates the sparse constraint s into the algorithm in [2] and

compare your algorithm with [1].

4. Topic 4: Coordinate Descent, Dykstra’s Algorithm and LASSO

References:

1) R. J. Tibshirani, “Dykstra’s algorithm, ADMM, and Coordinate Descent: Connections, insights, and ex-

tensions”, in Advances in Neural Information Processing Systems (NeurIPS), I. Guyon, U. V. Luxburg, S.

Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc.,

2017.

2) Tseng, Paul, “Convergence of a block coordinate descent method for nondifferentiable minimization”, Journal

of optimization theory and applications 109.3 (2001): 475-494.

3) Hastie, Trevor, Robert Tibshirani, and Martin Wainwright, “Statistical learning with sparsity: the LASSO

and generalizations”. CRC press, 2015.

In this project, you will implement two different algorithms: Dykstra and Block Coordinate descent to solve

certain types of optimization problems. You are expected to implement those algorithms (i.e. write the codes

on your own) on different problems and study their connections. Consider the following two types of problems

(P1) and (P2)

min

β P Rp

1

2

}y ´Xβ}22 `

dÿ

i“1

hipβq (P1)

5where y P Rn,X P Rnˆp, hi : Rp Ñ R

and

min

z P Rn }y ´ z}2 (P2)

s.t. z P C1 X C2 X . . .X Cd

where Ci’s are closed convex sets.

1) Review coordinate-descent algorithm. Briefly describe how and why it works. What are the conditions on

hi for coordinate-descent to succeed on pP1q?

2) Review Dykstra’s algorithm. Describe for what kind of problems it works, and how. How is it different

from alternating projection algorithm? When are they equivalent?

3) What are the relationships between pP1q and pP2q? Under what conditions are they equivalent? When they

are equivalent, how are Dykstra’s algorithm and Coordinate Descent related?

4) Consider the problem pP1q when hi “ λ|βi| and d “ p. This problem is called LASSO. This problem is

often used to recover sparse β˚, |β˚|0 “ s from measurements of the form

y “ Xβ˚ `w

where w denotes additive noise. What is the dual of the LASSO problem? How are the solutions of LASSO

and its dual related?

5) Write a code implementing the coordinate descent algorithm to solve LASSO. What should be the coordinate

descent iterations? Design various experiments (by generating data in suitable ways) for analyzing the

recovery performance of LASSO in different regimes of N, s, p. Analyze both l2 error }β´β˚}2 and support

recovery performance. Clearly describe your settings and experiments. Study the performance convergence

performance of coordinate descent. Compare the speed of your implementation of coordinate descent and

that from CVX for large problem sizes.

6) Solve the dual of LASSO with Dykstra’s algorithm. Verify the relationship between coordinate descent and

Dykstra’s algorithm through your simulations.

7) (BONUS). Choose another problem/application in literature for which you can apply coordinate descent.

Describe coordinate descent iterations. Similarly find and implement another problem which can be solved

using Dykstra.

5. Topic 5: Federated Learning

References:

1) Gafni, Tomer, et al. “Federated Learning: A Signal Processing Perspective”, arXiv preprint arXiv:2103.17150

(2021).

2) T. Li, A. K. Sahu, A. Talwalkar and V. Smith, “Federated Learning: Challenges, Methods, and Future Direc-

tions”, in IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50-60, May 2020, doi: 10.1109/MSP.2020.2975749.

3) Yin, Dong, Yudong Chen,Kannan Ramchandran and Peter Bartlett. “Byzantine-robust distributed learning:

Towards optimal statistical rates”, International Conference on Machine Learning. PMLR, 2018.

4) L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients”, in Advances in Neural Information Processing

Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch-Buc, E. Fox,and R. Garnett, Eds., vol.

32.Curran Associates, Inc., 2019.

In this project, you will study various task pertaining to Federated Learning. Federated Learning (FL) is a

distributed Machine Learning scheme where multiple users train a model using only local (and possibly private)

data. Then each user shares its trained model with a central server. The server aggregates the model of each

user, judiciously generates one global model and sends that model back to the user devices. You can find more

information about federated learning in references [1] and [2].

1) Find two applications of Federated Learning in literature and include them in your report. Describe a simple

Federative Learning model. Describe how each edge device trains their model, what update they send to the

central server and how central device aggregates them.

62) Implement a simple Federated Learning task of your choice using Fedarative Averaging Scheme from [1].

Describe your data set, loss function etc. Report your results. How quickly does the algorithm converge?

Suppose now you can train the same model using a centralized learning. How quick is it compared to

federated learning?

3) The success of Federated Learning depends on successful participation of edge users in process. Now suppose

a percentage of edge users sends wrong information to the central server during training as described in

[3]. Implement algorithms described in [3] to adapt your federative learning task to this challenge. Conduct

simulations for different percentages of corrupt users and report your results.

4) Even though one of the purposes of Federated Learning is to ensure user privacy, it is possible to infer user

data from their updates to the central server as described in [4]. Implement the algorithm described in [4]

to learn user data from their gradient updates. Do multiple experiments and report your results.

5) (BONUS) Peform literature review on Federated Learning and identify one more associated challenge related

to Federated Learning and discuss possible solutions for that challenge as reported in literature. Implement

the problem that you found interesting and report your results.

学霸联盟