xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

扫码添加客服微信

扫描添加客服微信

Python代写|机器学习 - COMP9418 Assignment 1

时间：2020-10-12

Assignment 1

COMP9418 – Advanced Topics in Statistical Machine Learning

Lecturer: Gustavo Batista

Last revision: Thursday 1st October, 2020 at 13:57

Instructions

Submission deadline: Sunday, 18th October 2020, at 18:00:00.

Late Submission Policy: The penalty is set at 20% per late day. This is ceiling penalty, so if a group is

marked 60/100 and they submitted two days late, they still get 60/100.

Form of Submission: This is a group assignment. Each group can have up to two students. Write the

names and zIDs of each student in the Jupyter notebook. Only one member of the group should

submit the assignment.

The group should submit the solution in one single file in zip format with the name solution.zip. There is

a maximum file size cap of 5MB, so make sure your submission does not exceed this size. The zip file should

contain one Jupyter notebook file. The Jupyter notebook should have all your source code. Use markdown

text to organise and explain your implementation and findings.

You are allowed to use any Python library used in the tutorial notebooks. No other library will be accepted,

particularly libraries for graph and Bayesian network representation and operation. Also, you can reuse any

piece of source code developed in the tutorials.

Submit your files using give. On a CSE Linux machine, type the following on the command-line:

$ give cs9418 ass1 solution.zip

Alternative, you can submit your solution via the WebCMS.

Recall the guidance regarding plagiarism in the course introduction: this applies to this assignment, and if

evidence of plagiarism is detected, it will result in penalties ranging from loss of marks to suspension.

The dataset and breast cancer domain description in the Background section are from the assignment

developed by Peter Lucas, Institute for Computing and Information Sciences, Radboud Universiteit.

Introduction

In this assignment, you will develop some sub-routines in Python to implement operations on Bayesian

Networks. You will code an efficient independence test, learn parameters from complete data, and classify

examples.

We will use a Bayesian Network for diagnosis of breast cancer. We start with some background information

about the problem.

1

Background

Breast cancer is the most common form of cancer and the second leading cause of cancer death in women.

Every 1 out of 9 women will develop breast cancer in her lifetime. Although it is not possible to say what

exactly causes breast cancer, some factors may increase or change the risk for the development of breast

cancer. These include age, genetic predisposition, history of breast cancer, breast density and lifestyle factors.

Age, for example, is the most significant risk factor for non-hereditary breast cancer: women with age of 50

or older have a higher chance of developing breast cancer than younger women. Presence of BRCA1/2 genes

leads to an increased risk of developing breast cancer irrespective of other risk factors. Furthermore, breast

characteristics, such as high breast density are determining factors for breast cancer.

The primary technique used currently for detection of breast cancer is mammography, an X-ray image of the

breast. It is based on the differential absorption of X-rays between the various tissue components of the breast

such as fat, connective tissue, tumour tissue and calcifications. On a mammogram, radiologists can recognise

breast cancer by the presence of a focal mass, architectural distortion or microcalcifications. Masses are

localised findings, generally asymmetrical to the other breast, distinct from the surrounding tissues. Masses

on a mammogram are characterised by several features, which help distinguish between malignant and benign

(non-cancerous) masses, such as size, margin, shape. For example, a mass with irregular shape and ill-defined

margin is highly suspicious for cancer, whereas a mass with round shape and well-defined margin is likely to

be benign. Architectural distortion is focal disruption of the normal breast tissue pattern, which appears on

a mammogram as a distortion in which surrounding breast tissues appear to be “pulled inward” into a focal

point, often leading to spiculation (star-like structures). Microcalcifications are tiny bits of calcium, which

may show up in clusters, or in patterns (like circles or lines) and are associated with extra cell activity in

breast tissue. They can also be benign or malignant. It is also known that most of the cancers are located in

the upper outer quadrant of the breast. Finally, breast cancer is characterised by several physical symptoms:

nipple discharge, skin retraction, palpable lump.

Breast cancer develops in stages. The early stage is referred to as in situ (“in place”), meaning that cancer

remains confined to its original location. When it has invaded the surrounding fatty tissue and possibly has

spread to other organs or the lymph, so-called metastasis, it is referred to as invasive cancer. It is known that

early detection of breast cancer can help improve the survival rates.

2

[20 Marks] Task 1 – Efficient d-separation test

In this part of the assignment, you will implement an efficient version of the d-separation algorithm. Let us

start with a definition for d-separation:

Definition. Let X, Y and Z be disjoint sets of nodes in a DAG G. We will say that X and Y are d-separated

by Z, written dsep(X,Z,Y), iff every path between a node in X and a node in Y is blocked by Z where a

path is blocked by Z iff there is at least one inactive triple on the path.

This definition of d-separation considers all paths connecting a node in X with a node in Y. The number of

such paths can be exponential. The following algorithm provides a more efficient implementation of the test

that does not require enumerating all paths.

Algorithm. Testing whether X and Y are d-separated by Z in a DAG G is equivalent to testing whether X

and Y are disconnected in a new DAG G′, which is obtained by pruning DAG G as follows:

1. We delete any leaf node W from DAG G as long as W does not belong to X ∪ Y ∪ Z. This process is

repeated until no more nodes can be deleted.

2. We delete all edges outgoing from nodes in Z.

Implement the efficient version of the d-separation algorithm in a function d_separation(G,X,Z,Y) that

return a boolean: true if X is d-separated from Y given Z and false otherwise.

[10 Marks] Task 2 – Estimate Bayesian Network parameters from

data

Estimating the parameters of a Bayesian Network is a relatively simple task if we have complete data. The

file bc.csv has 20,000 complete instances, i.e., without missing values. The task is to estimate and store the

conditional probability tables for each node of the graph. As we will see in more details in the Naive Bayes

and Bayesian Network learning lectures, the Maximum Likelihood Estimate (MLE) for those probabilities are

simply the empirical probabilities (counts) obtained from data.

Implement a function learn_outcome_space(data) that learns the outcome space (the valid values for each

variable) from the pandas dataframe data and returns a dictionary outcomeSpace with these values.

Implement a function learn_bayes_net(G, data, outcomeSpace) that learns the parameters of the

Bayesian Network G. This function should return a dictionary prob_tables with the all conditional

probability tables (one for each node).

[20 Marks] Task 3 – Bayesian Network Classification

This particular Bayesian Network has a variable that plays a central role in the analysis. The variable BC

(Breast Cancer) can assume the values No, Invasive and InSitu. Accurately identifying its correct value

would lead to an automatic system that could help in early breast cancer diagnosis.

First, remove the variables metastasis and lymphnodes since these two variables can be understood as

pieces of information derived from BC and they may not be available at the point when BC is classified.

Use the Bayesian Network to classify cases of the dataset. First, use 10-fold cross-validation to split the

dataset into training and test sets. Use the function learn_bayes_net(G, data, outcomeSpace) to learn

the Bayesian network parameters from the training set.

Design a new function assess_bayes_net(G, prob_tables, data, outcomeSpace, class_var) that uses

the test cases in data to assess the performance of the Bayesian network. Implement the efficient classification

procedure discussed in the lectures. Such a function should return the classifier accuracy. Compute and

report the average accuracy over the ten cross-validation runs as well as the standard deviation.

3

[10 Marks] Task 4 – Naïve Bayes Classification

Implement a Naïve Bayes classifier. Design a new function assess_naive_bayes(G, prob_tables, data,

outcomeSpace, class_var) to classify the cases in data using the log probability trick discussed in the

lectures. Do 10-fold cross-validation, same as above, and return accuracy and standard deviation. Since

the Naïve Bayes classifier is essentially a Bayesian network, you can call the function learn_bayes_net(G,

data, outcomeSpace) to learn the Naïve Bayes parameters from a training set.

[20 Marks] Task 5 – Tree-augmented Naïve Bayes Classification

Similarly to the previous task, implement a Tree-augmented Naïve Bayes (TAN) classifier and evaluate your im-

plementation in the breast cancer dataset. Design a function learn_tan_structure(data, outcomeSpace,

class_var) to learn the TAN structure (graph) from data and returns such a structure.

Since the TAN classifier is also a Bayesian network, you can use the function learn_bayes_net(G, data,

outcomeSpace) to learn the TAN parameters from a training set.

You can also use the previous designed function assess_bayes_net(G, prob_tables, data,

outcomeSpace, class_var) to classify and assess the test cases in data and measure the classifier

accuracy.

[20 Marks] Task 6 – Report

Write a report (with less than 500 words) summarising your findings in this assignment. Your report

should address the following:

a. Make a summary and discussion of the experimental results (accuracy). Use plots to illustrate your

results.

b. Discuss the complexity of the implemented algorithms.

Use Markdown and Latex to write your report in the Jupyter notebook. Develop some plots using Matplotlib

to illustrate your results. Be mindful of the maximum number of words. Please, be concise and objective.

4

- 留学生代写
- Python代写
- Java代写
- c/c++代写
- 数据库代写
- 算法代写
- 机器学习代写
- 数据挖掘代写
- 数据分析代写
- Android代写
- html代写
- 计算机网络代写
- 操作系统代写
- 计算机体系结构代写
- R代写
- 数学代写
- 金融作业代写
- 微观经济学代写
- 会计代写
- 统计代写
- 生物代写
- 物理代写
- 机械代写
- Assignment代写
- sql数据库代写
- analysis代写
- Haskell代写
- Linux代写
- Shell代写
- Diode Ideality Factor代写
- 宏观经济学代写
- 经济代写
- 计量经济代写
- math代写
- 金融统计代写
- 经济统计代写
- 概率论代写
- 代数代写
- 工程作业代写
- Databases代写
- 逻辑代写
- JavaScript代写
- Matlab代写
- Unity代写
- BigDate大数据代写
- 汇编代写
- stat代写
- scala代写
- OpenGL代写
- CS代写
- 程序代写
- 简答代写
- Excel代写
- Logisim代写
- 代码代写
- 手写题代写
- 电子工程代写
- 判断代写
- 论文代写
- stata代写
- witness代写
- statscloud代写
- 证明代写
- 非欧几何代写
- 理论代写
- http代写
- MySQL代写
- PHP代写
- 计算代写
- 考试代写
- 博弈论代写
- 英语代写
- essay代写
- 不限代写
- lingo代写
- 线性代数代写
- 文本处理代写
- 商科代写
- visual studio代写
- 光谱分析代写
- report代写
- GCP代写
- 无代写
- 电力系统代写
- refinitiv eikon代写
- 运筹学代写
- simulink代写
- 单片机代写
- GAMS代写
- 人力资源代写
- 报告代写
- SQLAlchemy代写
- Stufio代写
- sklearn代写
- 计算机架构代写
- 贝叶斯代写
- 以太坊代写
- 计算证明代写
- prolog代写
- 交互设计代写
- mips代写
- css代写
- 云计算代写
- dafny代写
- quiz考试代写
- js代写
- 密码学代写
- ml代写
- 水利工程基础代写
- 经济管理代写
- Rmarkdown代写
- 电路代写
- 质量管理画图代写
- sas代写
- 金融数学代写
- processing代写
- 预测分析代写
- 机械力学代写
- vhdl代写
- solidworks代写
- 不涉及代写
- 计算分析代写
- Netlogo代写
- openbugs代写
- 土木代写
- 国际金融专题代写
- 离散数学代写
- openssl代写
- 化学材料代写
- eview代写
- nlp代写
- Assembly language代写
- gproms代写
- studio代写
- robot analyse代写
- pytorch代写
- 证明题代写
- latex代写
- coq代写
- 市场营销论文代写
- 人力资论文代写
- weka代写
- 英文代写
- Minitab代写
- 航空代写
- webots代写
- Advanced Management Accounting代写
- Lunix代写
- 云基础代写
- 有限状态过程代写
- aws代写
- AI代写
- 图灵机代写
- Sociology代写
- 分析代写
- 经济开发代写
- Data代写
- jupyter代写
- 通信考试代写
- 网络安全代写
- 固体力学代写
- spss代写
- 无编程代写
- react代写
- Ocaml代写
- 期货期权代写
- Scheme代写
- 数学统计代写
- 信息安全代写
- Bloomberg代写
- 残疾与创新设计代写
- 历史代写
- 理论题代写
- cpu代写
- 计量代写
- Xpress-IVE代写
- 微积分代写
- 材料学代写
- 代写
- 会计信息系统代写
- 凸优化代写
- 投资代写
- F#代写
- C#代写
- arm代写
- 伪代码代写
- 白话代写
- IC集成电路代写
- reasoning代写
- agents代写
- 精算代写
- opencl代写
- Perl代写
- 图像处理代写
- 工程电磁场代写
- 时间序列代写
- 数据结构算法代写
- 网络基础代写
- 画图代写
- Marie代写
- ASP代写
- EViews代写
- Interval Temporal Logic代写
- ccgarch代写
- rmgarch代写
- jmp代写
- 选择填空代写
- mathematics代写
- winbugs代写
- maya代写
- Directx代写
- PPT代写
- 可视化代写
- 工程材料代写
- 环境代写
- abaqus代写
- 投资组合代写
- 选择题代写
- openmp.c代写
- cuda.cu代写
- 传感器基础代写
- 区块链比特币代写
- 土壤固结代写
- 电气代写
- 电子设计代写
- 主观题代写
- 金融微积代写
- ajax代写
- Risk theory代写
- tcp代写
- tableau代写
- mylab代写
- research paper代写
- 手写代写
- 管理代写
- paper代写
- 毕设代写
- 衍生品代写
- 学术论文代写
- 计算画图代写
- SPIM汇编代写
- 演讲稿代写
- 金融实证代写
- 环境化学代写
- 通信代写
- 股权市场代写
- 计算机逻辑代写
- Microsoft Visio代写
- 业务流程管理代写
- Spark代写
- USYD代写
- 数值分析代写
- 有限元代写
- 抽代代写
- 不限定代写
- IOS代写
- scikit-learn代写
- ts angular代写
- sml代写
- 管理决策分析代写
- vba代写
- 墨大代写
- erlang代写
- Azure代写
- 粒子物理代写
- 编译器代写
- socket代写
- 商业分析代写
- 财务报表分析代写
- Machine Learning代写
- 国际贸易代写
- code代写
- 流体力学代写
- 辅导代写
- 设计代写
- marketing代写
- web代写
- 计算机代写
- verilog代写
- 心理学代写
- 线性回归代写
- 高级数据分析代写
- clingo代写
- Mplab代写
- coventorware代写
- creo代写
- nosql代写
- 供应链代写
- uml代写
- 数字业务技术代写
- 数字业务管理代写
- 结构分析代写
- tf-idf代写
- 地理代写
- financial modeling代写
- quantlib代写
- 电力电子元件代写
- atenda 2D代写
- 宏观代写
- 媒体代写
- 政治代写
- 化学代写
- 随机过程代写
- self attension算法代写
- arm assembly代写
- wireshark代写
- openCV代写
- Uncertainty Quantificatio代写
- prolong代写
- IPYthon代写
- Digital system design 代写
- julia代写
- Advanced Geotechnical Engineering代写
- 回答问题代写
- junit代写
- solidty代写
- maple代写
- 光电技术代写
- 网页代写
- 网络分析代写
- ENVI代写
- gimp代写
- sfml代写
- 社会学代写
- simulationX solidwork代写
- unity 3D代写
- ansys代写
- react native代写
- Alloy代写
- Applied Matrix代写
- JMP PRO代写
- 微观代写
- 人类健康代写
- 市场代写
- proposal代写
- 软件代写
- 信息检索代写
- 商法代写
- 信号代写
- pycharm代写
- 金融风险管理代写
- 数据可视化代写
- fashion代写
- 加拿大代写
- 经济学代写
- Behavioural Finance代写
- cytoscape代写
- 推荐代写
- 金融经济代写
- optimization代写
- alteryxy代写
- tabluea代写
- sas viya代写
- ads代写
- 实时系统代写
- 药剂学代写
- os代写
- Mathematica代写
- Xcode代写
- Swift代写
- rattle代写
- 人工智能代写
- 流体代写
- 结构力学代写
- Communications代写
- 动物学代写
- 问答代写
- MiKTEX代写
- 图论代写
- 数据科学代写
- 计算机安全代写
- 日本历史代写
- gis代写
- rs代写
- 语言代写
- 电学代写
- flutter代写
- drat代写
- 澳洲代写
- 医药代写
- ox代写
- 营销代写
- pddl代写
- 工程项目代写