CSCI 4150 -无代写|学霸联盟

CSCI 4150 -无代写

时间：2025-04-11

CSCI 4150
Spring 2025
Introduction to
Artificial Intelligence
Homework 5
Machine Learning
Note: This homework contains typical exam-level questions. During the exam, you would
be under time pressure and have to complete the questions on your own. Therefore, we
strongly encourage you to first try this on your own to help you understand where you cur-
rently stand. Then, feel free to discuss the questions with other students and/or staff (during
office hours) before independently writing up your solution.
Hint: Make sure to show all your work and justify your answers. Many questions offer
partial credit, so showing your work is important.
Submission: Your submission on Submitty should be a PDF with your answers. You can
write your answers on regular paper, scan them, and upload the scans as a single PDF. If
you prefer a digital workflow, you’re welcome to copy the questions into your own LaTeX
file or any other format you’re comfortable with. Whichever option you take, please ensure
the questions are answered in the correct order and numbered appropriately. Your final
submission should be a single PDF file.
Policy: Submitting work that is misrepresented as entirely your own is a violation of course
policy. If you discuss homework questions with other students, you must list them as collab-
orators in the declarations section below. While you may use generative AI tools to deepen
your understanding of the topics covered in the homework, we strictly prohibit submitting
solutions that are direct outputs from such tools. Therefore, any use of generative AI tools
must also be declared. Additionally, include an appendix at the end of your submission that
documents the full exchange with the AI tool, detailing all prompts and responses related to
this homework. Failure to provide this information may result in academic integrity viola-
tions. Remember: relying on these AI tools without first trying the questions independently
will only hurt you during the exams, which account for a high percentage of your grade for
this course.
First Name
Last Name
RPI Email Address
RIN
Declarations
1
CSCI 4150
Spring 2025
Introduction to
Artificial Intelligence
Homework 5
Machine Learning
1 Spam Detection using Naive Bayes (20 points)
Naive Bayes is a common probabilistic algorithm for text classification tasks such as spam
filtering. In this problem, you will use a small dataset of labeled messages to determine
whether a new, unlabeled message is Spam or Ham.
Message Label Text
M1 Spam "send us your password"
M2 Spam "review us"
M3 Spam "send us your account"
M4 Spam "send your password"
M5 Ham "password review"
M6 Ham "send us your review"
(a) [10 points] You receive the new message "review account". Using a Naive Bayes
classifier with the Bag of Words approach, determine whether this message should be
classified as Spam or Ham.
In your solution, explicitly show each step in your process, including:
• Step 1: Identifying the vocabulary from the training messages,
• Step 2: Computing word frequencies per label (Spam/Ham),
• Step 3: Calculating conditional probabilities for each word,
• Step 4: Determining the posterior probabilities for the new message,
• Step 5: Making your final classification decision.
(b) [10 points] Now repeat the Spam/Ham classification for the "review account" mes-
sage using Laplace smoothing with k = 1. Follow the same general procedure as in the
previous question, but be sure to determine all unique words in the vocabulary V and
calculate the corresponding smoothed conditional probabilities.
2
CSCI 4150
Spring 2025
Introduction to
Artificial Intelligence
Homework 5
Machine Learning
2 Perceptron (30 points)
Perceptron is one of the fundamental building blocks of neural networks. For this question,
you will work with a dataset representing the logical AND function and observe how a
perceptron can learn this function through iterative weight updates.
Dataset Description The dataset consists of binary inputs and outputs representing the
logical AND function:
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
Where:
• x1 and x2 are binary input features (0 or 1),
• y is the target output (0 or 1), representing the logical AND operation.
Parameters of the perceptron:
• Initial weights: w1 = 0.3, w2 = −0.1
• Learning rate: α = 0.1
• Threshold: Th = 0.2
• Activation function (the perceptron computes the output as follows):
y =
{
1, if w1x1 + w2x2 ≥ Th
0, otherwise
2.1 Weight Update Analysis
(a) [3 points] After processing the entire dataset for one epoch (one pass over all training
examples), calculate the perceptron’s prediction (0, 1) for each input sample (x1, x2).
Show how you computed each prediction using the current weights and threshold.
(b) [2 points] Report the updated weights (w1, w2) at the end of this first epoch. Briefly
show how each update was performed based on any misclassifications.
3
CSCI 4150
Spring 2025
Introduction to
Artificial Intelligence
Homework 5
Machine Learning
2.2 Convergence Analysis Continue training the perceptron for additional epochs, up-
dating the weights on any misclassified examples, until convergence—an epoch in which no
weights are updated because all examples are correctly classified.
(a) [10 points] How many epochs are required for the perceptron to converge? Please
show all your work for each of the epochs.
(b) [2 points] What are the final weights after convergence?
(b) [3 points] Provide a brief explanation of why these weights correctly implement the
AND function.
2.3 Perceptron Behavior and Initialization Effects: Let’s investigate how the percep-
tron responds to new inputs and how different weight initializations might affect convergence.
(a) [5 points] Suppose you introduce a new data point (x1, x2) = (1,−1), which does not
align with the logical AND function’s domain (where inputs are 0 or 1). Using your
converged weights and threshold from Question 2.2, what output does the perceptron
produce for this new input?
(b) [5 points] If the weights were instead randomly initialized from a standard normal
distribution N (0, 1), would the perceptron still converge when trained on the AND
dataset? Explain your reasoning based on the linear separability of the AND function.
4
CSCI 4150
Spring 2025
Introduction to
Artificial Intelligence
Homework 5
Machine Learning
3 Multi-Layer-Perceptrons / Neural Networks (50 points)
3.1 Neural Network Computation Graph (38 points) Consider the following computa-
tion graph for a simple neural network for binary classification. Here, x is a single real-valued
input feature with an associated class y∗ (0 or 1). There are two weight parameters w1 and
w2, and non-linearity functions g1 and g2 (to be defined later, below). Linear combinations
are represented as zi, and activations as ai for each layer i. The network will output a value
a2 between 0 and 1, representing the probability of being in class 1. We will be using a loss
function Loss (to be defined later, below), to compare the prediction a2 with the true class
y∗.
Figure 1: Neural Network Computation Graph for Question 3.
(a) [4 points] Perform the forward pass on this network, writing the output values for
each node z1, a1, z2, and a2 in terms of the node’s input values.
(b) [5 points] Compute the loss Loss(a2, y
∗) in terms of the input x, weights wi, and
activation functions gi.
(c) [5 points] Now we will work through parts of the backward pass, incrementally. Use
the chain rule to derive ∂Loss
∂w2
. Write your expression as a product of partial derivatives
at each node: i.e., the partial derivative of the node’s output with respect to its inputs.
(Hint: the series of expressions you wrote for Question 3.1(a) will be helpful; you may
use any of those variables.)
(d) [8 points] Suppose the loss function is quadratic, Loss(a2, y
∗) = 1
2
(a2 − y∗)2, and g1
and g2 are both sigmoid functions g(z) =
1
1+e−z . Using the chain rule from Question
3.1(c), and the fact that ∂g(z)
∂z
= g(z)(1− g(z)) for the sigmoid function, write ∂Loss
∂w2
in
terms of the values from the forward pass, y∗, a1, and a2.
(e) [4 points] Now use the chain rule to derive ∂Loss
∂w1
as a product of partial derivatives at
each node used in the chain rule:
(f) [8 points] Write ∂Loss
∂w1
in terms of x, y∗, wi, ai, zi.
(g) [4 points] What is the gradient descent update for w1 with learning rate (step-size)
α in terms of the values computed above?
5
CSCI 4150
Spring 2025
Introduction to
Artificial Intelligence
Homework 5
Machine Learning
3.2 Neural Network Representation (12 points) In this question, you will analyze the
expressiveness of simple neural network architectures in approximating different piecewise-
linear functions. The networks shown in Figure 2 fall into two families:
• Gi: These use only scalar (1-dimensional) intermediate values. This means the input is
processed through a series of scalar operations like multiplication, addition, and ReLU,
one at a time.
• Hi: These use 2-dimensional intermediate representations (e.g., vectors or matrices),
allowing more complex transformations such as parallel ReLU activations and multiple
weighted combinations before producing the scalar output.
In the diagrams:
• Circles labeled with * represent multiplication (linear transformation),
• Circles labeled with + represent bias addition,
• Circles labeled with relu represent the element-wise ReLU nonlinearity:
relu(z) = max(0, z).
Figure 2: Neural Network Representions for Question 3.2
Below are four plots (1–4), each representing a target function over the domain x ∈ (−∞,∞).
Your task: For each plot, determine which of the networks (G1, G2, G3, H1, H2, H3) can
represent the function exactly. If none of the networks can represent the function, write
none. Briefly justify your choice.
6
CSCI 4150
Spring 2025
Introduction to
Artificial Intelligence
Homework 5
Machine Learning
(a) Plot 1 [3 points] (b) Plot 2 [3 points]
(c) Plot 3 [3 points] (d) Plot 4 [3 points]
7

学霸联盟