CSCI 4150 Spring 2025 Introduction to Artificial Intelligence Homework 5 Machine Learning Note: This homework contains typical exam-level questions. During the exam, you would be under time pressure and have to complete the questions on your own. Therefore, we strongly encourage you to first try this on your own to help you understand where you cur- rently stand. Then, feel free to discuss the questions with other students and/or staff (during office hours) before independently writing up your solution. Hint: Make sure to show all your work and justify your answers. Many questions offer partial credit, so showing your work is important. Submission: Your submission on Submitty should be a PDF with your answers. You can write your answers on regular paper, scan them, and upload the scans as a single PDF. If you prefer a digital workflow, you’re welcome to copy the questions into your own LaTeX file or any other format you’re comfortable with. Whichever option you take, please ensure the questions are answered in the correct order and numbered appropriately. Your final submission should be a single PDF file. Policy: Submitting work that is misrepresented as entirely your own is a violation of course policy. If you discuss homework questions with other students, you must list them as collab- orators in the declarations section below. While you may use generative AI tools to deepen your understanding of the topics covered in the homework, we strictly prohibit submitting solutions that are direct outputs from such tools. Therefore, any use of generative AI tools must also be declared. Additionally, include an appendix at the end of your submission that documents the full exchange with the AI tool, detailing all prompts and responses related to this homework. Failure to provide this information may result in academic integrity viola- tions. Remember: relying on these AI tools without first trying the questions independently will only hurt you during the exams, which account for a high percentage of your grade for this course. First Name Last Name RPI Email Address RIN Declarations 1 CSCI 4150 Spring 2025 Introduction to Artificial Intelligence Homework 5 Machine Learning 1 Spam Detection using Naive Bayes (20 points) Naive Bayes is a common probabilistic algorithm for text classification tasks such as spam filtering. In this problem, you will use a small dataset of labeled messages to determine whether a new, unlabeled message is Spam or Ham. Message Label Text M1 Spam "send us your password" M2 Spam "review us" M3 Spam "send us your account" M4 Spam "send your password" M5 Ham "password review" M6 Ham "send us your review" (a) [10 points] You receive the new message "review account". Using a Naive Bayes classifier with the Bag of Words approach, determine whether this message should be classified as Spam or Ham. In your solution, explicitly show each step in your process, including: • Step 1: Identifying the vocabulary from the training messages, • Step 2: Computing word frequencies per label (Spam/Ham), • Step 3: Calculating conditional probabilities for each word, • Step 4: Determining the posterior probabilities for the new message, • Step 5: Making your final classification decision. (b) [10 points] Now repeat the Spam/Ham classification for the "review account" mes- sage using Laplace smoothing with k = 1. Follow the same general procedure as in the previous question, but be sure to determine all unique words in the vocabulary V and calculate the corresponding smoothed conditional probabilities. 2 CSCI 4150 Spring 2025 Introduction to Artificial Intelligence Homework 5 Machine Learning 2 Perceptron (30 points) Perceptron is one of the fundamental building blocks of neural networks. For this question, you will work with a dataset representing the logical AND function and observe how a perceptron can learn this function through iterative weight updates. Dataset Description The dataset consists of binary inputs and outputs representing the logical AND function: x1 x2 y 0 0 0 0 1 0 1 0 0 1 1 1 Where: • x1 and x2 are binary input features (0 or 1), • y is the target output (0 or 1), representing the logical AND operation. Parameters of the perceptron: • Initial weights: w1 = 0.3, w2 = −0.1 • Learning rate: α = 0.1 • Threshold: Th = 0.2 • Activation function (the perceptron computes the output as follows): y = { 1, if w1x1 + w2x2 ≥ Th 0, otherwise 2.1 Weight Update Analysis (a) [3 points] After processing the entire dataset for one epoch (one pass over all training examples), calculate the perceptron’s prediction (0, 1) for each input sample (x1, x2). Show how you computed each prediction using the current weights and threshold. (b) [2 points] Report the updated weights (w1, w2) at the end of this first epoch. Briefly show how each update was performed based on any misclassifications. 3 CSCI 4150 Spring 2025 Introduction to Artificial Intelligence Homework 5 Machine Learning 2.2 Convergence Analysis Continue training the perceptron for additional epochs, up- dating the weights on any misclassified examples, until convergence—an epoch in which no weights are updated because all examples are correctly classified. (a) [10 points] How many epochs are required for the perceptron to converge? Please show all your work for each of the epochs. (b) [2 points] What are the final weights after convergence? (b) [3 points] Provide a brief explanation of why these weights correctly implement the AND function. 2.3 Perceptron Behavior and Initialization Effects: Let’s investigate how the percep- tron responds to new inputs and how different weight initializations might affect convergence. (a) [5 points] Suppose you introduce a new data point (x1, x2) = (1,−1), which does not align with the logical AND function’s domain (where inputs are 0 or 1). Using your converged weights and threshold from Question 2.2, what output does the perceptron produce for this new input? (b) [5 points] If the weights were instead randomly initialized from a standard normal distribution N (0, 1), would the perceptron still converge when trained on the AND dataset? Explain your reasoning based on the linear separability of the AND function. 4 CSCI 4150 Spring 2025 Introduction to Artificial Intelligence Homework 5 Machine Learning 3 Multi-Layer-Perceptrons / Neural Networks (50 points) 3.1 Neural Network Computation Graph (38 points) Consider the following computa- tion graph for a simple neural network for binary classification. Here, x is a single real-valued input feature with an associated class y∗ (0 or 1). There are two weight parameters w1 and w2, and non-linearity functions g1 and g2 (to be defined later, below). Linear combinations are represented as zi, and activations as ai for each layer i. The network will output a value a2 between 0 and 1, representing the probability of being in class 1. We will be using a loss function Loss (to be defined later, below), to compare the prediction a2 with the true class y∗. Figure 1: Neural Network Computation Graph for Question 3. (a) [4 points] Perform the forward pass on this network, writing the output values for each node z1, a1, z2, and a2 in terms of the node’s input values. (b) [5 points] Compute the loss Loss(a2, y ∗) in terms of the input x, weights wi, and activation functions gi. (c) [5 points] Now we will work through parts of the backward pass, incrementally. Use the chain rule to derive ∂Loss ∂w2 . Write your expression as a product of partial derivatives at each node: i.e., the partial derivative of the node’s output with respect to its inputs. (Hint: the series of expressions you wrote for Question 3.1(a) will be helpful; you may use any of those variables.) (d) [8 points] Suppose the loss function is quadratic, Loss(a2, y ∗) = 1 2 (a2 − y∗)2, and g1 and g2 are both sigmoid functions g(z) = 1 1+e−z . Using the chain rule from Question 3.1(c), and the fact that ∂g(z) ∂z = g(z)(1− g(z)) for the sigmoid function, write ∂Loss ∂w2 in terms of the values from the forward pass, y∗, a1, and a2. (e) [4 points] Now use the chain rule to derive ∂Loss ∂w1 as a product of partial derivatives at each node used in the chain rule: (f) [8 points] Write ∂Loss ∂w1 in terms of x, y∗, wi, ai, zi. (g) [4 points] What is the gradient descent update for w1 with learning rate (step-size) α in terms of the values computed above? 5 CSCI 4150 Spring 2025 Introduction to Artificial Intelligence Homework 5 Machine Learning 3.2 Neural Network Representation (12 points) In this question, you will analyze the expressiveness of simple neural network architectures in approximating different piecewise- linear functions. The networks shown in Figure 2 fall into two families: • Gi: These use only scalar (1-dimensional) intermediate values. This means the input is processed through a series of scalar operations like multiplication, addition, and ReLU, one at a time. • Hi: These use 2-dimensional intermediate representations (e.g., vectors or matrices), allowing more complex transformations such as parallel ReLU activations and multiple weighted combinations before producing the scalar output. In the diagrams: • Circles labeled with * represent multiplication (linear transformation), • Circles labeled with + represent bias addition, • Circles labeled with relu represent the element-wise ReLU nonlinearity: relu(z) = max(0, z). Figure 2: Neural Network Representions for Question 3.2 Below are four plots (1–4), each representing a target function over the domain x ∈ (−∞,∞). Your task: For each plot, determine which of the networks (G1, G2, G3, H1, H2, H3) can represent the function exactly. If none of the networks can represent the function, write none. Briefly justify your choice. 6 CSCI 4150 Spring 2025 Introduction to Artificial Intelligence Homework 5 Machine Learning (a) Plot 1 [3 points] (b) Plot 2 [3 points] (c) Plot 3 [3 points] (d) Plot 4 [3 points] 7
学霸联盟