程序代写案例-COMP5046
时间:2022-06-06
Exam Paper
Semester 1, 2021
COMP5046 Natural Language Processing
*This is a take-home exam with 5 questions.
*The final exam will be an open-book, unsupervised exam.
*Your answers MUST be submitted in PDF format
NOTE: You do not need to put the questions in your submission.
Question 1. Word Representation
Q1-1. In the TF-IDF model, discuss the reason for including the IDF factor, instead of using
just TF with concrete examples. (4 marks)
Q1-2. Explain the difference between Word2Vec and ELMO with examples. (4 marks)
Question 2. Syntactic Text Analysis
Q2-1. The following diagram shows the Markov model for Part-of-Speech (POS) with
transition probability, which was calculated based on a given corpus.
Suppose that you need to evaluate the phrase ‘the happy boy’. Unfortunately, you do not
know any token’s part-of-speech tag, except the given ‘the (DT)’. Given the tag ‘the (DT)’,
specify all possible tag sequences and explain which tag sequence is most likely. (4 marks)
Q2-2. Most graph-based dependency parsing approaches have high accuracy. However,
transition-based approaches are more popular. Give one disadvantage of Graph-based
approaches over transition-based approaches with concrete examples (4 marks)
Question 3. Attention and Transformer
Q3-1. Explain the difference between Global and Local Attention with examples. (4 marks)
Q3-2. Explain the motivation of a positional encoding as used in a Transformer with
examples. (4 marks)
Question 4. Recurrent Neural Networks
Q4-1. Explain the intuition of using Sigmoid and Tanh in LSTM with examples. (4 marks)
Q4-2. You implemented a chatbot with RNN model using encoder-decoder architecture.
During the training, you use a technique called teacher forcing. Describe the aim of teacher
forcing technique and how it is used during training an RNN-based chatbot model. (4 marks)
Question 5. Essay Question
Q5-1. Your client, Google, asks you to develop a question answering system capable of
providing answers to both very specific and more general questions by their users. Google
keeps various logs for users and wants to automatically answer specific user's questions (for
which the answer can be generated from the logs) such as:
● “Which Google application is the most often used?”
● “Which Keyword is the most often searched?”
● “How many unread emails in my mailbox?”
Additionally, the company wants to automatically provide answers to more general questions
(by searching the web), such as following:
● “How to make money?”
● “What is the difference between Google Colab and Google Cloud”
● “How to make pancakes?"
Unlike the specific questions, the general questions typically do not have a single piece of
information as an answer, but rather require a more descriptive answer (at least a sentence or
a paragraph). The answer provided to the user in case of general questions should be merged
from several top web-search results. The answer should be coherent, should not contain
redundant information and should be at most ten sentences long.
Elaborate on how you would solve the task. (18 marks)