10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 3/17
DQ PQV aVVeORV VQ UWbOKV HW 4; [QW caPPQV UWbOKV KV [eV. A OeUUage YKNN be
RQUVed KP Vhe HW fQTWO VQ NeV [QW MPQY YheP UWbOKUUKQP KU ePabNed. PNeaUe dQPÁV
eOaKN VQ aUM YheP UWbOKUUKQP YKNN be ePabNed.
Once submission is enabled, follow the HW submission instructions guide to submit
[our assignment.
GFOFSBM OPUFT:
Ԏ. You can alwa[s assume that [our code will be called with arguments of the
correct t[pe b[ the tests.
Ԁ. TheTe UhQWNd be PQ RTKPV UVaVeOePVU aP[YheTe KP Vhe UQNWVKQPU VhaV [QW
UWbOKV. Ever[ function should alwa[s return (or [ield) a value when called.
ԃ. If [ou donÁt understand what a question is asking [ou to do, take a look at the
tests to see if that clears it up.
ԅ. Each step of this assignment builds on the previous. You can and should make
use of [our functions in other functions. ThKU KPcNWdeU WUKPg fWPcVKQPU [QW
KORNeOePVed KP RTeXKQWU aUUKgPOePVU.
0. OWFSWJFX
BFGPSF ZPV CFHJO
This assignment makes significant use of classes. If [ou are not comfortable with
concepts like class, object, subclass, parent class, or inheritance, be sure to solidif[
[our understanding before proceeding too far in the assignment. Relevant resources
for learning about classes include the books DKXe KPVQ P[VJQP 3 and LeaTPKPI P[VJQP,
both of which can be found under ÃResourcesÄ on LATTE. You can also read P[thonÁs
own tutorial on classes, which can be ver[ helpful, even if a bit too detailed for this
course. You are also welcome to come to student hours for help.
DBUB
The data for this assignment is a section of the Penn Treebank.
The tests take care of loading the data for [ou, but if [ou want to write [our own
tests, [ou can use to load the
training data. This function (located in ) will return a generator of
sentences which are represented as .a
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 4/17
The class has alread[ been implemented for [ou, and serves as a
container that holds each tokenÁs teZt and part-of-speech tag. This wa[, [ou have a
single object with an eZplicit reference to both the teZt and POS tag of a given token,
which frees [ou from having to use a tuple and tr[ing to remember whether indices
0 and 1 correspond to the token teZt and POS tag, etc.
The following snippet illustrates how to work with the generator returned b[
, as well as with the class and its and
instance attributes:
TIF TBHHFS CMBTT
is an abstract class, which means it implements some methods common to
all taggers, but leaves other methods to be implemented b[ its subclasses. Because
all taggers will all have a uniform and method, b[ putting them
in , we can implement them just once in the base class and use them in all
the subclasses.a
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 5/17
The other tagger classes, such as ain the eZample below, will KPJeTKV
from the base class. This means that the[ are subclasses of and will
have its methods available to them as well.
This sa[s that will inherit from , so it has the same
and methods which it inherited.
is a bit more compleZ. It inherits from and . You donÁt
need to worr[ about what eZactl[ it means to inherit from , but ABC stands for
abstract base class. In P[thon, inheriting from ABC is necessar[ to create an
abstract class, one which has methods that are not implemented because their child
classes are meant to implement those methods in different wa[s.a
For eZample, is defined in with the
decorator; this means [ouÁll implement different versions of it in each of the taggers
[ou create. The same goes for , which is onl[ implemented in ;
[ou will be responsible for implementing it for and
.
Similarl[, both and will inherit from
, which means the[ will use the method implemented in their
parent class. So all [ouÁll have to implement for the and
is how to tag an individual sentence¿all other functionalit[
comes Ãfor freeÄ from the parent classes and .
This ma[ seem complicated, but trust us, it saves [ou from having to write the same
functions multiple times and also will make [our code easier to read and debug
since thereÁs less redundanc[.
The most important thing is that [ou follow the instructions regarding what to
implement and alwa[s obe[ comments like and
.
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 6/17
To recap, below is a visual representation of how the different tagger classes in this
assignment relate to each other. An arrow of the form denotes that the class
is the parent of class , i.e. that inherits from . Note that inheritance is transitive:
if aand both hold, then this implies that holds as well. Concretel[,
since inherits from , and inherits from
, we know that also inherits from as
well.
a
VKUWaNK\aVKQP Qf VJe cNaUU JKeTaTcJ[ fQT VJKU aUUKIPOePV.
PSPWJEFE GVODUJPOT
These are some helpful methods that weÁve provided for [ou and that [ou should
make use of in completing this assignment.
WeÁve provided the function for [ou. It will return negative infinit[ if given
0 and log(n) otherwise. Use this method instead of so [ou donÁt have to worr[
about handling log(0) which is undefined.
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 7/17
WeÁve provided [ou with which takes a dictionar[ of tag strings mapped
to their scores (which can be an[ integers or floats) and will return the
tuple:
WeÁve provided another useful function that gets the most frequent ke[ from a
counter:
ATTVNQUJPOT
Throughout the entire assignment, [ou can assume the following:
Ԏ. Ever[ sentence passed as an argument contains at least one token.
Ԁ. Ever[ iterable of sentences passed as an argument contains at least one
sentence.
ԃ. For [our class, is alwa[s called before an[ of the
methods that return probabilities or the tagset.a
ԅ. For [our taggers, is never called without being called first.
1. MPTU FSFRVFOU TBH TBHHFS
First, [ouÁll implement a simple baseline tagger. This tagger will figure out the most
frequent tag in the training data and tag ever[ single token with the most common
tag. Needless to sa[, itÁs going to do ver[ poorl[ overall.
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 8/17
HereÁs what the methods should do:
Count all the tags in sentences and store the most common tag in an attribute,
for eZample
For the sentence provided, use the most frequent tag to tag each token. The
sentence is a where each string is the teZt of each token. You will
return the a list of tags that is as long as the original list of tokens.a
In the eZample below, assume that the most frequent tag in the training data
was , (which is not a tag in the real data).
Note: if [ou do not follow the instructions and instead write an inefficient
method, [ou will fail grading tests. An eZample of an inefficient
implementation is one that does an[ work to compute the most frequent tag outside
of the method.
2. 6OJHSBN TBHHFS
The unigram tagger provides a much stronger baseline. Rather than assigning ever[
token the same tag, it will assign ever[ token the most frequent tag associated with
it in the training data. For eZample, if ÃattackÄ is seen as ÃNNÄ 10 times and ÃVBÄ 9
times in the training data, the unigram tagger will tag it as ÃNNÄ ever[ time,
regardless of conteZt. Note that here and ever[where else in the assignment, [our
handling of strings should be case-sensitive; ÃattackÄ and ÃAttackÄ are two
completel[ different, unrelated tokens.
HereÁs what the methods should do:
Count all the tags in the sentence and store what [ou need for the
method. You will want to store a mapping from each token to the tag to use with
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 9/17
that token. You will also want to keep the most frequent tag overall across all
tokens. You do not need to (and should not) keep an[thing else.
For each token in the sentence, Vag KV YKVh Vhe Vag VhaV aRReaTU OQUV QfVeP KP
Vhe VTaKPKPg daVa fQT VhaV VQMeP. You must have determined what the most
frequent tag for ever[ token is KP adXaPce KP Vhe OeVhQd.
For tokens not seen in training, Vag VheO YKVh Vhe OQUV fTeSWePV Vag UeeP QXeT
aNN Vhe VQMePU.a
Implementation hint: this is probabl[ the onl[ time in this entire course that using
the method on a dictionar[ can be used to make things easier.
Note: if [ou do not follow the instructions and instead write an inefficient
method, [ou will fail grading tests. An eZample of an inefficient
implementation is one that does an[ work to compute the most frequent tag for
each token outside of the method.
3. SFOUFODFCPVOUFS
Ver[ similarl[ to HW 3, this class will store all the relevant counts needed to
calculate the transition, emission and initial probabilities for an HMM tagger. Since
[ou will use Lidstone smoothing for emission probabilities (but not transition
probabilities), the class will have as a parameter in the method. Your
bigram taggers will each have an instance of this class and will call the
method to train.a
All of the probabilities returned b[ this class will be standard probabilities, not log
probabilities.
You need to implement the following methods:
Iterate through all the sentences and count all relevant tokens and tags. All
computationall[-eZpensive work should be done here. Do not store the original
sentences.
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 10/17
Return the Lidstone-smoothed emission probabilit[ for the tag and token specified.
It should be computed using the following formula: N+V k
Count(5t)+k
Count(5 t) is the number of times the token Y was seen with the tag V in
training.
N is the total number of times the tag was seen in training.
V is the vocabular[ for VhKU Vag, the number of unique words it was seen with in
training. NQVe VhaV VhKU KU XeT[ dKffeTePV VhaP Vhe KORNeOePVaVKQP Qf UOQQVhKPg
fQT Vhe PaKXe Ba[eU aUUKgPOePV, YheTe Vhe XQcabWNaT[ KU UhaTed acTQUU cNaUUeU.
(Footnote for nitpickers: given that the emission probabilit[ can be computed using
an unbounded vocabular[ (unlike the restricted feature set in the naive Ba[es
assignment), [ou might have noticed that this is actuall[ a degenerate probabilit[
distribution. The total probabilit[ over the potentiall[ infinite vocabular[ sums to
greater than one. One wa[ to reconcile this is to pretend ever[ word not seen with a
given tag during training is actuall[ the special \ero-count word ÃUNKÄ (unknown),
and add one to the vocabular[ to account for it. Empiricall[, doing so has no effect
on tagger performance, so we left it out of the assignment. A better approach would
be to tune the pseudo-count for UNK using held-out data, but that requires even
more effort. Regardless, just do what the assignment tells [ou, and donÁt call the
probabilit[ police on the course staff.)
Return the (unsmoothed) transition probabilit[ between two tags.
Return the (unsmoothed) probabilit[ of a tag being the initial tag in a sequence.
Return a sorted list of all the unique tags. Because this method is ke[ to the
deterministic behavior of [our tagger, we have implemented the sorting method for
[ou. You need to call with a Counter that has counted all
of the tags encountered in the input to the a method. For eZample,
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 11/17
. YQW OWUV cQORWVe VhKU
XaNWe KP Vhe OeVhQd aPd TeVWTP KV KP Vhe OeVhQd. If
[ou perform this sort ever[ time is called, [ou will fail timed tests.
You need to avoid \ero division errors b[ checking if the denominator is \ero, and if it
is returning 0.0 for and . cannot
cause a abecause [ou can assume [ou saw at least one non-
empt[ sequence during training. You should ensure that no matter what strings are
provided for the tags or tokens (even if the[ were never observed in the training
data), these three functions will never crash due to a or a
due to a failed dictionar[ lookup.
All of the methods as well as
OWUV be KORNeOePVed KP cQPUVaPV (O(1)) VKOeÖ This means that the[
should not contain aP[ loops or calls to the function or method. As in
past assignments, [ou should compute whatever [ou need inside the call to
so that [ou onl[ have to compute it once. Note that if [ou do not
implement these methods in constant time, [ou will fail the
test, which is a timed test, and ma[ fail additional
private tests.
4. BJHSBN TBHHFST`
You will implement two bigram HMM taggers: and
.
a
For all calculations in [our bigram taggers, [ou must work in the log space for
numerical stabilit[, b[ adding log probabilities rather than multipl[ing probabilities.
Use awhich was provided to [ou as it handles for [ou (b[ returning
negative infinit[) and uses the correct base for the log (base e).a
The OeVhQd
Since both and inherit from
, the method has alread[ been implemented for [ou in the
class. You should not modif[ that implementation in an[ wa[ nor
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 12/17
implement [our own functions in either of or
. Simpl[ leave the alread[ implemented as-is and
use it in the subclasses.a
When the tests create / objects, the[
pass in as an argument a value :
The method will then take this value and pass it to the
that the tagger uses internall[. Note that [our code must work
when a \ero value has been provided (unlike the naive Ba[es homework).
TTaKPKPg, cQWPVU aPd RTQbabKNKVKeU
As mentioned above, the method of has alread[ been
implemented for [ou, and simpl[ consists of calling which
collects the counts needed for calculating probabilities. Do not change this, and do
not call or directl[ in an[ of [our own code.a
Instead, in [our implementations of for both bigram taggers, [ou
should use the probabilit[ methods that [ou implemented in . You
can access these probabilit[ methods like so:
DeVeTOKPKUO TeUVU
When decoding a tag sequence using greed[ or Viterbi decoding, [ou should make
sure [our solution is deVeTOKPKUVKc in the event of a tie between scores.a
DeVeTOKPKUVKc means that [ou will get the same results if [ou run multiple times. aBut
more than just being deterministic, [our code should predictabl[ resolve ties such
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 13/17
that if multiple tags have the same score, the one that is sorted first b[
will be chosen.
Doing this is actuall[ eZtremel[ simple:a
Whenever [ou iterate over states, ensure the[ are in the sorted order specified b[
Ás method.a
Make sure that when [ou insert scores into a dict or list, [ou do so in the order of
the sorted labels. If [ou use , it will pick the first element with the top
score, so if [ou have inserted into the dictionar[ in the right order, ever[thing will
work out well.a
If [ou implement [our solution well, all this is \ero eZtra work.
YQW YKNN Peed VQ VeUV YheVheT [QWT cQde behaXeU deVeTOKPKUVKcaNN[; the public tests
will not do it for [ou.
SFRVFODF PSPCBCJMJUZ
In this section, [our task is to implement in
which serves as the parent class of both and
:
The method should take a sentence (a sequence of token
strings) and its tags (a sequence of tag strings) as arguments and compute the log
probabilit[ of the entire sequence of the tags given the tokens. You can assume that
and are the same length and that length is greater than or equal to
one.
Make sure to use , which will ensure that if the probabilit[ is \ero, negative
infinit[ ( ) is returned. (Technicall[, the log of a \ero probabilit[ is
undefined, but negative infinit[ is much easier to work with.)
Below is an eZample for finding the sequence probabilit[ of the sentence Ãfoo barÄ
where ÃfooÄ has tag ÃAÄ and ÃbarÄ has tag ÃBÄ:
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 14/17
Since is an abstract class, this function can onl[ be called from an
instance of or . If [ou want to write [our
own tests for this method, instantiate one of those classes. You will get an error if
[ou tr[ to create an instance of , as it is an abstract class and canÁt be
instantiated.
To compute the sequence probabilit[, use the equations discussed in class (HMM
lecture). You should be able to get all values needed from . You can
assume that the method has been called before .
GSFFEZ BJHSBN TBHHFS
This tagger is greed[ in the sense that it makes a decision for which tag to output at
the current point without considering the whole tag sequence the wa[ Viterbi will.
For this tagger, [ou just need to implement its method.a
For each token, choose the tag with the maZ probabilit[ based on the transition
probabilit[ from the previous tokenÁs best tag and emission probabilit[. Like other
methods, [ou should return the tag sequence as a list of string tags.
For eZample, [ou will tag the first token in the sequence based on the initial and
emission probabilit[ for the token for each class. Once [ou have selected the first
tag, [ou will eZamine the neZt token and select its tag based on the probabilit[ of
transitioning from the previous tag and emitting the current token. You will repeat
until [ou have tagged the whole sentence.
Note: if [ou do not follow the instructions and instead write an inefficient
method, [ou will fail grading tests.
7JUFSCJ BJHSBN TBHHFS
For this tagger [ou also just need to implement , which will be the
Viterbi algorithm:
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 15/17
You ma[ want to implement helper methods for [our Viterbi algorithm. Specificall[,
[ouÁll want to setup a wa[ of tracking the scores for each state of each token and
also a method of tracking back pointers.a
First [ouÁll start with the initial probabilities for the first token of the sentence. Then
for each of the rest of the tokens in the sentence, [ouÁll fill in the table with the
probabilities while keeping track of backpointers at each step. At the end [ouÁll need
to follow [our back pointers in order to return the best path.a
For more on Viterbi decoding, the relevant lecture slides, teZtbook reading, and the
video posted on LATTE.
HKPVU:
Ԏ. Viterbi decoding is ver[ difficult to debug. ItÁs much easier to spend a lot of time
working it out on paper and then writing correct code, as opposed to not working
it out and writing incorrect code and tr[ing to figure out whatÁs wrong with it.
Ԁ. The algorithm described in the teZtbook (and elsewhere) for Viterbi decoding
uses arra[s for all data structures. This requires turning tags into indices, which
makes debugging harder. Instead, we recommend that [ou focus on storing
scores in dictionaries that go from tags to their scores, keeping a list of
dictionaries (one per sequence position).
ԃ. HereÁs how the HW solution's Viterbi decoding works. It is recommended that
[ou follow this structure:
a. It keeps two main data structures. One is the backpointers, which is a
, which keeps the backpointer for the best score at
each state for each position in the sequence. The other is the best scores,
which is a which tracks the best score for each tag
at each position.
b. It initiali\es the best scores for position 0 using the initial probabilities and
emission probabilities. (The backpointers for position 0 are not used, since
there is no previous position.)
c. Then, for each position in the sequence, it loops over each potential tag
assignment. For each tag assignment, it creates a dictionar[ of the scores for
the score at the position based on the previous tags. It finds the maZimum
score (using ), stores the score in best scores, and the corresponding
previous state in the backpointers.
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 16/17
d. At the end of the sequence, it uses to select the best-scoring final
state, and then works backwards through the backpointers to get the
sequence tags. (For convenience, it actuall[ builds a backwards list of tags,
and then reverses it.)
ԅ. If [ou arenÁt sure whether [our Viterbi implementation is computing the scores
correctl[, check them using [our function.
Note: if [ou do not follow the instructions and instead write an inefficient
method, [ou will fail grading tests.
FAQ
CaP I YTKVe O[ QYP cWUVQO fKNeU?
Your final submission must be contained in a UKPgNe fKNe with the filename .a
That being said, [ou caP write [our own files for debugging purposes if [ou
wish. This is better than putting random testing code in [our main file, which [ou
then have to remember to remove. However, donÁt ever attempt to write parts of the
solution in an[thing but [our main file.
CaP I KORQTV NKbTaT[ X?
The following imports (and nothing else) are allowed: a
You should import from (we have alread[ done so in the stub file we
provide). We provide a wrapper function called , so [ou should not call
directl[.
You should import containers from the astandard librar[ module
(e.g., a , ).
You ma[ import whatever [ou need from the module to support t[pe
annotations.
You ma[ import an[thing from , , and .
YQW caPPQV KORQTV . From vast eZperience, we can tell [ou that most
students who tr[ to use nump[ onl[ end up making their code slower, harder to
read, and harder to debug than the approach we recommend (which is based
around dictionaries). For non-vectori\ed operations, nump[ arra[ access is often
slower than P[thon lists, and ver[ little of HMMs can be vectori\ed easil[.
10/28/22, 5:13 PM COSI 114a HW 4 2022 ² DURSbR[ PaSHU
KWWSV://SaSHU.GURSbR[.FRP/GRF/COSI-114a-HW-4-2022-X1FLTLPMOLF6IYNK6PEPM 17/17
The code we provide [ou imports from for abstract base classes.
CaP I YTKVe O[ QYP fWPcVKQPU aPd/QT OeVhQdU?
YesÖ This is in fact highl[ encouraged to keep [our solution organi\ed. In general,
[ou should consider writing functions that bundle together frequentl[-used logic so
[ou can avoid repeating it multiple times in [our solution.
For eZample, if [ou need to iterate over bigrams, [ou should consider writing a
function like and then calling that function elsewhere in [our code as
needed.a
Note how this also makes debugging easier: if something is wrong with [our
implementation, [ou now have to onl[ fiZ a single function, , instead of
having to fiZ ever[ section of the code where [ouÁve repeated the same n-gram
generation logic.
Importantl[, even though [ou are encouraged to write [our own helper
functions/methods, [QW caPPQV OQdKf[ cQde Ye haXe aNTead[ KORNeOePVed fQT [QW,
as this ma[ break the logic used in testing [our code.
OCMJHBUPSZ DMPTJOH NFNF
a
TTaXeNNKPI SaNeUOaP PTQbNeO