CS5012-nlp代写
时间:2023-03-29
CS5012 Mark-Jan Nederhof Practical 2
Practical 2: Grammar engineering
This practical is worth 50% of the coursework credit for this module. Its due date
is Tuesday 4th of April 2023, at 21:00. Note that MMS is the definitive source for
deadlines and weights.
The usual penalties for lateness apply, namely Scheme B, 1 mark per 8 hour period
or part thereof.
The purpose of this assignment is to make you familiar with specifications of syntax.
You will be engineering a grammar for a small subset of English. For your convenience,
this task is broken up into steps. At first, the grammar will be a pure context-free
grammar and later this will be refined to become a feature (or ‘unification’) grammar.
Only the final grammar needs to be submitted.
As in the first practical, we will be using NLTK with Python3.
Step 0: getting started
Investigate the provided files:
• parse.py
• grammar.fcfg
• positives.txt
• negatives.txt
The first is a small Python program that compiles a parser from a feature grammar,
and applies it on two files, one with positive examples and one with negative examples.
Have a close look at how ARG is used for arguments of verbs. Keep this treatment of
arguments in what follows, so we may later implement subcategorisation in a general
and elegant way, with arguments that can be NPs or other categories depending on
the verb. Here we assume that “invents” requires exactly one argument, which is an
NP; hence the two negative examples, with zero and two NP arguments, respectively.
Run /usr/local/python/bin/python3 parse.py (or just python3 parse.py or
python parse.py if working on your own machine) and see what happens.
1
Step 1: lexicon and context-free grammar
Consider the following positive examples:
Gromit barks
Gromit barked
Wallace and Gromit eat cheese
Wallace and Gromit ate cheese
Wallace feeds Gromit
Wallace seldom feeds Gromit cheese
Wallace thinks Gromit barks and eats cheese
Wallace often eats cheese in the kitchen after dinner
Wallace puts the contraptions in the kitchen
when Gromit barks Wallace feeds Gromit
when does Wallace eat cheese
when do Wallace and Gromit invent contraptions
Wallace eats cheese in the kitchen and invents contraptions after dinner
Note that all punctuation has been removed in order to avoid complications, and we
do not enforce capitalisation at the beginning of sentences.
Extend grammar.fcfg with more rules so that all words from the above sentences
are included. You will need to introduce more parts of speech such as Det (e.g. “the”),
Prep (e.g. “in”, “after”), and a few more. Note that “when” occurs with two different
functions, so needs to be associated with two different parts of speech.
At this point, you may not want to distinguish between singular and plural noun
phrases, nor between different verb forms, nor between verbs with different subcate-
gorisation.
Also add more context-free rules, so that the above sentences can be derived. Make
sure the rules defining the start symbol (S) come first. (NLTK by default assumes that
the first mentioned nonterminal is the start symbol.)
When designing your grammar, beware of the distinction between argument and
adjunct. In the above example sentences, PP “in the kitchen” is arguably an ar-
gument in one instance, and an adjunct in the remaining instances. (Which ones and
why?)
Step 2: intermediate testing of the grammar
Add the above positive examples to positives.txt and add further negative examples
to negatives.txt such as:
when Wallace invents
when does Gromit
2
contraptions puts the cheese in the kitchen
Wallace thinks the kitchen
(Some of the above might be grammatical in special contexts, e.g. assuming ‘ellip-
sis’, i.e. omitted phrases that are understood from the larger context of a dialogue.
In this practical, we don’t consider ellipsis, nor do we consider uncommon usage of
words that would require a stretch of the imagination to justify.) Once more run
/usr/local/python/bin/python3 parse.py.
Step 3: feature grammar
The grammar that you wrote in Step 1 will likely accept some of the negative examples.
This is because the following have not been modelled:
• number agreement,
• subcategorisation.
For number agreement, remember that English has five verb forms for ordinary verbs
(and a few more for to be):
• base form: to write, you write
• third person singular present : he writes
• preterite (a.k.a. simple past): wrote
• past participle: written
• present participle (a.k.a. gerund if used as noun): writing
For our simple examples, we only need the first form (used for third person plural
present, and infinitive), the second (third person singular present), and the third
(preterite). Features can be added to the grammar to ensure that only the correct
verb forms are allowed, and that there is number agreement for those verb forms where
it is relevant.
Subcategorisation should be implemented as illustrated by the following example
(which ignores the issue of number agreement):
S -> NP VP[SUBCAT=nil]
VP[SUBCAT=?rest] -> VP[SUBCAT=[HEAD=?arg, TAIL=?rest]] ARG[CAT=?arg]
VP[SUBCAT=?args] -> V[SUBCAT=?args]
ARG[CAT=np] -> NP
3
SNP
...
he
VP[SUBCAT=nil]
VP[SUBCAT=[HEAD=pp,TAIL=nil]]
VP[SUBCAT=[HEAD=np,TAIL=[HEAD=pp,TAIL=nil]]]
V[SUBCAT=[HEAD=np,TAIL=[HEAD=pp,TAIL=nil]]]
gave
ARG[CAT=np]
NP
...
the bike
ARG[CAT=pp]
PP
...
to his brother
Figure 1: Graphical representation of the parse of he gave the bike to his brother.
Note the two applications of VP[SUBCAT=?rest] -> VP[SUBCAT=[HEAD=?arg,
TAIL=?rest]] ARG[CAT=?arg]. Also note the topmost VP has [SUBCAT=nil], which
is needed to apply the rule with left-hand side S.
ARG[CAT=pp] -> PP
V[SUBCAT=nil] -> 'sneezed'
V[SUBCAT=[HEAD=np, TAIL=[HEAD=pp, TAIL=nil]]] -> 'gave'
Figure 1 illustrates how the rules for subcategorisation are applied, for a subcategorisa-
tion of “gave” with two arguments, namely a NP and a PP. In order to handle the verbs
in our example sentences, further rules for V and ARG are needed, but it should be possi-
ble to reuse the rule VP[SUBCAT=?rest] -> VP[SUBCAT=[HEAD=?arg, TAIL=?rest]]
ARG[CAT=?arg] for several verbs, regardless of their subcategorisation.
Step 4: final testing
Again test the positive and negative examples, and verify that all positive examples
are accepted, and none of the negative examples are accepted. You may add further
positive and negative examples (with words in the lexicon) to convince yourself that
your grammar is satisfactory.
Requirements
Submit a zipped file containing:
4
• parse.py (unmodified)
• grammar.fcfg (extended by you)
• positives.txt (extended by you)
• negatives.txt (extended by you)
• a report in PDF
The report should contain:
• Description of, and motivation for, any interesting choices you have made in
engineering the grammar, for example the choice of the set of categories.
• Critical reflection on the language the grammar accepts. Does it accept any sen-
tences that you would consider ungrammatical with regard to standard English?
• Any other thoughts on this practical.
• Explicit mention of contributions that you consider to be extensions (see below).
A very good report can consist of less than 5 pages. It is strongly discouraged to write
a report longer than 10 pages. We do not expect an essay on wider issues of syntactic
analysis or English grammar, or anything of the sort.
Marking and extensions
Marking is according to the school handbook. The basic requirements above earn you
up to 17 marks if all is done well; this will need to include implementation of number
agreement, verb forms and subcategorisation as outlined above. If only a context-free
grammar is produced, without features that implement number agreement, verb forms
or subcategorisation, then no more than 12 marks can be attained. Marks higher
than 17 require extensions that contribute to demonstrating your understanding of
grammars for natural language. Possible extensions include the implementation of
new types of sentences, and their discussion in the report, such as:
Wallace likes inventing contraptions
Gromit may have barked
Gromit watches Wallace invent contraptions
what does Gromit eat
what does Wallace feed Gromit
whom does Wallace feed cheese
what does Gromit think Wallace invents
where does Gromit think Wallace puts the cheese
5
Finding a general and elegant solution to handle sentences of the last five types above
is quite challenging, as it requires processing the subcategorisation in a novel manner;
ideally, you would not need to add new (lexical) rules for verbs.
Hints
• Try to avoid that your feature grammar allows sentences that are syntactically
incorrect. Do not worry however about accepting sentences with nonsensical
meaning; e.g. we consider “the kitchen barks” to be perfectly acceptable from
a syntactic viewpoint.
• We generally prefer small numbers of simple rules generating many different (cor-
rect) sentences over large numbers of rules that each capture few cases. If your
grammar can handle only the given positive examples and little more, then this
is not very satisfactory. For example, the largest number of adjunct PPs we see
in any example sentence in Step 1 is two, but why should we not allow sentences
with three or more adjuncts?
• Do not make matters unnecessarily complicated by involving grammatical con-
structions that are not in the listed example sentences. For example, none of the
examples here include compound nouns, so don’t introduce rules for compound
nouns.
• Ambiguity is unavoidable in natural language. It is fine if your grammar allows
several parses for a single sentence, provided these different parses correspond to
different interpretations.
• There may be more than one solution to achieve roughly the same language, but
a grammar that uses commonly accepted category names (see e.g. lecture notes)
is preferable over one that does not.
Pointers
• Marking
http://info.cs.st-andrews.ac.uk/student-handbook/
learning-teaching/feedback.html#Mark_Descriptors
• Lateness
http://info.cs.st-andrews.ac.uk/student-handbook/
learning-teaching/assessment.html#lateness-penalties
• Good Academic Practice
https://www.st-andrews.ac.uk/students/rules/academicpractice/
essay、essay代写