ECEMBER 2020 72-无代写|学霸联盟

ECEMBER 2020 72-无代写

时间：2022-12-04

AN
SW
ER
S
DECEMBER 2020 72 HOUR ASSESSMENT
SCHOOL OF COMPUTER SCIENCE
ANSWER SHEET
MODULE CODE: CS5010
MODULE TITLE: AI Principles
This assessment consists of exam-style questions and you should answer as you would
in an exam. As such, citations of sources are not expected, but your answers should be
from your own memory and understanding and significant stretches of text should not be
taken verbatim from sources. Any illustrations or diagrams you include should be original
(hand or computer drawn). You may word-process your answers, or hand-write and scan
them. In either case, please return your answers as a single PDF. If you handwrite, please
make sure the pages are legible, the right way up and in the right order. Your submission
should be your own unaided work. While you are encouraged to work with your peers to
understand the content of the course while revising, once you have seen the questions you
should avoid any further discussion until you have submitted your results. You must submit
your completed assessment on MMS within 72 hours of it being sent to you. Assuming
you have revised the module contents beforehand, answering the questions should take no
more than three hours.
Page 1 of 8
AN
SW
ER
S
1. Let E be the restaurant domain data given in the table below, in which input attributes
influence decisions on whether or not to wait for a table.
EXAMPLE PRICE ESTIMATED WAIT HUNGRY? BAR? Wait
D1 Expensive 30-60 No Yes No
D2 Expensive 30-60 No No No
D3 Medium 30-60 No Yes Yes
D4 Cheap 10-30 No Yes Yes
D5 Cheap 0-10 Yes Yes Yes
D6 Cheap 0-10 Yes No No
D7 Medium 0-10 Yes No Yes
D8 Expensive 10-30 No Yes No
D9 Expensive 0-10 Yes Yes Yes
D10 Cheap 10-30 Yes Yes Yes
D11 Expensive 10-30 Yes No Yes
D12 Medium 10-30 No No Yes
D13 Medium 30-60 Yes Yes Yes
D14 Cheap 10-30 No No No
(a) In a decision tree representation:
(i) Each internal node tests an attribute
(ii) Each branch corresponds to attribute value
(iii) Each leaf node assigns a classification, i.e. a decision about waiting
A new example is passed down the tree to the leaf node with the specific
decision. [4 marks]
(b) The curve plots percent correct decisions for test data as number of training
instances increases. A curve rising to near 100% shows realisable performance
(i.e. close to the ground truth); a curve with slow monotone growth to well
below 100% shows redundant performance (i.e. missing attributes); a curve with
a fast rise then a plateau at well below 100% shows nonrealisable performance
(either missing attributes or a restricted hypothesis class). [4 marks]
(c) This training set was not used in the course notes, but similar calculations were
done for similar data. We have 9 positive and 5 negative examples in the prior,
Page 2 of 8
AN
SW
ER
S
so
H(E) = − 9
14
log2
(
9
14
)
− 5
14
log2
(
5
14
)
= − 9
14
log10
( 9
14
)
log10(2)
− 5
14
log10
( 5
14
)
log10(2)
≈ − 9
14
−0.192
0.301
− 5
14
−0.447
0.301
= 0.410+ 0.530
= 0.940
Hence ≈ 0.940 bits. [2 marks]
(d) For PRICE we have 2 positives and 3 negatives on the expensive branch; 4 posi-
tives and 0 negatives on the medium; 3 positives and 2 negatives on the cheap.
These have entropies 0.971, 0 and 0.971 respectively. So the information gain is
0.940−
(
5
14
)
0.971−
(
4
14
)
0−
(
5
14
)
0.971 = 0.246 .
For ESTIMATED WAIT we have 2 positives and 2 negatives on the 30-60 branch;
4 positives and 2 negative on the 10-30 branch; 3 positives and 1 negative on
the 0-10 branch. These have entropies 1, 0.918 and 0.811 respectively. So the
information gain is
0.940−
(
4
14
)
−
(
6
14
)
0.918−
(
4
14
)
0.971 = 0.029 .
For HUNGRY? we have 3 positives and 4 negatives on the left branch; 6 positives
and 1 negative on the right. These have entropies 0.985 and 0.592 respectively. So
the information gain is
0.940−
(
7
14
)
0.985−
(
7
14
)
0.592 = 0.151 .
For BAR? we have 6 positives and 2 negatives on the left branch; 3 positives and
3 negative on the right. These have entropies 0.811 and 1 respectively. So the
information gain is
0.940−
(
8
14
)
0.811−
(
6
14
)
1.0 = 0.048 .
[4 marks]
(e) PRICE has the greater information gain, hence gives the greater reduction in
entropy (i.e. leads to sub nodes with more purity), and hence should be preferred
as a node to split on. [2 marks]
Page 3 of 8
AN
SW
ER
S
(f) We would have a tree with 14 branches each leading to a decision. This would
describe the current data perfectly, but would not generalise to unseen examples
(i.e. the model would overfit the data). [2 marks]
(g) The DT learning agent is a simple hypothesis that is approximately consistent
with training examples: there is no guarantee of perfect accuracy. In other terms
from the lectures, h ≈ f in a perfectly rational agent, where h is the hypothesis
used and f is the ground truth; equality is never realised in practice. [2 marks]
[Total marks 20]
Page 4 of 8
AN
SW
ER
S
2. (a) FOL sentence [2 marks]
∀(x)∀(y)W(x) ∧ C(y) ∧M(x, y) =⇒ L(x, y)
Replace implication [1]:
¬(W(x) ∧ C(y) ∧M(x, y)) ∨ L(x, y)
Distribute negation [1]:
¬(W(x)) ∨ ¬(C(y)) ∨ ¬(M(x, y)) ∨ L(x, y)
[4 marks]
(b) Write as FOL with negation of conclusion [4 marks]:
(i) ∀x(MIG(x) =⇒ PK(x))
(ii) ∀x∀y(HAVE(x, y) ∧VIRUS(y) =⇒ ¬∃z(HAVE(x, z) ∧ ATWORK(z)))
(iii) ∀x(PROF(x) =⇒ ¬∃y(HAVE(x, y) ∧ PK(y)))
(iv) ∃x(HAVE(Tom, x) ∧ (VIRUS(x) ∨MIG(x)))
(v) ¬[PROF(Tom) =⇒ ¬∃z(HAVE(Tom, z) ∧ ATWORK(z))]
Convert to CNF [2 marks]:
(i) ¬MIG(x) ∨ PK(x)
(ii) ¬HAVE(x, y) ∨ ¬VIRUS(y) ∨ ¬HAVE(x, z) ∨ ¬ATWORK(z)
(iii) ¬PROF(x) ∨ ¬HAVE(x, y) ∨ ¬PK(y)
(iv) A. HAVE(Tom, a)
B. VIRUS(a) ∨MIG(a)
(v) A. PROF(Tom)
B. HAVE(Tom, b)
C. ATWORK(b)
Resolution Proof [4 marks] :
(vi) PK(x) ∨VIRUS(a) from (i) and (iv)B – {x/a}
(vii) ¬HAVE(x, y) ∨ ¬VIRUS(y) ∨ ¬HAVE(x, b) from (ii) and (v)C – {z/b}
(viii) ¬HAVE(Tom, y) ∨ ¬VIRUS(y) from (vii) and (v)B – {x/Tom}
(ix) ¬HAVE(Tom, a) ∨ PK(a) from (vii) and (viii)
(x) PK(a) from (iv)A and (ix)
(xi) ¬PROF(x) ∨ ¬HAVE(x, a) from (x) and (iii) – {y/a}
(xii) ¬PROF(Tom) from (xi) and (iv)A – {x/John}
(xiii) ∅ from (xii) and (v)A
[10 marks]
Page 5 of 8
AN
SW
ER
S
(c) From (i), (ii), (vii) with {w/0, x/7, y/3, z/9} infer
(ix) 0+ 7 ≤ 3+ 9
From (ix), (vi), (viii) with {x1/0, y1/7, x2/0, y2/7, z2/3+9} infer
(x) 7+ 0 ≤ 3+ 9
where x1, y1 are renamed in (vi) and x2, y2, z2 are renamed in (viii). From (iv),
(x), (viii) with {x3/7, x4/7, y4/7, z4/3} infer
(xi) 7 ≤ 3+ 9
where x3 is renamed in (iv) and x4, y4, z4 are renamed in (viii). [6 marks]
[Total marks 20]
Page 6 of 8
AN
SW
ER
S
3. (a) (i) Write N + 5 + 9 to mean, we are at Node N, we have travelled 5 blocks to get
there, and we have a lower bound of 9 blocks to solution. BnB depth first,
looking at list.
Root: [A+0+12]
Expand A to B/C
[B+6+8,C+8+10]
Expand B to D/E
[E+9+5,D+10+6,C+8+10]
Expand E to F
[F+11+5,D+10+6,C+8+10]
[G+14+6,D+10+6,C+8+10]
[H+20+0,D+10+6,C+8+10]
Best solution found so far: H in 20. Is there a better one?
[D+10+6,C+8+10]
D to H
[H + 16+0, C+8+10]
New best solution, H in distance 16. Is there a better one?
[C+8+10]
Can remove C + 8 + 10 as 18 > 16, so breaks bound.
[]
Empty list so finished. Best distance is 16.
[6 marks]
(ii) Start with A, estimate 12. A+0+12
Then add B/C in order:
B + 6 + 8, C + 8 + 10
Expand B to D and E: In order:
E + 9 + 5, D + 10 + 6, C + 8 + 10
Expand E to F
F + 11 + 5, D + 10 + 6, C + 8 + 10
Expand F to G
D+ 10 + 6, C + 8 + 10, G + 14 + 6
Expand D, aha goal!
H in distance 16. Shortest route is A/B/D/H
[4 marks]
Consider the following simple description of faults in the brakes of a bus. Two
problems may cause a brake repair to be necessary. These are a leak in the brake
fluid cable, and worn brake pads. If either of these happens, one of two things
may alert the driver to the problem: either noticeably reduced braking power
in the bus, or an indicator light of a brake fault on the dashboard. If the driver
observes either of these problems, they should take the bus to a garage for repair.
Figure 2 shows the Bayesian network for modelling repairs in a bus. Nodes rep-
Page 7 of 8
AN
SW
ER
S(b)
Figure 1: The Bayesian network required for Q3b.
resent the following: L for a leak in the brake cable; P for a worn brake pad; F
for a fault in the brakes; R for reduced braking power; I for an indicator light of a
fault on the dashboard; and G for the bus is taken to a garage for a brake repair.
(i) To construct the diagram we take letters in order L P F R I G. L therefore
has no arrows. P is conditionally independent of L, because the faults occur
independently of each other. F obviously depends on both L and P – if either
happens there is a fault R happens with a certain chance but depends only
on there being a fault. So if we know the status of F, we can ignore L and P
[perhaps an unrealistic assumption but acceptable here.] Similarly with I,
but also I is conditionally independent of R given F. I.e. if we know whether
or not there is a fault, the status of I depends only on F, and is not affected
by whether we notice reduce braking power. Finally G obviously depends
on R and I – if the customer notices these, they will likely take the car in. But
it is independent of F, L, P, since the customer takes it in based on faults they
notice. [4 marks]
(ii) P(F|L,P) P(L)P(P) + P(F|not L, P)P(-L)p(P) + P(F|L,-P)p(L)p(-P) + p(F|-L,-
P)P(-L)p(-P) = 0.999* 0.1* 0.2 + 0.99* 0.9* 0.2 + 0.99 * 0.1 * 0.8 + 0.1* 0.9* .8 =
0.349 to 3dp [3 marks]
(iii) [3 marks]
[Total marks 20]
*** END OF PAPER ***
Page 8 of 8