COMP4620/8620-无代写-Assignment 2|学霸联盟

COMP4620/8620-无代写-Assignment 2

时间：2024-10-09

COMP4620/8620: Advanced Topics in ML – Intelligent Robotics
Semester-2 2024 – Assignment 2
Convenor & Lecturer: Hanna Kurniawati
Due date: Tuesday, 15 October 2024 10:00 am Canberra time
Grace period: 6 hours after the due date
Late submission: Not permitted
Please read the following notes first before starting to work on the assignment.
1. This is an individual assignment.
2. The maximum total mark for this assignment is 100 points
3. This assignment consists of two parts: Part A and Part B. Part A contains conceptual questions
only. The maximum total mark for Part A is 20 points. Part B contains conceptual questions,
programming and analysis. The maximum total mark for Part B is 80 points.
4. Submission Instruction:
(a) You can write your program in a programming language of your choice, but we provide
supporting code in Java.
(b) You must submit all of your source codes. If your program consists of multiple files, you
must place all files under a single folder, compress the folder into a single file with one of the
following extensions: .zip or .7z or .tar.gz, and submit this compressed file. If your program
consists of multiple files in multiple folders, your compressed file should preserve the folder
structure.
(c) In your selection of programming language and in compressing the files, you must consider
that during the demo, you will need to download your submission, extract your source codes,
compile (if needed), and run the program in front of the tutor marking your assignment
without making any changes to the source code.
(d) All non-programming parts of the assignment must be written in a single .pdf file.
(e) The two files (source codes and .pdf) must be submitted via wattle before the due date.
(f) Late submission is NOT permitted. However, we provide a 6 hours grace period. Within the
grace period, you can still submit your assignment. However, after the grace period ends,
you will NOT be able to submit your assignment.
5. Information about the in-person demo will be announced in the class forum.
PART A [20 pts]
1. [10 pts] Suppose ML and MH are two infinite horizon discounted MDPs (ie, the value function
of the MDP we discussed at length in class) with exactly the same state space, action space,
transition function and reward function but with different discount factor γ. The discount factor
of ML is much lower than that of MH. Please answer the following:
(a) [5 pts] How will the behaviour of ML and MH agents differ when they run their respective
optimal policy? Please explain.
(b) [5 pts] If we use the vanilla Value Iteration, which MDPs converge to the optimal value
function faster? Please explain.
2. [10 pts] Mr S is solving an MDP using a Policy Iteration method but with only a single iteration
of Bellman update for each of its policy evaluation step. Will Mr S’ Policy iteration converge
to an optimal policy? Please explain and provide evidence.
Page 1 of 5 – Advanced Topics in ML: Intelligent Robotics – COMP4680/8650
PART B [80 pts]
Figure 1: An illustration of the game Changing
Technology. Picture taken from https://www.
oranz.co.nz/
In this assignment, you will develop an MDP agent
to play a game, called Changing Technology. The
game Changing Technology is an off-road touring
car racing game, where the emphasize is on the
strategy that a team should take –that is, decid-
ing which driver, car models, and car components
should be used when and where–, so that the team
can reach the goal within a given number of steps.
In game Changing Technology, a car operates in
an environment with varying terrain types. There
are three variables that influence terrain types:
dirt/asphalt, straight/slalom, and hilly/flat. The en-
vironment map, including the terrain types, is known perfectly and is represented as 1-dimensional
grid cell of size N, where column-1 is the starting point and column-N is the goal region.
At each step, the team might be able to do one of the following classes of actions:
A1. Continue moving. Depending on the terrain, driver, car models, and car components, each
movement might move the car k cells, where k ∈ [−4, 5]⋃{slip, breakdown}. The result of
the exact movement is uncertain and represented as a probability distribution. This distribution
is conditioned on the components discussed in action #2–#6 below. To simplify computation
of transition probability, this game assumes conditional independence of the parameters below
(i.e., car type, driver, and tire model, given k) and prior distributions of the parameters and of
k is assumed to be uniform. If the movement causes the car to have non-zero probability mass
of being in cell index > N, then all probability mass for cells beyond N will be added to the
probability mass of being at cell-N. The same is applied to cell-1. The car might also slip,
which will incur a substantial time before it can start moving again. The car might also break
down, which will incur a repair time.
A2. Change the car type. Different type will have different speed in different terrains and different
fuel efficiency. After the change, the car fuel is at full tank (50 liters) and the pressure of all of
its tires are at 100% capacity.
A3. Change the driver. Different drivers will have different speed in different terrains.
A4. Change the tire(s) of existing car. The team has an option to change its entire tires with another
model. The available tire types are: all-terrain, mud, low-profile, performance. The different
types of tires and terrains will influence how far the car will move in a single step. After the
change, the pressure of all its tires is at 100% capacity.
A5. Add fuel to existing car. The car fuel level is an integer, with range 0 (empty tank) to 50 liters
(full tank). The time it takes to fill up the fuel by x amount is ceil(x/10) steps. For example: If
8 liters is added, at the next step the agent can do something else, but if 19 liters is added, only
at the next next step, the agent can choose another different action.
A6. Change pressure to the tires. The team can add/reduce the pressure to 50% capacity, 75%
capacity, or 100% capacity. The fuel consumption when the pressure is at 75% capacity is
double that of 100% capacity. The fuel consumption when pressure is at 50% capacity is triple
that of 100% capacity. However, the chances of slip at 75% capacity is double that of 50%
capacity, and the chances of slip at 100% capacity is triple that of 50% capacity.
A7. Combination of options A2 and A3.
A8. Combination of options A4 —A6. The time to do the change follows the slowest step (i.e., if
fuel is added, this will be fuel addition).
Page 2 of 5 – Advanced Topics in ML: Intelligent Robotics – COMP4680/8650
The game has 4 levels, which differ in the possible terrain types and available actions, as follows:
Level-1: Only two types of terrain (i.e., dirt and asphalt) with N ≤ 10, and only actions A1–A4 are
possible, with two different car models and two different drivers to choose from. Fuel and tire
pressure are assumed to remain the same throughout the race.
Level-2: Only four types of terrain (i.e., dirt-straight, dirt-slalom, asphalt-straight, and asphalt-
slalom) with N ≤ 10, and only actions A1–A6 are possible, with three different car models
and two different drivers to choose from.
Level-3: All eight terrain types (i.e., dirt-straight-hilly, dirt-straight-flat, dirt-slalom-hilly, dirt-slalom-
flat, asphalt-straight-hilly, asphalt-straight-flat, asphalt-slalom-hilly, and asphalt-slalom-flat)
with N ≤ 30, and only actions A1–A6 are possible, with five different car models and five
different drivers to choose from.
Level-4: All eight terrain types with N ≤ 30, and actions A1–A7 are possible, with five different car
models and five different drivers to choose from.
Level-5: All eight terrain types with N ≤ 30, and all eight classes of actions are possible, with five
different car models and five different drivers to choose from.
What you need to do
Your tasks in this part of the assignment are:
1. Design the solution. This task contains two main components, i.e., framing the problem as an
infinite horizon MDP problem and deciding how to solve the MDP problem.
2. Implement your design. To this end, you are provided with supporting Java codes that provide
a parser for the input and a simulator. Your program is allowed a maximum of 2 minutes com-
putation prior to each simulation run. If you use an online method for solving the MDP, at each
decision point step, your program is allowed a maximum of 15 seconds computation time. Your
program will be run on a PC in the demo room. The time requirement is for a program that runs
as a single-threaded process. If you use multi-threading, then we will divide the aforementioned
time limit with the number of threads you use.
Note: You are not allowed to use any library for linear algebra, optimization, and MDP solver.
3. Write a report, which answers the questions we post for the Report section (see the last part of
this document). Note that to answer the questions well, you need will need to run experiments.
Input and Output format
Input format. The input is a single .txt file, containing information about the race and how the terrain
types affect performance of the car, driver, and car components. The format is as follows.
1. The first line is the level of the game, i.e., 1, 2, 3, 4, or 5.
2. The second line consists of three numbers separated by a white space. The first number is the
discount factor, the second number is the time to recover from a slip, the last number is repair
time to recover from a breakdown.
3. The third line consists of two numbers. The first number is the number of cells in the map, i.e.,
N. The second number is the maximum number of time-steps the agent is allowed to take for
reaching the goal region. Let’s denote this number as MaxT . The purpose of this number is to
prevent having to wait for a very long time before realising the policy generated does not solve
the problem. The number MaxT will be sufficiently large, such that framing the problem as a
suitable infinite horizon MDP is sufficient.
4. Each line in line 4 to (3 + NT ), where NT is the number of terrain types, has two components
separated by a colon (‘:’). To the left of the colon is the type of terrain, written as all small letters
Page 3 of 5 – Advanced Topics in ML: Intelligent Robotics – COMP4680/8650
as per description of the terrains above. To the right of the colon is the index of the cells with
such a terrain type, separated by a comma. A continuous sequence of cells with the same terrain
type can be written as beginningCellIdx − endCellIdx. For example: dirt-straight-hilly:1-3,5,7
means cell index 1, 2, 3, 5, and 7 has dirt-straight-hilly as its terrain type. Note: If there’s no cell
with such terrain type, then the right side of the colon is empty. The cell index starts from 1.
5. Line 3 + NT + 1 is the number of car types. Let’s denote this as CT .
6. Each line in the subsequent CT lines has two components separated by a colon (‘:’). To the left
of the colon is the car type. The type of the car is always a single word. To the right of the colon
is 12 numbers separated by a white space. The numbers represent conditional probability that the
results of moving (i.e., k) is -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, slip, or breakdown, given the particular
car type is used.
7. Subsequent line is the number of drivers. Let’s denote this as DT .
8. Each line in the subsequent DT lines has two components separated by a colon (‘:’). To the left
of the colon is the driver’s name. The driver’s name is always a single word. To the right of the
colon is 12 numbers separated by a white space. The numbers represent conditional probability
that the results of moving (i.e., k) is -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, slip, or breakdown, given the
particular driver. 9. Each line in the subsequent 4 lines has two components separated by a colon
(‘:’). To the left of the colon is the tires model, written as per written in A4. To the right is
12 numbers separated by a white space. The numbers represent conditional probability that the
results of moving (i.e., k) is -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, slip, or breakdown, given the particular
tires model.
9. The next line consists of NT × CT (NT : #types of terrain, CT : #types of car) numbers. Each
number represents the amount of fuel used by a single moving action (i.e., A1) when pressure of
the tires is at 100% capacity. The first number is the number for the combination between the first
terrain type (as per Input #4) and the first car type (as per Input #6), the second is the number for
the first terrain type and the second car type, etc..
10. The next line consists of NT (NT : #types of terrain) numbers. Each number represents the slip
probability of a single moving action (i.e., A1) when pressure of the tires is at 50% capacity. The
first number is probability given the first terrain type and pressure of 50% capacity. The probabil-
ity of all other possible value of k is uniform.
Output format. The output is a single .txt file. The number of lines in the output file is at most the
same as the lower among thenumbero f stepstoreachthegoal+ 1 and MaxT + 1 (maxT refers to #3 on
Input format) lines.
Each line-i consists of a number and 2 tuples separated by a semi-colon (‘;’). The number represents
the current time-step. If the car receives a time “penalty” (i.e., due to slip/breakdown/adding fuel),
then the time step will jump accordingly. For instance, if at step-5, the car goes into a slip state and
the time to recover from slip is 2 steps, then the output file will have a line with time-step 5, no action
performed, and subsequent line with time-step 7.
The first tuple consists of 8 components separated by a comma. In sequential order, the components
are: the cell index where the car is currently located, whether the car is in slip condition or not (1 slip,
0 not slip), whether the car is in breakdown condition or not (1 breaks down, 0 good condition), the
current car type, the current driver’s name, the current tires type, the current fuel level, the current tire
pressure.
The second tuple consists of the index (e.g., A1) of the action(s) performed at time-i (in ascending
order) followed by the value (if values are required). The action index and its value are separated by
a colon (‘:’). If more than one actions are performed, the actions are separated by a comma. If there’s
no action performed, the second tuple is written as ‘(n.a.)’.
Page 4 of 5 – Advanced Topics in ML: Intelligent Robotics – COMP4680/8650
Marking Scheme for the Programming Part (total points: 50/100)
The details of the grading scheme is as follows. If your situation satisfies more than one marking
band, we will use the higher band.
[1, 5): The program does not compile nor run (staff discretion).
[5, 10): The program runs but fails to solve any query within the given time limit (staff discretion).
20: The program solves a query involving up to 2 level-1 cases.
30: The program solves a query involving up to 2 level-2 cases.
40: The program solves a query involving up to 2 level-3 cases.
45: The program solves a query involving up to 2 level-4 cases.
50: The program solves a query involving up to 2 level-5 cases.
Report (total points: 30/100)
Your report must contain answers to the following questions:
1. [10 pts] Please define your MDP problem.
2. [10 pts] Please explain the method you use to solve the problem at the conceptual level, i.e.,
approach and pseudo code, and additional heuristics applied.
3. [10 pts] Please analyse and discuss the positive and negative of your method and additional tricks
used. Please note that in this question, you will get higher mark for stronger arguments. A
strong argument, esp. for the positive of your proposed solution, should at least include a logical
explanation of why and include experimental results that backs your explanation. Also, please
also note that good explanation is NOT equal to long explanation.
Page 5 of 5 – Advanced Topics in ML: Intelligent Robotics – COMP4680/8650