SEEM3550A-无代写-Assignment3
时间:2024-04-18
The Chinese University of Hong Kong Department of Systems Engineering
and Engineering Management SEEM3550A Fundamentals of Information
Systems 2023-24 Assignment 3: Query Processing
This is an
individual assignment. Everyone must submit it individually. The
due date for the assignment 3 is 17:00, Apr 25th, 2024. The late
penalty will be 10% per day. A submission will not be accepted after the
solution release. Q1. [20 marks] Answer the following questions
related to query cost. (a) [5 marks] “A relational algebra expression
specifies a unique evaluation algorithm”. Is this statement true? (b) [5
marks] What are the differences between the primary index and the
secondary index? (c) [5 marks] “A secondary index must be dense for the
index to be useful”. Is this statement true? (d) [5 marks] When
performing a merge join operation, what is the minimum number of
memory buffers required? Explain briefly.
Q2. [59 marks] You are
given the following three relations in a university database as below
(with primary keys underlined): INSTRUCTOR (Id, Name, DeptName, Salary)
COURSE (Cid, Name) TEACHES (Id, Cid, Semester, Year) The relation
INSTRUCTOR has 500 tuples, and 20 tuples fit into one block; the
relation COURSE has 200 tuples, and 20 tuples fit into one block; the
relation TEACHES has 500 tuples, and 50 tuples fit into one block. All
records are sorted in ascending order of the primary keys of the table
on disk (TEACHES is first sorted on Id, and then on Cid if two records
have the same Id). Assume that the INSTRUCTOR Id ranges from 1 to 500,
COURSE Cid from 1 to 200. It is assumed that a relation is stored on
disk consecutively. Please answer the following questions and show your
calculations clearly. (a) [8 marks] Answer the following questions and
elaborate on your answers. i. [4 marks] Assume that a relation is
stored on disk consecutively, how many disk seeks do we need if we read
the entire relation, and read the relation from a certain block,
respectively? ii. [4 marks] Assume that relation INSTRUCTOR has a
primary index A and a secondary index B. On what possible search keys
are A and B built, respectively? (b) [30 marks] Answer the following
questions and show your calculations clearly. i. [10 marks] Consider
the following SQL query: select * from COURSE where Name=’Data
Structure’ or Cid=50 Assume that relation COURSE has a B+-tree index of
height 5 on the Cid attribute. Further assume that the Cid of COURSE are
integers between 1 to 200. What are the numbers of disk seeks and block
transfers associated with the SQL query if we use the B+-tree index?
What are the numbers if we do not use the B+-tree index? Should we use
the B+-tree index? Explain briefly. ii. [10 marks] Consider the
following SQL query: select * from INSTRUCTOR where Name = ‘Jack’ If we
perform linear search, what are the numbers of disk seeks and block
transfers associated with the SQL query? iii. [10 marks] Consider
the following SQL query. select * from INSTRUCTOR where Id>100 Assume
that the Id of relation INSTRUCTOR are integers between 1 and 500.
Further assume that INSTRUCTOR has a B+-tree index of height 10 on the
Id attribute. If we use the B+-tree for the query, what are the numbers
of disk seeks and block transfers associated with the SQL query? (c)
[8 marks] Assume that relation INSTRCUTOR has a B+-tree index of height 3
on the
DeptName attribute, and a B+-tree index of height 4 on the
Name attribute. Further assume that there are 30 instructors from the
Computer Science Department. i. [4 marks] Is the B+-tree on Name a
primary or secondary index? Explain briefly. ii. [4 marks] Consider
the following SQL query: select * from INSTRUCTOR where DeptName =
‘Computer Science’ If we use the B+-tree on the DeptName attribute for
the query, what are the numbers of disk seeks and block transfers
associated with the SQL query? The number of accesses to the blocks that
contain pointers to data records in the data file is ignored: (d) [13
marks] Consider the following SQL query performing the join operation:
select * from INSTRUCTOR, TEACHES where INSTRUCTOR.Id = TEACHES.Id
i. [9 marks] Assume that 12 blocks are allocated to the memory buffer.
The query is implemented by nested-loop join. Which of the relations
should be the inner relation that leads to a smaller number of seeks and
block transfers? Explain your answer by comparing the two cases with
calculating the number of seeks and block transfers, when different
relations are chosen as the inner relation. ii. [4 marks] Assume that
INSTRUCTOR has a B+-tree index of height 3 on the Id attribute. The
query is implemented by indexed nested-loop join with TEACHES as the
outer relation. What are the numbers of disk seeks and block transfers?
Q3. [21 marks] Consider the following two relations R1 and R2: a b
b c 2 P 3 Q 3 R R2 R1 We would like to merge-join R1 and R2
on attribute b. However, R1 is unsorted on b, so we have to sort R1
before joining the relations. The records R2 are stored on attribute b
in ascending order sequentially from top to bottom. Assume 1 record
fits into 1 block of memory. Please show how an external sort-merge on
R1 on the attribute b is conducted, by illustrating the following steps:
(a) [6 marks] Create sorted runs, assuming 4 blocks are allocated to
the memory buffer.
• Draw figures as in p.35 of Ch. 12 lecture slides
•
There should be 2 runs. Label them sequentially as r1, r2. (b) [12
marks] Merge the sorted runs created above via 2-way merge (p.36 of Ch.
12 lecture slides). The memory buffer has 3 blocks denoted as M1, M2 and
O, respectively. Memory buffer
(blocks):
A 1 B 8 C 2 D 3 E 9 F 7 G 10 H 6
M1 M2 O
•
Show this process by stating the read/write operations in the form of
“[Read/Write] [X] to [Y]” For instance, “Read r1 to M1” stands for
reading a block of r1 from disk to the memory block M1, “Write M1 to O”
stands for writing the record in M1 to the output buffer, and “Write O
to Disk” stands for writing the record in O to disk. The first seven
operations are shown as a reference: Read r1 to M1 Read r2 to M2
Write M1 to O Write O to Disk Read r1 to M1 Write M1 to O Write O to
Disk (c) [3 marks] Show the resulting joined relation.