INFO20003-sql代写|学霸联盟

INFO20003-sql代写

时间：2023-05-15

INFO20003 Database Systems 1© University of Melbourne
INFO20003 Database Systems
Lecture 12
Query Processing Part II
Week 6
Dr Renata Borovica-Gajic
INFO20003 Database Systems 2© University of Melbourne
Remember this? Components of a DBMS
DBMS
Index files
Heap files
Database
Concurrency
control module
File and access
methods mgr.
Buffer pool mgr.
Disk space mgr.
Storage module Concurrency
control module
Transaction
mgr.
Lock mgr.
Crash recovery
module
Log mgr.
Query processing module
Parser/
Compiler Optimizer Executor
This is one of several possible architectures;
each system has its own slight variations.
TODAY
Joins
Will briefly
touch upon …
INFO20003 Database Systems 3© University of Melbourne
Coverage
• Nested loops join
• Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems 4© University of Melbourne
Joins
• Are very common and can be very expensive (cross
product in the worst case)
• There are many implementation techniques for join
operations
• Join techniques we will cover:
1. Nested-loops join
2. Sort-merge join
3. Hash join
INFO20003 Database Systems 5© University of Melbourne
• In algebra: R S. They are very common and need to be
carefully optimized.
• R X S is large; so, R X S followed by a selection is inefficient.
• Join is associative and commutative:
− AxB == BxA
− Ax(BxC)==(AxB)xC
• Cost metric : Number of pages; Number of I/O
Equality Joins With One Join Column
SELECT *
FROM Reserves R1, Sailors S1
WHERE R1.sid=S1.sid

R S
12,10
12,10
11,80
13,74
12,20
13,75
13,35
13,7515,80
15,44
13,20
11,20
Example:
Left / Outer Right / Inner
INFO20003 Database Systems 6© University of Melbourne
Schema for Examples
• Sailors (S):
–80 tuples per page, 500 pages
–NPages(S) = 500, NTuplesPerPage(S) = 80
–NTuples(S) = 500*80 = 40000
• Reserves (R):
–100 tuples per page, 1000 pages
–NPages(R) = 1000, NTuplesPerPage(R) =100
–NTuples(R) = 100000
Sailors (sid: integer, sname: string, rating: integer, age: real)
Reserves (sid: integer, bid: integer, day: dates, rname: string)
INFO20003 Database Systems 7© University of Melbourne
Simple Nested Loops Join
• For each tuple in the outer relation R, we scan the entire inner
relation S
• Cost:
Cost (SNJL) = NPages(Outer) +
NTuples(Outer) * NPages(Inner)
• Our example:
Cost (SNLJ)= 1000+ 100*1000*500
= 50001000 (I/O)
foreach tuple r in R do
foreach tuple s in S do
if ri == sj then add to result
R S
12,10
12,10
11,80
13,74
12,20
13,75
13,35
13,7515,80
15,44
13,20
11,20
…
Pseudo code:
INFO20003 Database Systems 8© University of Melbourne
Page-Oriented Nested Loops Join
• For each page of R
–get each page of S
–write out matching pairs of tuples , where r is in R-page
and S is in S-page
Cost (PNJL) = NPages(Outer) +
NPages(Outer) * NPages(Inner)
• Our example:
Cost (PNLJ)= 1000+1000*500 = 501000 (I/O)
foreach page bR in R do
foreach page bS in S do
foreach tuple r in bR do
foreach tuple s in bSdo
if ri == sj then add to result
R S
12,10
12,10
11,80
13,74
12,20
13,75
13,35
13,7515,80
15,44
13,20
11,20
p1
p2
p1
p2
Pseudo code:
INFO20003 Database Systems 12© University of Melbourne
Block Nested Loops Join
• Page-oriented NL doesn't exploit extra memory buffers
• Alternative approach:
–Use one page as an input buffer for scanning the inner S,
one page as the output buffer, and use all remaining pages
to hold ‘block’ of outer R
• For each matching tuple r in R-block, s in S-page, add s> to result. Then read next R-block, scan S, etc
. . .
. . .
R & S
block of R tuples
(B-2 pages)
Input buffer for S Output buffer
. . .
Join Result
INFO20003 Database Systems 13© University of Melbourne
Block Nested Loops Join Cost
Cost (BNJL) = NPages(Outer) +
NBlocks(Outer) * NPages(Inner)
• NBlocks(Outer) = ()
−2
• Our example:
Let’s say we have 102 pages of space in memory, and consider
Reserves (R) as the outer and Sailors (S) as the inner table.
NBlocks(R) = 1000/(102-2) = 10
Cost(BNLJ) = 1000 + 10* 500 = 6000 I/O
R S
12,10
12,10
11,80
13,74
12,20
13,75
13,35
13,7515,80
15,44
13,20
11,20
B1
B2
INFO20003 Database Systems 14© University of Melbourne
Query Processing: Joins
• Nested loops join
• Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems 15© University of Melbourne
• Sort R and S on the join column, then scan them to do a
merge (on join column), and output result tuples
• Sorted R is scanned once;
• Each S group of the same key values is scanned once per matching
R tuple (typically means Sorted S is scanned once too).
• Useful when:
–one or both inputs are already sorted on join attribute(s)
–output is required to be sorted on join attributes(s)
Sort-Merge Join (R S)i=j
R S
12,10
12,10
11,80
13,75
12,20
13,74
13,35
13,75
15,80
15,44
13,20
11,20
INFO20003 Database Systems 16© University of Melbourne
Sort-Merge Join Cost
Cost (SMJ) = Sort(Outer) + Sort(Inner)
+ NPages(Outer) + NPages(Inner)
Sort(R) = External Sort Cost = 2*NumPasses*NPages(R)
Our example:
Let’s say that both Reserves and Sailors can be sorted in 2 passes,
then:
Cost(SMJ) = Sort R + Sort S + NPages(R) + NPages(S)
= 2*2*NPages(R)+ 2*2*NPages(S)
+ NPages(R) + NPages (S)
= 5*1000 + 5* 500 = 7500 I/O
Sort inputs
Merge inputs
INFO20003 Database Systems 18© University of Melbourne
Query Processing: Joins
• Nested loops join
• Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems 19© University of Melbourne
Hash-Join
• Partition both relations
using hash function h:
R tuples in partition l
will only match S
tuples in partition I
• Read in a partition of
R, hash it using h2 (<>
h!). Scan matching
partition of S, probe
hash table for matches
Partitions
of R & S
Input buffer
for Si
Hash table for partition
Ri (k < B-1 pages)
B main memory buffersDisk
Output
buffer
Disk
Join Result
hash
fn
h2
h2
B main memory buffers DiskDisk
Original
Relation OUTPUT
2INPUT
1
hashfunction
h B-1
Partitions
1
2
B-1
. . .
INFO20003 Database Systems 20© University of Melbourne
Hash-Join Cost
1. In partitioning phase, we read+write both relations
2. In matching phase, we read both relations
• Our example:
Cost(HJ) = 2*NPages(R) + 2*NPages(S) + NPages(R) + NPages(S)
= 3 * 1000 + 3* 500 = 4500 I/Os
Cost (HJ) = 2 * NPages(Outer) + 2* NPages(Inner)
+ NPages(Outer) + NPages(Inner)
Create partitions
Match partitions
INFO20003 Database Systems 21© University of Melbourne
Watch this video if you are confused
https://www.youtube.com/watch?v=o1dMJ6-CKzU
From 0:58
INFO20003 Database Systems 24© University of Melbourne
Query Processing: Joins
• Nested loops join
• Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems 25© University of Melbourne
General Join Conditions
• Equalities over several attributes (e.g., R.sid=S.sid AND
R.rname=S.sname):
–For Sort-Merge and Hash Join, sort/partition on combination of
the two join columns
• Inequality conditions (e.g., R.rname < S.sname):
–Hash Join, Sort Merge Join not applicable
–Block NL quite likely to be the best join method here
INFO20003 Database Systems 28© University of Melbourne
Summary
• A virtue of relational DBMSs:
– Queries are composed of a few basic operators
– Implementation of operators can be carefully tuned
– Important to do this
• Many alternative implementations for each operator
–No universally superior technique for most operators
• Must consider alternatives for each operation in a query and
choose best one based on system statistics…
–Part of the broader task of optimizing a query composed of
several operations
INFO20003 Database Systems 29© University of Melbourne
What’s examinable
• Understand alternatives for join operator implementations
–Be able to calculate the cost of alternatives
• Important for Assignment 3 as well
INFO20003 Database Systems 30© University of Melbourne
Next Lecture
- -
• Query optimization
‒ How does a DBMS pick a good query plan?