CSCI 4380 -无代写|学霸联盟

CSCI 4380 -无代写

时间：2025-04-21

CSCI 4380 Database Systems
2025 spring
Instructor: Lei Yu
Assistant Prof, Department of Computer Science
Outline:
• Test 2 grades released.
• Makeup test grades will be available soon.
• Lecture 21 Exercise after class: 48 hours
• Conventional indexes
• B-Trees
The Main Purpose of Index Structures
Speedup the search process
3
indexσa=6(R)
blocks
containing
the desired
tuples
quickly
figure out
disks
otherwise have to
scan the entire R
also need to handle dynamic changes of R
Root
B+Tree Example n=3
10
0
12
0
15
0
18
0
30
3 5 11 30 35 10
0
10
1
11
0
12
0
13
0
15
0
15
6
17
9
18
0
20
0
Sample non-leaf
to keys to keys to keys to keys
< 57 57£ k<81 81£k<95 ³95
57 81 95
Sample leaf node:
From non-leaf node
to next leaf
in sequence
57 81 95
To
re
co
rd

w
ith
k
ey
5
7
To
re
co
rd

w
ith
k
ey
8
1
To
re
co
rd

w
ith
k
ey
8
5
In textbook’s notation n=3
Leaf:
Non-leaf:
30 35
30
30 35
30
B+tree rules
q Rule 1. All leaves are at same lowest level (balanced
tree)
q Rule 2. Pointers in leaves point to records except for
“sequence pointer”
q Rule 3. Number of keys/pointers in nodes:
8
Max. #
pointers
Max. #
keys
Min. #
pointers
Min. #
keys
Non-leaf n+1 n é(n+1)/2ù é(n+1)/2ù-1
Leaf n+1 n ë(n+1)/2û+1 ë(n+1)/2û
Root n+1 n 2 1
A B+tree node of order n
pn+1knk2p2k1p1 p3 ……
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
pn
k1 k2 …
p1 p2 p3
Minimum Occupancy Rule (or Minimum Fill
Factor Rule)
• Balance and Predictable Height: To keep the tree from
becoming lopsided or sparse, we enforce a rule that each node
(except the root) must be at least half full.
• Space Efficiency: Nodes map directly to disk blocks, and
Accessing a half-empty block is a waste of I/O.
Max. #
pointers
Max. #
keys
Min. #
pointers
Min. #
keys
Non-leaf n+1 n é(n+1)/2ù é(n+1)/2ù-1
Leaf n+1 n ë(n+1)/2û+1 ë(n+1)/2û
Root n+1 n 2 1
Full node min. node
Non-leaf
Leaf
n=3
12
0
15
0
18
0
30
3 5 11 30 35
co
un
ts
e
ve
n
if
nu
ll
Search in a B+tree
1. Start from the root
2. Search in a leaf block
3. May not have to go to the data file
11
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
Search in B+tree
12
A tree node
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Search in B+tree
13
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Search in B+tree
14
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
Search(*prt, 130)
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Search in B+tree
15
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
Search(*prt, 130)
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Search in B+tree
16
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
Search(*prt, 130)
100 £ 130
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Search in B+tree
17
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
Search(*prt, 130)
100 £ 130
12
0 £
13
0 <
15
0
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Search in B+tree
18
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
Search(*prt, 130)
12
0 £
13
0 <
15
0
130 =130
return
100 £ 130
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Range Search in B+tree
19
To research all records whose key values are
between k1 and k2:
Range Search in B+tree
20
To research all records whose key values are
between k1 and k2:
Range-Search(ptr, k1, k2)
1. Call Search(ptr, k1);
2. Follow the sequence pointers until the
search key value is larger than k2.
Range Search in
B+tree
21
Range-Search(ptr, k1, k2)
1. Call Search(ptr, k1);
2. Follow the sequence pointers until the
search key value is larger than k2.
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Range Search in
B+tree
22
Range-Search(ptr, k1, k2)
1. Call Search(ptr, k1);
2. Follow the sequence pointers until the
search key value is larger than k2.
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
Range-Search(*prt, 50, 125)
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Range Search in
B+tree
23
Range-Search(ptr, k1, k2)
1. Call Search(ptr, k1);
2. Follow the sequence pointers until the
search key value is larger than k2.
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
Range-Search(*prt, 50, 125)
Search(*prt, 50)
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Range Search in
B+tree
24
Range-Search(ptr, k1, k2)
1. Call Search(ptr, k1);
2. Follow the sequence pointers until the
search key value is larger than k2.
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
50
< 1
00
30 £ 50
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
Range-Search(*prt, 50, 125)
Search(*prt, 50)A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Range Search in
B+tree
25
Range-Search(ptr, k1, k2)
1. Call Search(ptr, k1);
2. Follow the sequence pointers until the
search key value is larger than k2.
100
30 120 150 180
3 5 11 30 35 100 101 110 120 130 150 156 179 180 200
root
120 £ 125
return
50
< 1
00
30 £ 50
110 £ 125
101 £ 125
100 £ 125
return
returnreturn
135 > 125
STOP
35 < 50
Not
Return
Search(ptr, k);
\\ search a record of key value k in the subtree rooted at ptr
\\ assume the B+tree is a dense index of order n
Case 1. ptr is a leaf \\ pn+1 is the sequence pointer
IF (k = ki) for a key ki in *ptr THEN return(pi);
ELSE return(Null);
Case 2. ptr is not a leaf
find a key ki in *ptr such that ki-1 ≤ k < ki;
return(Search(pi, k));
Range-Search(*prt, 50, 125)
Search(*prt, 50)A tree node
k1 k2 ‧‧‧‧‧‧ kn
p1 p2 p3 pn+1pn
Insert into B+tree
26
Insert into B+tree
Basic idea:
1. Find the leaf L where the record r should be
inserted;
2. If L has further room, then insert r into L, and return;
3. If L is full, spilt L plus r into two leaves (each is
about half full): this causes an additional child for the
parent P of L, thus we need to add a child to P;
4. If P is already full, then we have to split P and add an
additional child to P’s parent … (recursively)
27
Insert into B+tree
q Simple case (space available for new child)
q Leaf overflow
q Non-leaf overflow
q New root
28
Insert into B+tree
q Simple case (space available for new child)
q Leaf overflow
q Non-leaf overflow
q New root
29
order n=3
30
I. Simple case: Insert key 32
100
30 40
3 5 11 30 31
order n=3
31
I. Simple case: Insert key 32
100
30 40
3 5 11 30 31
Insert(prt, 32)
order n=3
32
I. Simple case: Insert key 32
100
30 40
3 5 11 30 31
Insert(prt, 32)
32
< 1
00
30 £ 32 <40
order n=3
33
I. Simple case: Insert key 32
100
30 40
3 5 11 30 31
Insert(prt, 32)
32
< 1
00
30 £ 32 <40
room for 32
order n=3
34
I. Simple case: Insert key 32
100
30 40
3 5 11 30 31 32
Insert(prt, 32)
32
< 1
00
30 £ 32 <40
Insert into B+tree
q Simple case (space available for new child)
q Leaf overflow
q Non-leaf overflow
q New root
35
Insert into B+tree
q Simple case (space available for new child)
q Leaf overflow
q Non-leaf overflow
q New root
36
Insert into B+tree
q Simple case (space available for new child)
q Leaf overflow
q Non-leaf overflow
q New root
37
Idea: when there is no space for a new child
(all pointers are in use), split the node into
two nodes, with both at least half full.
38
Complication: node overflow
39
Complication: Leaf overflow
40
pn+1 kn+1
p1 k1 … p k p k … … pn kn p* + 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
Complication: Leaf overflow
p k p’
41
p1 k1 … p k --- p** p k … pn kn pn+1 kn+1 --- p* + 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
pn+1 kn+1
p1 k1 … p k p k … … pn kn p* + 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
Complication: Leaf overflow
p k k p’
p k p’
42
p1 k1 … p k --- p** p k … pn kn pn+1 kn+1 --- p* + 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
pn+1 kn+1
p1 k1 … p k p k … … pn kn p* + 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
Complication: Leaf overflow
p k k p’
p k p’
43
p1 k1 … p k --- p** p k … pn kn pn+1 kn+1 --- p* + 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
pn+1 kn+1
p1 k1 … p k p k … … pn kn p* + 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
Complication: Leaf overflow
p k k p’ q
p k p’
44
p1 k1 … p k --- p** p k … pn kn pn+1 kn+1 --- p*
p k k p’ q
+ 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
pn+1 kn+1
p1 k1 … p k p k … … pn kn p* + 1 /2
p k p’
+ 1 /2 + 1 /2 +1 + 1 /2 +1
Complication: Leaf overflow
?
What is the
key here?
45
p1 k1 … p k --- p** p k … pn kn pn+1 kn+1 --- p*
p k k p’ q
+ 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
pn+1 kn+1
p1 k1 … p k p k … … pn kn p* + 1 /2
p k p’
+ 1 /2 + 1 /2 +1 + 1 /2 +1
Complication: Leaf overflow
?
What is the
key here?
46
p1 k1 … p k --- p** p k … pn kn pn+1 kn+1 --- p*
p k k p’ q
+ 1 /2 + 1 /2 + 1 /2 +1 + 1 /2 +1
pn+1 kn+1
p1 k1 … p k p k … … pn kn p* + 1 /2
p k p’
+ 1 /2 + 1 /2 +1 + 1 /2 +1
Complication: Leaf overflow
+ 1 /2 +1
k ≤ x < k; + 1 /2 +1
47
100
30
3 5 11 30 31
Leaf overflow: Insert key 7 (order n = 3)
48
100
30
3 5 11 30 31
Leaf overflow: Insert key 7 (order n = 3)
7
49
100
30
3 5 30 31
Leaf overflow: Insert key 7 (order n = 3)
7 11
50
100
30
3 5 30 31
Leaf overflow: Insert key 7 (order n = 3)
7 11
3 5 11
7
51
100
30
3 5 30 31
Leaf overflow: Insert key 7 (order n = 3)
7 11
3 5 11
7
52
100
30
3 5 30 31
Leaf overflow: Insert key 7 (order n = 3)
7 11
3 5 11
7
53
100
30
3 5 30 31
Leaf overflow: Insert key 7 (order n = 3)
7 11
3 5 11
7
54
100
7 30
3 5 30 31
Leaf overflow: Insert key 7 (order n = 3)
7 11
3 5 11
7
55
Complication: nonleaf overflow
56
kn+1 pn+2p1 k1 … p k p k p k … pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2
p k’ p’
( + 1)/2 +1 ( + 1)/2 +1
Complication: nonleaf overflow
57
kn+1 pn+2p1 k1 … p k p k p k … pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2
p k’ p’
p1 k1 p2 k2 … p k p( + 1)/2 -1 ( + 1)/2 -1 p k … pn kn pn+1 kn+1 pn+2( + 1)/2 +1
k ( + 1)/2p k’ p’q
( + 1)/2 ( + 1)/2 +1
( + 1)/2 +1 ( + 1)/2 +1
Complication: nonleaf overflow
58
kn+1 pn+2p1 k1 … p k p k p k … pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2
p k’ p’
p1 k1 p2 k2 … p k p( + 1)/2 -1 ( + 1)/2 -1 p k … pn kn pn+1 kn+1 pn+2( + 1)/2 +1
k ( + 1)/2p k’ p’q
( + 1)/2 ( + 1)/2 +1
( + 1)/2 +1 ( + 1)/2 +1
Complication: nonleaf overflow
59
kn+1 pn+2p1 k1 … p k p k p k … pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2
p k’ p’
p1 k1 p2 k2 … p k p( + 1)/2 -1 ( + 1)/2 -1 p k … pn kn pn+1 kn+1 pn+2( + 1)/2 +1
k ( + 1)/2p k’ p’q
( + 1)/2 ( + 1)/2 +1
( + 1)/2 +1 ( + 1)/2 +1
Complication: nonleaf overflow
60
kn+1 pn+2p1 k1 … p k p k p k … pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2
p k’ p’
p1 k1 p2 k2 … p k p( + 1)/2 -1 ( + 1)/2 -1 p k … pn kn pn+1 kn+1 pn+2( + 1)/2 +1
k ( + 1)/2p k’ p’q
( + 1)/2 ( + 1)/2 +1
( + 1)/2 +1 ( + 1)/2 +1
Complication: nonleaf overflow
?
What is the
key here?
61
kn+1 pn+2p1 k1 … p k p k p k … pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2
p k’ p’
p1 k1 p2 k2 … p k p( + 1)/2 -1 ( + 1)/2 -1 p k … pn kn pn+1 kn+1 pn+2( + 1)/2 +1
k ( + 1)/2p k’ p’q
( + 1)/2 ( + 1)/2 +1
( + 1)/2 +1 ( + 1)/2 +1
Complication: nonleaf overflow
?
not used
What is the
key here?
62
kn+1 pn+2p1 k1 … p k p k p k … pn kn pn+1 ( + 1)/2( + 1)/2 -1 ( + 1)/2 -1 ( + 1)/2
p k’ p’
p1 k1 p2 k2 … p k p( + 1)/2 -1 ( + 1)/2 -1 p k … pn kn pn+1 kn+1 pn+2( + 1)/2 +1
k ( + 1)/2p k’ p’q
( + 1)/2 ( + 1)/2 +1
( + 1)/2 +1 ( + 1)/2 +1
Complication: nonleaf overflow

63
Nonleaf overflow: Insert key 160 (order n = 3)
100
120 150 180
150 156 179 180 210
160
64
Nonleaf overflow: Insert key 160 (order n = 3)
100
120 150 180
150 156 180 210
150 156 179
160
160 179
65
Nonleaf overflow: Insert key 160 (order n = 3)
100
120 150 180
150 156 180 210
150 156 179
160
160 179
66
Nonleaf overflow: Insert key 160 (order n = 3)
100
120 150 180
150 156 180 210
150 156 179
160
160 179
160
67
Nonleaf overflow: Insert key 160 (order n = 3)
100
120 150 180
150 156 180 210
150 156 179
160
160 179
no room!
160
68
Nonleaf overflow: Insert key 160 (order n = 3)
100
120
150 156 180 210
150 156 179
160
160 179
120 150 180
160
160 180150
69
Nonleaf overflow: Insert key 160 (order n = 3)
100
120
150 156 180 210
150 156 179
160
160 179
160 180150
120 150 180
160
70
Nonleaf overflow: Insert key 160 (order n = 3)
100 150
120
150 156 180 210
150 156 179
160
160 179
160 180150
120 150 180
160
Delete in B+tree
71
Delete in B+tree
Basic idea:
1. Find the leaf L where the record r should be
deleted;
2. If L remains at least half-full after deleting r, then
delete r, and return;
3. Else consider the sibling L’ of L;
3.1 If L’ is more than half-full, then move a record
from L’ to L, and return;
3.2 Else combine L and L’ into a single leaf;
3.3 But now you need to consider the case of deleting
a child from L’s parent … (recursively)
72
q Simple case: the node remains at least half-full after deletion.
q Re-distribute keys among siblings
q Coalesce with a sibling and delete a pointer from its father
73
Delete in B+tree
q Simple case: the node remains at least half-full after deletion.
q Re-distribute keys among siblings
q Coalesce with a sibling and delete a pointer from its father
74
Delete in B+tree
75
order n=3Simple case: Delete key 35
76
order n=3Simple case: Delete key 35
105
10 40
3 5 8 10 20 35
Delete(prt, 35)
40 50
77
order n=3Simple case: Delete key 35
105
10 40
3 5 8 10 20 35
Delete(prt, 35)
40 50
Delete
35
78
order n=3Simple case: Delete key 35
105
10 40
3 5 8 10 20 35
Delete(prt, 35)
40 50
After deletion, the
node still keeps at
least half-full
79
order n=3Simple case: Delete key 35
105
10 40
3 5 8 10 20 40 50
q Simple case: the node remains at least half-full after
deletion.
q Re-distribute keys among siblings
q Coalesce with a sibling and delete a pointer from its father
80
Delete in B+tree
81
Key re-distribution at leaves
p1 k1 … pi ki ---
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht --- p* q*
i = ë(n+1)/2û - 1 t > ë(n+1)/2û
82
p1 k1 … pi ki ---
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht --- p* q*
p1 k1 … pi ki q1 h1 ---
p’ k’ p k p” k”
q2 h2 … qt ht --- p* q*
i = ë(n+1)/2û - 1 t > ë(n+1)/2û
Key re-distribution at leaves
83
p1 k1 … pi ki ---
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht --- p* q*
p1 k1 … pi ki q1 h1 ---
p’ k’ p k p” k”
q2 h2 … qt ht --- p* q*
h2
i = ë(n+1)/2û - 1 t > ë(n+1)/2û
Key re-distribution at leaves
84
order n=3
Key re-distribution at leaves: Delete 5
85
order n=3
105
10 40
3 5 10 20 35
Delete(prt, 5)
40 50
Key re-distribution at leaves: Delete 5
86
105
10 40
3 5 10 20 35
Delete(prt, 5)
40 50
Delete 5
order n=3
Key re-distribution at leaves: Delete 5
87
105
10 40
3 5 10 20 35
Delete(prt, 5)
40 50
order n=3
Key re-distribution at leaves: Delete 5
88
105
10 40
3 5 10 20 35
Delete(prt, 5)
40 50
Less than
half-full !!
order n=3
Key re-distribution at leaves: Delete 5
89
105
10 40
3 5 10 20 35
Delete(prt, 5)
40 50
Look at the sibling,
which is more than
half-full, so we can
redistribute the keysa
order n=3
Key re-distribution at leaves: Delete 5
90
105
10 40
3 5 10 20 35
Delete(prt, 5)
40 50
Look at the sibling,
which is more than
half-full, so we can
redistribute the keys
order n=3
Key re-distribution at leaves: Delete 5
91
105
10 40
3 10 20 35
Delete(prt, 5)
40 50
redistribution
3 10 20 35
order n=3
Key re-distribution at leaves: Delete 5
92
105
10 40
3 10 20 35
Delete(prt, 5)
40 50
redistribution
3 10 20 35
20
order n=3
Key re-distribution at leaves: Delete 5
93
105
10 40
3 10 20 35
Delete(prt, 5)
40 50
redistribution
3 10 20 35
20
20
order n=3
Key re-distribution at leaves: Delete 5
94
105
20 40
3 10 20 35 40 50
order n=3
Key re-distribution at leaves: Delete 5
95
Key re-distribution at nonleaves
96
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
97
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
98
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
p1 k1 … pi ki pi+1 q1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
99
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
p1 k1 … pi ki pi+1 q1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
?
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
100
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
p1 k1 … pi ki pi+1 q1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
?
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
101
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
p1 k1 … pi ki pi+1 q1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
k’
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
102
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
p1 k1 … pi ki pi+1 q1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
k’
?
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
103
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
p1 k1 … pi ki pi+1 q1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
k’
?
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
104
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
p1 k1 … pi ki pi+1 q1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
k’
h1
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
105
Key re-distribution at nonleaves
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 q2 h2 … qt ht qt+1
-
p1 k1 … pi ki pi+1 q1
p’ k’ p k p” k”
q2 h2 … qt ht qt+1 k’
h1
i+1 = é(n+1)/2ù - 1 t+1> é(n+1)/2ù
q Simple case: the node remains at least half-full after
deletion.
q Re-distribute keys among siblings
q Coalesce with a sibling and delete a pointer from its father
106
Delete in B+tree
q Simple case: the node remains at least half-full after
deletion.
q Re-distribute keys among siblings
q Coalesce with a sibling and delete a pointer from its father
107
Delete in B+tree
Observation: when two siblings both are no more than half full,
coalesce them into a single node (which is nearly full)
108
Node Coalescence
109
Leaf Coalescence
110
p1 k1 … pi ki ---
p’ k’ p k p” k”
q1 h1 … qt ht --- p* q*
i = ë(n+1)/2û - 1 t = ë(n+1)/2û
Leaf Coalescence
111
p1 k1 … pi ki ---
p’ k’ p k p” k”
q1 h1 … qt ht --- p* q*
p1 k1 … pi ki
p’ k’ p k p” k”
q1 h1 … qt ht - q* q*
i = ë(n+1)/2û - 1 t = ë(n+1)/2û
Leaf Coalescence
112
p1 k1 … pi ki ---
p’ k’ p k p” k”
q1 h1 … qt ht --- p* q*
p1 k1 … pi ki
p’ k’ p k p” k”
q1 h1 … qt ht - q* q*
i = ë(n+1)/2û - 1 t = ë(n+1)/2û
Leaf Coalescence
113
Leaf coalescence : Delete key 5
114
order n=3
Leaf coalescence : Delete key 5
38
10
3 5 10 20
Delete(prt, 5)
40 50
60 80
60 75 80 90
115
order n=3
Leaf coalescence : Delete key 5
38
10
3 5 10 20
Delete(prt, 5)
40 50
60 80
60 75
Delete 5
80 90
116
order n=3
Leaf coalescence : Delete key 5
38
10
3 5 10 20
Delete(prt, 5)
40 50
60 80
60 75 80 90
117
order n=3
Leaf coalescence : Delete key 5
38
10
3 5 10 20
Delete(prt, 5)
40 50
60 80
60 75
The sibling is just
half-full, so we
should coalesce
80 90
118
order n=3
Leaf coalescence : Delete key 5
38
10
3 10 20 10 20
Delete(prt, 5)
40 50
60 80
60 75
3 5 10 20
Leaf
coalescence
80 90
119
order n=3
Leaf coalescence : Delete key 5
38
10
3 10 20 10 20
Delete(prt, 5)
40 50
60 80
60 75
3 5 10 20
Less than
half-full
Leaf
coalescence
80 90
120
order n=3
Leaf coalescence : Delete key 5
38
10
3 10 20 10 20
Delete(prt, 5)
40 50
60 80
60 75
3 5 10 20
Less than
half-full
Leaf
coalescence
more than
half-full, so
we can re-
distribute
pointers at
nonleaves
80 90
121
order n=3
38
10
3 10 20 10 20
Delete(prt, 5)
40 50
60 80
60 75
3 5 10 20
Less than
half-full
Leaf
coalescence
more than
half-full, so
we can re-
distribute
pointers at
nonleaves
80 90
Leaf coalescence : Delete key 5
122
order n=3
Key re-distribution at Nonleaves
38
10
3 10 20 10 20
Delete(prt, 5)
40 50
60 80
60 75
3 5 10 20
Less than
half-full
Leaf
coalescence
more than
half-full, so
we can re-
distribute
pointers at
nonleaves
80 90
123
order n=3
Key re-distribution at Nonleaves
38
10
3 10 20 10 20
Delete(prt, 5)
40 50
60 80
60 75
3 5 10 20
Less than
half-full
Leaf
coalescence
38
60 more than
half-full, so
we can re-
distribute
pointers at
nonleaves
80 90
124
order n=3
Key re-distribution at Nonleaves
38
10
3 10 20 10 20
Delete(prt, 5)
40 50
60 80
60 75
3 5 10 20
Less than
half-full
Leaf
coalescence
38
60
80
more than
half-full, so
we can re-
distribute
pointers at
nonleaves
80 90
125
order n=3
Key re-distribution at Nonleaves
60
38
3 10 20 40 50
80 80
60 75 80 90
126
Nonleaf Coalescence
127
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1
i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù
Nonleaf Coalescence
128
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1
i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù
Nonleaf Coalescence
129
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1 ?
i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù
Nonleaf Coalescence
130
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1 ?
i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù
Nonleaf Coalescence
131
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1 k
i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù
Nonleaf Coalescence
132
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1
p1 k1 … pi ki pi+1
p’ k’ p k p” k”
q1 h1 … qt ht qt+1 q1 h1 … qt ht qt+1 k
i+1 = é(n+1)/2ù - 1 t+1= é(n+1)/2ù
Nonleaf Coalescence
133
order n=3
Nonleaf coalescence : Delete key 5
134
order n=3
Nonleaf coalescence : Delete key 5
55
10
Delete(prt, 5)
55 58
60
61 723 5 10 20
135
order n=3
Nonleaf coalescence : Delete key 5
55
10
Delete(prt, 5)
55 58
60
61 723 5 10 20
delete 5
136
order n=3
Nonleaf coalescence : Delete key 5
55
10
Delete(prt, 5)
55 58
60
61 723 5 10 20
137
order n=3
Nonleaf coalescence : Delete key 5
55
10
Delete(prt, 5)
55 58
60
61 723 10 20 10 20
leaf
coalescence
3 5 10 20
138
order n=3
Nonleaf coalescence : Delete key 5
55
10
Delete(prt, 5)
55 58
60
61 723 10 20 10 20
leaf
coalescence
3 5 10 20
less than
half-full
139
order n=3
Nonleaf coalescence : Delete key 5
55
10
Delete(prt, 5)
55 58
60
61 723 10 20 10 20
leaf
coalescence
3 5 10 20
less than
half-full
just half-full,
so we need to
coalesce
140
order n=3
Nonleaf coalescence : Delete key 5
55
55 60
Delete(prt, 5)
55 58
60
61 723 10 20 10 20
leaf
coalescence
3 5 10 20
55 60
nonleaf
coalescence
141
order n=3
Nonleaf coalescence : Delete key 5
55
55 60
Delete(prt, 5)
55 58
60
61 723 10 20 10 20
leaf
coalescence
3 5 10 20
55 60
nonleaf
coalescence new root
142
order n=3
Nonleaf coalescence : Delete key 5
55 60
55 58 61 723 10 20
B+tree deletions in practice
q Often, coalescing is not implemented
Too hard and not worth it!
143
Pseudo Code for Insertion in B+tree
Insert(pt, (p, k), (p', k')); \\ technically, the smallest key kmin in pt is also returned
\\ (p, k) is a pointer-key pair to be inserted into the subtree rooted at pt; p' is a new sibling
\\ of pt, if created, and k' is the smallest key value in p' ;
Case 1. pt = (p1, k1, ..., pi, ki, --, pn+1) is a leaf \\ pn+1 is the sequence pointer
IF (i < n) THEN insert (p, k) into pt; return(p' = Null, k' = 0);
ELSE re-arrange (p1, k1), ..., (pn, kn), and (p, k) into (ρ1, κ1, ..., ρn+1, κn+1);
create a new leaf p''; put (ρr+1, κr+1, …, ρn+1, κn+1, --, pn+1) in p''; \\ r = ë(n+1)/2û
put (ρ1, κ1, ..., ρr, κr, --, p'') in pt; \\ pn+1 and p'' are sequence pointers in p'' and pt.
IF pt is the root THEN create a new root with children pt and p'' (and key κr+1)
ELSE return(p' = p'', k' = κr+1);
Case 2. pt is not a leaf
find a key ki in pt such that ki-1 ≤ k < ki;
Insert(pi , (k, p), (k", p"));
IF (p" = Null) THEN return(k' = 0, p' = Null);
ELSE IF there is room in pt, THEN insert (k", p") into pt; return(k' = 0, p' = Null);
ELSE re-arrange the content in pt and (k", p") into (ρ1, κ1, ..., ρn+1, κn+1, ρn+2);
create a new node p''; put (ρr+1, κr+1, …, ρn+1, κn+1, ρn+2, -- ) in p''; \\ r = é(n+1)/2ù
leave (ρ1, κ1, ..., ρr-1, κr-1, ρr , -- ) in pt;
IF pt is the root THEN create a new root with children pt and p'' (and key κr)
ELSE return(p' = p'', k' = κr ).

144
Pseudo Code for Deletion in a B+tree
Delete(pt, (k,p), belowmin); \\ technically, the smallest key kmin in *pt is also returned
\\ (k,p) is the data record to be deleted from the subtree rooted at pt; belowmin = true if
\\ after deletion, pt has fewer than the required min # of pointers;
Case 1. pt is a leaf
delete (k,p) in pt;
IF pt has at least ë(n+1)/2û data pointers OR pt is the root
THEN return (belowmin = false) ELSE return (belowmin = true);
Case 2. pt is not a leaf
find a key ki in pt such that ki-1 ≤ k < ki;
Delete(pi , (k, p), belowmin');
IF (not belowmin') THEN return(belowmin= false);
ELSE IF pi has an adjacent sibling p' that has more than the required min # of pointers
THEN move one key-pointer pair from p' to pi;
ELSE combine pi with an adjacent sibling of pi into a single node;
IF pt is the root with only one pointer pi
THEN pt = pi; return(belowmin= false);
IF pt has at least é(n+1)/2ù pointers OR pt is the root
THEN return(belowmin= false) ELSE return(belowmin= true);

145
Application of B+ Tree
• Range query 19. Range Search in B+tree
• (a,b)
• What if b is infinite? - to the end of the chain of leaves.
• What if a is –infinite? – start from the first leaf.
• Duplicate keys
• B-tree on multiple attributes
• Scan efficiency
B-tree with Duplicate keys
• A non-leaf node stores the first key of the first node that is not
already present in the previous left sibling.
• If there is no such key, then a null (placeholder) value is stored at
this location.
B-tree with Duplicate keys
• A non-leaf node stores the first key of the first node that is not
already present in the previous left sibling.
• If there is no such key, then a null (placeholder) value is stored at
this location.
B-tree with Duplicate keys
• A non-leaf node stores the first key of the first node that is not
already present in the previous left sibling.
• If there is no such key, then a null (placeholder) value is stored at
this location.
B-tree with Duplicate keys
• A non-leaf node stores the first key of the first node that is not
already present in the previous left sibling.
• If there is no such key, then a null (placeholder) value is stored at
this location.
B-tree with Duplicate keys
• A non-leaf node stores the first key of the first node that is not
already present in the previous left sibling.
• If there is no such key, then a null (placeholder) value is stored at
this location.
Any key search becomes range query.
Index on multiple attributes
• An index on multiple attributes like A,B will sort tuples first by A
and then by B Index on R(X, Y)
X = 'C' & Y = 8
X = 'J' & 3 <= Y <= 8
Search for J,3 - J,8
'C' <= X <= 'E' & Y = 4
Search for C,4 - E,4
'C' <= X <= 'E' & 3 <= Y <= 8
Search for C,3 - E,8
Y <= 5
B-tree indices
• Each node in a B-tree is a disk page (stored on disk)
• Each B-tree is created on one or more key values
• Leaf nodes store one entry for each tuple: each entry has a key
value and a pointer to a tuple (dense index: one leaf node for each
tuple)
Example
• 92,000 tuples in instructors Index on instructors(coursecode,
instructor)
• To index each tuple: 9 (course code) + 16 (instructor) + 20 byte (for
tuple address) 45 bytes per tuple!
• Index page (assume) is 8KB Store: 8000/45 = about 177 tuples per
node
• Each node must have: 177 tuples max (88 tuples min) Assume
each node is 100% full (mostly):
• At leaf level: ceil(92,000/177) = 520 nodes in the leaf level
• The level above: ceil(520/177) = 3 nodes in the next internal level The level
above: ceil(3/177) = 1 node (root)
• Assume each node is 50% full (mostly):
• At leaf level: 1040 nodes At a level above: 12 nodes At a level above: 1
node
Cost of queries/index search
• Number of disk pages read/written
• For any search, find the first leaf node in the search and then
scan leaf nodes left to right following sibling pointers until the end
of the range is reached. This will find all the matching tuples for a
query.
• Depending on the query, these tuples may need to be read from
disk as well.
Example
• R(A,B,C,D,E), Assume: 100,000 tuples 10,000 pages ( 10 tuples
per page)
• Index I1 on R(A): 400 pages at leaf, 3 levels (root+internal+leaf)
• Index I2 on R(B): 400 pages at leaf, 3 levels (root+internal+leaf)
• Index I3 on R(A,B): 1,000 pages at leaf, 3 levels
(root+internal+leaf)
• Index I4 on R(B,A): 1,000 pages at leaf, 3 levels
(root+internal+leaf)
Example
• SELECT A,B,C,D,E FROM R
WHERE 10 <= A and A <= 20 ; (returns: 100 tuples)
Query plan 1 -> Scan the whole relation: Cost = 10,000
Query plan 2: Scan I1 (find tuple pointers that match 10<=A<=20) + Read all
tuples that I find based on the index from relation R
100,000/400 = 250 tuples per node
100 tuples for this query will fit in a single leaf node!
Cost to scan the index = 1 (root) + 1 (internal) + 1 or 2 (leaf) = 4 max
To read the matching 100 tuples: I need to read pages of R
How many pages do I need to read?
Best case: 10 pages (all packed sorted by A) + 4 = 14 pages
Worst case: 100 Cost of this query (max) : 4 + 100 = 104 pages
Example
• Query plan 3: Use index I3:
• 100,000 (tuples of R) / 1,000 = 100 tuples per node
• Cost of index search = 4 (max)
• Cost of searching the relation = 100 Total cost = 104 (max)
• Query plan 4: Use index I4: Index I4 on R(B,A): 1,000 pages at leaf,
3 levels (root+internal+leaf)
• Scan all leaf, cost of index search = 1 (root) + 1 (internal) + 1,000 (all leaf)
= 1002
• Scan the tuples = 100
• Cost = 1,102
Example
• SELECT B FROM R WHERE 10 <= A and A <= 20 ; (returns: 100
tuples)
• Query plans:
• 1. Scan all of R = 10,000
• 2. Use I1 on R(A) = 104 (max)
• 3. Use I3 on R(A,B) = 4 (index only search)
• 4. Use I4 on R(B,A) = 1,002 (index only search)

学霸联盟