COMP9311-无代写
时间:2023-08-10
Relational
Database Design
COMP9311 2023T2; Week 7.1
By Zhengyi Yang, UNSW
1
Notice
(Welcome back from Quiet Week)
Project 1: Due on the July 15th
Clarification on Project Late Penalties
2
From The Previous Lecture
Redundancy/Anomalies can be removed from relation designs by
decomposing them until they are in a normal form.
3
Decomposition
4
Decomposition
Example: R={A, B, C, D, E}
R1 = {A, B}
R2 = {A, C}
R3 = {C, D, E}
A naive decomposition: each relation has only one attribute?
5
On Decompositions
Important: it is improper to assess the quality of your decompositions
by independently checking to see if the resulting relations are in a
higher form.
A good decomposition should also have the following two properties.
1. the dependency preservation property
2. the nonadditive (or lossless) join property
Together, they gives us desirable decompositions
6
Dependency Preserving
A decomposition D={R1, …, Rn} of R is dependency-preserving
wrt a set F of FDs if:
(F1 ∪ … ∪ Fn)
+ = F+,
where Fi means the projection of F onto Ri.
7
Projection of F
8
Projection of F Example
9
Projection of F Example
10
Dependency Preservation Example (1)
11
Dependency Preservation Example (1)
12
Dependency Preservation Example (2)
13
Dependency Preservation Example (2)
14
Dependency Preservation Example (3)
Third Example:
R = (A, B, C, D, E, G, M)
Consider F = {A -> BC, D -> EG, M -> A , M -> C, C -> D, M -> D}
Decomposition into R1 and R2
R1= ( A, B, C, M) and R2 = (C, D, E, G)
F1 = {A -> BC, M -> A, M -> C}, F2 = {D -> EG, C -> D}
(Question: Is this dependency preserving?)
15
Dependency Preservation Example (3)
Third Example:
R = (A, B, C, D, E, G, M)
Consider F = {A -> BC, D -> EG, M -> A , M -> C, C -> D, M -> D}
Decomposition into R1 and R2
R1= ( A, B, C, M) and R2 = (C, D, E, G)
F1 = {A -> BC, M -> A, M -> C}, F2 = {D -> EG, C -> D}
(Question: Is this dependency preserving?)
Once again F1 U F2 is not the same as F
We can verified that M -> D is inferred by F1 and F2.
Thus, F+ = (F1 U F2)+ (they are equivalent)
Hence, R1 and R2 are dependency preserving regarding F.
16
Lossless Join Decomposition
17
Lossy Join Decomposition(cont)
18
STUDENT_ADVISOR
Name Department Advisor
Jones Comp Sci Smith
Ng Chemistry Turner
Martin Physics Bosky
Dulles Decision Sci Hall
Duke Mathematics James
James Comp Sci Clark
Evan Comp Sci Smith
Baxter English Bronte
A Lossy Join Decomposition(cont)
STUDENT_DEPARTMENT
Name Department
Jones Comp Sci
Ng Chemistry
Martin Physics
Duke Mathematics
Dulles Decision Sci
James Comp Sci
Evan Comp Sci
Baxter English
19
DEPARTMENT_ADVISOR
Department Advisor
Comp Sci Smith
Chemistry Turner
Physics Bosky
Decision Sci Hall
Mathematics James
Comp Sci Clark
English Bronte
STUDENT_ADVISOR
Name Department Advisor
Jones Comp Sci Smith
Ng Chemistry Turner
Martin Physics Bosky
Dulles Decision Sci Hall
Duke Mathematics James
James Comp Sci Clark
Evan Comp Sci Smith
Baxter English Bronte
A Lossy Join Decomposition(cont)
20
Name Department Advisor
Jones Comp Sci Smith
Jones Comp Sci Clark*
Ng Chemistry Turner
Martin Physics Bosky
Dulles Decision Sci Hall
Duke Mathematics James
James Comp Sci Smith*
James Comp Sci Clark
Evan Comp Sci Smith
Evan Comp Sci Clark*
Baxter English Bronte
This is not the same as the
original relation (the tuples
marked with * have been
added). Thus the decomposition
is lossy.
There is a simple test to see if a decomposition is lossy by check if this
dependency exists.
Test: A decomposition of R into R1 and R2 is lossless join iff the common
attributes R1 ∩ R2 form a superkey for either R1 OR R2.
This only works for binary decompositions.
A Lossy Join Decomposition(cont)
21
Lossless Join Property
Note: the above test only applies for simple binary decompositions
We restate the theorem: The decomposition {R1,R2} of R is lossless iff
the common attributes R1∩ R2 form a superkey for either R1 or R2.
Exercise: Given R(A,B,C) and F = {A→ B}.
Is the decomposition into R1(A,B) and R2(A,C) lossless?
22
Lossless Join Property
Note: the above test only applies for simple binary decompositions
We restate the theorem: The decomposition {R1,R2} of R is lossless iff
the common attributes R1∩ R2 form a superkey for either R1 or R2.
Exercise: Given R(A,B,C) and F = {A→ B}.
Is the decomposition into R1(A,B) and R2(A,C) lossless?
Yes
23
Lossless Join Property
Note:
◦ The word loss in lossless refers to loss of information
◦ The word loss in lossless does not refer to a loss of tuples
In fact…
◦ A decomposition without the lossless join property leads to
additional spurious tuples after NATURAL JOIN operations
◦ These additional tuples contribute to erroneous or invalid
information
◦ A decomposition with a lossless join property will not lead to
additional tuples; Therefore, it is also known as non-additive join.
24
Test Lossless Join property
This previous test works on binary decompositions, below is the general
solution to testing lossless join property
Algorithm TEST_LJ
1. Create a matrix S, each element si,j ∈S corresponds the relation
Ri and the attribute Aj, such that: sj,i = a if Ai ∈ Rj , otherwise sj,i =
b.
2. Repeat the following process until (1) S has no change OR (2) one
row is made up entirely of “a” symbols.
i. For each X→ Y , choose the rows where the elements
corresponding to X take the value a.
ii. In those chosen rows (must be at least two rows), the elements
corresponding to Y also take the value a if one of the chosen
rows take the value a on Y .
Verdict: Decomposition is lossless if one row is entirely made up by “a”
values.
25
Testing lossless join property(cont)
Example 1: R = (A,B,C,D),
F = {A→B, A →C, C → D}.
Let R1 = (A,B,C), R2 = (C,D).
Note: rows 1 and 2 of S agree on {C}, which
is the left-hand side of C→D. Therefore,
change the D value on rows 1 to a, matching
the value from row 2.
Now row 1 is entirely a’s, so the
decomposition is lossless.
26
CHEAT SHEET: Algorithm TEST_LJ
1. Create a matrix S, each element si,j∈S
corresponds the relation Ri and the
attribute Aj, such that: sj,i = a if Ai∈ Rj,
otherwise sj,i = b.
2. Repeat the following process till S has no
change or one row is made up entirely of
“a” symbols.
1. For each X→ Y , choose the rows
where the elements corresponding to X
take the value a.
2. In those chosen rows (must be at least
two rows), the elements corresponding
to Y also take the value a if one of the
chosen rows take the value a on Y .
A B C D
R1 a a a b
R2 b b a a
Testing lossless join property(cont)
Example 2: R = (A,B,C,D,E),
F = {AB →CD ,A → E, C → D}.
Let R1 = (A,B,C),
R2 = (B,C,D) and
R3 = (C,D,E).
27
A B C D E
R1 a a a b b
R2 b a a a b
R3 b b a a a
X
a
Not lossless join
CHEAT SHEET: Algorithm TEST_LJ
1. Create a matrix S, each element si,j∈S
corresponds the relation Ri and the
attribute Aj, such that: sj,i = a if Ai∈ Rj,
otherwise sj,i = b.
2. Repeat the following process till S has no
change or one row is made up entirely of
“a” symbols.
1. For each X→ Y , choose the rows
where the elements corresponding to X
take the value a.
2. In those chosen rows (must be at least
two rows), the elements corresponding
to Y also take the value a if one of the
chosen rows take the value a on Y .
Testing lossless join property(cont)
Example 3: R = (A,B,C,D,E,G),
F = {C → DE, A → B, AB → G}.
Let R1 = (A,B), R2 = (C,D,E) and R3 = (A,C,G).
28
A B C D E G
R1 a a b b b b
R2 b b a a a b
R3 a b a a a aX
a
A B C D E G
R1 a a b b b b
R2 b b a a a b
R3 a b a b b aX
a
X
a
CHEAT SHEET: Algorithm TEST_LJ
1. Create a matrix S, each element si,j∈S
corresponds the relation Ri and the
attribute Aj, such that: sj,i = a if Ai∈ Rj,
otherwise sj,i = b.
2. Repeat the following process till S has no
change or one row is made up entirely of
“a” symbols.
1. For each X→ Y , choose the rows
where the elements corresponding to X
take the value a.
2. In those chosen rows (must be at least
two rows), the elements corresponding
to Y also take the value a if one of the
chosen rows take the value a on Y .
Checkpoint
Previous:
1. The test for lossless join property
2. The dependency preservation property
Next:
1. The method to decompose to 3NF and BCNF
2. Minimal Cover and Equivalence
29
Break 15min
30
Testing for BCNF
Testing of a relation schema R to see if it satisfies BCNF can be simplified
in some cases:
o To check if a nontrivial dependency α → β causes a violation of BCNF,
compute α+ (the attribute closure of α), and verify that it includes all
attributes of R; that is, it is a superkey for R.
o To check if a relation schema R is in BCNF, it suffices to check only
the dependencies in the given set F for violation of BCNF, rather than
check all dependencies in F +.
31
Testing for BCNF
NOTE: We cannot use F to test relations Ri (decomposed from R) for
violation of BCNF. It may not suffice.
Consider R(A, B, C, D, E) with F = {A -> B, BC -> D}.
Suppose R is decomposed into R1 = (A, B) and R2 = (A, C, D, E).
Neither of the dependencies in F contains only attributes from R2.
So R2 is in BCNF? No, AC -> D is in F+.
Example above : A X →Y violating BCNF is not always in F.
It passing with respect to the projection of F on Ri
32
Testing Decomposition for BCNF
An alternative BCNF test is sometimes easier than computing every
dependency in F+.
To check if a relation schema Ri in a decomposition of R is truly in BCNF,
we apply this test:
For each subset X of Ri , computer X
+.
o X →(X+|Ri − X) violates BCNF, if X
+|Ri − X ≠ ∅ and Ri − X
+
≠ ∅ .
o This will show if Ri violates BCNF.
Explanation:
o X+|Ri − X = ∅ means each F.D with X as the left-hand side is trivial;
o Ri − X
+ = ∅ means X is a superkey of Ri
33
Lossless Decomposition into BCNF
Algorithm TO_BCNF
◦ D := {R1,R2, ...Rn}
◦ While ( there exists a Ri ∈ D and Ri is not in BCNF ) Do
1 . find a X →Y in Ri that violates BCNF;
2. replace Ri in D by ( Ri − Y ) and (X ∪ Y );
34
Lossless Decomposition into BCNF
Example:
Find a BCNF decomposition of the relation scheme below:
SHIPPING (Ship , Capacity , Date , Cargo , Value)
F consists of:
Ship → Capacity
{Ship , Date} → Cargo
{Cargo , Capacity} → Value
We know this relation is not in BCNF
35
Algorithm TO_BCNF
D := {R1,R2, ...Rn}
While ( there exists a Ri ∈ D
and Ri is not in BCNF ) Do
1 . find a X →Y in Ri that
violates BCNF;
2. replace Ri in D by ( Ri − Y )
and (X ∪ Y );
Lossless Decomposition into BCNF (V1)
From Ship→ Capacity, we decompose SHIPPING into R1A and R 2A
R1A(Ship , Date , Cargo , Value) with Key: {Ship,Date}
A nontrivial FD in F+ violates BCNF: {Ship , Cargo} → Value
and
R2A(Ship , Capacity) with Key: {Ship}
Only one nontrivial FD in F+: Ship → Capacity
36
SHIPPING (Ship , Capacity , Date , Cargo , Value)
F consists of: Ship → Capacity, {Ship , Date}→ Cargo, {Cargo , Capacity}→ Value
Lossless Decomposition into BCNF (V1)
R1 is not in BCNF so we must decompose it further into R11A and R12A
R11A (Ship , Date , Cargo) with Key: {Ship,Date}
Only one nontrivial FD in F+ with single attribute on the right side: {Ship , Date} →Cargo
and
R12A (Ship , Cargo , Value) with Key: {Ship,Cargo}
Only one nontrivial FD in F+ with single attribute on the right side: {Ship,Cargo} → Value
This is in BCNF, and the decomposition is lossless but not dependency
preserving (the FD {Capacity, Cargo} → Value) has been lost.
37
SHIPPING (Ship , Capacity , Date , Cargo , Value)
F consists of: Ship → Capacity, {Ship , Date}→ Cargo, {Cargo , Capacity}→ Value
Lossless Decomposition into BCNF (V2)
Or we could have chosen {Cargo , Capacity} →Value, which would give us:
R1B (Ship , Capacity , Date , Cargo) with Key: {Ship,Date}
A nontrivial FD in F+ violates BCNF: Ship → Capacity
and
R2B (Cargo , Capacity , Value) with Key: {Cargo,Capacity}
Only one nontrivial FD in F+ with single attribute on the right side: {Cargo , Capacity} → Value
Once again, R1B is not in BCNF so we must decompose it further…
38
SHIPPING (Ship , Capacity , Date , Cargo , Value)
F consists of: Ship → Capacity, {Ship , Date}→ Cargo, {Cargo , Capacity}→ Value
Lossless Decomposition into BCNF (V2)
R1 is not in BCNF so we must decompose it further into R11B and R12B
R11B (Ship , Date , Cargo) with Key: {Ship,Date}
Only one nontrivial FD in F+ with single attribute on the right side: {Ship , Date} → Cargo
and
R12B (Ship , Capacity) with Key: {Ship}
Only one nontrivial FD in F+: Ship → Capacity
This is in BCNF, and the decomposition is both lossless and dependency
preserving.
39
SHIPPING (Ship , Capacity , Date , Cargo , Value)
F consists of: Ship → Capacity {Ship , Date}→ Cargo, {Cargo , Capacity}→ Value
Lossless Decomposition into BCNF
With this algorithm from the previous slide…
We get a decomposition D of R that does the following:
◦ May not preserves dependencies
◦ Has the lossless join property
◦ Is such that each resulting relation schema in the decomposition is in BCNF
40
41
Since a X →Y violating BCNF is not always in F, the main difficulty is to verify if Ri is in
BCNF; see the approach below:
1. For each subset X of Ri, computer X
+.
2. X →(X+|Ri − X) violates BCNF, if X
+|Ri − X ≠ ∅ and Ri − X
+ ≠ ∅ .
Here, X+|Ri − X = ∅ means that each F.D with X as the left hand side is trivial;
Ri − X
+ = ∅ means X is a superkey of Ri
Lossless decomposition into BCNF
Algorithm TO_BCNF
D := {R1,R2, ...Rn}
While ∃ a Ri ∈ D and Ri is not in BCNF Do
{ find a X →Y in Ri that violates BCNF; replace Ri in D by (Ri − Y ) and (X
∪ Y ); }
Practice
F = { A→B, A→C, A→D, C→E, E→D, C→G },
R1 = (C, D, E, G), R2 = (A, B, C, D)
42
Practice
F = { A→B, A→C, A→D, C→E, E→D, C→G },
R1 = (C, D, E, G), R2 = (A, B, C, D)
Answer:
R11 = (C, E, G), R12 = (E, D) because of E -> D
R21 = (A, B, C), R22 = (C, D) because of C -> D
43
Lossless and dependency-preserving
decomposition into 3NF
A lossless and dependency-preserving decomposition into 3NF is always
possible.
More definitions regarding FD’s are needed.
44
Equivalence
Definition (equivalence): Two sets of functional dependencies E and
F are equivalent if E+ = F+.
Equivalence can also be understood via cover defined as follows
Definition (cover): A set of functional dependencies F is said to cover
another set of functional dependencies E if every FD in E is also in
F+; that is, if every dependency in E can be inferred from F;
alternatively, we can say that E is covered by F.
Explanation (equivalence): Therefore, equivalence means that every
FD in E can be inferred from F, and every FD in F can be inferred
from E; that is, E is equivalent to F if both the conditions—E covers F
AND F covers E—hold
45
Minimal Cover
Definition (equivalence): Two sets of functional dependencies E
and F are equivalent if E+ = F+.
Definition. A minimal cover Fmin of a set of functional dependencies E is a minimal set of dependencies (in the standard
canonical form and without redundancy) that is equivalent to E.
Property: If any dependency from the set F is removed; this
property is lost F
A minimal cover for F is a minimal set of FD’s Fmin such that F
+ = F+min.
46
Minimal Cover
A set F of FD’s is minimal if
1. Every FD X→ Y in F is simple: Y consists of a single attribute,
2. Every FD X→ A in F is left-reduced: there is no proper subset Y ⊂ X
such that X → A can be replaced with Y→A.
3. No FD in F can be removed; that is, there is no FD X→A in F such that
(F − {X → A})+ = F+.
47
Prereq. for Algorithm (1)
(Condition one)
Algorithm Reduce_right
◦ INPUT: F.
◦ OUTPUT: right side reduced F’.
◦ For each FD X→ Y ∈ F where Y = {A1,A2, ...,Ak}, we use all X →{Ai} (for 1≤
i ≤ k) to replace X→ Y .
48
Prereq. for Algorithm (2)
(Condition two)
Algorithm Reduce_left
◦ INPUT: right side reduced F.
◦ OUTPUT: right and left side reduced F’.
◦ For each X → {A} ∈ F where X = {Ai : 1 ≤ i ≤ k}, do the following. For i = 1
to k, replace X with X − {Ai} if A∈(X − {Ai})
+.
49
Prereq for Algorithm (3)
(Condition three)
Algorithm Reduce_redundancy
◦ INPUT: right and left side reduced F.
◦ OUTPUT: a minimum cover F’ of F.
◦ For each FD X → {A} ∈ F, remove it from F if: A ∈ X+ with respect to F −
{X →{A}}.
50
Algorithm for Minimal Cover
Algorithm Min_Cover
Input: a set F of functional dependencies.
Step 1: Reduce right side.
Apply Algorithm Reduce Right to F.
Step 2: Reduce left side.
Apply Algorithm Reduce Left to the output of Step 1.
Step 3: Remove redundant FDs.
Apply Algorithm Remove_redundency to the output of Step 2.
51
Computing a Minimal Cover (Step 1)
Step 1: Reduce Right: For each FD X→ Y ∈ F where Y = {A1,A2, ...,Ak}, we use all X →{Ai} (for 1≤ i ≤ k) to replace X→ Y .
Practice:
R = (A, B, C, D, E, G)
F = {A ->BCD, B -> CDE, AC -> E}
At the end of step 1 we have : F’ = {A -> B, A -> C, A -> D, B -> C, B -> D, B ->
E, AC -> E}
52
Computing a Minimal Cover (Step 2)
Step 2: Reduce Left: For each X → {A} ∈ F where X = {Ai : 1 ≤ i ≤ k},
do the following. For i = 1 to k, replace X with X − {Ai} if A∈(X −
{Ai})
+.
From Step 1, we had: F’ = {A -> B, A -> C, A -> D, B -> C, B -> D, B -> E, AC -> E}
AC -> E
C+ = {C}; thus C -> E is not inferred by F’.
Hence, AC -> E cannot be replaced by C -> E.
A+ = {A, B, C, D, E}; thus, A -> E is inferred by F’.
Hence, AC -> E can be replaced by A -> E.
We now have F’’ = {A -> B, A -> C, A -> D, A -> E, B -> C, B -> D, B -> E}
53
Computing a Minimal Cover (Step 3)
Step 3: Reduce_redundancy: For each FD X → {A} ∈ F, remove it from F if:
A ∈ X+ with respect to F − {X →{A}}.
From Step 2, we had: F’’ = {A -> B, A -> C, A -> D, A -> E, B -> C, B -> D, B -> E}
A+|F’’ – {A -> B} = {A, C, D, E}; thus A -> B is not inferred by F’’ – {A -> B}.
That is, A -> B is not redundant.
A+|F’’ – {A -> C} = {A, B, C, D, E}; thus, A -> C is redundant.
Thus, we can remove A -> C from F’’ to obtain F’’’.
We find that we can remove A -> D and A -> E but not the others.
Thus, Fmin={A -> B, B -> C, B -> D, B -> E}.
54
A Note on Finding Minimal Cover
There can be more than one possible minimum cover.
We can always find at least one minimal cover F for any set of
dependencies E using this algorithm.
55
3NF Decomposition Algorithm
Algorithm 3NF decomposition
1. Find a minimal cover G for F.
2. For each left-hand-side X of a functional dependency that appears in G,
create a relation schema in D with attributes {X ∪ {A1} ∪ {A2} ... ∪ {Ak}
}, where X -> A1, X -> A2, ..., X -> Ak are the only dependencies in G
with X as left-hand-side (X is the key to this relation).
3. If none of the relation schemas in D contains a key of R, then create
one more relation schema in D that contains attributes that form a key of
R.
4. Eliminate redundant relations from the resulting set of relations in the
relational database schema. A relation R is considered redundant if R is
a projection of another relation S in the schema; alternately, R is
subsumed by S.
56
3NF Decomposition Algorithm
With this algorithm from the previous slide…
We get a decomposition D of R that does the following:
◦ Preserves dependencies
◦ Has the nonadditive join property
◦ Is such that each resulting relation schema in the decomposition is in 3NF
57
3NF Decomposition Algorithm
Example ONE:
R = (A, B, C, D, E, G)
Fmin={A->B, B->C, B->D, B->E}.
Candidate key: (A, G)
R1 = (A, B), R2 = (B, C, D, E)
R3 = (A, G)
58
3NF Decomposition Algorithm
Example TWO:
Following from the SHIPPING relation. The functional dependencies already
form a canonical cover.
• From Ship→Capacity, derive R1(Ship,Capacity),
• From {Ship,Date} → Cargo, derive
R2(Ship , Date , Cargo),
• From {Capacity,Cargo} → Value, derive
R3(Capacity , Cargo , Value).
• There are no attributes not yet included and the original key {Ship,Date}
is included in R2.
59
SHIPPING (Ship , Capacity , Date , Cargo , Value)
F consists of: Ship → Capacity, {Ship , Date}→ Cargo, {Cargo , Capacity}→ Value
3NF Decomposition Algorithm
Example THREE: Apply the algorithm to the LOTS example given earlier.
One possible minimal cover is
{ Property_Id→Lot_No,
Property_Id → Area, {City,Lot_No} → Property_Id,
Area → Price, Area → City, City → Tax_Rate }.
This gives the decomposition:
R1 (Property_Id , Lot_No , Area)
R2 (City , Lot_No , Property_Id)
R3 (Area , Price , City)
R4 (City , Tax_Rate)
60
Summary
1. Data redundancies are undesirable as they create the potential for
update anomalies.
2. One way to remove such redundancies is to normalize a design, guided
by FD’s.
3. BCNF removes all redundancies due to FDs, but a dependency preserving
decomposition cannot always be found.
4. A dependency preserving, lossless decomposition into 3NF can always be
found, but some redundancies may remain.
5. Even where a dependency preserving, lossless decomposition that
removes all redundancies can be found, it may not be possible, for
efficiency reasons, to remove all redundancies.
61
Learning Outcome
● Checking for important decomposition properties
○ Checking for the dependency preserving property
○ Checking for the lossless join property
● Lossless decomposition into BCNF algorithm
● Lossless and Dependency Preserving 3NF decomposition algorithm