CS3402 1-无代写
时间:2023-04-23
CS3402 1
CS3402: Lecture 4
Normal Forms
(Sem B, 2022-2023)
CS3402 2
Problems caused by redundancy
 Redundant storage
 Potential inconsistency
 Insertion anomaly: E.g., to insert a new tuple for an employee
who works in department number 5, we must enter all the
attribute values of department 5 correctly
CS3402 3
Problems caused by redundancy
 Deletion anomaly: E.g., If we delete an employee tuple that
happens to represent the last employee working for a particular
department, the information concerning that department is lost
inadvertently from the database.
CS3402 4
Problems caused by redundancy
 Update anomaly: E.g., If we change the value of one of the
attributes of a particular department—say, the manager of
department 5—we must update the tuples of all employees who
work in that department; otherwise, the database will become
inconsistent.
CS3402 5
Problems caused by redundancy
 Redundancy must be eliminated in a well guided fashion
 How?
Normalization based on functionality dependency
CS3402 6
Normalization
 Normalization
Proposed by Codd (1972)
 take a relation schema through a series of tests based on
functional dependency and primary keys to certify whether it
satisfies a certain normal form
Decompose a relation into several related relation to achieve:
 Minimizing redundancy
 Minimizing the insertion, deletion and update anomalies
CS3402 7
Relational Database Design
Normalization theory
- based on functional dependencies
* universe of relations
* 1st Normal Form (1NF)
* 2NF
* 3NF
* BCNF
* 4NF
* 5NF
1NF
2NF
3NF
BCNF
4NF
5NF
CS3402 8
Relational Database Design
 Issue: Which attributes should or should not be put in the same
table?
 Criteria: Avoid redundancies that may cause problems that we
discussed.
 If a relation is in a certain normal form, it is known that certain kinds
of problems can be avoided / minimized.
CS3402 9
Normalization
 Normalization requires two properties:
Non-additive or lossless join
 Decomposition is reversible and no information is lost
 No spurious tuples (tuples that should not exist) should be
generated by doing a natural-join of any relations
(Extremely important)
Preservation of the functional dependencies
 Ensure each functional dependency is represented in some
individual relation or could be inferred from other
dependencies after decomposition.
CS3402 10
First Normal Form with Primary Key
 First normal form (1NF)
 Disallow multi-values attributes, composite attributes and their
combination
 The domain of an attribute must be atomic (simple and
indivisible) values
Q. Is this relational schema in 1NF?
CS3402 11
CS3402 12
Nested relations
First normal form also disallows multivalued attributes that are
themselves composite. These are called nested relations because
each tuple can have a relation within it.
E.g., Ssn is the primary key of the EMP_PROJ relation in Figure (a)
Pnumber is the partial key of the nested relation; that is, within each
tuple, the nested relation must have unique values of Pnumber.
(a) Schema of the EMP_PROJ
relation with a nested relation
attribute PROJS.
(b) Sample extension of the
EMP_PROJ relation showing
nested relations within each
tuple.
CS3402 13
Nested relations
To normalize this into 1NF, we:
1. Remove the nested relation attributes into a new relation
2. Propagate the primary key into it;
the primary key of the new relation will combine the partial key with the
primary key of the original relation.
(c) Decomposition of EMP_PROJ into relations EMP_PROJ1 and
EMP_PROJ2 by propagating the primary key
CS3402 14
Problems with 1NF
 Problems with 1NF
 Insertion Anomalies
Deletion Anomalies
Update Anomalies
Q: Is it in 1NF?
What are the problems?
CS3402 15
Functional Dependency (Recap)
 Functional dependency is a constraint between two sets of
attributes from the database
 [Definition] A functional dependency denotes by X → Y, between
two sets of attributes X and Y that are subsets of a relation schema
R specifies a constraint on the possible tuples that can form a
relation state r of R
 The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X],
they must also have t1[Y] = t2[Y]
 i.e., The values of the X component of a tuple uniquely (or functionally)
determine the values of the Y component.
 E.g., In DEPARTMENT, Dnumber and Dname
 If you know the department number, you know the department
name, so we have Dnumber → Dname
CS3402 16
Functional Dependency: Full vs Partial
 A FD X→ Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold
anymore
 E.g., {Ssn, Pnumber} → Hours is a full dependency
 A FD X→ Y is a partial functional dependency if some attributes A
belonging to X can be removed from X and the dependency still
holds
 E.g., {Ssn, Pnumber} → Ename is a partial dependency
CS3402 17
Attributes: Prime vs Nonprime
 An attribute of relation schema R is called a prime attribute of R if it
is a member of some candidate key of R.
 Otherwise, it is nonprime.
 Example: Consider the CAR relation schema:
 CAR(State, Reg#, SerialNo, Make, Model, Year)
 Suppose CAR has two candidate keys:
 Key1 = {State, Reg#}
 Key2 = {SerialNo}
 Q. Which attributes are nonprime attributes?
CS3402 18
2NF with Primary Key
 [Definition] A relation R is in 2NF if every nonprime attributes A in
R is fully functional dependent on the primary key of R.
 Thus, if the primary key contains a single attribute, the test
need not be applied at all.
 Q.Is the EMP_PROJ relation in 2NF?
CS3402 19
2NF Normalization
 If a relation schema is not in 2NF, it can be 2NF normalized into a
number of 2NF relations in which nonprime attributes are
associated only with the part of the primary key on which they are
fully functionally dependent.
CS3402 20
Problems with 2NF
 Problems with 2NF
 Insertion Anomalies
Deletion Anomalies
Update Anomalies
Q: Is it in 2NF?
What are the problems?
CS3402 21
Transitive Dependency
 3NF is based on the concept of transitive dependency
 [Definition] A functional dependency X → Y in a relation R is
transitive dependency if
 there exists a set of attributes Z in R that is neither a candidate
key nor a subset of any key of R AND
both X → Z and Z → Y hold.
 E.g., The dependency Ssn→ Dmgr_ssn is transitive through
Dnumber
CS3402 22
3NF with Primary Key
 [Codd’s original definition] A relation schema R is in 3NF if it
satisfies 2NF and no non-prime attribute of R is transitively
dependent on the primary key.
 Q. Is EMP_DEPT in 3NF?
CS3402 23
3NF Normalization
 To normalize relation schema R into 3NF, R should be
decomposed to break the transitive dependency.
CS3402 24
Normal Forms Defined Informally
 1st normal form
All attributes are simple (no composite or multivalued attributes)
 2nd normal form
All non-prime attributes depend on the whole key (no partial
dependency)
 3rd normal form
All non-prime attributes only depend on the key (no transitive
dependency)
 In general, 3NF is desirable and powerful enough
 But, still there are cases where a stronger normal form is needed
CS3402 25
Test and Remedy for Normalization
CS3402 26
General Definitions of 2NF and 3NF
 The normalization procedure described is useful for analysis in
practical situations where primary keys have already been defined.
 These definitions, however, do not take other candidate keys of a relation, if any,
into account.
 Here, we give the more general definitions of 2NF and 3NF that
take all candidate keys of a relation into account.
 Notice that this does not affect the definition of 1NF since it is
independent of keys and functional dependencies.
 Partial and full functional dependencies and transitive
dependencies will now be considered with respect to all candidate
keys of a relation.
CS3402 27
General Definition of 2NF
 Alternative definition: A relation schema R is in 2NF is every
nonprime attribute A in R is fully dependent on every key of R.
CS3402 28
Example
 LOTS describes parcels of land for sale in various counties of a
state
 Two candidate keys: Property_id# and {County_name, Lot#}
 Property_id# is chosen as the primary key
 Q. Is LOTS in 2NF?
 A. LOTS violates 2NF due to FD3 as Tax_rate is partially dependent
on the candidate key {County_name, Lot#}
 County_name → Tax-rate (partial dependent)
Q&A
CS3402 29
Example (cont’d)
 Decomposing into the 2NF relations LOTS1 and LOTS2.
CS3402 30
General Definition of 3NF
 This general definition can be applied directly to test whether a
relation schema is in 3NF; it does not have to go through 2NF first.
 In other words, if a relation passes the general 3NF test, then it
automatically passes the 2NF test.
CS3402 31
General Definition of 3NF
 Two candidate keys: Property_id# and {County_name, Lot#}
 Q. Are LOTS1 and LOTS2 in 3NF?
LOTS2 is in 3NF
LOTS1 violates 3NF:
FD4 in LOTS1 violates 3NF because Area is not a superkey
and Price is not a prime attribute in LOTS1
Q&A
CS3402 32
Example (cont’d)
 Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B.
CS3402 33
Example (cont’d)
 Progressive normalization of LOTS into a 3NF design..
CS3402 34
Boyce-Codd Normal Form
 BCNF was proposed as a simpler form of 3NF, but it was found to
be stricter than 3NF
 Every relation in BCNF is also in 3NF
BUT relation in 3NF is not necessarily in BCNF
 Difference with 3NF:
Condition which allows A to be prime is absent from BCNF
CS3402 35
Boyce-Codd Normal Form
 Suppose we have thousands of lots in the relation but the lots are
from only two counties: A and B
 Lot sizes in A are only 0.5, 0.6, 0.7, 0.8, 0.9 acres whereas lots in B
are 1.1, 1.2, …, 1.9
 Thus, we further have Area → County_name (FD 5) in LOTS1A
CS3402 36
Boyce-Codd Normal Form
 Recall that there are two candidate keys: Property_id# and
{County_name, Lot#}
 Q. Does LOTS1A satisfy 3NF? How about BCNF?
 FD5 satisfies 3NF in LOTS1A because County_name is a prime
attribute
 FD5 violates BCNF in LOTS1A because Area is not a superkey of
LOTS1A
Q&A
CS3402 37
BCNF Normalization
 This decomposition loses the functional dependency FD2 because
its attributes no longer coexist in the same relation after
decomposition.
CS3402 38
Summary of 3NF and BCNF
 In practice, most relation schemas that are in 3NF are also in
BCNF.
 Only if there exists some f.d. X → A that holds in a relation schema
R with X not being a superkey and A being a prime attribute will R
be in 3NF but not in BCNF.
 Ideally, relational database design should strive to achieve BCNF or
3NF for every relation schema.
 Achieving the normalization status of just 1NF or 2NF is not
considered adequate, since both were developed historically to be
intermediate normal forms as stepping stones to 3NF and BCNF.
CS3402 39
Boyce-Codd Normal Form
 Some notes on BCNF:
BCNF is stronger than 3NF
BCNF is useful when a relation has:
multiple candidate keys, and
these candidate keys are composite ones, and
they overlap on some common attributes
BCNF reduces to 3NF if the above conditions do not apply
CS3402 40
References
 7e
 Ch. 14
essay、essay代写