Shaokun Fan
Assistant Professor in BIS
Oregon State University
BA 574 – Data Management
Lecture 6: Normalization
2• Definition of normalization
• Database anomalies
• Normalization rules
• Denormalization
Agenda
3• A process for evaluating and correcting table structures to minimize
data redundancies, thereby reducing the likelihood of data anomalies.
(Coronel and Morris 12e, p 202)
▫ Structure data to make it more useful
▫ Organize data in to a set of lists (tables)
▫ Pay close attention to redundancy (storing the same data twice)
▫ Decide if one structure is better than another (process for evaluating)
Normalization
4• Used while designing a new database structure
▫ Analyzes the relationship among the attributes within each entity
▫ Determines if the structure can be improved
• Improves the existing data structure and creates an appropriate
database design
When to use normalization?
5• Update Anomaly: the same bit of data has to be updated in multiple
places
• Insert Anomaly: there is no place to nicely save important bits of data
• Delete Anomaly: if we delete some obsolete content we wind up
deleting something important
Database Anomalies
6Example of Database Anomalies
7• Update anomalies. Modifying the JOB_CLASS for employee number 105
requires many potential alterations, one for each EMP_NUM = 105.
• Insertion anomalies. Can you add an employee who is new (not
assigned to a project)?
• Deletion anomalies. Deleting project No. 25 will result in the loss of
information about Employee 107 and 115.
Example of Database Anomalies (Cont.)
8▫ Formally we have first, second, third, and fourth normal forms.
▫ A simplified version:
Organize data into a set of tables
each table provides details about one kind of thing
each item (row) in the table is uniquely identified
Each table has multiple columns capturing ‘attributes’ that describe the listed
things; allow only one value for an attribute in each row
Except the Primary Key, the value in one column is not to determine the value
in another
Normalization Rules of Thumb
9• Each table should list only one kind of thing
• In a kids club where kids who earn badges.
• Each list need to be uniquely identified
▫ Kid list includes a unique member number
▫ Badge list includes a badge name
What should be considered as a table?
Member Name Email
1234567 Sabah sc@gmail.com
7654321 Sally sc2@gmail.com
2222222 Bruce batman@gmail.com
Unique Identifier is included in the list
10
• Allow only 1 value in a column in a row
• Listing multiple earned badges in a list of kids or multiple awardees in a
list of badges is not allowed
• Hint: this can be fixed with another table
No Multi-Valued Attributes Allowed
Member Name Badges
1234567 Sabah Swimming, Hiking, First Aid
7654321 Sally Hiking, Archery
2222222 Bruce Spelunking, Bat husbandry
The Badges column
here is a multi-valued
attribute -
This is not allowed in
normalized data
11
Avoiding Multi-Valued Attributes
Member Name Email
1234567 Sabah sc@gmail.com
7654321 Sally sc2@gmail.com
2222222 Bruce batman@gmail.com
Badge Comment
Swimming Float, Hold breath..
Hiking 25 miles total
First Aid Cuts and Tourniquets
Archery Safety
Member Badge
1234567 Swimming
1234567 Hiking
1234567 First Aid
7654321 Hiking
7654321 Archery
[…]
The new table
associates members
with badges
12
Avoid In-row Dependencies
Member Badge Name Email
1234567 Swimming Sabah sc@gmail.com
1234567 Hiking Sabah sc@gmail.com
1234567 First Aid Sabah sc@gmail.com
7654321 Hiking Sally sc3@gmail.com
7654321 Archery Sally sc2@gmail.com
[…]
Name and Email are actually dependent on Member so they should not be included
here.
Note that in previous slides (p. 9) we listed people in a separate list from the badges
they earned. Because Member was a PK in that table, it was OK. In this table, PK is
the combination of Member and Badge, we can not let Member decide Name and
Email.
13
• Number of database tables expands when tables are
decomposed to conform to normalization requirements
• Joining a larger number of tables:
▫ Takes additional input/output (I/O) operations and
processing logic
▫ Reduces system speed
• In practice we often “de-normalize” our data structures.
• But even if we do, it is useful to understand. It is better to only
break the rules on purpose.
Denormalization
14
• What is normalization?
• What are database anomalies?
• Normalization rules
• Why denormalize database?
Summary
学霸联盟