sql代写-BA 574
时间:2021-02-24
Shaokun Fan
Assistant Professor in BIS
Oregon State University
BA 574 – Data Management
Lecture 8: Big Data and NoSQL
• Big Data Concepts
• Big Data Management Strategies
• Hadoop for Big Data Management
• NoSQL vs Relational
Agenda
2
3Big Data is Everywhere
4Name Example(s) of Size
Byte A single letter, like "A."
Kilobyte 1024 Bytes. An e-mail.
Megabyte 1024 Kilobytes. A good sized book.
Gigabyte 1024 Megabytes. A DVD is about 1-5 Gigabyte
Terrabyte 1024 Gigabytes. The capacity of a personal computer.
Petabyte
1024 Terrabytes. The amount of data available on the web
in the year 2000 is thought to occupy 8 Petabytes.
Exabyte
1024 Petabytes. 161 Exabytes of data were created in
2006.
Zettabyte
1024 Exabytes. In year 2010, we create about 1 Zettabyte
data.
Data Sizes
Big Data Characteristics
• Volume: Quantity of data to be stored
• Velocity: Speed at which data is entered into system and
must be processed
• Variety: Variations in the structure of data to be stored
•Other characteristics:
▫ Variability: Changes in meaning of data based on context
 Sentimental analysis attempts to determine attitude
▫ Veracity: Trustworthiness of data
▫ Value: Degree data can be analyzed for meaningful insight
▫ Visualization: Ability to graphically resent data to make it
understandable to users
5
Current View of Big Data
6
7• The overall goal is to enable parallel and cost-
effective computation.
• Three strategies:
▫ Employ redundant but relatively inexpensive components to
control costs
▫ Minimize joins to compartmentalize data requests
▫ Partition datasets to spread the workload
Big Data Management Strategies
Multiple Inexpensive Components
• Systems divide and conquer big
jobs; many computers share the
load
• Save money using a bunch of
cheap computers in place of a
very expensive one
• … But, more computers = more
failures
8
?
Master
Node
Master
Node
Data
Node
Master
Node
Data
Node
Data
Node
Minimize Joining Data
9
Products
ID Price Desc
1 $8 IPhone Cable
2 $15 Wireless Keyboard
3 $750 Iphone 6S
Carts
ID Customer
1001 S. Green
1002 J. Sousa
2134 J. Kim
PIC
ProdId CartID Qty
3 1001 1
2 1002 1
1 1001 1
Cart Document
CartId: 1001
Customer: “S. Green”
Items:
PId:“1” Dsc:“IPhone Cable” $:“8”
PId:“3” Dsc:“IPhone 6S” $:“750”
GetCartData(1001)
1) GetCart(1001),
2) GetCProds(1001) ,
3) GetProd(1,3)
4) SendResult()
GetCart(1001)
SendResult()
The hierarchical data structure
reduces the need for joins in high-
volume, high-velocity transactions
Spread Workload
10
Web Server
Master
Node
Web Server Web Server
Master
Node
Master
Node
Data
Node
Data
Node
Data
Node
Sep-Dec May-Aug Jan-Apr
Goal: balance the load
across the servers
Throughput is limited
if all the data in used
at one time is on one
server
11
Hadoop-Implementing Big Data Management
Strategies
• De facto standard for most Big Data storage and processing
• Java-based framework for distributing and processing very large data
sets across clusters of computers
• Most important components:
▫ Hadoop Distributed File System (HDFS): Store large data on a cluster of
inexpensive nodes.
▫ MapReduce: Programming model that supports processing large data sets
on a cluster of inexpensive nodes.
11
Hadoop Distributed File System (HDFS)
• Uses several types of nodes (computers):
▫ Data node store the actual file data
▫ Name node contains file system metadata
▫ Client node makes requests to the file system as needed to support user
applications
▫ Data node communicates with name node by regularly sending block
reports and heartbeats
12
Figure 14.4 – Hadoop Distributed File System
(HDFS)
13
MapReduce
• Framework used to process large data sets across clusters
▫ Breaks down complex tasks into smaller subtasks, performing the subtasks
and producing a final result
▫ Map function takes a collection of data and sorts and filters it into a set of
key-value pairs
 Mapper program performs the map function
▫ Reduce summaries results of map function to produce a single result
 Reducer program performs the reduce function
14
15
MapReduce Process
NoSQL
• Name given to non-relational database technologies developed to
address Big Data challenges
• Key-value (KV) databases store data as a collection of key-value pairs
organized as buckets which are the equivalent of tables
• Document databases store data in key-value pairs in which the value
components are tag-encoded documents grouped into logical groups
called collections
16
Figure 14.8- Document Database Tagged Format
17
We are not throwing away SQL
• NoSQL should mean “Not Only SQL” because relational systems are still
at the heart of most important organizational systems
• SQL (structured query language) was developed to support relational
systems, but versions are increasing used in NoSQL systems
• Increasingly hybrid non-relational capabilities are being built into more
traditional relational Database Management Systems (DBMS)
18
Cost, Performance, and Equipment
• High performance in relational systems is expensive:
▫ Lots of expensive RAM
▫ High-end storage networks and data appliances
▫ Top-of-the-line servers
▫ High licensing costs for enterprise class features
• The NoSQL way, on the other hand uses:
▫ Many relatively inexpensive computers
▫ Relatively inexpensive DBMS licensing
▫ A “No Frills” set of add on features
19
Specialized or Versatile?
• Should we make shopping carts fast for customers or should we make
current demand immediately available to suppliers?
20
Cart Document
CartId: 1001
Customer: “S. Green”
Items:
PId:“1” Dsc:“IPhone Cable” $:“8”
PId:“3” Dsc:“IPhone 6S” $:“750”
ShipperProduct
Shipper: 1001
Supplier: “SuperCable”
Items:
PId:“1” Dsc:“IPhone Cable” Qty:86
PId:“12” Dsc:“USB Hub” Qty:45
This document design
optimizes for the customer
experience
This document design
optimizes product shipment
projections
Relational DB systems aim to support a wide variety of data needs pretty well, but
NoSQL can only support highly specialized big data processing.
Relational VS. NoSQL
• Neither NoSQL nor relational systems are “better” – they are just
different.
• Understand both task and technology
• Most systems are hybrid and are changing at a rapid pace
• The concepts will apply even as the tools change
21
• 3 Vs of Big Data
• Big data management strategies
• Hadoop
• NoSQL database
Summary
22









































































































































































































































学霸联盟


essay、essay代写