JavaScript代写-COMP5338
时间:2021-08-27
Dr. Ying Zhou
School of Computer Science
COMP5338 – Advanced Data Models
Week 3: MongoDB – Aggregation Framework
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
Outline
nNull type
nMongoDB Data Modelling
nAggregation
?Single collection aggregation
?Aggregation pipeline with multiple collection
03-2
Null, empty string and related
operators
n Null (or null) is a special data type
? Similar to None, Null or Nil in any programming language
? It has a singleton value expressed as null
? Indicating no value is here
n The interpretation of null is different depending on where it
appears
n It might represents
? The field exists, but has no value
? The field does not exits
? Or both
n This is different to giving a field an empty string “” as value
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
http://docs.mongodb.com/manual/tutorial/query-for-null-fields/index.html
03-3
Null and empty string example
n Collection tweets has many optional fields
? Retweet_id, retweet_user_id, hash_tags, and more
n Collection users uses empty string for fields with no value
? location, description
n The schema was based on the source JSON file. Making
location and description as optional field is a better
option.
n Examples
? db.tweets.find({retweet_id:{$exists:true}})
? db.users.find({location:{$ne: “”}})
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
http://docs.mongodb.com/manual/tutorial/query-for-null-fields/
03-4
Querying for null or field existence
n Queries
? db.tweets.find({retweet_id:{$exists:true}})
? Find all documents that has a field called retweet_id
? db.tweets.find({retweet_id: {$ne: null}})
? Find all documents whose retweet_id field’s value is not null
? db.tweets.find({retweet_id:null})
? Find all documents that do not have a retweet_id field or the value of
retweet_id field is null
? db.tweets.find({retweet_id:{$exists:false}})
? Find all documents that does not have a retweet_id field
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-5
It is possible to set the value to null
db.users.updateMany({location:"", description:""},
{$set:{location: null}})
db.users.updateMany({location:""},{$unset:{location:1}})
db.users.find({location: null}) would return all documents with either
location field with null value or no location field
db.users.find({location:{$exists:false}}) only returns the documents that
do not have the location field
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-6
Outline
nNull type
nMongoDB Data Modelling
nAggregation
?Single collection aggregation
?Aggregation pipeline with multiple collection
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-7
“Schema” Design Example
n A fully normalized relational
model would have the
following tables:
?User
?Post
?Comment
?PostLink
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
http://docs.mongodb.org/manual/applications/">? The specification can be an existing field name followed by a single
value indicating the inclusion (1) or exclusion (0) of fields
? Or it can be a field name (existing or new) followed by an expression
to compute the value of the field
:
? In the expression, existing field from incoming document can be
accessed using field path: “$fieldname”
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-29
$project examples
n Find the active retweet period of each tweet in the
collection, where the active retweet period is defined as
the duration between the last and the first retweets of that
tweet, assuming the timestamp is of ISODate type
db.tweets.aggregate([
{$match: {retweet_id: {$exists: true}}},
{$group: {_id:"$retweet_id",
first: {$min:"$created_at"},
last: {$max:"$created_at"} }},
{$project: {actp: {$subtract:["$last","$first"]}}},
{$match: {actp:{$gt: 0}}}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
http://docs.mongodb.com/manual/reference/operator/aggregation/#arithmetic-expression-operators
Arithmetic expression operator, part of a large group of Aggregation pipeline operator
03-30
$group then $project
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{_id:ObjectId(“…”), retweet_id: 3333, timestamp:“2016-07-01 00:03:46.000Z”, … }
{_id:ObjectId(“…”), retweet_id: 4444, timestamp:“2016-07-01 00:55:44.000Z”, … }
{_id:ObjectId(“…”), retweet_id: 3333, timestamp:“2016-07-15 12:22:35.000Z”, … }
{_id:ObjectId(“…”), retweet_id: 4444, timestamp:“2016-07-28 00:03:58.000Z”,… }
{_id:ObjectId(“…”), retweet_id: 3333, timestamp:“2016-07-28 00:20:19.000Z”, … }
{_id:3333, first:“2016-07-01 00:03:46.000Z”, last:“2016-07-28 00:20:19.000Z”}
{_id:4444, first:“2016-07-01 00:55:44.000Z”, last:“2016-07-28 00:03:58.000Z”}
{title: 3333, age:2333793000}
{title: 4444, age:2329694000}
{$group: {_id:"$retweet_id",
first: {$min:"$created_at"},
last: {$max:"$created_at"} }},
{$project: {
actp: {$subtract:["$last","$first"]}
}}
03-31
We can combine multiple operators
db.tweets.aggregate([
{$match: {retweet_user_id: {$exists: true}}},
{$group: {_id:"$retweet_id",
first: {$min:"$created_at"},
last: {$max:"$created_at"} }},
{$project: {
actp:{$divide:
[{$subtract:["$last","$first"]},
86400000]}}}
actp_unit: {$literal:"day"}}}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
($last-$first)/86400000
03-32
Dealing with data of array type
n To aggregate (e.g. grouping) values in an array field, it is possible to
flatten the array to access individual value
n $unwind stage flattens an array field from the input documents to output
a document for each element. Each output document is the input
document with the value of the array field replaced by the element.
? { $unwind: } or
? {
$unwind:
{
path: ,
includeArrayIndex: ,
preserveNullAndEmptyArrays:
}
}
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-33
$unwind example
n Default behaviour
? Input document:
{ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] }
? After $unwind:"$sizes"
? Becomes 3 output documents:
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }
n Find the number of items that are available in each size
db.inventory.aggregate( [
{ $unwind : "$sizes" },
{ $group:{_id: “$sizes”, item_count: {$sum:1}} }
] )
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-34
$unwind then $group
{ "_id" : 1, "item" : "ABC", "sizes": [ "S", "M", "L"] }
{ "_id" : 2, "item" : "EFG", "sizes" : [ ] }
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{ "_id" : 1, "item" : "ABC", "sizes": "S"}
{ "_id" : 1, "item" : “ABC", "sizes": "M"}
{ "_id" : 1, "item" : “ABC", "sizes": "L"}
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ $group:{_id: “$sizes”,
item_count: {$sum:1}}
{ "_id" : "S", "item_count": 1}
{ "_id" : "M", "item_count": 2}
{ "_id" : "L", "item_count": 1}
{ $unwind : "$sizes" },
03-35
$sort, $skip, $limit and $count stages
n $sort stage sorts the incoming documents based on specified field(s)
in ascending or descending order
? The function and format is similar to the sort modifier in find query
? { $sort: { : , : ...
} }
n $skip stage skips over given number of documents
? The function and format is similar to the skip modifier in find query
? { $skip: }
n $limit stage limits the number of documents passed to the next stage
? The function and format is similar to the limit modifier in find query
? { $limit: }
n $count stage counts the number of documents passing to this stage
? The function and format is similar to the count modifier in find query
? { $count: }
? String is the name of the field representing the count
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-36
$sample and $out stages
n The $sample stage randomly selects given number of
documents from the previous stage
? { $sample: { size: } }
? Different sampling approaches depending on the location of the
stage and the size of the sample and the collection
?May fail due to memory constraints
n The $out stage writes the documents in a given collection
? should be the last one in the pipeline
? { $out: "" }
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-37
Aggregation Operators
n A few aggregation stages allow us to add new fields or to
give existing fields new values based on expression
? In $group stage we can use various operators or accumulators to
compute values for new fields
? In $project stage we can use operators to compute values for new
or exiting fields
n There are many predefined operators for various data types
to carry out common operations in that data type
? Arithmetic operators: $mod, $log, $sqrt, $subtract, …
? String operators: $concat, $split, $indexofBytes, …
? Comparison operators: $gt, $gte, $lt, $lte,…
? Set operators: $setEquals, $setIntersection, …
? Boolean operators: $and, $or, $not, …
? Array operators: $in, $size, ..
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-38
Aggregation vs. Query operators
n There is another set of operators that can be used in
find/update/delete queries or the $match stage of an
aggregation
? E.g. $gt, $lt, $in, $all….
n The set is smaller and are different to the operators used in
$group or $project stage
n Some operators look the same but have different syntax and
slightly different interpretation in query and in aggregation.
? E.g. $gt in find query looks like
{age: {$gt:18}}
? $gt in $project stage looks like:
{over18: {$gt:[“$age”, 18]}}
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
Returns true or false
03-39
Outline
nNull type
nAggregation
?Single collection aggregation pipeline
?Aggregation pipeline with multiple
collections
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-40
$lookup stage
n $lookup stage is added since 3.2 to perform left outer join
between two collections
? The collection already in the pipeline (maybe after a few stages)
? Another collection (could be the same one)
n For each incoming document from the pipeline, the $lookup
stage adds a new array field whose elements are the
matching documents from the other collection.
n A few different forms
? Equality match
? Join with other conditions
? Join with uncorrelated sub-queries
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-41
$lookup stage properties
n The output of $lookup stage has the same number of
documents as the previous stage
n Each document is augmented with an array field storing
matching document(s) from the other collection
n The array could contain any number of documents
depending on the match, including zero
n Missing local or foreign field is treated as having null value
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-42
Equality Match $lookup
{$lookup:
{ from: ,
localField: ,
foreignField: ,
as:
}
}
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-43
Equality match $lookup example
db.orders.aggregate([
{
$lookup:
{
from: "inventory",
localField: "item",
foreignField: "sku",
as: "inventory_docs"
}
}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{"_id":1, "item":"abc", "price":12,"quantity":2 }
{"_id":2, "item":"nosku", "price":20,"quantity":1 }
{"_id":3 }
{"_id":1, "sku":"abc", description:"product 1", "instock":120}
{"_id":2, "sku":"def", description:"product 2", "instock":80 }
{"_id":3, "sku":"ijk", description:"product 3", "instock":60}
{"_id":4, "sku":"jkl", description:"product 4", "instock":70 }
{"_id":5, "sku":null, description:"Incomplete" }
{"_id":6}
orders
inventory
http://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#pipe._S_lookup
A document with no item field
A document with sku field
equals null
A document with no sku field
03-44
Equality match $lookup example (cont’d)
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{"_id":1, "item":"abc", "price":12,"quantity":2 }
{"_id":2, "item":"nosku", "price":20,"quantity":1 }
{"_id":3 }
{"_id":1, "sku":"abc", description:"product 1", "instock":120}
{"_id":2, "sku":"def", description:"product 2", "instock":80 }
{"_id":3, "sku":"ijk", description:"product 3", "instock":60}
{"_id":4, "sku":"jkl", description:"product 4", "instock":70 }
{"_id":5, "sku":null, description:"Incomplete" }
{"_id":6}
{"_id":1, "item":"abc", "price":12,"quantity":2,
"inventory_docs": [
{ "_id":1, "sku":"abc", description:"product 1", "instock":120 }] }
{"_id":2, "item":“nosku", "price":20,"quantity":1,
"inventory_docs" : [] }
{"_id":3, "inventory_docs" : [
{ "_id" : 5, "sku" : null, "description" : "Incomplete" },
{ "_id" : 6 }]}
An empty array for no matching from other collection
Non exists field matches null and non exists field
local
foreign
output
03-45
Other format of $lookup
{
$lookup:
{
from: ,
let: { : , …, : },
pipeline: [ ],
as:
}
}
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
let: Optionally specifies variables to use in the pipeline field
stages. Most likely the variable(s) may refer to field(s) in the
local collection already in the pipeline
pipeline: Specifies the pipeline to run on the joined collection.
The pipeline determines the resulting documents from the
joined collection.
03-46
Multiple Join Condition Example
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 }
{ "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 }
{ "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
orders collection
{ "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 }
{ "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 }
{ "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 }
{ "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 }
{ "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
warehouses collection
An ordered item may be stocked in multiple warehouses;
We want to find for each ordered item the warehouse with
sufficient stock to cover the order
03-47
Insufficient stock
Multiple Joint Condition
n This query involves comparing two fields of the local and
foreign documents:
? “item” in orders should match ”stock_item” in warehouses
? “ordered” in orders should be less than or equal to “instock” in
warehouses
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 }
{ "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 }
{ "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
{ "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 }
{ "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 }
{ "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 }
{ "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 }
{ "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
orders collection
warehouses collection
03-48
Multiple Joint Condition $lookup
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
db.orders.aggregate([
{
$lookup:
{
from: "warehouses",
let: { order_item: "$item", order_qty: "$ordered" },
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$stock_item", "$$order_item" ] },
{ $gte: [ "$instock", "$$order_qty" ] }
]
}
}
},
{ $project: { stock_item: 0, _id: 0 } }
],
as: "stockdata"
}
}
])
This is the way to let the pipeline access local fields:
use variable order_item to access the local
document’s item field; use variable order_qty to
access the local document’s ordered field
This is the
way to
specify
multiple
condition
variables are accessed
using “$$” prefix
$lookup by default
includes the entire matched
foreign document in the
array, we can use $project
stage to get rid of some
fieldsMatching document after
the pipeline stage will be
stored in this variable
03-49
Results
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{ "_id" : 1,
"item" : "almonds",
"price" : 12,
"ordered" : 2,
"stockdata" : [
{ "warehouse" : "A", "instock" : 120 },
{ "warehouse" : "B", "instock" : 60 }
]
}
{ "_id" : 2,
"item" : "pecans",
"price" : 20,
"ordered" : 1,
"stockdata" : [
{ "warehouse" : "A", "instock" : 80 }
]
}
{ "_id" : 3,
"item" : "cookies",
"price" : 10,
"ordered" : 60,
"stockdata" : [
{ "warehouse" : "A", "instock" : 80 }
]
}
03-50
Concise syntax introduced in version 5.0
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-51
$lookup:
{
from: ,
localField: ,
foreignField: ,
let : { : , …, : },
pipeline: [ ],
as:
}
Keep the simple expression of the equality condition
Equivalent Concise Syntax
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-52
db.orders.aggregate([
{
$lookup:
{
from: "warehouses",
localField: "item",
foreignField: "stock_item",
let: {order_qty: "$ordered" },
pipeline: [
{ $match:
{ $expr:
{ $gte: [ "$instock", "$$order_qty" ] }
}
},
{ $project: { stock_item: 0, _id: 0 } }
],
as: "stockdata"
}
}
])
Uncorrelated Subquery Example
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{
"_id" : 1,
"student" : "Ann Aardvark",
sickdays: [ "2018-05-01", 2018-08-23"]
}
{
"_id" : 2,
"student" : "Zoe Zebra",
sickdays: [“2018-02-01", 2018-05-23") ]
}
absences collection
{ "_id" : 1, year: 2018, name: "New Years", date: "2018-01-01" }
{ "_id" : 2, year: 2018, name: "Pi Day", date: 2018-03-14" }
{ "_id" : 3, year: 2018, name: "Ice Cream Day", date: "2018-07-15"}
{ "_id" : 4, year: 2017, name: "New Years", date: "2017-01-01" }
{ "_id" : 5, year: 2017, name: "Ice Cream Day", date: "2017-07-16”}
holidays collection
We want to include all 2018 public holidays in the absences
collection
03-53
Uncorrelated Subquery $lookup
db.absences.aggregate([
{
$lookup:
{
from: "holidays",
pipeline: [
{ $match: { year: 2018 } },
{ $project: { _id: 0,
date: { name: "$name", date: "$date" } } },
{ $replaceRoot: { newRoot: "$date" } }
],
as: "holidays"
}
}
])
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
The inner pipeline selects documents
from holidays collection based on a
condition unrelated with the local
collection, ‘let’ field is not needed
The $project and $replaceRoot change the
structure of the inner pipeline output documents
Has the same effect as: { $project: { _id: 0, year:0 } } in this case
$replaceRoot and similar “project” like stage are useful for promoting an
embedded document at root level
03-54
Results
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou)
{
"_id" : 1,
"student" : "Ann Aardvark",
sickdays: [ "2018-05-01", 2018-08-23"],
"holidays" : [
{ "name" : "New Years", "date" : " 2018-01-01") },
{ "name" : "Pi Day", "date" : "2018-03-14") },
{ "name" : "Ice Cream Day", "date" : "2018-07-15"}
]
}
{
"_id" : 2,
"student" : "Zoe Zebra",
sickdays: [“2018-02-01", 2018-05-23") ],
"holidays" : [
{ "name" : "New Years", "date" : " 2018-01-01") },
{ "name" : "Pi Day", "date" : "2018-03-14") },
{ "name" : "Ice Cream Day", "date" : "2018-07-15"}
]
}
03-55
References
n BSON types
? http://docs.mongodb.com/manual/reference/bson-types/
n Querying for Null or Missing Field
? http://docs.mongodb.com/manual/tutorial/query-for-null-
fields/index.html
n Aggregation Pipelines
? http://docs.mongodb.com/manual/core/aggregation-pipeline/
n Aggregation operators
? http://docs.mongodb.com/manual/reference/operator/aggregation/
COMP5338 "Advanced Data Models" - 2021 (Y. Zhou) 03-56
essay、essay代写