Sql代写|数据库代写 - CS 4320/5320 - Homework 3
时间:2020-10-14
CS 4320/5320 - Homework 3
Your task is to select a query plan joining three tables and to estimate the execution cost of the
plan according to the model seen in class. For instance, you can reuse a data set and query
used in prior homework submissions.
Step 1. Choose a query and an associated query plan on a data set of your choice. The query
must join three different tables (i.e., self-joins are not permitted). Use non-trivial binary join
predicates (i.e., no trivial selectivity of zero or one such as for predicates that are always true),
connecting all tables (e.g., one join predicate connecting the first with the second table, and
one connecting the second with the third). Joined tables must not be empty and the
associated data must consume more than one page on disk. Draw a physical query plan that
answers the query (you may use any drawing program). As seen in class, the physical plan
specifies the operator implementations and how intermediate results are passed on between
operators. Your plan must use two different join operator implementations for the two joins.
Step 2. Apply the formulas seen in class to estimate the selectivity of each query predicate.
Justify predicate selectivity thoroughly by pointing at data statistics (e.g., number of table rows)
and schema properties (e.g., at key-foreign key constraints connecting columns across tables).
Clearly state simplifying assumptions you make in your calculations such as predicate
independence or uniform data distribution. Using predicate selectivity and base table
cardinality values, estimate the cardinality of each intermediate result. You may assume that
different query predicates are independent from each other.
Step 3. Estimate byte sizes of base tables and intermediate results. First, estimate for each row
and intermediate result how many tuples fit on one page. To do so, you must assume an
average byte size per row. You may simplify by assuming that columns of the same type
consume the same number of bytes per field (or you may use the Postgres statistics, as shown
in the last homework instructions). Assume that each page fits 8,000 bytes of data (close to the
Postgres default value), do not account for space consumption due to meta-data (e.g., slot
offsets). Finally, exploit the cardinality estimates from the previous step to estimate byte size.
Step 4. Estimate processing cost for each operator in the query plan. Assume that all data is
initially on hard disk (i.e., not cached in main memory). Apply the cost formulas seen in class
that are based on the amount of data read from and written to hard disk. Assume that 1,000
pages of main memory are available. Finally, sum up all costs to obtain the total execution cost
estimate. Remember to neglect the cost of writing out the final plan result (but count cost for
writing out intermediate results if applicable).
Optionally, if you are curious, compare your cost estimate against the estimate of the optimizer
(using the "analyze" keyword before your query). This step will not be graded.
Submit a .zip file containing a .pdf document with your calculations and plan drawing, as well
as a reference to the data set your are using (or the schema creation commands if you created
your own data set) and the query.
You receive up to 25 Points for Step 1, 25 Points for Step 2, 25 Points for Step 3,
and 25 Points for Step 4.