Sql代写|数据库代写 - CS 4320/5320 - Homework 3
CS 4320/5320 - Homework 3 Your task is to select a query plan joining three tables and to estimate the execution cost of the plan according to the model seen in class. For instance, you can reuse a data set and query used in prior homework submissions. Step 1. Choose a query and an associated query plan on a data set of your choice. The query must join three different tables (i.e., self-joins are not permitted). Use non-trivial binary join predicates (i.e., no trivial selectivity of zero or one such as for predicates that are always true), connecting all tables (e.g., one join predicate connecting the first with the second table, and one connecting the second with the third). Joined tables must not be empty and the associated data must consume more than one page on disk. Draw a physical query plan that answers the query (you may use any drawing program). As seen in class, the physical plan specifies the operator implementations and how intermediate results are passed on between operators. Your plan must use two different join operator implementations for the two joins. Step 2. Apply the formulas seen in class to estimate the selectivity of each query predicate. Justify predicate selectivity thoroughly by pointing at data statistics (e.g., number of table rows) and schema properties (e.g., at key-foreign key constraints connecting columns across tables). Clearly state simplifying assumptions you make in your calculations such as predicate independence or uniform data distribution. Using predicate selectivity and base table cardinality values, estimate the cardinality of each intermediate result. You may assume that different query predicates are independent from each other. Step 3. Estimate byte sizes of base tables and intermediate results. First, estimate for each row and intermediate result how many tuples fit on one page. To do so, you must assume an average byte size per row. You may simplify by assuming that columns of the same type consume the same number of bytes per field (or you may use the Postgres statistics, as shown in the last homework instructions). Assume that each page fits 8,000 bytes of data (close to the Postgres default value), do not account for space consumption due to meta-data (e.g., slot offsets). Finally, exploit the cardinality estimates from the previous step to estimate byte size. Step 4. Estimate processing cost for each operator in the query plan. Assume that all data is initially on hard disk (i.e., not cached in main memory). Apply the cost formulas seen in class that are based on the amount of data read from and written to hard disk. Assume that 1,000 pages of main memory are available. Finally, sum up all costs to obtain the total execution cost estimate. Remember to neglect the cost of writing out the final plan result (but count cost for writing out intermediate results if applicable). Optionally, if you are curious, compare your cost estimate against the estimate of the optimizer (using the "analyze" keyword before your query). This step will not be graded. Submit a .zip file containing a .pdf document with your calculations and plan drawing, as well as a reference to the data set your are using (or the schema creation commands if you created your own data set) and the query. You receive up to 25 Points for Step 1, 25 Points for Step 2, 25 Points for Step 3, and 25 Points for Step 4.