COMP9311-comp9311代写|学霸联盟

COMP9311-comp9311代写

时间：2023-03-29

Project 1 SQL
COMP9311 23T1
The deadline for project 1 is: Fri 16:59:59 Mar 31st 2023(Sydney Local Time)
1. Aims
This project aims to give you practice in
• Reading and understanding a moderately large relational schema (MyMyUNSW).
• Implementing SQL queries and views to satisfy requests for information.
• Implementing PL/pgSQL functions to aid in satisfying requests for information
• The goal is to build some useful data access operations on the MyMyUNSW database. The
data may contain some data inconsistencies; however, they won’t affect your answers to the
project.
2. How to do this project:
• Read this specification carefully and completely
• Familiarize yourself with the database schema (description, SQL schema, summary)
• Make a private directory for this project, and put a copy of the proj1.sql template there
• You must use the create statements in proj1.sql when defining your solutions
• Look at the expected outputs in the expected_qX tables loaded as part of the check.sql file
• Solve each of the problems below, and put your completed solutions into proj1.sql
• Check that your solution is correct by verifying against the example outputs and by using
the check_qX() functions
• Test that your proj1.sql file will load without error into a database containing just the original
MyMyUNSW data
• Double-check that your proj1.sql file loads in a single pass into a database containing just the
original MyMyUNSW data
• Submit the project via moodle
• For each question, you must output result within 120 seconds on Nw-syd-vxdb server.
• Hardcode is strictly forbidden.
3. Introduction
All Universities require a significant information infrastructure in order to manage their affairs. This
typically involves a large commercial DBMS installation. UNSW's student information system sits
behind the MyUNSW web site. MyUNSW provides an interface to a PeopleSoft enterprise
management system with an underlying Oracle database. This back-end system (Peoplesoft/Oracle)
is often called NSS.
UNSW has spent a considerable amount of money ($80M+) on the MyUNSW/NSS system, and it
handles much of the educational administration plausibly well. Most people gripe about the quality of
the MyUNSW interface, but the system does allow you to carry out most basic enrolment tasks online.
Despite its successes, MyUNSW/NSS still has several deficiencies, including:
• no waiting lists for course or class enrolment
• no representation for degree program structures
• poor integration with the UNSW Online Handbook
The first point is inconvenient, since it means that enrolment into a full course or class becomes a
sequence of trial-and-error attempts, hoping that somebody has dropped out just before you attempt
to enroll and that no-one else has grabbed the available spot.
The second point prevents MyUNSW/NSS from being used for three important operations that would
be extremely helpful to students in managing their enrolment:
• finding out how far they have progressed through their degree program, and what remains to
be completed
• checking what are their enrolment options for next semester (e.g., get a list of available
courses)
• determining when they have completed all the requirements of their degree program and are
eligible to graduate
NSS contains data about student, courses, classes, pre-requisites, quotas, etc. but does not contain
any representation of UNSW's degree program structures. Without such information in the NSS
database, it is not possible to do any of the above three. So, in 2007 the COMP9311 class devised a
data model that could represent program requirements and rules for UNSW degrees. This was built
on top of an existing schema that represented all the core NSS data (students, staff, courses, classes,
etc.). The enhanced data model was named the MyMyUNSW schema.
The MyMyUNSW database includes information that encompasses the functionality of NSS, the UNSW
Online Handbook, and the CATS (room allocation) database. The MyMyUNSW data model, schema
and database are described in a separate document.
4. Setting Up
To install the MyMyUNSW database under your Nw-syd-vxdb server, simply run the following two
commands:
$ createdb proj1
$ psql proj1 -f /home/cs9311/web/23T1/proj/proj1/mymyunsw.dump
If you've already set up PLpgSQL in your template1 database, you will get one error message as the
database starts to load:
psql:mymyunsw.dump:NN: ERROR: language "plpgsql" already exist.
You can ignore the above error message, but all other occurrences of ERROR during the load needs to
be investigated.
If everything proceeds correctly, the load output should look something like:
SET
SET
SET
SET
SET
psql:mymyunsw.dump:NN: ERROR: language "plpgsql" already exists
... if PLpgSQL is not already defined,
... the above ERROR will be replaced by CREATE LANGUAGE
SET
SET
SET
CREATE TABLE
CREATE TABLE
... a whole bunch of these
CREATE TABLE
ALTER TABLE
ALTER TABLE
... a whole bunch of these
ALTER TABLE
Apart from possible messages relating to plpgsql, you should get no error messages.
The database loading should take less than 60 seconds on Nw-syd-vxdb, assuming that Nw-syd-vxdb
is not under heavy load. (If you leave your project until the last minute, loading the database on Nw-
syd-vxdb will be considerably slower, thus delaying your work even more. The solution: at least load
the database Right Now, even if you don't start using it for a while.) (Note that
the mymyunsw.dump file is 50MB in size; copying it under your home directory or your /srvr directory
is not a good idea).
If you have other large databases under your PostgreSQL server on Nw-syd-vxdb or if you have large
files under your /srvr/YOU/ directory, it is possible that you will exhaust your Nw-syd-vxdb disk quota.
Regardless, it is certain that you will not be able to store two copies of the MyMyUNSW database
under your Nw-syd-vxdb server. The solution: remove any existing databases before loading your
MyMyUNSW database.
Summary on Getting Started
To set up your database for this project, run the following commands in the order supplied:
$ createdb proj1
$ psql proj1 -f /home/cs9311/web/23T1/proj/proj1/mymyunsw.dump
$ psql proj1
... run some checks to make sure the database is ok
$ mkdir Project1Directory
... make a working directory for Project 1
$ cp /home/cs9311/web/23T1/proj/proj1/proj1.sql Project1Directory
The only error messages produced by these commands should be those noted above. If you omit any
of the steps, then things will not work as planned.
5. Important Advice Before You Start
The database instance you are given is not a small one. The first thing you should do is get a feeling
for what data is there in the database. This will help you understand the schema better and will make
the tasks easier to understand. Tip: study the schema of each table to see how tables are related and
try write some queries to explore/ understand what each table is storing.
$ psql proj1
proj1=# \d
... study the schema ...
proj1=# select * from Students;
... look at the data in the Students table ...
proj1=# select p.unswid,p.name from People p join Students s on (p.id=s.id);
... look at the names and UNSW ids of all students ...
proj1=# select p.unswid,p.name,s.phone from People p join Staff s on (p.id=s.id);
... look at the names, staff ids, and phone #s of all staff ...
proj1=# select count(*) from Course_Enrolments;
... get an idea of the number of records each table has...
proj1=# select * from dbpop();
... how many records in all tables ...
proj1=# ... etc. etc. etc.
proj1=# \q
Read these before you start on the exercises:
• The marks reflect the relative difficulty/length of each question.
• Work on the project on the supplied proj1.sql template file.
• Make sure that your queries work on any instance of the MyMyUNSW schema;
don't customize them to work just on this database; we may test them on a different database
instance.
• Do not assume that any query will return just a single result; even if it phrased as "most" or
"biggest", there may be two or more equally "big" instances in the database.
• When queries ask for people's names, use the People.name field; it's there precisely to
produce displayable names.
• When queries ask for student ID, use the People.unswid field; the People.id field is an internal
numeric key and of no interest to anyone outside the database.
• Unless specifically mentioned in the exercise, the order of tuples in the result does not
matter; it can always be adjusted using order by. In fact, our check.sql will order your results
automatically for comparison.
• The precise formatting of fields within a result tuple does matter, e.g., if you convert a number
to a string using to_char it may no longer match a numeric field containing the same value,
even though the two fields may look similar.
• We advise developing queries in stages; make sure that any sub-queries or sub-joins that
you're using works correctly before using them in the query for the final view/function
• You may define as many additional views as you need, provided that (a) the definitions
in proj1.sql are preserved, (b) you follow the requirements in each question on what you are
allowed to define.
• If you meet with error saying something like “cannot change name of view column”, you can
drop the view you just created by using command “drop view VIEWNAME cascade;” then
create your new view again.
Each question is presented with a brief description of what's required. If you want the full details of
the expected output, look at the expected_qX tables supplied in the checking script (check.sql) once
we release it.
6. Tasks
To facilitate the semi-auto marking, please pack all your SQL solutions into view/function as defined
in each problem (see details from the solution template we provided).
Question 1 (4 marks)
Define a SQL view Q1(subject_code):
Give the code of subjects offered by School of Information Systems, Technology
and Management that mentions at least one COMP course in its equivalent description. The
school name refers to the longname in the related table. The equivalent description refers to the
_equivalent column.
• subject_code should be taken from Subjects.code field.
Question 2 (4 marks)
Define an SQL view Q2(course_id) that gives the id of the courses where all the labs in each
course are held in different rooms, where LAB refers to the unswid in the related table. Only
consider the courses offered in year 2011 and having the word data(regardless of case) in
Subjects.name.
• course_id should be taken from Courses.id field.
Question 3 (4 marks)
Define a SQL view Q3(unsw_id, name) that gives all the distinct students who have unwsid
started with 3210 and enrolled in at least two courses in one semester. Only consider the courses
whose staff involve at least one professor. Professor refers to the Prof in People.title.
• unsw_id should be taken from People.unswid field;
• name should be taken from People.name field.
Question 4 (5 marks)
Define an SQL view Q4(unsw_id,program) that gives the students and distinct programs
they enrolled, where the students should enroll in at least two streams for the same program, and
the stream types of programs should involve Major and Research (refers to
stream_types.description).
• unsw_id should be taken from People.unswid field;
• program should be taken from programs.id.
Question 5 (5 marks)
Define a SQL view Q5(unsw_id, program, course)that gives the students, programs and
courses where the program and the course are enrolled in the same term in year 2005 but are
offered by different organizations. Only consider the courses that students have achieved 98
(refers to the course_enrolments.mark).
• unsw_id should be taken from People.unswid field;
• program should be taken from Programs.id field;
• course should be taken from courses.id field.
Question 6 (5 marks)
Define a SQL view Q6(school_id, building_count)that gives the id of orgunits and
number of buildings they used for offered courses in 2011, where the number of buildings is more
than School of Mathematics & Statistics (Name of school refers to
orgunits.longname). Only consider the courses belonging to subjects that are directly offered
by the organization unit that is a School (refer to orgunit_types.name).
• school_id should be taken from Orgunits.id field;
• Return building_count as integer.
Question 7 (5 marks)
Define a SQL view Q7(course_id, staff_name) that gives the id of the courses whose
classes have the highest number of facilities, and the name of the course_staff of these
courses. Only consider the course staff that are from an organization whose phone number starts
with 9382.If a course has more than one course staff, return all of them.
• course_id should be taken from Courses.id field;
• staff_name should be taken from People.name field.
Question 8 (6 marks)
Define SQL view Q8(unsw_id, name) that gives all the distinct students who have already
graduated as Bachelor and are not eligible to graduate as Master before 2012.
• Bachelor degree refers to the degree type whose name contains the word Bachelor;
• Master degree refers to the degree type whose name contains the word Master.
A student that is valid for graduate should satisfy the following conditions in one program:
• the total UOC (refer to subjects.uoc) earned in the program (exclusive) should be no less
than the required UOC of the program (refer to programs.uoc);
• A student can only earn the UOC for a course where he/she has received mark greater than or
equal to 60.
The view should return the following details about each student:
• unsw_id should be taken from People.unswid field;
• name should be taken from People.name field.
Note:
• A course belongs to a program if this student enrolls into course and program in the same
semester (refer to Semesters.id).
Question 9 (6 marks)
Define a PL/pgSQL function Q9(unswid integer) that takes an unswid of a student and returns
the code of courses that the given student enrolled in, the unswid of distinct tutors for these
courses and the number of their employed days. Only consider the courses which have at least one
same-prefix course as prerequisite.
• Tutor refers to staff_roles that includes the word Tutor in its name;
• Same-prefix courses are the courses which have the same first four characters;
• Prerequisite refers to _prereq in the related table;
• Number of employed days refers to the number of days between the employed date of a staff
and the starting date of semester of the given course.
Each line of output (in text type) should contain the following three elements and they are
concatenated with a space:
 A course code should be taken from Course.id field;
 An unswid should be taken from People.unswid field;
 The number of employed days should be an integer.
Question 10(6 marks)
Define a PL/pgSQL function Q10(year courseyeartype, term character(2),
orgunit_id integer)that takes a year, a term and an orgunit id. Output the students who
enrolled in the program provided by the given orgunit in the given term and had the average mark
greater than 85 in the given term, and the average mark.
• year is taken from Semesters.year field;
• term is taken from Semesters.term field;
• orgunit_id is taken from Orgunits.id field;
• We only count the average mark of passed courses.
Each line of the output (in text type) should contain the following two elements and they are
concatenated with a space:
• a unswid of student which is taken from People.unswid field;
• the average mark (as numeric(4,2)) of the related course.
A student passes a course if he/she obtains a grade in {PC, PS, CR, DN, HD} for this course and the
mark is not null. If a student did not pass any course in the given term, i.e., the divisor is zero, ignore
this student in the output. See check file for examples. We will only test with valid inputs.
7. Submission
You can submit this project by doing the following:
• Students must submit an electronic copy of their answers to the above questions to the
course website in Moodle.
• The file name should be proj1_studentID.sql (e.g., proj1_z5100000.sql).
• If you submit your project more than once, the last submission will replace the previous one
• In case that the system is not working properly, you must take the following actions:
• Please keep a copy of your submitted file on the CSE server. If you are not sure how, please
have a look at taggi.
The proj1.sql file should contain answers to all the exercises for this project. It should be completely
self-contained and able to load in a single pass, so that it can be auto-tested as follows:
• A fresh copy of the MyMyUNSW database will be created (using the schema
from mymyunsw.dump)
• The data in this database may be different from the database that you're using for testing
• A new check.sql file may be loaded (with expected results appropriate for the database)
• The contents of your proj1.sql file will be loaded
• Each checking function will be executed, and the results recorded
8. Check your Answers
Before you submit your solution, you should check that it loads correctly for testing by using
something like the following operations. For function questions, we provide five testcases for each
question (E.g., for question 10, they are q10a to q10e). Testcases can be found from line215 in
check.sql:
$ dropdb proj1 ... remove any existing DB
$ createdb proj1 ... create an empty database
$ psql proj1 -f /home/cs9311/web/23T1/proj/proj1/mymyunsw.dump ... load the
MyMyUNSW schema and data
$ psql proj1 -f /home/cs9311/web/23T1/proj/proj1/check.sql ... load the checking code
$ psql proj1 -f proj1.sql ... load your solution
$ psql proj1
proj1=# select check_q1(); … check your solution to question1
…
proj1=# select check_q6(); … check your solution to question6
…
proj1=# select check_q10a(); … check your solution to question10 testcase (a)
proj1=# select check_all(); … check all your solutions
Notes:
1. You must ensure that your proj1.sql file will load and runs correctly (i.e., it has no syntax errors,
and it contains all your view definitions in the correct order).
a. If your database contains any views that are not available in a file somewhere, you should
put them into a file before you drop the database.
b. If we need to manually fix problems with your proj1.sql file in order to test it (e.g., change
the order of some definitions), you will be fined via half of the mark penalty for each
problem.
c. If your code loads with errors, fix it and repeat the above until it does not.
2. In addition, write queries that are reasonably efficient.
a. For each question, you must output result within 120 seconds on Nw-syd-vxdb server.
This time restriction applies to the execution of the ‘select * from check_Qn()’ calls.
b. For each question, you will be fined via half of the mark penalty if your solution cannot
output results within 120 seconds.
9. Late Submission Penalty
5% reduction (-2.5 out of 50) for each 24 hours after the deadline date and time. Submissions that
are more than five days late will not be marked.