sql代写-MISM 6210
MISM 6210 - Week 5 Handout
Myles D. Garvey, Ph.D
Spring, 2021
1 Agenda for Today
• Announcements and Questions (8:00am - 8:10am)
• Prepare for Groups (8:10am - 8:20am)
• MySQL and SQL Basics (8:20am - 9:00am)
• Break (9:00am - 9:10am)
• An Introduction to Tableau (9:10am - 10:30am)
2 Basics of SQL
When we are presented with a database, we often need to prepare a dataset for analysis. As we
have discussed in the prior weeks, data which is in "storage format" is not suitable for analysis.
This is due to the fact that usually, data is stored in multiple tables using complicated relation-
ships across the database to connect the tables. Furthermore, when we prepare a dataset for
analysis, the unit of analysis may be different than what is in the database. We may necessitate
a special unit of analysis that is "smaller" or "larger" than the unit stored in the database. When
this is the case, we must think carefully about the intended unit (or level) of analysis, as well as
the attributes (i.e. the columns, variables, etc) we would like to have in the dataset.
When we need to prepare a data-set for analysis, and conduct operations to aggregate the
data, Structured Query Language (SQL) aids us in this process. Unlike other programming lan-
guages, SQL is a language that is entirely data-driven. The entire language rests on a collection of
operations called statements. The statements are centered around the operations that can be con-
ducted on a tables of data. Generally, these operations involve (but are not limited to) SELECT,
UPDATE, INSERT, and DELETE. However, in our set of notes here, we will focus primarily on
SELECT, which is the primary mechanism to retrieve information from a database (or, as we’ll
see later in this semester, in R/Tableau/etc.)
MISM 6210 - Week 5 Handout
2.1 The SELECT Statement
When we would like to extract data from a database, we can use the SELECT statement. Con-
structing SELECT is an art and a science. It involves deep thinking about which final unit of
analysis and the collection of variables (i.e. columns) you would like in your final data table
(intended for analysis). This is not always as obvious as it may appear. Generally, a SELECT
is structured in the same manner as any other SELECT statement. It begins with the word "SE-
LECT", which is followed by the names of the columns you would like to select. This is then
followed by a FROM keyword, which instructs the database to select columns from one or many
tables. If it is only from a single table, life is good. In most applications, however, you are
selecting columns from more than one table. This is where the art of table joins come into play.
Put simply, a table join is a grouping of columns from different tables at the same unit (or
level) of analysis. After you instruct the database which table(s) you seek to select from (which
is sometimes followed by a JOIN statement), you follow this with the GROUP BY statement,
if needed. Sometimes, when you pull data from a database, you need to aggregate data into
smaller numbers. For example, you may happen to have data on students, but you would like
information about classrooms. Hence, you need a way to combine student information per
classroom together (usually, this is found by computing parameters/statistics such as means,
medians, standard deviations, etc). Which additional unit you’re adding into the unit of analysis
to make it "higher" needs to be specified by "GROUP BY" (in this example, you would "group by"
classroom, since this would be the new unit of analysis). Last, you need to filter out any rows of
data you may not want. Hence, in summary, you need to specify:
• The columns
• From which tables (possibly joined together)
• Any table joins you must undertake.
• Any group by statements if you are using a different level of analysis.
• Any "WHERE" clauses to filter our rows of data you do not wish to have.
2.2 INNER Table Joins
In various databases, we have data split across smaller tables, where each table has its own unit
of analysis. For example, we may have the following two tables:
Course_ID Name Location
0 Foundations of Statistics Smith Hall
1 Supply Chain Management Jack Auditorium
2 Data Science The Library
Student_Name Course_ID
Bob Kelly 0
Joe Locke 0
Ahmed Smith 1
Susan Miller 0
Shannon Pillar 1
MISM 6210 - Week 5 Handout
So what if we wanted a single table that had the student name, course name, and location?
We can do this by using an INNER JOIN statement. Here is how this works. We type:
SELECT t1.Student_Name, t2.Name, t2.Location
FROM student as t1
INNER JOIN course as t2
ON t1.Course_ID = t2.Course_ID
The final columns will simply be three: student name, course name, and location. The way
that SQL will stitch this dataset together will be based on the course id. It will select the unit
of analysis to be that of table student. It will then go to the course table and look up the other
information affiliated with the corresponding id. After, it will take this information and copy it
over to the final table to be produced. The final table will hence look like this:
Student_Name Name Location
Bob Kelly Foundations of Statistics Smith Hall
Joe Locke Foundations of Statistics Smith Hall
Ahmed Smith Supply Chain Management Jack Auditorium
Susan Miller Foundations of Statistics Smith Hall
Shannon Pillar Supply Chain Management Jack Auditorium