INFS5710-infs5710代写
时间:2023-11-08
INFS5710 Information Technology Infrastructure for Business Analytics
Assessment 3
Project Statement
(Due by 12 PM on Monday 13 November 2023 via Moodle)
• This project accounts for 25% of the total marks for this course.
• The deliverable is a PowerPoint file with video narration and speaker notes.
Bike-sharing has witnessed a surge in popularity worldwide as an affordable transportation option.
Currently, these programs are operational in approximately 1,000 cities, boasting over half a million
bicycles in circulation. The concept behind bike-sharing is elegantly simple: individuals gain access to
bicycles as needed, freeing them from the expenses and responsibilities associated with bike
ownership. This system offers short-term bicycle usage, promoting an eco-friendly mode of public
transportation. Designed to cater to daily commuting needs, it allows users to conveniently retrieve and
return public bicycles at unattended bike stations, with the entire process being self-service. Typically
concentrated in urban areas, bike-sharing programs feature multiple station locations, facilitating the
pickup and return of bicycles at various stations throughout the city.
This project is about the bike-sharing scheme in the metropolitan area of Los Angeles (LA), a large city in
the US with a population of several million. You are a business consultant working for the bike-sharing
program.
This project revolves around the bike-sharing program in the metropolitan area of Los Angeles (LA), a
major U.S. city with a population of several million. You and your fellow group members are working as
business consultants for the bike-sharing program.
2
Bike-sharing Data
The manager of the bike-sharing company has directed you to access historical bike-sharing data
available at the following website: https://bikeshare.metro.net/about/data/. This dataset encompasses
records of nearly 2 million bike trips conducted from July 2016 (Q3) through to September 2023 (Q3).
Ref: https://bikeshare.metro.net/about/data/
However, you do not need to download all the trip data. You only need to download the period related
to the size of your group. Thus, we do not expect a one-member group to generate more outputs than
a three-member group.
Number of Members
in Group
Period Number of Quarters
of Data
1 Q4 2022 to Q3 2023 4
2 Q4 2021 to Q3 2023 8
3 Q4 2020 to Q3 2023 12
Ref: https://bikeshare.metro.net/about/data/
3
Calculation of Distance
You can also find in the data files the locations of bike stations in the GPS coordinate system. For
example, the coordinate of a station is (x, y), where x is the longitude coordinate and y is the latitude
coordinate. The following link helps you to understand more about the GPS coordinate system:
https://www.ubergizmo.com/how-to/read-gps-coordinates/
Suppose a bike rental starts from (1, 1) and ends at (2, 2),
how should you estimate the distance travelled? In this project, it is
recommended that you estimate it using the so-called taxicab
distance, which is |1 − 2| + |1 − 2|. See the following figure
for interpretation.
For more information, please see
https://study.com/academy/lesson/taxicab-geometry-history- formula.html.
Weather data
Weather plays an important role when people decide whether or not to use bike-sharing. You are
required to explore the relationship between weather (e.g., temperature, wind speed and humidity) and
the bike-sharing rentals in this project. Unfortunately, there is no easy way to download free historical
weather data. The following provides a way to manually capture weather data month by month from
Weather Underground (wunderground.com).
• First visit https://www.wunderground.com/ and try to search the weather conditions in Los
Angeles. (There are other locations that you may also try, e.g., Santa Monica and Valley
Village, where there are many bike stations as well.) You will be led to the site of a weather
station near Los Angeles, which may be different from time to time.
• Click the History tab on the page, and then choose to view Monthly weather data. Once you
choose a month, click View. For example, the following link shows the weather data of
September 2023 measured at the Burbank station (near Los Angeles):
Burbank, CA Weather History | Weather Underground (wunderground.com)
• Scroll down the page, and you will see the table of Daily Observations. Use your mouse to
select and copy the table and paste it into an Excel spreadsheet.
• Copy only the data required for this project.
Holiday data
Another factor that influences bike-sharing rentals is holidays. You can easily search the dates of federal
holidays in the US for each year.
4
Tasks
In this project, you are expected to manage and clean the data collected; some of them may contain
missing data, different formatting, and incomplete information. The goal is to overcome such obstacles
commonly encountered and enable the extraction of valuable business insights from the datasets. These
insights will be instrumental in advancing and promoting the bike-sharing program in Los Angeles.
(1) Entity Relationship Model or Star Schema
Before proceeding with data analysis, it is necessary to download and store the data in tables. You have
the flexibility to choose between creating an entity-relationship model or a star schema (data
warehouse). Both options are equally valid, but your group must make a collective decision on which
model or schema to implement. It is imperative to model all the necessary tables for your analysis,
encompassing primary and foreign keys, and attributes. Ensuring that the tables align with the chosen
model or schema is essential for the accuracy and coherence of your data analysis in the next section.
(2) Data Analysis
The manager of the bike-sharing company asked you to collect and analyse the data and “let the data
speak for itself.” You understand that the company wants to further grow the market and attract
more users. Before they do it, they want to have some insights from the data:
“Toss it, flip it, turn it inside out, combine it with your own or someone else's data! Tell us
how many trips happened after midnight, or near your event or business - we invite you to
delve into the data and reveal the hidden gems of understanding that will allow us to make
Metro Bike Share the best it can be.”
The following are some of the common analyses that you may consider:
• Station analysis: For example, what stations are most popular (for start or destination)? At what
times?
• Trip analysis: For example, what routes are most popular, one-way, or round trip? What is the
average distance of trips? Are most trips within a city or across cities?
• Time analysis for demand: For example, what time or day has a higher demand?
• Holiday analysis: How do holidays affect the demand?
• Weather analysis: How weather influences the demand?
• Customer / Subscribe analysis: For example, what type of pass is most popular? Does it have any
relation with the trips?
Regardless of the analysis topics that you choose, you must conduct a chronological analysis for each
topic chosen. For example, how has the daily, weekly, or quarterly demand pattern evolved over the
past few years? The introduction of motorbikes in late 2018 and COVID-19 between 2020 and 2021 may
have significantly changed customers’ demand for bike sharing. Basically, you should study pattern
changes over years or quarters or seasons rather than present an overall, averaged result, per se.
Therefore, it is preferred that each group member studies one topic in depth rather than multiple ones
superficially.
5
SAS Enterprise Guide (EG)
You are required to only use SAS Enterprise Guide (EG) for this project. To begin with the ETL (extract,
transform and loading) process, you need to prepare your data in proper tables that will go into SAS.
That is, you need to create tables in the SAS environment. Remember your tables must match your ERD
or Star Schema.
Whenever you want to conduct an analysis (e.g., trip analysis), you must write a query to select relevant
attributes by properly joining multiple tables to obtain a resultant table for a specific analysis. See the
Appendix for using some common data analysis functions of SAS EG. More features of SAS EG will be
introduced in a tutorial session later.
Finally, please note that the management (or the LIC) does not know anything beyond this project
statement. Therefore, you need to use your own judgment and make necessary and reasonable
assumptions when doing this project. Make sure to present all assumptions made in the project.
6
Project Presentation
• A maximum time of presentation is 10 minutes for 3-member group; 1 or 2-member group can
present less than 8 minutes.
• You must come to your tutorial to present your work.
• PowerPoint Slides (for a three-member group):
o First slide – it should have your group number, names, and pictures, and the order of the
presenters for the slides. We do not want to give marks to the wrong presenter. You
should introduce yourself and your fellow group members.
o Main slides – a maximum of 9 PowerPoint slides to be used for the presentation (1 or 2-
member group can have less than 9 slides). It is up to each group to decide how the slides
are presented.
▪ 1 to 2 slides: Briefly describe your ERM or Star Schema. Your model/schema must
match your tables. You can discuss how you prepare the data for analysis, including
how you clean data, manage missing information, and you organise tables that go
into SAS.
▪ For the rest slides, you can divide them into several topics such as Topic I, Topic II,
and so on. For each topic, you should describe the research question, major
findings (in terms of data visualisation such as charts), business insights, and
recommendations. The graphs and tables produced from the SAS Enterprise Guide
must look reasonable, that is, we can read the numbers, and the results make sense.
It is recommended each group member presents his/her work.
▪ As time is limited, we accept if you do not have a conclusion slide.
o During the presentation, please introduce yourself before presenting the content. Again,
we do not want to give a mark to the wrong presenter.
o Appendix slides - Slide number 11 onwards are treated as Appendices, where you can put
the SQL statements, data analyses not used in the presentation, data cleaning output, and
other details you can put on the Appendix slides.
• All group members must present as individual marks will be given for an individual’s presentation.
If an individual does not contribute to the presentation, then s/he will get a zero mark for the
presentation.
(a) [Mandatory] Each group must present one descriptive analysis using the SAS Enterprise
Guide to show the datasets you have selected.
(b) [Mandatory] Each group member must create two outputs for two presentation slides. All
the outputs must be created using the SAS Enterprise Guide.
(c) [Optional] Each group has the option to present one output not using the SAS Enterprise
Guide. If the group decides to exercise this option, then one of the group members can
replace one of the outputs in (b) here. Furthermore, you must say what application you
used, and how you achieved the output from the raw data. For example, a group of three
can have five outputs/slides created by the SAS Enterprise Guide and one output/slide
created by not using the SAS Enterprise Guide.
• You are expected to add notes to the speaker notes, at least for the Main slides. However, you do
not have to speak in exact wording as on the slides (like me in the lecture). The number of words is
limited to +/- 300 words.
• On the top of each slide, you must indicate who is responsible for writing and/or presenting the
slide. However, the slides AND the presenter’s face (highly recommended) should be shown in the
video. You can also do the same for the Appendix slides.
• The file name should be as follows: let’s say your Group ID is T19A-03, then “GroupID-
Presentation.pptx” will be T19A-03-Presentation.pptx.
7
• For each slide, please include the name(s) of the group member(s) who wrote, and which group
member presents on the corresponding slide.
▪ This slide was written by Vincent Pang and Silvia Lin
▪ This slide was presented by Silvia Lin
• Support for Oral Presentations | UNSW Current Students
• Presentn PowerPoint Using PowerPoint | UNSW Current Students
8
Deliverables
(a) This project accounts for 25% of the total marks for this course.
(b) Your project report is due in Week 10, Monday, 13 November 2023, 12 pm (Sydney Time) on
Moodle in two submission boxes:
o Submission box for PowerPoint slides and
o Submission box for other files - data files, and script files.
(c) Only the group leader needs to submit.
(d) Only cleaned SAS datasets/tables you used in your data analysis (Please clearly label all the
datasets/tables) are needed to upload.
(e) All SAS SQL scripts you used – please clearly indicate which script is related to which output on
your slide. Use comments in your file to indicate what you are doing.
(f) You need to include your metadata for all the data and tables you used. You can include the
metadata in an Excel spreadsheet or a Word Document. You can include it as part of the file in (g).
(g) You have an Excel spreadsheet or a Word Document to say the purpose of each of the files you
have uploaded. You can name this file as “Readme”. You can include (f) in this file. This will give
the markers an idea the purpose of each of the files.
(h) You do not need to submit a written report.
9
Marking Guideline
Item (%) Description
Data preparation and
Design (15%)
Did you select any external datasets to support your data analysis?
Did you properly manage the missing data?
Did you properly process the tables used in the SAS Enterprise Guide?
How well is your design of ERD or Star Schema?
Quality of the data analysis
(35%)
Are your research questions or hypotheses meaningful to the business? That
is, can the data support your argument?
Have you properly analysed the data with the right functions or steps?
Have you provided proper data visualisation (for example, table or graph) to
present and support your analysis? [Note: graphs produced in SAS are not the
best, but it is the numbers we look for and not the prettiness of the graph.]
How well have you written in the speaker notes in the PP slides?
[Note: marks might be adjusted for the complexity of the data analysis]
Quality of business insights
obtained and
recommendation (25%)
Do you obtain business insights from the data?
Are your obtained insights helpful for business?
Do you provide proper recommendations to make use of the obtained
insights?
How well have you written in the speaker notes in the PP slides?
Presentation quality (15%) How well did your group present?
How well did you, as an individual, present?
[It is expected the presentation times are evenly distributed among the group
members.]
Teamwork (10%) How well has your team conducted in teamwork?
How did you communicate on Teams?
Can meeting minutes, plans, chats or files be found on Teams?
Total (100%)
Note: we assume the PowerPoint slides are produced at a professional level.
10
Group Assignment QnA Channel (in Microsoft Teams)
All questions must be posted in the group assignment (see
https://teams.microsoft.com/l/channel/19%3a906fb9e41cc24324898c15c7f7e745ef%40thread.tacv2/Cla
ss%252002%2520Group%2520Assignment?groupId=9144e682-2580-45e4-818c-
f7955954116a&tenantId=3ff6cfa4-e715-48db-b8e1-0867b9f9fba3).
Emailing questions related to the group assignment to LiC and tutors will not be answered. To be fair to
all the students, we do not want to be seen to be biased to one group by giving them extra information.
The due date is Monday, 13 November 2023, 12 pm.
Please note that we will stop answering questions after Friday 10 November 2023, 12 pm. This is to
ensure you plan ahead, and not panicking at the last minute. In the last few days, you probably just have
to finalise your PowerPoint presentation.
11
Private Group Channel (in Microsoft Teams)
(a) A private group channel will be created in Microsoft Teams by the teaching team
after the group formation.
(b) The name of your private group channel is the same as your group ID, e.g.,
T19A-03.
(c) Only your group members and the teaching team will have access to your
private group channel. That is, no member from outside the group can access to
your private group channel.
(d) All conversations and files saved in the private group channel remain in the
private group channel.
(e) The purpose of the private group channel is to work as an environment for you
to meet, chat, leave messages, and upload/download files. Moreover, the
teaching team can communicate directly with the group in this channel.
(f) All the posts, files, meeting times and other group activities will be saved in the
channel.
(g) Another important factor of using the private group channel in Microsoft Teams
is it is transparent to all the group members and the teaching team. When there
is a dispute between the group members, the LiC will only examine evidence
such as posts, meeting activities and uploaded files in the private group channel.
The LiC will not examine other communication channels such as WhatsApp,
Google Doc, Facebook, and WeChat.
(h) Every week, you are expected to create a weekly folder and upload all the work
you have done for that week or to date to the folder. You can think of this folder
as a backup folder (you can perform this task on Saturday or Sunday). This will
also encourage all group members to deliver work for the week. These folders
will be used as part of the marking for teamwork.
12
Trello or Excel for Planning Tasks
• Planner such as Trello or something similar which can be found in Microsoft Excel can be
used to improve planning of tasks.
• The whole team can access the planner so there will be no excuse such as, “I don’t know
what’s going on”.
• Notifications can be used as a reminder task to be completed.
• Planner should be maintained by the team leader or an appointee. All other group
members should also assist in managing the tasks in the Planner.
13
Teamwork
One of the key PLOs is 4 Teamwork. We will examine the activities such as chat and planner in the
channel as the means of assessing teamwork. Groups must plan, schedule, and conduct activities in due
time. Groups must meet on a regular basis (at least twice per week) while the assignment is being
undertaken and keep records (diaries, meeting minutes) of such meetings. The groups must ensure that
all members are involved in the completion of the assignment. The work is to be divided equally among
the group members.
All group members are expected to behave professionally and work diligently. Group members should
contribute in a useful and constructive way to the teamwork. Deadlines should be kept, and work should
be delivered at a professional standard. If problems emerge in your group, then these problems should in
the first instance openly be discussed in the group (different members might have different views) and
resolutions should be agreed on. If internal arrangements repeatedly fail to remedy the situation, then
you should bring the issues to the attention of the LiC. The LiC may call a meeting of the group in which
each group member will be asked to describe in detail his or her input into the assignment and provide
supporting documentation of this effort. If group members are found to be making an inadequate effort
or delivering poor quality, then they will be counselled to improve their effort. If sufficient improvement
is not made despite group efforts and LiC interventions, then the mark of underperforming group
member(s) may be moderated to reflect the relatively lower input into the assignment. Note that the
inability to resolve internal group conflicts without involving the LiC does not reflect well on the group’s
project management and teamwork skills.
An important part of a project is to record and evaluate teamwork. You need to keep all the
communications such as meetings and chats, and upload/download files in Teams. We can just view how
teamwork is performed. If one member consistently asks for parts from all the members, but no one
responds until the last two days before the submission, then this is not teamwork. The planner (assuming
you use Excel) should be pinned on top of the tab, so everyone knows what to do.
Keeping minutes of a meeting is critical, it must record the following details for each group activity:
• Record what the activity (meetings, work) entailed.
• Record location, time, date and duration of a group activity.
• Record who was present at the activity.
• For “next actions”: Specify who is doing what by when (Action plan)
• The team leader should post the meetings on Teams to all the members.
The Team Leader is responsible for recording the minutes unless other arrangements are made within
the group.
14
Self and Peer Assessments
You are strongly recommended to do self and peer assessments in Review. Most of the times, the
contributions are evenly distributed as shown below:
However, occasionally, you might have a “free rider” (i.e. student who does not do any work but has his
or her name on the front cover) or people who just want to do the minimum, then you might the
contributions as shown below:
The self and peer review assessment will allow you to rate your group members’ contribution.
❑ Rate your group members and your own contribution to the group assignment on a scale out of 5:
▪ 5 = Significantly above expectations (very strong contribution in terms of quality and
quantity, leadership of the project)
▪ 4 = Slightly above expectation (strong contribution in terms of quantity and quality)
▪ 3 = Meeting expectations (did his/her fair share)
▪ 2 = Slightly below expectations (did some work, but could have been more and/or of better
quality)
▪ 1 = Significantly below expectations (did very little work and/or of poor quality)
▪ 0 = Did not participate at all / free riding
❑ Equal contribution is expected.
▪ Please note that simply doing the final proof-reading or making a cup of tea does not count
as equal contribution.
15
▪ Unequal contribution might lead to redistribution of the marks of the group assignment.
▪ You will be allocated a group channel in Microsoft Teams. You can record all your
communication such as meetings. All the posts and uploading of files will be all date and
time-stamped.
▪ Conflicting/inconsistent/unfair peer contribution review will lead to the group being assessed
by the Teaching Team. The Teaching Team will then examine the communication between
the members including posts, date and time stamp of the posts, meeting minutes, files and so
on in Microsoft Teams.
16
Appendix
Using Enterprise Guide for Data Analysis and Visualisation
Given a data file opened in SAS Enterprise Guide, you can see some analysis and visualisation functions
available (from the tool bar below).
Most functions are straightforward to use. Graphs can be found under Graph; some useful analysis tools
can be found under Analyze in the tool bar. You are expected to try them by yourself.
Note that the data visualisation functions only apply to a SAS data file only. When you write a query,
before you can graph the table of the query outcome, you need to save the result table as a SAS data file
using “create” statement, which has been introduced previously.
Graphing:
Line Chart
Bar Chart
Histogram
If you are not familiar with the concept of histogram, please read the following site about histogram. To
plot a histogram, choose Bar Chart Wizard. In Step 2 out of 4, choose Percentage for the Bar height.
Correlation Analysis
You may plot a 2D scatter chart first for the two variables that you want to study their correlation.
17
If a correlation is revealed from the scatter chart, you may also calculate the exact correlation between
these two variables. Assume these two variables are “Amount” and “Visits”. The following figures show
how their correlation can be calculated.
Drag Amount and Visits from the left pane to the right pane.