程序代写案例-FIT3176-Assignment 1
时间:2021-09-12
FIT3176 Assignment 1: MongoDB & Cassandra
Weight: 20% of all marks for the unit.
Due Date: Week 8, Wednesday, 15-September-2021, 11:55 pm.
Submission: (i) The assignment submission must be made through Moodle
for this unit by Wednesday, 15-September-2021, 11:55 pm.
(ii) The submission of this assignment must be in the form of a
single PDF file and a single ZIP file. No other forms will be
accepted.
Feedback on your
work
Feedback will be provided on student work within ten working
days post submission via Moodle.
Lateness: A penalty of 10% per day after the due date, including the
weekends.
Important: Tasks submitted more than seven days after the
due date will receive a mark of zero for that task.
Learning
outcomes:
LO1. Describe various types of non-relational database
systems, including NoSQL.
LO2. Compare and contrast between relational and
non-relational database design and model.
LO3. Design database systems using document-store and
column-store design techniques.
LO4. Explain the concepts of transactions in non-relational
systems.
LO5. Implement and manipulate document-store and
column-store database systems.
Authorship: This assignment is a group assignment, and the final
submission must be identifiably your own work. Breaches of
this requirement will result in an assignment not being
accepted for assessment and may result in disciplinary action.
Version Number 1.00 | 26/08/2021
FIT3176 Assignment2021-Semester2
Getting help and support:
What can you get help for?
● Consultations with the Teaching Team
Talk to the Teaching Team: http://lms.monash.edu/course/view.php?id=115965§ion=2
● English language skills
Talk to English Connect: http://www.monash.edu/english-connect
● Study skills
Talk to a learning skills advisor: http://www.monash.edu/library/skills/contacts
● Counselling
Talk to a counsellor: http://www.monash.edu/health/counselling/appointments
Extensions:
If you are experiencing difficulties that you think will impact your ability to meet this
deadline, you may apply for an assignment extension. You must apply no later than two
University working days after the due date of this assignment.
The extension application can be found on Moodle > Assessments > How to Apply for an
Extension. Please allow two business days for your application to be processed.
Please ensure your application is supported by appropriate documentation. You can find more
information about assignment extensions at the Special Consideration website.
Special Considerations:
Students should carefully read the Special Consideration website, especially the details about
what formal documentation is required.
All special consideration requests should be made using the Special Consideration
Application.
Please do not assume that submission of a Special Consideration application guarantees that
it will be granted – you must receive an official confirmation that it has been granted.
Plagiarism and Collusion:
Monash University is committed to upholding standards and academic integrity and honesty.
Please take the time to view these links.
Academic Integrity Module
Student Academic Integrity Policy
Test your knowledge, collusion (FIT No Collusion Module)
2
FIT3176 Assignment2021-Semester2
FIT3176 Group Assignment - Sem 2/2021 (Weight: 20%)
Due date: Week 8, Wednesday, 15-September-2021, 11:55pm
A. General Information and Submission
o This is a group assignment. One group consists of 2 students.
o Submission method: Submission is online through Moodle.
o Penalty for late submission: 10% deduction for each day (including weekends).
o Assignment Cover Sheet: You will need to sign the assignment cover sheet.
o Contribution Form: The contribution needs to be completed by all members and
please sign (e-signature is acceptable) the form as an agreement between members.
o Please carefully read the requirements for EACH section, especially the Task Outputs.
B. Problem Description – MAC
Monash University Agriculture Club (MAC) is recently studying the parks and different
wildlife habitats across the United States in hopes of finding any links between the parks and
wildlife of Australia.
The club has hired your team of Advanced Database Experts to use the following sample data
files to help with the parks and wildlife analysis:
● wildlife.json
● parks.csv
Note: These data are raw data that does not follow any particular schema. For more
information about the fields/columns in the data please refer to Appendix A.
For the analysis MAC has asked your team to perform the following tasks:
C. Tasks
The assignment is divided into FOUR main tasks:
Since MAC has heard about both MongoDB and Cassandra, therefore, they wish to use a
combination of both technologies to analyse the data.
3
FIT3176 Assignment2021-Semester2
C.1. Analysis using MongoDB.
Data Requirement: The data for this task is contained in the following files:
(i) wildlife.json
(ii) parks.csv
Software
Requirement:
(i) MongoDB Compass Software.
(ii) A software that can run Mongo Shell commands (e.g. VS
Code, Terminal, Command Prompt, MongoDB Compass etc.).
Overall Task C.1.
Outputs:
(i) Screenshots added to the reports for each query, with before and
after screenshots for any insert/update/delete made to the database.
(ii) All code added in a properly commented file named
C1_MongoDB.json
Task Requirements:
C.1.1. Using the MongoDB Compass, create a database called FIT3176MAC and create
one collection called parks and another collection called wildlife.
C.1.2. Using the MongoDB Compass, import parks.csv into the parks collection and
wildlife.json into the wildlife collection. All fields should be assigned appropriate
data types (i.e. whole number data should be integers, numbers with decimal points
should be double etc.).
The following tasks require the use of MongoDB Shell commands:
C.1.3. Using the newly created wildlife collection:
(i) convert all fields having multiple values to arrays (if needed). For example, if the
field commonName has values “Northern White-Tailed Deer, Virginia Deer,
White-Tailed Deer” then it would be converted to
.
(ii) convert the speciesCharacteristics and speciesParkRecords fields to object (if
needed) e.g.
(iii) after converting the fields store the new collection as wildlifeArray
C.1.4. Using the parks collection with one MongoDB Shell Command:
4
FIT3176 Assignment2021-Semester2
(i) combine the date data (from the fields parkEstDay, parkEstMonth and
parkEstYear) to proper MongoDB date data type,
(ii) store the combined date data in a new field called dateEst,
(iii) remove the fields for year, month and day,
(iv) add another field with your group name (e.g. groupName: "Group FIT3176" if
your group name is FIT3176), and
(v) store the updated collection in a new collection called parkDates.
Note: For the date data other than day, month and year, the remaining components of
the time can be taken as any value. For Task (iv), you can come up with your own
group name.
C.1.5. Add the wildlifeArray collection to the parkDates collection and store the combined
data in a new collection called parksWildlife. Each document of the parksWildlife
collection should contain all fields from the parkDates document with the
wildlifeArray data for each park nested as a field having the name wildlife, field type
array and each element of the array representing a wildlife species. Example of the
structure of the wildlife field inside a document in the parksWildlife collection:
wildlife: [
{ speciesID: ...,
parkName: ...,
speciesCharacteristics: ...,
speciesParkRecords: ... },
{ speciesID: ...,
parkName: ...,
speciesCharacteristics: ...,
speciesParkRecords: ... }....]
C.1.6. Using the newly created parksWildlife collection, remove all wildlife species and
data associated with the particular species with the common name “Mouse”.
C.1.7. Use the aggregation pipeline to answer the following queries:
Note: Marks for this section depend on the query efficiency e.g. the processing speed,
the storage, the number of queries used etc. Therefore, using too many queries to
answer a section or using temporary variables, collections, cursors (e.g. for each
loop) may incur mark penalties.
(i) What was the total number of wildlife species found in each park? Your output
should display the park id, park name and the count of wildlife species, which can be
in any format as long as it displays the required information.
(ii) Which park had the most Rare abundance species of the Plant category?
5
FIT3176 Assignment2021-Semester2
(iii) How many parks were in each state? Note: For parks which are a part of 2 states
e.g. Great Smoky Mountains National Park which is a part of TN and NC states,
would be counted as 1 park for TN and 1 park for NC.
(iv) Which park had the smallest area and was established in the month of October?
Your output should include the name of the month (i.e. October), the name of the park
and the area of the park.
(v) Display a list of unique Common Names of all species belonging to the Bird
category.
(vi) Display the park information and only the wildlife information for all parks with
wildlife having the common name “Coyote”. The output should only display the
wildlife data for coyotes and should exclude the data for all other species.
C.1.8. Using Mongo Shell commands convert the latitude and longitude for each park in
parksWildlife to MongoDB GeoJSON objects with location type as a point in a field
called location and store the modified data into a new collection called
parksWildlifeGeojson. Use MongoDB Compass’s schema tab to display the
location on a map visualisation.
C.1.9. Using mongodb’s Geospatial Query Operators and mongodb shell commands, find the
list of all parks within 50km and 500km of the park with the most Resident
seasonality species. The list should not show duplicate park names. If required you
can split this into more than one query.
C.1.10.Using MongoDB Compass, export the parksDates collection as a csv file named
parksDates.csv and the wildlifeArray as a csv file named wildlifeArray.csv to use
in Task C.2.
C.2. Analysis using Cassandra.
Data Requirement: The data for this task is contained in the file: parksDates.csv and
wildlifeArray.csv generated in Task C.1.
Software
Requirement:
(i) A software that can run Cassandra Shell (cqlsh) commands (e.g.
VS Code, Terminal, Command Prompt etc.)
6
FIT3176 Assignment2021-Semester2
Overall Task C.2.
Outputs:
(i) Screenshots added to the reports for each query, with before and
after screenshots for any insert/update/delete made to the database.
(ii) All code added in a properly commented file named
C2_Cassandra.cql.
Task Requirements:
C.2.1. Create a keyspace called FIT3176_MAC for the Cassandra database, with
SimpleStrategy and replication factor of 1.
C.2.2. Using the cassandra COPY command import the data from the parksDates.csv into
the parks table.
C.2.3. Using the cassandra COPY command, import the data from the wildlifeArray.csv
into the wildlife table with speciesCharacteristics and speciesParkRecords as their
own data types.
C.2.4. Use cassandra shell to answer the following queries:
(i) Find how many wildlife were in the database?
(ii) Find how many parks were in the state with state code WY?
(iii) What was the average area of parks with latitude greater than 24.63 and longitude
less than -82.87?
(iv) Count how many mammals were in the database?
C.2.5. Insert the following data into the appropriate tables:
speciesID 00007
parkID RAWO
park name Rainbow Wonderland
scientific name unicorn
family unicorn family
order unicorn order
category mythical creature
common name unicorn
7
FIT3176 Assignment2021-Semester2
C.3. Reflections
As a summary, you are required to provide a comparison between the three databases i.e.
Relational Database, Document-Oriented Database and Column-Oriented Database
Task Requirement:
C.3.1. Select three real-world examples where each database is used and analyse them by
providing the following:
(i) in your own words provide an explanation of how each type of database works
using your selected examples.
(ii) with the help of your selected examples provide a comparison in a tabular format
with details on the main strengths and weaknesses of each database.
MAC is happy with your work. However, they have to cut down their budget and they have
decided to use only one database (either MongoDB or Cassandra) instead of two.
Task Requirement:
C.3.2. Decide which database to use (either MongoDB or Cassandra) and provide a
non-generic explanation specifically valid for MAC on why you have selected one
database over the other.
Overall Task C.3. Outputs:
? A report specifying the requirements in Task C.3.1. and Task C.3.2 with any
references made from any sources using a proper referring style.
Note: Penalties may be applied for any generic explanations not specific to your examples
(for C.3.1.) and to MAC.(for C.3.2.). This section does not require any code submissions.
Mark penalties are applicable if references are not provided e.g. when talking about
real-world examples etc.
8
FIT3176 Assignment2021-Semester2
C.4. Connecting to Drivers (Optional Bonus Section - up to 5 marks)
Note: This section is for those who are looking for a real challenge. Even without completing
this section, there is a possibility of scoring full marks for the assignment. However, to attain
full marks for Task C.4., both of your scripts must be runnable in the MacOS/Windows
terminal/command prompt. Therefore, attempt at your own risk :)
Data Requirement: Same as Task C.1.2. and Task C.1.3.
Software
Requirement:
(i) MongoDB Compass Software.
(ii) A software that can run scripts such as Python script
commands (e.g. VS Code etc.).
Overall Task C.4.
Outputs:
(i) List of steps to take in order to connect MongoDB and
Cassandra drivers.
(ii) Two properly commented driver script files i.e. one for
MongoDB (named e.g. C4_MongoDB.py) and another for
Cassandra (named e.g. C4_Cassandra.py).
After reading your reflections in Task C.3., MAC now has more knowledge about the NoSQL
databases: MongoDB and Cassandra. They also came to know that these databases can be
incorporated into applications and be used as backend database stores.
Amazed by the idea, MAC has asked your team to create a runnable script incorporating the
code from Task C.1. and C.2. into applications (e.g. a Python application) with the help of
drivers.
Note: For details about how to use the MongoDB and Cassandra drivers you can refer to the
steps given in the driver tutorials.
Task Requirement:
(i) Provide in your report a list of steps you have taken to connect the MongoDB and
Cassandra drivers
(ii) Convert the code from Task C.1. and C.2. into runnable applications using script files
(e.g. .py files) and not Jupyter notebook files.
9
FIT3176 Assignment2021-Semester2
D. Submission Checklist
D.1. One combined pdf file containing all tasks mentioned above and named
FIT3176_A1Report.pdf.
The pdf file should contain:
i. Cover page
ii. A signed cover sheet for group assignments
iii. A contribution declaration form, e.g.:
Each student must state the parts of the assignment that he/she did.
An example is as follows:
A. Percentage of contribution:
I. Name: Adam, ID: 210008, Contribution: 60%
II. Name: Ben, ID: 230933, Contribution: 40%
B. List of parts that each student did:
I. Adam: list the parts that Adam did
Task C.1 (tasks 1.1, 1.2,...)
Task C.3
II. Ben: list the parts that Ben did
Task C.2 (tasks 2.1, 2.2,...)
Task C.4
iv. A report that combines all the tasks (including explanations, code and
screenshots) from Task C.1, Task C.2, Task C.3, and Task C.4 (if applicable).
Note: It is expected that this PDF document is quite large due to incorporating the
above components. Please note that the report readability also contributes to your
assignment mark.
D.2. .json and .cql files from the following tasks:
Task C.1 and Task C.2
Note: All of the above files must be run-able in MongoDB and Cassandra format. You
should also clearly indicate each task using comments and follow appropriate naming
conventions e.g. as specified in the MongoDB documentations.
Penalties may apply for any inconsistencies between the screenshots in the PDF
report and the outputs of the script files.
10
FIT3176 Assignment2021-Semester2
D.3. [OPTIONAL] Driver script files (i.e. .py file) from the following tasks:
Task C.4 (MongoDB and Cassandra driver scripts and provide screenshots as required
in the report)
D.4. ZIP all the files from D.2 (and files from D.3 if attempted), and name the ZIP folder as
FIT3176_A1Codes.zip.
D.5. Upload both the PDF file and ZIP file to Moodle.
The submission of this assignment must be in the form of a single PDF file
(FIT3176_A1Report.pdf) AND a single ZIP file (FIT3176_A1Codes.zip). No other
forms will be accepted.
Note: Both Group Members are required to click on the submit button for the
submission to be completed.
11
FIT3176 Assignment2021-Semester2
Appendix
Dataset details
wildlife.json:
Field Description
speciesID unique id for each species
parkName name of the park where the species can be found
category category of the species
order order of the species
family family of the species
scientificName scientific name the species (genus, species, subspecies)
commonName one or more common names of the species it is commonly known by
recordStatus the park record status of the species
occurrence shape of the observed UFO sighting
nativeness species native or foreign to park
abundance presence and visibility of species in park
seasonality season and nature of presence in park
conservationStatus species classification according to US Fish & Wildlife Service
parks.csv:
Field Description
parkID unique id for each park
parkName name of the park
stateCode code of the state or states where the park is located
areaInAcres area or size of the park in acres
latitude latitude of the park centre
longitude longitude of the park centre
parkEstDay day of the month the park was established
parkEstMonth month of the year the park was established
parkEstYear year the park was established
References
National Park Service. (2017, January 20). Biodiversity in National Parks [Dataset]. Kaggle.
http://www.kaggle.com/nationalparkservice/park-biodiversity?select=species.csv
THE END
12
essay、essay代写