INFS4205/7205-无代写
时间:2024-04-20
INFS4205/7205 Advanced Techniques for High Dimensional Data Semester 1, 2024
1
INFS4205/7205 Practice 2 - Individual Project
Due: 16:00 AEST on 10 May 2024
Weighting: 25%
All assignments should be submitted to the UQ Blackboard. If any assignment fails to be submitted
appropriately before the due date, late penalties will be applied as detailed in the ECP. Email
submission will not be accepted.
Overview
The project consists of three sections (1) Implementation, (2) Report, (3) Presentation. In this
assignment, you are asked to implement a set of query scenarios in PostgreSQL, utilising spatial /
spatial-temporal data as well as computational geometry algorithms wherever suitable.
Firstly, you need to find spatial datasets that are suitable for this project from any sources you find
interesting. Then, according to the scenarios of the datasets, you need to design at least three
practical query tasks covered in this course, such as k-NN, range search, skyline, trajectory
analysis, shorted path, etc. Next, you need to implement these queries by writing SQL code in
PostgreSQL.
For each of query scenario, you need to implement at least three different indexing algorithms
(e.g., sequential scan, k-d tree, R tree, quadtree). You may use different (suitable) indexes for
different queries. To validate the correctness and efficiency of the implementation, for each of
query scenario, you need to include at least three tests with different inputs, outputs, and
conditions (i.e., WHERE).
Here is an example.
Query 1: Search the K-nearest restaurants of given a point location.
Indexing Methods: 1) sequential search, 2) quadtree and 3) k-d tree.
Test Cases:
1) K=5 and point location is (0,0),
2) K=1 and point location is (0,0),
3) K=5 and point location is (100,100).
INFS4205/7205 Advanced Techniques for High Dimensional Data Semester 1, 2024
2
For this query scenario, you execute the query nine times. For each of test case, you apply three
different indexing methods. And for each of indexing method, you apply three different test cases.
Presentation – Reflection. Once you have done the implementation, you need to prepare a 3-
minute video presentation reflecting on your work and present your problem statement,
methodology, outcomes, and analysis in the project report. You will need to present your findings
in a clear and concise manner, with a focus on the insights gained from the project.
Language requirements: You are required to write SQL code in psql or PostgreSQL for
implementing the project. If the indexing methods/algorithm implemented are not supported in
PostgreSQL/PostGIS, you are allowed to use any other programming languages (e.g., Python or
Java) to support this project. You are also allowed to use any existing libraries.
Dataset Selection
Any open-sourced dataset is allowed as long as it fits topic about spatial / spatial-temporal data
manipulation. We provide some example datasets for reference, including but not limited to:
Example Datasets Size Attributes Difficulty Marks Capped
Chipotle Locations 2,629 Coordinates Easy 17
Satellite Data 419,438 Coordinates Easy 17
Traffic Accident 2,845,342 Coordinates, Timestamps Moderate 21
FourSquare 38,333 Coordinates, Timestamps Moderate 21
Taxi Trajectory Data 1,703,650
Coordinate Sequences,
Timestamps Hard 25
Gowalla 6,442,890
Coordinates, Timestamps,
Relationships Hard 25
For any datasets you find that are not listed above, we will evaluate the difficulty based on the size
and attributes of that dataset. For ‘moderate’ datasets, the size is greater than 10, 000 and
attributes contain at least coordinates and timestamps. For ‘hard’ datasets, the datasets size is
greater than 100, 000 and attributes should be more complicated and informative.
Marks Capped (as shown in the last column): If you choose to work with the easy dataset, the
maximum marks you can obtain for this project is 17. This means that any marks beyond 17
will not be counted towards your final grade.
INFS4205/7205 Advanced Techniques for High Dimensional Data Semester 1, 2024
3
Implementation [10 marks]
1. Once you have determined the datasets, you need to conceptualize at least three query tasks
from the real world. Some example query tasks are listed below:
a. find all data points in a given rectangular area and within a certain time window.
b. find all data points within certain distance to a trajectory emerging on the same day.
c. find k nearest neighbours (data points) of a given trajectory for a given date.
d. find the skyline data points.
e. find the trajectory that is shortest and fastest from given data point to another.
f. find the trajectory that is most similar to a given trajectory.
2. As described in the Overview, for each of the query task, you need to implement at least
three indexing methods, and test the query with at least three cases for different inputs and
outputs.
3. You must upload your full source code, including a) SQL for database setup, data insertion,
index creation, query implementation, and b) any supporting code in other languages. No
marks will be given for this section incomplete code is submitted.
The marking criteria is summarized as follows:
Completeness [6 marks]: The selected high-dimensional database was adequately processed and
cleaned. At least three algorithms taught in this course should be implemented, or methods from
recent scientific research can be reproduced. At least three query tasks from real-world scenarios
need to be given to test your implementation. You need to include at least three testing cases
which cover a diverse range of inputs and make full use of the special attributes (e.g., sequence,
relationships) of datasets, reflecting the completeness of the methods. Each query is worth 2
marks and full marks will be awarded by successfully tackling challenging tasks (e.g., skyline,
spatiotemporal, trajectory similarity). Marks are not awarded for repetitive queries.
Correctness [4 marks]: To ensure the query is optimized through your implemented indexing
methods, you must employ ‘EXPLAIN (ANALYZE ON, BUFFERS ON)’ to review the query plan and
associated costs. You need to include the returned outputs of both the query and `EXPLAIN` as
comments in your .sql file, for each (query task, indexing method and test case). An example is
provided below (the template will be provided):
INFS4205/7205 Advanced Techniques for High Dimensional Data Semester 1, 2024
4
Report [10 marks]
The report should contain the following sections:
(1) Introduction [2 marks] to the task or problem being proposed and elucidate its practical
application value in industry or its potential contribution to scientific research. For example,
what are the application scenarios for the dataset of your choice? Why/How do we build efficient
index-based queries for these scenarios?
(2) Methodology [3 marks] to explicate the approach employed in a precise and explicit manner,
encompassing the overall algorithm, the technical intricacies of each step or module, as well as
any improvements or innovations you made. You can enrich your descriptions by drawing
detailed flowcharts and/or using rigorous mathematical formulas.
(3) Experimental Results & Analysis [4 marks] based on the query plan across various indexing
methods. This involves a comprehensive comparison of costs and execution strategies,
incorporating detailed analyses of query operations. Utilize tables, graphs, and map visualizations
to highlight efficiency differences among indexing techniques, pinpointing any scenario-specific
advantages. The results should be analysed deeply to unearth insightful findings.
Writing [1 mark]: The report should be written in excellent logical structure, physical layout,
scientific and technical style, with no spelling mistakes or grammar errors. You need to
appropriate reference to a correctly formatted bibliography. The report should be around four-
page long and written in given IEEE doc or latex template.
Presentation [5 marks]
You are required to make a 3-minute presentation to comprehensively reflect your work and
progress on this project.
Content [3 marks]: You must provide a deep and thorough reflection on your contributions to
this project. For example, how was your progress on this project? What course material did you
learn that helped you complete the program? What are the key challenges you've encountered
INFS4205/7205 Advanced Techniques for High Dimensional Data Semester 1, 2024
5
and struggled with? What skills, knowledge or other benefits did you gain from completing the
program? How can you make improvement if you do a similar project next time, maybe in your
work?
Presentation [2 mark]: The presenter must articulate clearly, logically organize content, share
insights, and ensure the audience easily understands. The video should include a webcam view of
the presenter's face. The resolution of the uploaded video should be in 1080P resolution and not
exceed 3 minutes.
INFS4205/7205 Advanced Techniques for High Dimensional Data Semester 1, 2024
6
Submission
You are required to submit all following files.
− A compressed file (.zip) consisting of all source code:
o a SQL file including the database construction, manipulation, task queries,
returned outputs of both query and query plan. A template is provided.
o supporting code in other programming languages and provide a .txt file as the
instructions of when/how/why we use them.
− A 3-minute video of your presentations.
− Project report in PDF format.
Only your submitted version will be marked. A penalty will be applied to the late submission
according to the ECP.
Use of AI Tool
Artificial Intelligence (AI) provides emerging tools that may support students in completing this
assessment task. Students may appropriately use AI in completing this assessment task. Students
must clearly reference any use of AI in each instance. A failure to reference AI use may
constitute student misconduct under the Student Code of Conduct. This task has been designed to
be challenging, authentic and complex. Whilst students may use AI technologies, successful
completion of assessment in this course will require students to critically engage in specific
contexts and tasks for which artificial intelligence will provide only limited support and guidance.
A failure to reference AI use may constitute student misconduct under the Student Code of
Conduct. To pass this assessment, students are required to demonstrate detailed comprehension
of their written submission independent of AI tools.
When you use generative AI (ChatGPT) in this assessment, you should:
− Do not provide any private information when using these tools.
− Verify any information provided by generative AI tools with credible sources and check
for missing information.
− Acknowledge any generative tools that you use for your assignments or work and how
you used them. For example, include the name, model or version, date used and how you
used it in your assignment or work.
Useful Tools
− Visualization spatial (-temporal) data over google maps: [link]
− Import CSV file into PostgreSQL table: [link]
essay、essay代写