COMP9024-COMP9024代写
时间:2023-04-19
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 1/7
COMP9024 23T1 Assignment
MyPageRank
Data Structures and Algorithms
Change Log
We may make minor changes to the spec to address/clarify some outstanding issues. These may require minimal changes in your design/code, if at all. Students are strongly encouraged to check the online forum discussion and the
change log regularly.
Version 0.1
(2023-03-23 17:00)
Initial release.
Version 0.2
(2023-04-05 02:30)
graph.c was missing code stub for graph_edges_count(). Starter code has been updated but the missing code can also be found on this forum post.
Version 0.3
(2023-04-08 12:30)
Revised Makefile to include test for memory leaks. Starter code has been updated but the file can also be downloaded from this forum post.
Assignment Files
Download this zip file.
Unzipping the file will create a directory called 'assn' with all the assignment start-up files.
Alternatively, you can achieve the same thing from a terminal with commands such as:
prompt$ curl https://www.cse.unsw.edu.au/~cs9024/23T1/assn/assn.zip -o assn.zip
prompt$ unzip assn.zip -d assn
The first command will download assn.zip into the current working directory, then the second command will extract it into a sub-directory assn.
You can also make note of the following URLs:
http://www.cse.unsw.edu.au/~cs9024/micro-web
http://www.cse.unsw.edu.au/~cs9024/mini-web
Here is a visual representation of the micro-web:
Once you read the spec, hopefully it will be clear to you what you could use these URLs for! You may also find it useful to construct a similar graph visualisation for the mini-web.
Background
As we have mentioned in lectures, the Internet can be thought of as a graph (a very large graph). Web pages represent vertices and hyperlinks represent directed edges.
With over 1.2 billion unique websites (as of June 2021) and each website having multiple webpages and each webpage having multiple hyperlinks it can understandably be a very difficult job to remember the URL of every website you
want to visit.
In order to make life easier, from the very early days of the internet, there have been search engines that can be used to find websites.
But the job of a search engine is very difficult: First it must index (create a representation of) the entire (or as close to it as possible) internet. Next it must rank the webpages it finds.
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 2/7
In this assignment we will be implementing algorithms to solve each of these problems.
1. To index the internet we will be creating a web crawler.
2. To rank webpages we will implement the PageRank algorithm.
3. To find the worst path between two pages we will implement a modified Dijkstra's algorithm
The Assignment
Overall File Structure
Below is a reference for each file and their purpose.
Note: You cannot modify ANY of the header (.h) files.
Provided File Description Implemented In
dijkstra.h Function prototypes for calculating and outputting paths, extends graph.h. graph.c
graph.h Graph ADT graph.c
pagerank.h Function prototypes for calculating and outputting pageranks, extends graph.h. graph.c
pqueue.h Priority Queue ADT pqueue.c
queue.h Queue ADT queue.c
set.h Set ADT set.c
stack.h Stack ADT stack.c
Subset 0 - Dependencies
Before we can start crawling we need to be able to store our crawled data. As the internet is a Graph, this means we need a Graph ADT. We will also need a Set ADT and one of a Queue ADT or a Stack ADT in order to perform web
scraping (for a BFS or DFS). Later, when Dijkstra's algorithm is needed, we will naturally need a Priority Queue ADT.
Subset 0a - Implement Queue, Stack, Set, Priority Queue ADTs
Task
You have been provided with multiple header files:
queue.h
stack.h
set.h
pqueue.h
Examine each file carefully, they provide interfaces for their corresponding ADT structures.
You must implement the functions prototyped in each header file within its corresponding .c file.
You must store string (char *) data within each ADT.
You must allocate memory dynamically.
You can add utility functions to the .c files.
You can use the string.h library and other standard libraries from the weekly exercises.
You can reuse code previously submitted for weekly assessments and provided in the lectures.
You cannot modify the .h files.
You cannot modify the function prototypes declared in the .h files.
You should write programs that use each ADT to test and debug your code.
As a reminder:
Queue - First In, First Out.
Stack - Last In, First Out.
Set - Only stores unique values.
Priority Queue - Like a Queue, but dequeuing always returns the highest priority element based on some condition. This condition is explained in pqueue.h.
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 3/7
Hint: You should implement a single ADT and test it thoroughly first. Once you're confident that it's correct, you'll find much of your code can be copied and reused for the other ADTs, with slight tweaks.
Testing
We have created scripts to automatically test each of your ADTs. They expect to find the relevant .c file for testing of a specific ADT in the current working directory.
This means that if you want to test your Set data structure, you will need to have set.c in your current working directory, etc.
Limited test cases are provided, so you should always do your own, more thorough, testing.
prompt$ 9024 dryrun assn_(queue|stack|set|pqueue)
Please note that you will need to specifiy which data structure that you want to test with the above command. As an example, the following command tests your stack implementation:
prompt$ 9024 dryrun assn_stack
Subset 0b - Implement Graph ADT
Task
You have been provided with a file graph.h.
Examine the file carefully. It provides the interface to an ADT that will provide Graph functionality.
You must implement the functions prototyped in graph.h within graph.c.
Note that graph.h indicates that each edge has a weight.
You must use an adjacency list representation, but the exact implementation is up to you.
You must store string (char *) data in the vertices.
You must allocate memory dynamically.
You can add utility functions to graph.c.
You can use the string.h library and other standard libraries from the weekly exercises.
You can reuse code previously submitted for weekly assessments and provided in the lectures.
You cannot modify the file graph.h.
You cannot modify the function prototypes declared in graph.h.
You should write programs that use your ADT to test and debug your code.
Subset 1 - Web Crawler
Task
We are now going to use some of the ADTs you have created to implement a web crawler.
Once you have completed the appropriate ADT implementations in subset 0, carefully examine the code in crawler.c.
You are not required to modify crawler.c but you will benefit from understanding its use and operation.
crawler.c requires external dependencies (libcurl and libxml2).
The provided Makefile will work on CSE servers (ssh and vlab) but might not work on your home computer.
After compiling with make crawler (using the Makefile provided), run the executable and check that the output aligns with the navigation of the sample website.
If you have correctly implemented the ADTs from the previous tasks, this part should be mostly free.
Sample Output
Note the edge weights are printed to 3 decimal places.
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 4/7
prompt$ ./crawler http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html crawl
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html
http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html
http://www.cse.unsw.edu.au/~cs9024/micro-web/Y.html
http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html 1.000
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html http://www.cse.unsw.edu.au/~cs9024/micro-web/Y.html 1.000
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html 1.000
http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html 1.000
http://www.cse.unsw.edu.au/~cs9024/micro-web/Y.html http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html 1.000
Testing
We have created a script to automatically test your ADTs. It expects to find queue.c, stack.c, set.c, pqueue.c and graph.c in the current working directory. However, stack.c and pqueue.c do not need to be implemented to run
these tests, unless you have used them within your Graph implementation.
Limited test cases are provided, so you should always do your own, more thorough, testing.
prompt$ 9024 dryrun assn_crawler
Subset 2 - PageRank
Background
Now that we can crawl a network and build a graph, we need a way to determine what pages in our network (vertices) are important.
We haven't kept page content so the only metric we can use to determine importance of a page is to check how much other pages rely on its existence. That is: how easy is it to follow a sequence of one or more links (edges) and end up
on the page.
In 1998 Larry Page and Sergey Brin created the PageRank algorithm to evaluate this metric.
Google still uses the PageRank algorithm to score every page it indexes on the internet to help order its search results.
Task
In graph.c implement the two new functions graph_pagerank() and graph_show_pagerank() that are prototyped in pagerank.h.
First, graph_pagerank() should calculate the PageRank of every page (vertex) in your graph. The algorithm should store the calculated PageRank within the ADT.
Second, graph_show_pagerank() should print the PageRank of every page (vertex) in your graph.
Pages (vertices) should be printed from highest to lowest rank, based on their rank rounded to 3 decimal places. You should use the round() function from math.h. If two pages have the same rounded rank then they should be printed
alphabetically.
You can add utility functions to graph.c.
You can (and most likely will need to) modify your struct definitions in graph.c.
You cannot modify the file graph.h.
You cannot modify the file pagerank.h.
You cannot modify the function prototypes for graph_pagerank() and graph_show_pagerank().
Algorithm
For :
for :
t = 0
PR(p
i
; t) =
1
N
t > 0
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 5/7
Where:
is the number of vertices
and are each some vertex
is the "time" (iteration count)
is the PageRank of vertex at some time
is the damping_factor
is the set of vertices that have an outbound edge towards
is the PageRank of vertex at some time
is the degree of vertex , ie. the number of outbound edges of vertex
is the set of sinks, ie. the set of vertices with no outbound edges, ie. where is 0
This formula is equivalent to the following algorithm:
procedure graph_pagerank(G, damping_factor, epsilon)
N = number of vertices in G
for all V in vertices of G
oldrank of V = 0
pagerank of V = 1 / N
end for
while |pagerank of V - oldrank of V| of any V in vertices of G > epsilon
for all V in vertices of G
oldrank of V = pagerank of V
end for
sink_rank = 0
for all V in vertices of G that have no outbound edges
sink_rank = sink_rank + (damping_factor * (oldrank of V / N))
end for
for all V in vertices of G
pagerank of V = sink_rank + ((1 - damping_factor) / N)
for all I in vertices of G that have an edge from I to V
pagerank of V = pagerank of V + ((damping_factor * oldrank of I) / number of outbound edges from I)
end for
end for
end while
end procedure
Sample Output
Here we're using a modified crawler.c that calculates graph_pagerank() and prints graph_show_pagerank(). With damping set to 0.85 and an epsilon/delta to 0.00001, for the micro-web, the output should be:
prompt$ ./crawler http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html pagerank 0.85 0.00001
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html (0.412)
http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html (0.196)
http://www.cse.unsw.edu.au/~cs9024/micro-web/Y.html (0.196)
http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html (0.196)
Testing
We have created a script to automatically test your PageRank algorithm. It expects to find queue.c, stack.c, set.c, pqueue.c and graph.c in the current working directory. However, stack.c and pqueue.c do not need to be
implemented to run these tests, unless you have used them within your Graph implementation.
Limited test cases are provided, so you should always do your own, more thorough, testing.
PR(p
i
; t) =
1 − d
N
+ d× (( ∑
p
j
∈M(p
i
)
PR(p
j
; t− 1)
D(p
j
)
) + (∑
p
j
∈S
PR(p
j
; t− 1)
N
))
N
p
i
p
j
t
PR(p
i
; t) p
i
t
d
M(p
i
) M(p
i
)
PR(p
j
; t− 1) p
j
t− 1
D(p
j
) p
j
p
j
S D(p
j
)
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 6/7
prompt$ 9024 dryrun assn_rankings
Subset 3 - Worst Path by PageRanks (Shortest Path)
Task
In graph.c implement the two new functions graph_worst_path() and graph_show_path() that are prototyped in dijkstra.h.
graph_worst_path() should calculate the worst path between a source vertex and all other vertices.
The worst possible path between two vertices is defined as traversing from the source vertex to the destination vertex such that the sum of all PageRanks of vertices along the path is minimal.
In this situation, PageRanks are specific to a node. However, to solve this problem, PageRanks of a certain node can be extended to the weight of incoming edges to that node. Naturally, graph_worst_path() should use Dijkstra's
algorithm to solve the problem as a result.
graph_show_path() should print the path from the previously given source vertex to a given destination vertex.
You can add more utility functions to graph.c.
You can (and most likely will need to) extend your struct definitions in graph.c.
You cannot modify the file dijkstra.h.
You cannot modify the file graph.h.
You cannot modify the file pagerank.h.
You cannot modify the function prototypes for graph_worst_path() and graph_show_path().
Sample Output
prompt$ ./crawler http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html worstpath http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html http://www.cse.unsw.edu.au/~cs9024/m
http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html
-> http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html
-> http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html
prompt$ ./crawler http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html worstpath http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html http://www.cse.unsw.edu.au/~cs9024/m
No path found.
Testing
We have created a script to automatically test your worst path algorithm. It expects to find queue.c, stack.c, set.c, pqueue.c and graph.c in the current working directory. Limited test cases are provided, so you should always do
your own, more thorough, testing.
prompt$ 9024 dryrun assn_path
Assessment
Submission
Submit your implementation (.c) files using the following give command:
prompt$ give cs9024 assn queue.c stack.c set.c pqueue.c graph.c
Make sure you spell the filenames correctly. You can run give multiple times. Only your last submission will be marked.
You can check your latest submission on CSE servers with:
prompt$ 9024 classrun check assn
If you are working at home, you may find it more convenient to upload your work via the "Make Submission" tab at the top of this assignment page on WebCMS, or via give's web interface.
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 7/7
Due Date
This assignment is due Week 10, Tuesday, 18 April 23:59:59 (i.e. before midnight).
The UNSW standard late penalty for assessment is 5% per day for 5 days - this is implemented hourly for this assignment.
Each hour your assignment is submitted late reduces its mark by 0.2%.
For example, if an assignment worth 60% was submitted 10 hours late, it would be awarded 58.8%.
Beware - submissions more than 5 days late will receive zero marks. This again is the UNSW standard assessment policy.
Assessment Scheme
Total 12 marks allocated for this assignment. 10 marks for correctness from automated testing, 2 marks for code quality from manual inspection. The specific breakdown is:
Description Marks
Queue ADT 1
Stack ADT 1
Set ADT 1
Priority Queue ADT 1
Graph ADT 2
PageRank 2
Dijkstra 2
Code Quality 2
TOTAL 12
Code Quality
Code quality will be judged based on the following criteria:
Readability - Code is easy to understand.
Documentation - Code is documented in harder to understand places.
Memory Management - Code should not leak memory.
Important:
Any submission that does not allow us to follow the above mentioned marking procedure "normally" (e.g., missing files, compile errors) will result in your submission not getting marked in time. Depending on the severity of the
errors/problems, we may ask you to resubmit (with max late penalty) or assess your written code instead (e.g., for some "effort" marks only).
Ensure your submitted code compiles on a CSE machine using dcc with the standard options -Wall -Werror.
Plagiarism
Group submissions will not be allowed. Your program must be entirely your own work. Plagiarism detection software will be used to compare all submissions pairwise (including submissions for similar assignments in previous years, if
applicable) and serious penalties will be applied, particularly in the case of repeat offences.
Do not copy ideas or code from others.
Do not use a publicly accessible repository or allow anyone to see your code.
Please refer to the on-line resources to help you understand what plagiarism is and how it is dealt with at UNSW:
Academic Integrity and Plagiarism
UNSW Plagiarism Policy Statement
UNSW Plagiarism Management Procedure
Copyright
Reproducing, publishing, posting, distributing or translating this assignment is an infringement of copyright and will be referred to UNSW Student Conduct and Integrity for action.



























































2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 1/7
COMP9024 23T1 Assignment
MyPageRank
Data Structures and Algorithms
Change Log
We may make minor changes to the spec to address/clarify some outstanding issues. These may require minimal changes in your design/code, if at all. Students are strongly encouraged to check the online forum discussion and the
change log regularly.
Version 0.1
(2023-03-23 17:00)
Initial release.
Version 0.2
(2023-04-05 02:30)
graph.c was missing code stub for graph_edges_count(). Starter code has been updated but the missing code can also be found on this forum post.
Version 0.3
(2023-04-08 12:30)
Revised Makefile to include test for memory leaks. Starter code has been updated but the file can also be downloaded from this forum post.
Assignment Files
Download this zip file.
Unzipping the file will create a directory called 'assn' with all the assignment start-up files.
Alternatively, you can achieve the same thing from a terminal with commands such as:
prompt$ curl https://www.cse.unsw.edu.au/~cs9024/23T1/assn/assn.zip -o assn.zip
prompt$ unzip assn.zip -d assn
The first command will download assn.zip into the current working directory, then the second command will extract it into a sub-directory assn.
You can also make note of the following URLs:
http://www.cse.unsw.edu.au/~cs9024/micro-web
http://www.cse.unsw.edu.au/~cs9024/mini-web
Here is a visual representation of the micro-web:
Once you read the spec, hopefully it will be clear to you what you could use these URLs for! You may also find it useful to construct a similar graph visualisation for the mini-web.
Background
As we have mentioned in lectures, the Internet can be thought of as a graph (a very large graph). Web pages represent vertices and hyperlinks represent directed edges.
With over 1.2 billion unique websites (as of June 2021) and each website having multiple webpages and each webpage having multiple hyperlinks it can understandably be a very difficult job to remember the URL of every website you
want to visit.
In order to make life easier, from the very early days of the internet, there have been search engines that can be used to find websites.
But the job of a search engine is very difficult: First it must index (create a representation of) the entire (or as close to it as possible) internet. Next it must rank the webpages it finds.
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 2/7
In this assignment we will be implementing algorithms to solve each of these problems.
1. To index the internet we will be creating a web crawler.
2. To rank webpages we will implement the PageRank algorithm.
3. To find the worst path between two pages we will implement a modified Dijkstra's algorithm
The Assignment
Overall File Structure
Below is a reference for each file and their purpose.
Note: You cannot modify ANY of the header (.h) files.
Provided File Description Implemented In
dijkstra.h Function prototypes for calculating and outputting paths, extends graph.h. graph.c
graph.h Graph ADT graph.c
pagerank.h Function prototypes for calculating and outputting pageranks, extends graph.h. graph.c
pqueue.h Priority Queue ADT pqueue.c
queue.h Queue ADT queue.c
set.h Set ADT set.c
stack.h Stack ADT stack.c
Subset 0 - Dependencies
Before we can start crawling we need to be able to store our crawled data. As the internet is a Graph, this means we need a Graph ADT. We will also need a Set ADT and one of a Queue ADT or a Stack ADT in order to perform web
scraping (for a BFS or DFS). Later, when Dijkstra's algorithm is needed, we will naturally need a Priority Queue ADT.
Subset 0a - Implement Queue, Stack, Set, Priority Queue ADTs
Task
You have been provided with multiple header files:
queue.h
stack.h
set.h
pqueue.h
Examine each file carefully, they provide interfaces for their corresponding ADT structures.
You must implement the functions prototyped in each header file within its corresponding .c file.
You must store string (char *) data within each ADT.
You must allocate memory dynamically.
You can add utility functions to the .c files.
You can use the string.h library and other standard libraries from the weekly exercises.
You can reuse code previously submitted for weekly assessments and provided in the lectures.
You cannot modify the .h files.
You cannot modify the function prototypes declared in the .h files.
You should write programs that use each ADT to test and debug your code.
As a reminder:
Queue - First In, First Out.
Stack - Last In, First Out.
Set - Only stores unique values.
Priority Queue - Like a Queue, but dequeuing always returns the highest priority element based on some condition. This condition is explained in pqueue.h.
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 3/7
Hint: You should implement a single ADT and test it thoroughly first. Once you're confident that it's correct, you'll find much of your code can be copied and reused for the other ADTs, with slight tweaks.
Testing
We have created scripts to automatically test each of your ADTs. They expect to find the relevant .c file for testing of a specific ADT in the current working directory.
This means that if you want to test your Set data structure, you will need to have set.c in your current working directory, etc.
Limited test cases are provided, so you should always do your own, more thorough, testing.
prompt$ 9024 dryrun assn_(queue|stack|set|pqueue)
Please note that you will need to specifiy which data structure that you want to test with the above command. As an example, the following command tests your stack implementation:
prompt$ 9024 dryrun assn_stack
Subset 0b - Implement Graph ADT
Task
You have been provided with a file graph.h.
Examine the file carefully. It provides the interface to an ADT that will provide Graph functionality.
You must implement the functions prototyped in graph.h within graph.c.
Note that graph.h indicates that each edge has a weight.
You must use an adjacency list representation, but the exact implementation is up to you.
You must store string (char *) data in the vertices.
You must allocate memory dynamically.
You can add utility functions to graph.c.
You can use the string.h library and other standard libraries from the weekly exercises.
You can reuse code previously submitted for weekly assessments and provided in the lectures.
You cannot modify the file graph.h.
You cannot modify the function prototypes declared in graph.h.
You should write programs that use your ADT to test and debug your code.
Subset 1 - Web Crawler
Task
We are now going to use some of the ADTs you have created to implement a web crawler.
Once you have completed the appropriate ADT implementations in subset 0, carefully examine the code in crawler.c.
You are not required to modify crawler.c but you will benefit from understanding its use and operation.
crawler.c requires external dependencies (libcurl and libxml2).
The provided Makefile will work on CSE servers (ssh and vlab) but might not work on your home computer.
After compiling with make crawler (using the Makefile provided), run the executable and check that the output aligns with the navigation of the sample website.
If you have correctly implemented the ADTs from the previous tasks, this part should be mostly free.
Sample Output
Note the edge weights are printed to 3 decimal places.
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 4/7
prompt$ ./crawler http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html crawl
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html
http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html
http://www.cse.unsw.edu.au/~cs9024/micro-web/Y.html
http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html 1.000
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html http://www.cse.unsw.edu.au/~cs9024/micro-web/Y.html 1.000
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html 1.000
http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html 1.000
http://www.cse.unsw.edu.au/~cs9024/micro-web/Y.html http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html 1.000
Testing
We have created a script to automatically test your ADTs. It expects to find queue.c, stack.c, set.c, pqueue.c and graph.c in the current working directory. However, stack.c and pqueue.c do not need to be implemented to run
these tests, unless you have used them within your Graph implementation.
Limited test cases are provided, so you should always do your own, more thorough, testing.
prompt$ 9024 dryrun assn_crawler
Subset 2 - PageRank
Background
Now that we can crawl a network and build a graph, we need a way to determine what pages in our network (vertices) are important.
We haven't kept page content so the only metric we can use to determine importance of a page is to check how much other pages rely on its existence. That is: how easy is it to follow a sequence of one or more links (edges) and end up
on the page.
In 1998 Larry Page and Sergey Brin created the PageRank algorithm to evaluate this metric.
Google still uses the PageRank algorithm to score every page it indexes on the internet to help order its search results.
Task
In graph.c implement the two new functions graph_pagerank() and graph_show_pagerank() that are prototyped in pagerank.h.
First, graph_pagerank() should calculate the PageRank of every page (vertex) in your graph. The algorithm should store the calculated PageRank within the ADT.
Second, graph_show_pagerank() should print the PageRank of every page (vertex) in your graph.
Pages (vertices) should be printed from highest to lowest rank, based on their rank rounded to 3 decimal places. You should use the round() function from math.h. If two pages have the same rounded rank then they should be printed
alphabetically.
You can add utility functions to graph.c.
You can (and most likely will need to) modify your struct definitions in graph.c.
You cannot modify the file graph.h.
You cannot modify the file pagerank.h.
You cannot modify the function prototypes for graph_pagerank() and graph_show_pagerank().
Algorithm
For :
for :
t = 0
PR(p
i
; t) =
1
N
t > 0
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 5/7
Where:
is the number of vertices
and are each some vertex
is the "time" (iteration count)
is the PageRank of vertex at some time
is the damping_factor
is the set of vertices that have an outbound edge towards
is the PageRank of vertex at some time
is the degree of vertex , ie. the number of outbound edges of vertex
is the set of sinks, ie. the set of vertices with no outbound edges, ie. where is 0
This formula is equivalent to the following algorithm:
procedure graph_pagerank(G, damping_factor, epsilon)
N = number of vertices in G
for all V in vertices of G
oldrank of V = 0
pagerank of V = 1 / N
end for
while |pagerank of V - oldrank of V| of any V in vertices of G > epsilon
for all V in vertices of G
oldrank of V = pagerank of V
end for
sink_rank = 0
for all V in vertices of G that have no outbound edges
sink_rank = sink_rank + (damping_factor * (oldrank of V / N))
end for
for all V in vertices of G
pagerank of V = sink_rank + ((1 - damping_factor) / N)
for all I in vertices of G that have an edge from I to V
pagerank of V = pagerank of V + ((damping_factor * oldrank of I) / number of outbound edges from I)
end for
end for
end while
end procedure
Sample Output
Here we're using a modified crawler.c that calculates graph_pagerank() and prints graph_show_pagerank(). With damping set to 0.85 and an epsilon/delta to 0.00001, for the micro-web, the output should be:
prompt$ ./crawler http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html pagerank 0.85 0.00001
http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html (0.412)
http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html (0.196)
http://www.cse.unsw.edu.au/~cs9024/micro-web/Y.html (0.196)
http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html (0.196)
Testing
We have created a script to automatically test your PageRank algorithm. It expects to find queue.c, stack.c, set.c, pqueue.c and graph.c in the current working directory. However, stack.c and pqueue.c do not need to be
implemented to run these tests, unless you have used them within your Graph implementation.
Limited test cases are provided, so you should always do your own, more thorough, testing.
PR(p
i
; t) =
1 − d
N
+ d× (( ∑
p
j
∈M(p
i
)
PR(p
j
; t− 1)
D(p
j
)
) + (∑
p
j
∈S
PR(p
j
; t− 1)
N
))
N
p
i
p
j
t
PR(p
i
; t) p
i
t
d
M(p
i
) M(p
i
)
PR(p
j
; t− 1) p
j
t− 1
D(p
j
) p
j
p
j
S D(p
j
)
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 6/7
prompt$ 9024 dryrun assn_rankings
Subset 3 - Worst Path by PageRanks (Shortest Path)
Task
In graph.c implement the two new functions graph_worst_path() and graph_show_path() that are prototyped in dijkstra.h.
graph_worst_path() should calculate the worst path between a source vertex and all other vertices.
The worst possible path between two vertices is defined as traversing from the source vertex to the destination vertex such that the sum of all PageRanks of vertices along the path is minimal.
In this situation, PageRanks are specific to a node. However, to solve this problem, PageRanks of a certain node can be extended to the weight of incoming edges to that node. Naturally, graph_worst_path() should use Dijkstra's
algorithm to solve the problem as a result.
graph_show_path() should print the path from the previously given source vertex to a given destination vertex.
You can add more utility functions to graph.c.
You can (and most likely will need to) extend your struct definitions in graph.c.
You cannot modify the file dijkstra.h.
You cannot modify the file graph.h.
You cannot modify the file pagerank.h.
You cannot modify the function prototypes for graph_worst_path() and graph_show_path().
Sample Output
prompt$ ./crawler http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html worstpath http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html http://www.cse.unsw.edu.au/~cs9024/m
http://www.cse.unsw.edu.au/~cs9024/micro-web/X.html
-> http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html
-> http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html
prompt$ ./crawler http://www.cse.unsw.edu.au/~cs9024/micro-web/index.html worstpath http://www.cse.unsw.edu.au/~cs9024/micro-web/Z.html http://www.cse.unsw.edu.au/~cs9024/m
No path found.
Testing
We have created a script to automatically test your worst path algorithm. It expects to find queue.c, stack.c, set.c, pqueue.c and graph.c in the current working directory. Limited test cases are provided, so you should always do
your own, more thorough, testing.
prompt$ 9024 dryrun assn_path
Assessment
Submission
Submit your implementation (.c) files using the following give command:
prompt$ give cs9024 assn queue.c stack.c set.c pqueue.c graph.c
Make sure you spell the filenames correctly. You can run give multiple times. Only your last submission will be marked.
You can check your latest submission on CSE servers with:
prompt$ 9024 classrun check assn
If you are working at home, you may find it more convenient to upload your work via the "Make Submission" tab at the top of this assignment page on WebCMS, or via give's web interface.
2023/4/11 19:12 COMP9024 23T1 - Assignment
https://cgi.cse.unsw.edu.au/~cs9024/23T1/assn/index.php 7/7
Due Date
This assignment is due Week 10, Tuesday, 18 April 23:59:59 (i.e. before midnight).
The UNSW standard late penalty for assessment is 5% per day for 5 days - this is implemented hourly for this assignment.
Each hour your assignment is submitted late reduces its mark by 0.2%.
For example, if an assignment worth 60% was submitted 10 hours late, it would be awarded 58.8%.
Beware - submissions more than 5 days late will receive zero marks. This again is the UNSW standard assessment policy.
Assessment Scheme
Total 12 marks allocated for this assignment. 10 marks for correctness from automated testing, 2 marks for code quality from manual inspection. The specific breakdown is:
Description Marks
Queue ADT 1
Stack ADT 1
Set ADT 1
Priority Queue ADT 1
Graph ADT 2
PageRank 2
Dijkstra 2
Code Quality 2
TOTAL 12
Code Quality
Code quality will be judged based on the following criteria:
Readability - Code is easy to understand.
Documentation - Code is documented in harder to understand places.
Memory Management - Code should not leak memory.
Important:
Any submission that does not allow us to follow the above mentioned marking procedure "normally" (e.g., missing files, compile errors) will result in your submission not getting marked in time. Depending on the severity of the
errors/problems, we may ask you to resubmit (with max late penalty) or assess your written code instead (e.g., for some "effort" marks only).
Ensure your submitted code compiles on a CSE machine using dcc with the standard options -Wall -Werror.
Plagiarism
Group submissions will not be allowed. Your program must be entirely your own work. Plagiarism detection software will be used to compare all submissions pairwise (including submissions for similar assignments in previous years, if
applicable) and serious penalties will be applied, particularly in the case of repeat offences.
Do not copy ideas or code from others.
Do not use a publicly accessible repository or allow anyone to see your code.
Please refer to the on-line resources to help you understand what plagiarism is and how it is dealt with at UNSW:
Academic Integrity and Plagiarism
UNSW Plagiarism Policy Statement
UNSW Plagiarism Management Procedure
Copyright
Reproducing, publishing, posting, distributing or translating this assignment is an infringement of copyright and will be referred to UNSW Student Conduct and Integrity for action.
essay、essay代写