xuebaunion@vip.163.com
3551 Trousdale Rkwy, University Park, Los Angeles, CA
留学生论文指导和课程辅导
无忧GPA:https://www.essaygpa.com
工作时间:全年无休-早上8点到凌晨3点

微信客服:xiaoxionga100

微信客服:ITCS521
SOFT3410 Assignment 1 Due: 11:59pm Sunday, 10th October 2021 This assignment is worth 12% of your final assessment Task description In this assignment we will implement the basic PageRank algorithm in the C programming language using a variety of parallel programming techniques to ensure peak performance is achieved. The PageRank algorithm was developed in 1996 by Larry Page and Sergey Brin when they were graduate students at Stanford University. Google and other search engines compare words in search phrases to words in web pages and use ranking algorithms to determine the most relevant results. PageRank assigns a score to a set of web pages that indicates their importance. The underlying idea behind PageRank is to model a user who is clicking on web pages and following links from one page to another. In this framework, important pages are those which have incoming links from many other pages, or have incoming links from other pages with a high PageRank score, or both. The PageRank algorithm PageRank is an iterative algorithm that is repeated until a stopping criteria is met. The last iteration gives us the result of the search, which is a score per web page. A high score indicates a very relevant web page whereas a low score indicates a not so relevant web page for a search. Sorting the web pages by their scores in descending order gives us the order for the result list of a search query. For describing the PageRank algorithm we introduce the following symbols: • S is the set of all web pages that we are computing the PageRank scores for • N = |S| is the total number of web pages • P = [P t1, P t 2, ..., P t N ] is the vector of PageRank scores • d is a dampening factor for the probability that the user continues clicking on web pages • is the convergence threshold • IN(p) is the set of all pages in S which link to page p • OUT(p) is the set of all pages in S which page p links to 1 SOFT3410 For the PageRank vector, we use the notation P(t)p to represent the PageRank score for page p at iteration t. We initialise the scores for all pages to an initial value of 1 N so that the sum equals 1. P(0)u = 1 N (1) During each iteration of the algorithm, the value of P is updated as follows: P(t+1)u = 1− d N + d ∑ v∈IN(u) P (t) v |OUT(v)| (2) The algorithm continues to iterate until the convergence threshold is reached; that is, the algorithm terminates when PageRank scores stop varying between iterations. The PageRank scores have converged when the following condition is met: ||P(t+1) −P(t)|| ≤ (3) The vector norm is defined to be the standard Euclidean vector norm; that is, for some vector x = {x1, x2, . . . , xn}: ||x|| = √∑ i x2i (4) Page 2 of 8 SOFT3410 Example Our example has four web pages: S = {A,B,C,D}. In this example, d = 0.85 and = 0.005. The referencing structure of the web pages A, B, C, and D is given in the graph below. IN(A) = {B,D} IN(B) = {D} IN(C) = {B,D} IN(D) = ∅ OUT(A) = ∅ OUT(B) = {A,C} OUT(C) = ∅ OUT(D) = {A,B,C} Each node represents a web page. Edges in the graph indicate that the source of the edge is linking to the destination of the edge. Initialise P(0) = 1 N . Then perform the first iteration for each page. P (1) A = 1− 0.85 4 + 0.85 ( P (0) B |{A,C}| + P (0) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.25 2 + 0.25 3 ) ≈ 0.214 P (1) B = 1− 0.85 4 + 0.85 ( P (0) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.25 3 ) ≈ 0.108 P (1) C = 1− 0.85 4 + 0.85 ( P (0) B |{A,C}| + P (0) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.25 2 + 0.25 3 ) ≈ 0.214 P (1) D = 1− 0.85 4 + 0.85 (0) = 0.15 4 + 0 ≈ 0.038 The initialisation and first iteration result in the following values for P: t A B C D 0 0.250 0.250 0.250 0.250 1 0.214 0.108 0.214 0.038 Next, we check whether or not the algorithm has converged. ||P(1) −P(0)|| = ||{−0.036,−0.142,−0.036,−0.215}|| = √ 0.0677 ≈ 0.260 Page 3 of 8 SOFT3410 Since 0.260 6≤ 0.005 (), we perform another iteration. P (2) A = 1− 0.85 4 + 0.85 ( P (1) B |{A,C}| + P (1) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.108 2 + 0.038 3 ) ≈ 0.094 P (2) B = 1− 0.85 4 + 0.85 ( P (1) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.038 3 ) ≈ 0.048 P (2) C = 1− 0.85 4 + 0.85 ( P (1) B |{A,C}| + P (1) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.108 2 + 0.038 3 ) ≈ 0.094 P (2) D = 1− 0.85 4 + 0.85 (0) = 0.15 4 + 0 ≈ 0.038 Which leaves us with the following three iterations of P: t A B C D 0 0.250 0.250 0.250 0.250 1 0.214 0.108 0.214 0.038 2 0.094 0.048 0.094 0.038 Again, we check whether or not the algorithm has converged: ||P(2) −P(1)|| ≈ 0.180 6≤ Since convergence has not been reached, we perform another iteration. P (3) A = 1− 0.85 4 + 0.85 ( P (2) B |{A,C}| + P (2) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.048 2 + 0.038 3 ) ≈ 0.069 P (3) B = 1− 0.85 4 + 0.85 ( P (2) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.038 3 ) ≈ 0.048 P (3) C = 1− 0.85 4 + 0.85 ( P (2) B |{A,C}| + P (2) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.048 2 + 0.038 3 ) ≈ 0.069 P (3) D = 1− 0.85 4 + 0.85 (0) = 0.15 4 + 0 ≈ 0.038 Again, we check whether or not the algorithm has converged: ||P(3) −P(2)|| ≈ 0.040 6≤ Page 4 of 8 SOFT3410 Since convergence has not been reached, we perform another iteration. P (4) A = 1− 0.85 4 + 0.85 ( P (3) B |{A,C}| + P (3) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.048 2 + 0.038 3 ) ≈ 0.069 P (4) B = 1− 0.85 4 + 0.85 ( P (3) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.038 3 ) ≈ 0.048 P (4) C = 1− 0.85 4 + 0.85 ( P (3) B |{A,C}| + P (3) D |{A,B,C}| ) = 0.15 4 + 0.85 ( 0.048 2 + 0.038 3 ) ≈ 0.069 P (4) D = 1− 0.85 4 + 0.85 (0) = 0.15 4 + 0 ≈ 0.038 Again, we check whether or not the algorithm has converged: ||P(4) −P(3)|| = 0 ≤ We have now reached convergance, so the algorithm terminates and the values of P(4) are the final PageRank scores for each of the pages. If this were a query, the resulting ranks would be A,C,B,D where we arbitrarily rank page A before page C since they have the same score. Page 5 of 8 SOFT3410 Implementation details In the header file pagerank.h we have provided the function read_input that will process the input file for you. The read_input function will output error if the input is malformed in any way and will free any memory that has been previously allocated. We have also provided various structs in the header file that may be useful, including a struct for holding pages and a struct that is a linked list of pages. The function read_input returns the input data in these structs, with the following form: /* singly linked list to store all pages */ struct list { node* head; /* pointer to the head of the list */ node* tail; /* pointer to the tail of the list */ int length; /* length of the entire list */ }; /* struct to hold a linked list of pages */ struct node { page* page; /* pointer to page data structure */ node* next; /* poiner to next page in list */ }; /* struct to hold a page */ struct page { char name[21]; /* page name */ int index; /* index of the page */ int noutlinks; /* number of outlinks from this page */ list* inlinks; /* linked list of pages with inlinks to this page */ }; Warning: Do not add other files or modify the included header file or main function. We will replace these files with our own copy when marking your code. Page 6 of 8 SOFT3410 Program input and output