24T2-无代写
时间:2024-07-16
UNSW COMP9312_24T2
Subgraph Matching
COMP9312_24T2
UNSW COMP9312_24T2
Outline
- Basic Concepts
- Triangle Counting
- General Subgraph Matching
2
UNSW COMP9312_24T2
Graph Homomorphism
Two graphs G and H are homomorphic, if there exists a function f: VG->VH between
vertices of the graph such that if {a,b} is an edge in G then {f(a),f(b)} is an edge in H.
3
1
4
2
3
5 c
a
e
d
b
An ideal case…
G H
UNSW COMP9312_24T2
Graph Homomorphism (cont)
Some other cases …
4
c
a
e
d
b
1
3
2
c
a
ed
b f 1 4
3
5
f(a) = 1 f(b) = 2
f(c) = 1 f(d) = 2
f(e) = 3
Try to find the mapping function
All bipartite graphs can be mapped into an edge
UNSW COMP9312_24T2
Graph Isomorphism
Two graphs G and H are isomorphic, if there exists a bijection f: VG->VH between vertices
of the graph such that if {a,b} is an edge in G then {f(a),f(b)} is an edge in H.
5
1
4
2
3
5
c a e
db
f(a) = 4
f(b) = 5
f(c) = 1
f(d) = 3
f(e) = 2
f(a) = 4
f(b) = 5
f(c) = 3
f(d) = 1
f(e) = 2
or
G
H
There is a one-to-one vertex correspondence in Graph Isomorphism.
UNSW COMP9312_24T2
Subgraph Matching
Given a query graph Q and a data graph G, compute (count/enumerate) all subgraphs of G
that are isomorphic to Q.
All matching instances of f(1),f(2),f(3),f(4):
6
Query graph
Data graph
1
3
2
4
c
a
e
d
b
f
g
h
UNSW COMP9312_24T2
Subgraph Matching
Given a query graph Q and a data graph G, compute (count/enumerate) all subgraphs of G
that are isomorphic to Q.
All matching instances of f(1),f(2),f(3),f(4):
7
Query graph
Data graph
1
3
2
4
1
3
2
4
The query graph is
symmetric
c
a
e
d
b
f
g
h
UNSW COMP9312_24T2
Application
§ Bioinformatics of protein-protein interaction networks
§ Chemistry (similarity between chemical compounds)
§ Node representation
§ Malware detection
§ System analysis
§ …
8
UNSW COMP9312_24T2
Subgraph Matching
§ Exact counting/enumeration
§ Specific patterns (triangle, k-clique, butterfly, …)
§ General subgraphs
§ Approximate counting (estimation)
§ Probability theory
§ Graph Neural Networks
9
UNSW COMP9312_24T2
Triangle
Counting
UNSW COMP9312_24T2
Triangle Counting
Count all triangles in a graph.
11
c
a
e
d
b
f
g
h
Five triangles exist in this example.
UNSW COMP9312_24T2
Triangle Counting
Latapy, M. (2008). Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical computer
science, 407(1-3), 458-473.
Edge-iterator: Given an edge (u, v), any triangle that includes the edge
must contain a third vertex w that has connections to both of u and v.
Thus, we can obtain any triangles containing edge (u, v) based on the
intersection of N(u) and N(v). For each edge, the edge-iterator returns
the set of triangles associated with that edge, and when repeated on
all edges, the set of all triangle solutions is made available.
Duplication: If we use the above method to calculate the triangles,
then we will count a triangle repeatedly.
12
c
a b
UNSW COMP9312_24T2
Triangle Counting
Orientation technique (without duplication): Each undirected edge is
mapped to a directed edge where the direction (i.e., orientation) is decided
by the priority of its endpoints in the vertex-ordering (i.e., u->v if u has a
higher priority than v). We refer to vertex u as a pivot vertex if u has two
out-going edges. We can association a triangle in the undirected graph
with only one pivot vertex to ensure one and only one instance of this
triangle in the output, which significantly improves the performance.
Priority: The priority of vertices can be determined by the degree of each
vertex. The smaller the degree; the smaller the priority. If the degrees of
two vertices are the same, it can be determined by the alphabetical order
or numerical order of the vertices.
Latapy, M. (2008). Main-memory triangle computations for very large (sparse (power-law))
graphs. Theoretical computer science, 407(1-3), 458-473.
13
c
a b
UNSW COMP9312_24T2
Compact Forward (CF) Algorithm: We denote the set of outgoing-neighbors
of vertex u in G as ! , and the out-degree as ! = ! . In line 1,
undirected graph G is transformed into a directed graph G via the orientation
technique. (Line 2 onward follows the edge-iterator framework.) In Line 3,
triangles are enumerated by iterating through the outgoing neighborhoods
rather than the full neighborhood. In Line 4, a merge-based intersection
identifies the common out-going neighbors of u and v, denoted by T. A set
of triangles (u, v, w) is then output for every vertex w ∈ T.
Triangle Counting
c
a b
14
UNSW COMP9312_24T2
The adjacency list of each vertex is unsorted: The time complexity
of CF algorithm is: (∑(#,%)∈( ! ∗ ! ).
Proof: We need to check whether there are common neighbors in the
adjacency list of u and v. For each neighbor of u, we need to check ! times in the adjacency list of v, u has ! neighbors, so the time
complexity of finding common neighbor of two vertices (line 4 in the pseudo
code) is (! ∗ ! ). Thus, the total time complexity of CF
algorithm is: (∑(#,%)∈( ! ∗ ! ).
Adjacency list of u:
Adjacency list of v:
B A D I E
B F I
For the sake of brevity, here we
only give the out-neighbors of u
and v and the degree of vertices
other than u, v is higher than the
degree of u, v.
u v
A B D E F I
N+ (u)
N+ (v)
Triangle Counting: Naïve Join
15
UNSW COMP9312_24T2
Triangle Counting
If the adjacency list of each vertex is sorted, the merge-based
intersection operation at Line 4 takes (! + ! ). The
time complexity of CF algorithm is: (∑(#,%)∈( ! + ! ).
Ordered adjacency list of u:
Ordered adjacency list (with order) of v:
A B D E I
B F I
u v
A B D E F I
N+ (u)
N+ (v)
16
UNSW COMP9312_24T2
A hash table has been built for each vertex: The time complexity of CF algorithm is: (∑ #,% ∈(min(! ,! )) = (∑ #,% ∈(min( , )) = (α ⋅ ) = ().+).
Suppose a hash table has been built for each vertex based on the out-going
neighbors in the oriented graph. At Line 4 of Algorithm CF, we may choose the
vertex with larger number of neighbors as the hash table for intersection operation
with (min(! ,! )) look-up cost.
Triangle Counting
Graph arboricity
17
https://en.wikipedia.org/wiki/Arboricity
UNSW COMP9312_24T2
Choosing the vertex with larger number of neighbors as the hash table for intersection operation.
Adjacency list of u:
Adjacency list of v:
A B D E I
B F I
Here is an example of u’s hash table. We use division hashing
to make this hash table. The value of the vertex is replaced by
the corresponding number (A: 1, B: 2, D: 4, E: 5, I: 9) and the
hash function used is X % 5.
Because the query time complexity of hash table is O(1), we
only need to query ! times.
Final result: there are two common neighbors B and I.
0
1
2
3
4
5
6

A
B
D I
N+(u)
N+ (v)
E
Triangle Counting
18
UNSW COMP9312_24T2
Triangle Counting
A hash table is being built on the fly: The time complexity of CF algorithm is: ∑ #,% ∈(! !() = ∑ #,% ∈(! () = ∑ #,% ∈(min(deg , )
= (α ⋅ ) = ().+).
Coding practice~
E’: the set of all directed edges
• Building a hash table for neighbors of each vertex takes too much
space for big graphs.
• Utilize the degree order
• Scan the smaller set -> scan the out neighbor of v
19
Optional
UNSW COMP9312_24T2
Further Optimization?
[2020 DASFAA] AOT: Pushing the Efficiency Boundary of
Main-memory Triangle Listing
See details in: https://cgi.cse.unsw.edu.au/~cs9312/24T2/DASFAA_2020.pdf
20
Large degree
Small degree
Optimization in previous slide:
Record the neighbor of large degree
Scan the neighbor of small degree
Observation: are we scanning all neighbors?
Only scan out neighbors -> compare out-degree instead of degree
: #,% ∈(min(deg , ) ⇒ : #,% ∈(min(! , ! )
UNSW COMP9312_24T2
K-Clique Enumeration
A clique is a graph in which every pair
of vertices are connected.
A k-clique is a clique with k vertices.
Can you design an algorithm to
enumerate all k-cliques?
21
5-clique
4-clique
UNSW COMP9312_24T2
General Subgraph
Matching
Optional
UNSW COMP9312_24T2
Subgraph Matching
Avoid redundancy for symmetric query graphs by
enforcing f(1)All matching instances of f(1),f(2),f(3),f(4):




23
Query graph
Data graph
1
3
2
4
1
3
2
4
1 and 2 are equivalent in
this case
c
a
e
d
b
f
g
h
Optional
UNSW COMP9312_24T2
Subgraph Matching
1. Set up a matching order
2. Match following the order
3. Apply pruning rules
24
Query graph
Data graph
1
3
2
4
c
a
e
d
b
f
g
h
Optional
UNSW COMP9312_24T2
Subgraph Matching
Matching order <1,2,3,4>
25
Query graph
Data graph
1
3
2
4
Pruning rules:
§ degree
§ connectivity
. . .
Matching tree
1
3
2
4
ca edb f g
b c e c d
c b
d
a
e
c
a
e
d
b
f
g
h
Optional
UNSW COMP9312_24T2
Further Optimization
§ Matching plan
§ order plan (vertex-based)
§ join plan (vertex-based)
§ Efficient common neighbor computation
26
Query graph
Data graph
1
3
2
4
c
a
e
d
b
f
g
h
essay、essay代写