Finding all n-sized subgraphs in a given graph
A few weeks ago a co-worker approached with me with an interesting problem he was dealing with. How could one find all n-sized subtrees in any given undirected graph?
We were driving at the time, and so we talked and hand-waved our way through how we thought we could implement an algorithm that finds all the subtrees of size n.
Originally, this problem was solved in MATLAB, but since MATLAB is not free and not commonly used outside of engineering companies, I reimplemented a solution in Python and then a solution in C++ using the Boost Graph Library (coming soon… eventually…).
A Motivating Demonstration
Say we want to find all the subtrees with only two vertices in the graph shown here. How would we go about doing it by hand? I would start by choosing a random vertex, and then I would make a list of all the neighboring vertices. When combined with the original vertex as a pair, I now have all subgraphs of size two that include our initial vertex.
1
2
3
4
5
6
7
Initial vertex = 4
Neighbors of V[4] = 7, 0
Subgraphs of size 2 = [
[4, 7],
[4, 0],
]
After I make a list of pairs for the chosen vetex, I would move on to another vertex and make a list of the pairs following the same steps, then I would concatenate that list onto some global list of all of the subgraphs of size 2. After doing this process for each vertex, there will be many duplicate subgraphs in the global list, so I’ll need to filter out duplicates, and then I will finally have the list of all subgraphs of size 2 of the given graph.
This process is not too hard to do by hand, but it is time consuming, even for this To do this process repeatedly on larger graphs we need to automate it in some way.
Automating the Process
Please note that code in this section does not actually run due to the way python lists work, but is used to show my thought process in discovering the initial algorithm
When I don’t know exactly how an algorithm or a process is going to look before I write it out, I like to start with what I do know. For this problem, I know that I’ll probably need to start by iterating over each of the vertices in the original graph, as that is essentially how I started when doing this problem by hand. Each vertex in the original graph may be part of a n-sized subgraph, so we need to consider each one. That means that from the get-go, we are looking at a minimum of O(V) complexity, where V is the number of vertices in the starting graph.
1
2
3
4
g = Graph()
for vertex in g.vertices:
...
Next if we consider our hand algorithm, we can see that from each vertex we need to get a list of all the neighbors of the vertex. We do this to consider what the “next vertex” in our subgraph may be.
1
2
3
4
5
6
g = Graph()
for vertex in g.vertices:
next_vertices = g.neighbors(vertex)
...
For our previous example of finding all subgraphs of size two, this is essentially all we need to do! We just need to find the neighbors of each nodes in the graph, enumerate all the unique pairs, and call it a day. A python implementation of this simple case might look something like this.
1
2
3
4
5
6
7
8
9
10
11
12
g = Graph()
subgraphs = []
for vertex in g.vertices:
next_vertices = g.neighbors(vertex)
pairs = [(vertex, v2) for v2 in next_vertices]
subgraphs.append(pairs)
unique_subgraphs = set(subgraphs)
This approach works great for when n = 2, but has no way to track the size of the current subgraph we are looking at which is needed for n to be genericized. That way we can tell when we have created a subgraph of size n, and we can return the subgraph once we have reached the desired size. We actually do this in our previous implementation, but it is not clear unless we rewrite the solution to include comments that track n.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
g = Graph()
subgraphs = []
for vertex in g.vertices:
n = 0
next_vertices = g.neighbors(vertex)
current_subgraph = [vertex]
n = len(current_subgraph) # n = 1
for v2 in next_vertices:
current_subgraph = current_subgraph + [v2]
n = len(current_subgraph) # n = 2
if n == 2:
subgraphs.append(updated_current_subgraph)
Now it’s clear that we actually are building up subgraphs until we reach the desired size (in this case 2) and then stopping and moving on to the next subgraph. If we wanted to extend this to find subgraphs of size 3, it’s as simple as adding one more step to get to n = 3.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
...
for vertex in g.vertices:
current_subgraph = [vertex]
n = len(current_subgraph) # n = 1
for v2 in g.neighbors(current_subgraph[-1]):
current_subgraph = current_subgraph + [v2]
n = len(current_subgraph) # n = 2
for v3 in g.neighbors(current_subgraph[-1]):
current_subgraph = current_subgraph + [v3]
n = len(current_subgraph) # n = 3
if n == 3:
subgraphs.append(updated_current_subgraph)
To extend this further we just need to another for
loop, and another for
loop, etc. etc. If we
take a step back, we can see that we are actually taking recursive steps here. The structure of
the for
loop for v2
and for v3
is the same, so we can actually just write a recursive function
to do this work. Here are the operations for the k_th step in the sequence.
- Iterate over all the neighbors of the last node in the current subgraph
- Append each neighbor to the subgraph
- If k == n, append the subgraph to the global list of subgraphs and end
Here’s what this might look like in python.
1
2
3
for v_k in g.neighbors(current_subgraph[-1]):
current_subgraph = current_subgraph + [v_k]
n = len(current_subgraph) # n = k
Algorithm Developement
Given the four generic steps we determined above, we can now refactor our code into a recursive
function to calculate all the subgraphs. We’ll call this function all_n_subgraphs()
. We’ll also
take this opportunity to plan out what types to use, and how we want our calling specification
laid out.
Types
Python doesn’t have a built-in graph type, and hand rolling a graph type is out of scope, so we’ll
use a prebuilt graph package to represent our graphs. I did some some research, and I decided that the
igraph
package for python would suit
our needs just fine. igraph
has a lot of
functionality, but for this algorithm to run, we really only need two functions from
the igraph
library:
Graph.neighbors(v: vertex)
– returns the neighbors to a vertex in a graph.Graph.vs
– returns the list of all vertices in the graph.
Inputs and Outputs
Let’s consider what we want all_n_subgraphs()
to do, and build a specification for this
function. The graph G, and the desired size of the subtrees, n, will be inputs to
all_n_subgraphs()
. Our output will be a list of all the subgraphs, represented by lists
of size n with the vertex indices (integers) of the subgraph.
1
2
3
import igraph as ig
def all_n_subgraphs(G: ig.Graph, n: int) -> list[list[int]]
Another important part of creating a specification for a function is defining the input and
output conditions. Considering our inputs, we can make a few assertions that should be fulfilled
for the function to work properly. First, we know that n should be greater than zero. While it’s
certainly possible to represent a subgraph of size zero as an empty list, []
, it’s not that
useful. Secondly, we should state that n cannot be greater than the number of vertices V in
the graph G. This is because a subgraph of G cannot have more vertices than G. These
two assertions can be formalized in assert
statements at the top of our function.
Beyond declaring the type of our output, there isn’t really anything we can state about the output
of all_n_subgraphs()
, as the list could theoretically be any size greater than or equal to zero.
Specification
When we bring all the parts together, we get a nice specification for our function.
1
2
3
4
5
6
import igraph as ig
def all_n_subgraphs(G: ig.Graph, n: int) -> list[list[int]]:
assert n > 0, f"Error in 'all_n_subgraphs': n must be greater than 0"
assert len(G.vs) >= n,
f"Error in 'all_n_subgraphs': n is greater than len(Graph.vs)"
Now, is doing input validation in this way very pythonic? Probably not, but it’s good enough for me in this example! Now we need to add in the rest of the logic to complete this function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import igraph as ig
def all_n_subgraphs(G: ig.Graph, n: int) -> list[list[int]]:
assert n > 0, f"Error in 'all_n_subgraphs': n must be greater than 0"
assert len(G.vs) >= n,
f"Error in 'all_n_subgraphs': n is greater than len(Graph.vs)"
subgraphs: list[list[int]] = []
for vertex in g.vertices:
next_vertices = g.neighbors(vertex)
current_subgraph = [vertex]
subgraphs = subgraphs + subtrees(G, n - 1,
current_subgraph, next_vertices)
def difference(a: list, b: list) -> list:
return list(set(a) - set(b))
def subtrees(G: ig.Graph,
n: int,
current_subgraph,
next_vertices) -> list[list[int]]:
if n <= 0:
return [current_subgraph]
subgraphs: list[list[int]] = []
for vertex in next_vertices:
# concat instead of appending because we need a copy
new_subgraph = current_subgraph + [vertex]
new_next_vertices = next_vertices + g.neighbors(vertex)
unique_next_steps = difference(new_next_vertices, new_subgraph)
subgraphs = subgraphs + next_subtree(g, n - 1,
new_subgraph, unique_next_steps, original_vertex)