### A "generic" graph traversal algorithm

• Introductory problem

• Given the following graph:

where:

 nodes represents cities and the edges represent the cost (e.g., by train or airplane) to travel between 2 cities

(There is no train/airplane) connection between cities that are not connected by an edge)

• Question:

 What is the lowest cost path from node 0 to node 2 ?

with cost = 3 + 2 + 1 = 6

• In order to find the answer, we would have tried different ways to go from 0 → 2

Example:

• Unsuccessful attempt 1: (cost = 4 + 4 + 4 + 2 = 14)

• Unsuccessful attempt 2: (cost = 7 + 2 = 9)

• Algorithms used to process the information stored in a Graphs data structure

• Fact:

• Graphs are like trees and linked lists:

 All of them are used to represent some information that is important to solve our problem.

• In order to process the information stored in a graph, we must visit

of the graph

(Just like in a tree or linked list !!!)

• Terminology:

 Graph traversal = visiting nodes in a graph using the link information of the graph

Note:

 We must access the links in a graph traversal because the links often contains essential information (e.g., cost) for the problme at head !!!

• Caveat in graph traversal

• Unlike trees/linked lits, a graph can have cycles:

Warning:

 If a program blindly follow the links in a graph, the program will loop forever !!!

• How to avoid executing in an infinite loop:

• We must maintain some visitation information:

 When we visit a node, we must mark that node as visited The program must never visit an already visited node again

• Visitation information

• How to store information on whether a node has been visited:

 boolean visited[ ]; // denote whether a node has been visited visited = new boolean[ # nodes ]; We use: visited[i] = true to represent node i has been visited visited[i] = false to represent node i has NOT been visited

• Additional information needed to traverse a graph

• In addition to the visited[ ] information (stores whether a node has been visited), we also need:

• toVisitNodes:

 This variable contains nodes that can be reached by known edges

• General algorithm used to visit all nodes in a graph

• General algorithm that can be used to visit all nodes in a graph using (all) links in the graph:

 Set all nodes to "not visited"; put an initial node into "toVisitNodes"; while ( "toVisitNodes" ≠ empty ) do { x = select (and delete) some node from "toVisitNodes"; if ( x has not been visited ) { visited[x] = true; // Visit node x ! for ( every edge (x, y) /* we are using all edges ! */ ) if ( y has not been visited ) add y to "toVisitNodes"; // Use the edge (x,y) !!! } }

• Example of traversal

• Initialization:

 Set all nodes to "not visited"; We start the traversal at node 0 (i.e., insert initial node = 0 into "toVisitNodes")

Graphically:

• Iteration 1:

• Select (and delete) a node x from toVisitNodes:

(Since toVisitNodes contains only node 0, this node will be selected)

• Node 0 is not visited, we will execute:

 visited[0] = true; // Visit node 0 ! for ( every edge (0, y) /* we are using all edges ! */ ) { if ( y has not been visited ) add y to "toVisitNodes"; // Use the edge (0,y) !!! }

The statement visited[0] = true will mark node 0 as visited (this represents the fact that we are visiting the node !):

The for-loop will use the edges that are incident to node 0.

The for-loop add these nodes to the variable toVisitNodes: 1, 3, 8

Result:

• Iteration 2:

• Select (and delete) a node x from toVisitNodes:

(We picked node 3)

• Node 3 is not visited, we will execute:

 visited[3] = true; // Visit node 3 ! for ( every edge (3, y) /* we are using all edges ! */ ) { if ( y has not been visited ) add y to "toVisitNodes"; // Use the edge (3,y) !!! }

The statement visited[3] = true will mark node 3 as visited (this represents the fact that we are visiting the node !):

The for-loop will use the edges that are incident to node 3: ((3,0), (3,2), (3,4)) --- see figure above).

 But node 0 is visited So node 0 is not added !!! (Otherwise, we will visit node 0 again, and we will keep on going forever !!!)

The for-loop adds these nodes to the variable toVisitNodes: 2, 4

Result:

I have colored the edges that we have used in red

You can see that the algorithm uses all edges in the graph to visit all the nodes.

• And so on....

(Too long to go through all the steps !)

• The generic graph traversal algorithm in a Java syntax

• "Pseudo" Java code: (I cannot be specific on how to store the toVisitNodes yet)

 /* ============================================== Initialize variables for the traversal ============================================== */ for ( i = 0; i < NNodes; i++ ) { visited[i] = false; } add( 0, toVisitNodes); // Start the "to visit" at node 0 /* =========================================== Loop as long as there are "active" node =========================================== */ while( toVisitNodes ≠ empty ) { int nextNode; // Next node to visit int i; nextNode = remove( toVisitNodes ); // Remove a node from "toVisitNodes" if ( ! visited[nextNode] ) { visited[nextNode] = true; // Mark node as visited System.out.println("nextNode = " + nextNode ); for ( i = 0; i < NNodes; i++ ) if ( adjMatrix[nextNode][i] > 0 && ! visited[i] ) add( i, toVisitNodes); } }

• Classic graph traversal algorithms

• There are 2 classic graph traversal algorithms:

 Breadth First Search (BFS) Depth First Search (DFS)

• These 2 algorithms differs in:

 The data structure used to store the nodes in toVisitNodes

• Facts: