# Breadth-first search

## Contents

## General information

**Algorithmic problem:** Graph traversal

**Type of algorithm:** loop

## Abstract view

**Definition:**
On this page, the **distance** of a node [math]v\in V[/math] is the minimal number of arcs on a path from the start node [math]s[/math] to [math]v[/math].

**Additional output:** An arborescence [math]A=(V',A')[/math] rooted at [math]s[/math] such that [math]V'\subseteq V[/math] is the set of all nodes reachable from [math]s[/math] (including [math]s[/math]). For each node [math]v\in V'[/math], the path from [math]s[/math] to [math]v[/math] in [math]A[/math] is a shortest [math](s,v)[/math]-path in [math]G[/math] with respect to that notion of distance.

**Specific characteristic:**
The nodes are finished in the order of increasing distance (which is not unique, in general).

**Auxiliary data:**
A FIFO queue [math]Q[/math] whose elements are nodes in [math]V[/math].

**Invariant:**
Before and after each iteration:

- There is a
**current distance**[math]k\in\mathbb{N}[/math]. - All nodes with distance at most [math]k[/math] are already seen.
- Let [math]n[/math] denote the current size of [math]Q[/math]. There is [math]\ell\in\{1,\ldots,n\}[/math] such that the first [math]\ell[/math] elements of [math]Q[/math] have distance [math]k[/math] and the last [math]n-\ell[/math] elements have distance [math]k+1[/math] (in particular,
*all*elements have distance [math]k[/math] if, and only if, it is [math]\ell=n[/math]).

**Variant:** One node is removed from [math]Q[/math].

**Break condition:** [math]Q=\emptyset[/math].

## Induction basis

**Abstract view:** The start node is seen, no other node is seen. The start node is the only element of [math]Q[/math]. The output sequence is empty and the arborescence [math]A[/math] contains the start node [math]s[/math] and nothing else.

**Implementation:** Obvious.

**Proof:** Obvious.

## Induction step

**Abstract view:**

- Extract the first element [math]v[/math] from [math]Q[/math].
- Append [math]v[/math] to the output sequence.
- For each outgoing arc [math](v,w)[/math] of [math]v[/math] such that [math]w[/math] is not yet seen:
- Insert [math]w[/math] and [math](v,w)[/math] in [math]A[/math].
- Label [math]w[/math] as seen.
- Append [math]w[/math] to [math]Q[/math].

**Implementation:** Obvious.

**Proof:**
The variant is obviously fulfilled. The first point of the invariant is only a notation, so nothing is to show for that.

Let [math]w\in V[/math] not yet seen before the current iteration, and let [math](v,w)\in A[/math]. The induction hypothesis (second point of the invariant) implies that the distance of [math]w[/math] is greater than [math]k[/math]. However, [math]v[/math] has distance [math]k[/math], so the distance of [math]w[/math] cannot be greater than [math]k+1[/math]. Im summary, this proves the third point of the invariant.

The critical iterations for the second invariant are those where [math]k[/math] increases. These are exactly the iterations in which [math]\ell=1[/math] immediately before (and, consequently, [math]\ell=n[/math] immediately after) that iteration. All nodes with old distance [math]k[/math] have been seen and no such node is in [math]Q[/math] anymore after the current iteration. Therefore, all nodes [math]w[/math] such that [math](v,w)\in A[/math] for some [math]v[/math] with distance [math]k[/math] have been seen as well. Clearly, this includes all nodes with old distance [math]k+1[/math], in other words, the nodes with new distance [math]k[/math].

## Correctness

It is easy to see that each operation of the algorithm is well defined. Due to the variant, the loop terminates after a finite number of steps. Based on a straightforward induction on the iterations, the third invariant ensures that the nodes leave [math]Q[/math] in ascending order of their distances. Thus, it remains to show that *all* nodes that are reachable from [math]s[/math] have been seen.

Suppose some node [math]v[/math] is reachable from [math]s[/math] but has not been seen. Let [math]p[/math] be an [math](s,v)[/math]-path. There is an arc [math](x,y)[/math] on [math]p[/math] such that [math]x[/math] is seen and [math]y[/math] is not. However, since [math]x[/math] is seen, [math]x[/math] has been inserted in [math]Q[/math] in some iteration and, consequently, removed from [math]Q[/math] in a later iteration. However, in that later iteration, [math]y[/math] had been seen.

## Complexity

**Statement:** The asymptotic complexity is in [math]\Theta(|V|+|A|)[/math] in the worst case.

**Proof:**
The complexity of each iteration is linear in the number of arcs leaving the current node [math]v[/math]. Therefore, the complexity of the entire loop is in [math]\Theta(|A|)[/math]. The complexity of the initialization is dominated by the initialization of the seen labels of all nodes, which is clearly in [math]\Theta(|V|)[/math].

## Remark

The nodes in [math]Q[/math] form kind of a "frontier line", which separates the nodes that already left [math]Q[/math] from the nodes that have not yet entered [math]Q[/math] so far. More specifically, each path from some seen node to some unseen node contains at least one node that is currently stored in [math]Q[/math].

To see that, we apply a straightforward induction on the iterations. Let [math]p[/math] be a path from some seen node outside [math]Q[/math] to some node that is not yet seen immediately after the current iteration. If [math]p[/math] does not contain [math]v[/math], the claim follows from the induction hypothesis. So assume [math]p[/math] contains [math]v[/math]. Let [math]x[/math] be the last node on [math]p[/math] that is already seen immediately after the current iteration. If [math]x=v[/math], the claim follows from the fact that all unseen nodes [math]w\in V[/math] with [math](v,w)\in A[/math] were put in [math]Q[/math], so the immediate successor of [math]v[/math] on [math]p[/math] is in [math]Q[/math] now. Otherwise, the claim follows from the induction hypothesis again.

## Pseudocode

```
```

```
```BFS(*G,s*)
1 **for** each vertex *u* ∈ *G.V* - {*s*}
2 *u.color* = WHITE
3 *u.d* = ∞
4 *u.*π = NIL
5 *s.color* = GRAY
6 *s.d* = 0
7 *s.π* = NIL
8 *Q* = Ø
9 ENQUE(*Q, s*)
10 **while** *Q* ≠ Ø
11 *u* = DEQUEUE(*Q*)
12 **for** each *v* ∈ *G.Adj*[*u*]
13 **if** *v.color* == WHITE
14 *v.color* = GRAY
15 *v.d* = *u.d* + 1
16 *v*.π = *u*
17 ENQUEUE(*Q, v*)
18 *u.color* = BLACK

```
```