Dictionary ADT S168
Abstract data structure mapping keys (type K) to
values (type V). Operations:
search(K key) โ returns the value associated with the key (or special value if
absent)
insert(K key, V value) โ adds/updates a key-value pair
delete(K key) โ removes the element with the given key
Simple (Naive) Implementations S171
Data size: $n$ = number of elements.
Dominating operation: key comparison. Space: $\Theta(n)$ for all simple
implementations.
| Variant |
Search |
Insert |
Delete |
Notes |
| Unordered (Array/List) |
O(n) |
O(1) |
O(n) |
Append to end/head |
| Ordered Array |
O(log n) |
O(n) |
O(n) |
Binary search; shift on insert/delete |
| Ordered Linked List |
O(n) |
O(n) |
O(n) |
Keeping sorted does NOT help! |
โ Ordered linked
list gains nothing: no random access โ no binary search possible.
Direct Addressing S172
Keys come from universe $U \subseteq \mathbb{N}$. Store element with
key $k$ at index $k$ in a $|U|$-element array.
Operations
- Search: O(1)
- Insert: O(1)
- Delete: O(1)
Trade-off
Space:
$O(|U|)$
Extremely fast but wastes memory when $|U| \gg n$. Only
practical when key universe is small.
Hash Tables S173
Elements stored in an $m$-element array $[0, \ldots, m-1]$, where $m
\ll |U|$. Position computed by hash function $h : U \to [0..m-1]$.
Hash Function Properties S175
An ideal hash function $h \to [0, \ldots, m-1]$ should have:
- Uniform load: each index $0 \le i < m$ equally likely for a random key
- Fast computation: $O(1)$ time
- Avalanche: different values even for very similar keys
Example: $h(k) = k \bmod m$
(usually $m$ is a prime number)
Non-integer keys must first be transformed to
integers (e.g., strings โ sum of char codes). This pre-hash should also satisfy hash function
properties.
Collisions S176
Collision: when $h(k) = h(j)$ for two distinct keys $k
\ne j$. Two resolution strategies:
1. Chaining Method
Key $k$ is added to a linked list $l(h(k))$ stored
at position $h(k)$.
2. Open Hashing (Open Addressing)
Other indices are probed in a repeatable sequence
until a free slot is found.
Chain Method โ Operations S177
$n$ = elements stored, computing $h(k)$ takes
$O(1)$:
| Operation |
Cost |
Why |
| insert(k) |
O(1) |
Compute $h(k)$, prepend to list at $h(k)$ |
| search(k) |
$O(1 + |l(h(k))|)$ |
Compute $h(k)$, scan list $l(h(k))$ |
| delete(k) |
$O(1 + |l(h(k))|)$ |
Compute $h(k)$, scan list to find & remove |
Worst case: all $n$ keys map to same
slot โ $|l(h(k))| = n$ โ $\Theta(n)$ comparisons (no better than naive!).
Average Case Analysis โ Chain Method S178
Under uniform load assumption, define load
factor $\alpha = n/m$.
Proof sketch: Let $X$ = length of list
$l(h(k))$. For each element $e \in S$, define indicator $X_e = [h(k) = h(e)]$, so $X =
\sum_{e \in S} X_e$.
$$E[X] = E\left[\sum_{e \in S} X_e\right] = \sum_{e \in S} P(X_e = 1) = |S| \cdot \frac{1}{m} =
\frac{n}{m} = \alpha$$
Result: Average cost of all operations is $O(1 +
\alpha)$. If $m = O(n)$, then $\alpha = O(1)$ and all operations take O(1)
average time.
Universal Hashing S179
A family $H$ of hash functions into range $\{0, \ldots, m-1\}$ is
called $c$-universal (for $c > 0$) if for a randomly chosen $h \in H$, any two
distinct keys $i, j$ collide with probability:
$$P(h(i) = h(j)) \le \frac{c}{m}$$
- Family is universal if $c = 1$
- Purpose: avoid "malicious" data patterns โ hash function is randomly picked from
the family at initialization
- Guarantee: With $c$-universal family in chain method, average time of
operations is $O(1 + cn/m)$
Open Hashing (Open Addressing) S180
Exactly one element per slot.
On collision, probe a sequence $\pi(k) = (h(k,0), h(k,1), \ldots, h(k,m-1))$ until a free slot
is found. Requires $n < m$ (i.e., $\alpha < 1$).
| Probing Strategy |
Formula |
Problem |
| Linear |
$h(k,i) = (h(k) + i) \bmod m$ |
Primary clustering |
| Quadratic |
$h(k,i) = (h(k) + c_1 i + c_2 i^2) \bmod m$ |
Secondary clustering |
| Double hashing |
$h(k,i) = (h_1(k) + i \cdot h_2(k)) \bmod m$ |
Best โ "more random" permutations |
For double
hashing: e.g., $h_1(k) = k \bmod m$, $h_2(k) = 1 + (k \bmod m')$, $m' = m-1$. The $h_1,
h_2$ should differ.
Note: delete in open hashing requires special handling โ
must restore the hash table (e.g., lazy deletion with tombstone markers).
Average Case Analysis โ Open Hashing S181
Under assumption that all probe orders are equally probable, with
load factor $\alpha = n/m < 1$:
Unsuccessful Search (key absent)
$$\text{Avg comparisons} =
\frac{1}{1 - \alpha}$$
Successful Search (key present)
$$\text{Avg comparisons} =
\frac{1}{\alpha} \ln \frac{1}{1-\alpha} + \frac{1}{\alpha}$$
Worst
case: $O(n)$ โ linear. When $n$ approaches $m$, open hashing degrades to
naive linear scan. Must maintain $n < m$.
(*) Perfect Hashing S182
Previous methods guarantee expected constant time.
Perfect hashing guarantees worst-case $O(1)$
search time for a static set.
- Can be constructed for a given set of $n$ elements in expected $O(n)$ time
- Based on 2-universal hash function families (Fredman, Komlรณs, Szemerรฉdi 1984)
- Useful when keys are known in advance and don't change
Hash Table โ Space Complexity
| Method |
Space |
Constraint |
| Chaining |
$\Theta(m + n)$ |
$m$ slots + $n$ list nodes |
| Open Addressing |
$\Theta(m)$ |
Must have $m > n$ ($\alpha < 1$) |
Dynamic Ordered Set S183
Extension of Dictionary where keys have a
linear order. Adds order-based operations:
Dictionary Ops
search(K key)
insert(K key, V value)
delete(K key)
Order Ops
minimum(), maximum()
predecessor(K key)
successor(K key)
โ Hash tables
support dictionary ops efficiently but do NOT efficiently support order
operations.
Binary Search Tree (BST) S184
BST property: For each node, all keys in left
subtree $\le$ node's key $\le$ all keys in right subtree.
- Min key: leftmost node. Max key: rightmost node.
- All operations run in $O(h)$, where $h$ = height
- Worst case: $h = O(n)$ (degenerate/skewed tree)
BST: Search S185
searchRecursive(node, key): // called with node == root
if(node == null or node.key == key) return node
if(key < node.key) return search(node.left, key)
else return search(node.right, key)
searchIterative(node, key): // called with node == root
while(node != null and node.key != key):
if(key < node.key) node = node.left
else node = node.right
return node
BST: Min, Max, Successor, Predecessor S186
minimum(node): // called with node == root
while(node.left != null) node = node.left
return node
maximum(node):
while(node.right != null) node = node.right
return node
successor(node):
if(node.right != null) return minimum(node.right)
p = node.parent
while(p != null and node == p.right):
node = p; p = p.parent
return p
predecessor(node): // mirror of successor
if(node.left != null) return maximum(node.left)
p = node.parent
while(p != null and node == p.left):
node = p; p = p.parent
return p
All run in $O(h)$.
BST: Insert S187
insert(node, key): // recursive version
if(key < node.key) then
if(node.left == null):
n = create new node with key
node.left = n
else: insert(node.left, key)
else: // key >= node.key
if(node.right == null):
n = create new node with key
node.right = n
else: insert(node.right, key)
insert(T, z): // iterative version
y = null; x = root
while(x != null):
y = x
if(z.key < x.key) x = x.left
else x = x.right
z.parent = y
if(y == null) root = z
else if(z.key < y.key) y.left = z
else y.right = z
BST: Delete & Delete1 S188-189
delete(node, key): // find node by key, then remove
if(key < node.key) then
delete(node.left, key)
else if(key > node.key) then
delete(node.right, key)
else begin // key == node.key โ found!
if node is a leaf then
deletesimple(node) // just remove it
else
if(node.left != null) then
find x = rightmost node in node.left // predecessor
node.key = x.key
delete1(x) // x has at most 1 child
else
find x = leftmost node in node.right // successor
node.key = x.key
delete1(x)
delete1(node): // helper: node has at most 1 child
subtree = null
parent = node.parent
if(node.left != null)
subtree = node.left
else
subtree = node.right
if(parent == null)
root = subtree
else if(parent.left == node) // node is a left son
parent.left = subtree
else // node is a right son
parent.right = subtree
Delete handles 3
cases: leaf (remove), one child (delete1 bypasses), two children (replace key with
predecessor/successor, then delete1).
BST: Average Case Analysis S190-191
Under the random permutation model (every insertion
order equally likely):
Average Height: Expected height
of BST is $O(\log n)$ โ can be proved by analogy to QuickSort (proof omitted in lecture).
Average Depth: For a sequence of
keys $\langle k_i \rangle$ inserted into a BST, define for key $k_j$:
- $G_j = \{k_i : 1 \le i < j $ and $k_l> k_i > k_j$ for all $l < i$ such that $k_l> k_j\}$ โ
direct predecessors
- $L_j = \{k_i : 1 \le i < j $ and $k_l < k_i < k_j$ for all $l < i$ such that $k_l < k_j\}$ โ
direct successors
Path from root to $k_j$ consists exactly of $G_j
\cup L_j$, so $d(k_j) = |G_j| + |L_j|$. The $i$-th element is a current minimum with probability
$1/i$, so $E[|G_j|] = H_n = \sum_{i=1}^{n} 1/i = O(\log n)$. Analogously for $L_j$. Thus:
$d(k_j) = O(\log n)$.
Note: under the
alternative model where every $n$-element BST shape is equally likely, average height is
$\Theta(\sqrt{n})$ โ this model is less natural.
BST: Complexities of Operations S192
Data size: $n$ = number of elements.
Dominating operation: comparison of keys.
| Operation |
Average |
Worst Case |
| search |
$\Theta(\log n)$ |
$O(n)$ |
| insert |
$\Theta(\log n)$ |
$O(n)$ |
| delete |
$\Theta(\log n)$ |
$O(n)$ |
| minimum / maximum |
$\Theta(\log n)$ |
$O(n)$ |
| successor / predecessor |
$\Theta(\log n)$ |
$O(n)$ |
AVL Tree (Adelson-Velskij, Landis) S193
Simplest tree data structure guaranteeing $O(\log n)$
worst-case height.
Definition: A BST where for
every node, the difference in height of left and right subtrees is at most 1:
$$BF(v) = h(v.\text{left}) - h(v.\text{right}) \in \{-1, 0, 1\}$$
AVL: Maximum Height Proof S194
Let $T_h$ = minimum number of nodes in an AVL tree of height $h$:
- $T_0 = 1$, $T_1 = 2$
- $T_h = 1 + T_{h-1} + T_{h-2}$ (root + minimal left subtree + minimal right subtree)
Thus $T_h \ge F_h$ (Fibonacci numbers). Since
Fibonacci grows exponentially in $h$, the minimum node count grows
exponentially โ $h = O(\log n)$.
Conclusion: Height of an
$n$-element AVL tree is at most $O(\log n)$.
AVL: Operations & Rotations S195-196
Operations are the same as BST, but after each
modifying operation:
- Balance factors ($bf$) are recomputed bottom-up
- If any $|bf| = 2$, a rotation is performed on that node
Rotation types:
| Type |
When |
Cost |
| Single rotation (LL or RR) |
Same-direction imbalance (bf of child matches parent) |
O(1) |
| Double rotation (LR or RL) |
Opposite-direction imbalance |
O(1) |
Rotations preserve
BST order and restore subtree height. After rotation, $|bf|$ cannot exceed 2 on a valid AVL.
AVL: Worst-Case Analysis S197
- Each rotation: $O(1)$
- Operations bounded by tree height: $O(\log n)$
- Height is at most logarithmic โ all dictionary operations: $O(\log n)$
worst case
Note: a single
delete may trigger $O(\log n)$ rotations (e.g., on Fibonacci trees). Insert triggers at most 1
rotation.
Self-Organising BST (Splay Trees) S198
Guarantee amortised $O(\log n)$ for all ordered
dictionary operations. Any sequence of $m$ operations has total cost $O(m \log n)$.
Core idea: Every operation uses
a helper splay(k) that moves key $k$ (or its predecessor/successor) to the root via
zig-zig and zig-zag rotations.
splay(k)
Bring $k$ (or its closest key) to root via rotations.
insert(k)
splay(k) โ brings successor/predecessor $k'$ to root โ make
$k'$ a child of new node $k$.
delete(k)
splay(k) โ $k$ becomes root โ remove $k$ (two subtrees) โ
splay(k) on left subtree โ brings predecessor to root โ attach right subtree.
Linked List Operations: Print & Splice S120
print(head):
curr = head
while(curr != null):
print(curr.data); curr = curr.next
splice(p, b, e): // Move [b..e] after p โ O(1)
b.prev.next = e.next; e.next.prev = b.prev
e.next = p.next; b.prev = p
p.next.prev = e; p.next = b
๐ Complete Dictionary Implementation
Comparison
| Structure |
Search |
Insert |
Delete |
Space |
Order Ops |
Notes |
| Unordered (Arr/List) |
O(n) |
O(1) |
O(n) |
$\Theta(n)$ |
O(n) |
Simplest |
| Ordered Array |
O(log n) |
O(n) |
O(n) |
$\Theta(n)$ |
O(log n) |
Binary search |
| Ordered List |
O(n) |
O(n) |
O(n) |
$\Theta(n)$ |
O(n) |
Sorting doesn't help |
| Direct Addressing |
O(1) |
O(1) |
O(1) |
$O(|U|)$ |
โ |
Wastes memory |
| Hash Table (chaining) |
O(1) avg |
O(1) avg |
O(1) avg |
$\Theta(m+n)$ |
โ |
$O(n)$ worst |
| Hash Table (open addr.) |
O(1) avg |
O(1) avg |
O(1) avg* |
$\Theta(m)$ |
โ |
Requires $\alpha < 1$ |
| Perfect Hashing |
O(1) W |
โ |
โ |
$O(n)$ |
โ |
Static set only |
| BST โญ |
O(h) |
O(h) |
O(h) |
$\Theta(n)$ |
O(h) |
$h$: O(n) W, O(log n) avg |
| AVL Tree |
O(log n) |
O(log n) |
O(log n) |
$\Theta(n)$ |
O(log n) |
Worst-case guaranteed |
| Splay Tree |
O(log n) am. |
O(log n) am. |
O(log n) am. |
$\Theta(n)$ |
O(log n) am. |
Self-adjusting |
| Red-Black Tree |
O(log n) |
O(log n) |
O(log n) |
$\Theta(n)$ |
O(log n) |
Fewer rotations than AVL |
W = worst case, avg =
average case, am. = amortised, * = delete in open addressing needs special handling
Binary Tree Properties
Max nodes at height h: $2^{h+1} - 1$. Height of n nodes: $\lfloor \log_2 n \rfloor$
(complete tree).