โ† Back to Dashboard

๐Ÿ“‹ ASD Exam Cheat Sheet

Quick reference for algorithms, complexities & data structures

๐Ÿ“…

Lecture Concept Index

L1-L2: Foundations

  • Correctness, Specification
  • Loop Invariants, Variants
  • Asymptotic Notation (O, ฮฉ, ฮ˜)

L3-L5: Search & Sort

  • Binary Search, QuickSelect
  • O(nยฒ) Sorts (Selection, Insertion)
  • O(n log n) Sorts (Merge, Quick, Heap)
  • Linear Sorts (Count, Radix)

L7-L8: Structures & ADTs

  • Arrays, Linked Lists
  • Dynamic Arrays (Unbounded) S137
  • Stacks, Queues, Deques
  • Priority Queues, Binary Heaps

L9: Dictionary & Trees

  • Hash Tables (O(1) avg)
  • Simple Dictionary (Arrays/Lists)
  • BST, AVL Trees, RB Trees
  • Dynamic Ordered Set (Optional) S183
๐Ÿ“š

Correctness of Algorithms

L1

Specification S10

  • Initial condition: Valid input constraints
  • Final condition: Expected output/result

Total Correctness S12

Algorithm is totally correct for a given specification if and only if for any correct input data it:

  • It stops for all valid inputs (Termination)
  • Returns correct output (Partial Correctness)

Formula: $Total = Partial + Termination$

Logical Sense:

$\forall$ valid input $d$: algorithm(d) stops $\wedge$ result satisfies final condition.

Partial Correctness: IF algorithm stops $\implies$ output is correct. (Note: does NOT require stopping!)

โš ๏ธ Example โ€” partially correct but NOT totally correct:

sum(array, len){ 
  sum=0; 
  i=0; 
  while(i<len){ 
    sum+=array[i]; 
    // MISSING: i++
  } 
  return sum; 
}

Missing i++ โ†’ infinite loop. It never returns wrong output (partial correctness holds vacuously), but it never stops โ†’ not totally correct.

Termination Proof S16

Proven using a Variant Function $v: State \to \mathbb{N}$:

  • Decreasing: $v(S_{i+1}) < v(S_i)$
  • Bounded: $v(S_i) \ge 0$

Partial Correctness

IF algorithm stops โ†’ output is correct

(Does NOT guarantee termination)

Loop Invariant S17

Predicate true before & after each iteration.

  • Initialization: True before loop
  • Maintenance: Preserved by iteration
  • Termination: Implies result
โฑ๏ธ

Complexity Analysis

L2

Dominating Operations S27

Operations whose count is proportional to total work (e.g., comparisons in sorting).

Time Complexity S30

  • W(n) - Worst case (pessimistic)
  • A(n) - Average case
  • B(n) - Best case

Space Complexity

Memory used relative to input size n.

  • In-place: O(1) extra space
  • Not in-place: O(n) or more

Amortized Analysis Methods S135

Method Approach Formal Condition
Aggregate Total cost for a sequence of $m$ operations $A = \text{TotalCost}(m) / m$
Accounting Assign fixed costs (credits); pre-pay cheap ops $\sum \hat{c}_i \ge \sum c_i$
Potential $\Phi$: state โ†’ real number; $\Phi(S_0)=0$ $a_i = c_i + \Phi(S_i) - \Phi(S_{i-1})$
๐Ÿ“ˆ

Asymptotic Notation S37

L2 โญ COMPULSORY
Notation Meaning Limit Test (f/g โ†’ ?)
f(n) = O(g(n)) f grows at most as fast as g (upper bound) โ‰ค constant (including 0)
f(n) = ฮฉ(g(n)) f grows at least as fast as g (lower bound) โ‰ฅ constant (including โˆž)
f(n) = ฮ˜(g(n)) f grows at same rate as g (tight bound) = positive constant
f(n) = o(g(n)) f grows strictly slower than g = 0
f(n) = ฯ‰(g(n)) f grows strictly faster than g = โˆž

๐Ÿ“ Formal Definitions (Quantified) S37

Notation Symbol Formal Definition
f = O(g) โ‰ค $\exists c > 0,\; \exists n_0,\; \forall n \geq n_0:\; f(n) \leq c \cdot g(n)$
f = ฮฉ(g) โ‰ฅ $\exists c > 0,\; \exists n_0,\; \forall n \geq n_0:\; f(n) \geq c \cdot g(n)$
f = ฮ˜(g) = $f = O(g) \wedge f = \Omega(g)$, i.e. $\exists c_1, c_2 > 0,\; \exists n_0,\; \forall n \geq n_0:\; c_1 g(n) \leq f(n) \leq c_2 g(n)$
f = o(g) < $\forall c > 0,\; \exists n_0,\; \forall n \geq n_0:\; f(n) < c \cdot g(n)$
f = ฯ‰(g) > $\forall c > 0,\; \exists n_0,\; \forall n \geq n_0:\; f(n) > c \cdot g(n)$

Capital letter (O, ฮฉ, ฮ˜) = non-sharp inequality (โ‰ค, โ‰ฅ, =). Lowercase (o, ฯ‰) = sharp inequality (<, >).

Key Insight: O and ฮฉ use โˆƒc (existential โ€” "there exists some constant"). o and ฯ‰ use โˆ€c (universal โ€” "for every constant"). This is why o means strictly slower โ€” it must hold for every possible constant, not just one.

Complexity Ranking (Growth Hierarchy) S42

O(1) < O(log n) < O(โˆšn) < O(n) < O(n log n) < O(nยฒ) < O(nยณ) < O(2โฟ) < O(n!)

๐Ÿ“‹ Most Common Ranks of Functions S36

Rank Notation Example
constant $\Theta(1)$ $S(n) = 3 = \Theta(1)$
logarithmic $\Theta(\log n)$ $W(n) = 2 + \lg_2 n = \Theta(\log n)$
linear $\Theta(n)$ $A(n) = 2n + 1 = \Theta(n)$
linear-logarithmic $\Theta(n \log n)$ $A(n) = 1.44 \cdot n\log(n) = \Theta(n\log n)$
square (polynomial) $\Theta(n^2)$ $W(n) = n^2 + 4 = \Theta(n^2)$
cubic (polynomial) $\Theta(n^3)$ $A(n) = \Theta(n^3)$
sub-exponential $\Theta(n^{\log n})$ $A(n) = \Theta(n^{\log n})$
exponential $\Theta(2^n)$ $A(n) = \Theta(2^n)$
factorial $\Theta(n!)$ $W(n) = \Theta(n!)$

โš ๏ธ Time complexity: An over-polynomial rank (sub-exponential, exponential, factorial) is considered "unacceptably high" in practice.

๐Ÿ’พ Space complexity: Even linear rank is considered "very high". Aim for $O(1)$ or $O(\log n)$ auxiliary space when possible.

Rule: $f(n) = \Theta(g(n)) \iff f(n) = O(g(n)) \text{ AND } f(n) = \Omega(g(n))$

๐Ÿ“Š Comparing Function Ranks (Limit Test Method) S41

Limit Test Formula

To compare f(n) vs g(n), compute: lim (nโ†’โˆž) f(n)/g(n)

Limit Result Meaning Notation
= 0 f grows strictly slower than g f = o(g)
= c > 0 (constant) f and g grow at same rate f = ฮ˜(g)
= โˆž f grows strictly faster than g f = ฯ‰(g)

Example Comparisons

f(n) g(n) Limit f/g Result
n nยฒ n/nยฒ = 1/n โ†’ 0 n = o(nยฒ) โœ“ slower
log n n (log n)/n โ†’ 0 log n = o(n) โœ“ slower
n log n nยฒ (n log n)/nยฒ โ†’ 0 n log n = o(nยฒ)
2โฟ n! 2โฟ/n! โ†’ 0 2โฟ = o(n!) โœ“ slower
3nยฒ + 5n nยฒ โ†’ 3 ฮ˜(nยฒ) same rate

โšก Quick Hierarchy Rules

โ€ข Polynomial beats logarithm: nฮต = ฯ‰(log n) for any ฮต > 0
โ€ข Exponential beats polynomial: an = ฯ‰(nk) for a > 1
โ€ข Factorial beats exponential: n! = ฯ‰(an)
โ€ข Constants don't matter: 5nยฒ = ฮ˜(nยฒ)

โ“ Exam Questions & Answers (L2) S43

1. How to measure the "speed of algorithm"?

By counting the number of dominating operations as a function of data size $n$, not by wall-clock time (which depends on hardware).

2. What 2 things should be determined before assessing time complexity?

(1) The dominating operation set, and (2) the data size (what in the input influences the number of operations).

3. What is a dominating operation?

Operations whose count is proportional to the total work of the algorithm โ€” they well represent the whole. E.g., comparisons in searching/sorting.

4. Definition of Time Complexity of Algorithm

A function $T(n)$ expressing the number of dominating operations as a function of data size $n$. Since it can vary for different inputs of the same size, we use pessimistic or average variants.

5. Definition of Space Complexity of Algorithm

The amount of additional memory (auxiliary space) used by the algorithm as a function of data size $n$, not counting the input itself. In-place = $O(1)$ extra space.

6. Definition of Pessimistic Time Complexity S30

Let $D_n$ = set of all possible inputs of size $n$, $t(d)$ = number of dominating ops for input $d$.
$W(n) = \sup\{t(d) : d \in D_n\}$
= the maximum number of dominating operations over all inputs of size $n$ (worst case).

7. Definition of Average Time Complexity S31

Let $X_n$ = random variable giving the number of dominating ops, $P_{nk}$ = probability of executing $k$ ops for input of size $n$.
$A(n) = E[X_n] = \sum_k k \cdot P_{nk}$
= the expected value of the number of dominating operations.

8. Be able to determine time complexity for simple algorithms

Steps: (1) identify dominating ops & data size, (2) count ops in best/worst/average cases, (3) express as a function of $n$, (4) simplify using asymptotic notation. See examples in the Correctness & Analysis section below.

9. What is the purpose of asymptotic notation?

To classify functions by their growth rate (rank of order), ignoring constants and lower-order terms. Allows comparing algorithms independently of hardware/implementation.

10โ€“12. Definitions of O, ฮฉ, ฮ˜, o, ฯ‰ notations & expressing ranks

See the Formal Definitions (Quantified) table and Most Common Ranks table above โ†‘

๐Ÿ“

Logarithmic Properties

CLRS ๐Ÿงฎ MATH FOUNDATION

Standard Notation (CLRS)

  • $\lg n$ $= \log_2 n$ (Binary log)
  • $\ln n$ $= \log_e n$ (Natural log)
  • $\lg^2 n$ $= (\lg n)^2$
  • $\lg \lg n$ $= \lg(\lg n)$

Basic Properties

For all $a, b, c > 0$:

  • $\log_b xy = \log_b x + \log_b y$
  • $\log_b (x/y) = \log_b x - \log_b y$
  • $\log_b x^a = a \log_b x$
  • $\log_b x = \frac{\log_c x}{\log_c b}$ (Base change)
  • $\log_b (1/x) = -\log_b x$

Crucial Identities โญ

Often used in Master Theorem & complexity:

  • $a^{\log_b c} = c^{\log_b a}$ (Swap base/arg)
  • $b^{\log_b x} = x$
  • $\log_b b^x = x$
  • $\log_b x = \Theta(\log_c x)$ (All logs same rank)

Inequalities

For $x > -1$:

  • $\frac{x}{1+x} \le \ln(1+x) \le x$
  • For $x=0$, equalities hold.
  • $\lg(n!) = \Theta(n \lg n)$ (Stirling's approx)
  • $x-1 < \lfloor x \rfloor \le x \le \lceil x \rceil < x+1$

Iterated Logarithm ($\log^* n$)

Number of times log must be applied to reach $\le 1$.

  • $\log^* n = 0$ if $n \le 1$
  • $\log^* n = 1 + \log^* (\lg n)$ if $n > 1$
  • Growth: Extremely slow (effectively < 5 for any practical $n$).

โšก Pro Tip: Calculating Bounds

When you see $2^{\lg n}$, it's just $n$. When you see $n^{\lg 7}$ vs $n^2$, remember $\lg 7 \approx 2.807 \implies n^{\lg 7}$ grows strictly faster than $n^2$.

๐Ÿ”

Searching Algorithms

L3

1. Sequential Search โญ S25 S28 S29 S31 S48

1. Specification:

Pre: Arr sequence, len length, key to find.
Post: Index $i$ s.t. $Arr[i] == key$ OR $-1$.

search(Arr, len, key):
  i = 0
  while(i < len):
    if (Arr[i] == key) return i
    i++
  return -1
2. Analysis:

Dominating operation: comparisons. $W(n) = n$, $A(n) = n/2 \implies \Theta(n)$. $B(n) = \Theta(1)$ (element found at first position). $S(n) = O(1)$.

Note: Multiplicative factor is 1 ($W(n)=n$) and cannot be improved due to the specificity of the problem (must check every element in unsorted data).

1.5 Skipping Search (k-skip) S51

1. Fundamental Mechanism:

Requires sorted sequence. Checks every $k$-th cell (skips $k-1$ elements). When a value $> key$ is found, it performs a linear check on the skipped elements.

2. Efficiency & Complexity:
  • Average Speed: Asymptotically $k$ times faster than sequential.
  • Complexity: $W(n) = O(n/k + k)$ because it jumps $n/k$ times and scans at most $k$ elements. $B(n) = \Theta(1)$ (element at first checked position).
  • Optimal k: $k = \sqrt{n}$ minimizes $n/k + k$, giving $W(n) = \Theta(\sqrt{n})$.
  • Rank: Still linear $\Theta(n)$ for fixed $k$; rank only improves if $k$ scales with $n$.
  • 3. Context:

    Middle ground between basic Sequential Search and Binary Search.

    ๐Ÿ“ Why $k = \sqrt{n}$?

    $W(n) = \frac{n}{k} + k$. To minimize, set $\frac{d}{dk}\left(\frac{n}{k}+k\right) = -\frac{n}{k^2}+1 = 0$, giving $k^2 = n \implies k = \sqrt{n}$. Then $W(n) = \frac{n}{\sqrt{n}} + \sqrt{n} = 2\sqrt{n} = \Theta(\sqrt{n})$.

    โš ๏ธ Fixed k (e.g. k=10): $W(n) = n/10 + 10 = \Theta(n)$ โ€” still linear rank! Only when $k$ grows with n (i.e. $k = \sqrt{n}$) does the rank improve to $\Theta(\sqrt{n})$.

    2. Binary Search โญ S52

    Requires sorted sequence for $O(1)$ random access.

    1. Specification:

    Pre: Array $A[0..n-1]$ is non-decreasingly sorted: $\forall_{0 \leq i < n-1}: A[i] \leq A[i+1]$.
    Post: $(result \neq -1 \implies A[result] = key) \;\wedge\; (result = -1 \implies \forall_{0 \leq i < n}: A[i] \neq key)$.

    binSearch(Arr, len, key):
      l = 0, r = len - 1
      while(l <= r):
        m = (l + r) / 2
        if (Arr[m] == key) return m
        else if (Arr[m] > key) r = m - 1
        else l = m + 1
      return -1
    2. Stop Property:

    Sequence length halves each step ($n \to n/2 \to \dots \to 1$).

    3. Complexity:

    $W(n) = \Theta(\log n)$, $A(n) = \Theta(\log n)$, $B(n) = \Theta(1)$ (middle element is the key). $S(n) = O(1)$.

    ๐Ÿ” Loop Invariant:

    "If $key$ exists in $A$, then $key \in A[l..r]$." At each iteration, the discarded half is guaranteed not to contain $key$. Upon termination ($l > r$), the search space is empty $\implies key \notin A$.

    3. Tournament Algorithm (2nd Smallest) S57

    Uses a "Divide & Conquer" tournament tree to find min, then scans winners' competitors.

    1. Logic:

    Phase 1: Build tree ($n-1$ comparisons). Root is min.
    Phase 2: Find min among keys that lost directly to the winner ($\log n$ candidates).

    2. Complexity:

    Total Ops: $n + \lceil \log n \rceil - 2$.
    $W(n) = B(n) = \Theta(n)$ (linear, but fewer comparisons than $2n-3$).

    4. Hoare's Algorithm (QuickSelect) โญ S62

    An efficient "Divide & Conquer" approach to finding the k-th positional statistic (k-th smallest element).

    1. Mechanism:
    • Call Partition on the sequence.
    • If the returned pivot index is equal to k $\implies$ Return the element.
    • Else, continue recursively on only one side (left or right) depending on the pivot index relative to k.
    2. Complexity:

    Best: $B(n) = \Theta(n)$.
    Average: $O(n)$.
    Worst Case: $\Theta(n^2)$.

    3. Space Complexity:

    Worst Case: $O(n)$ due to recursive stack depth. Can be optimized to $\Theta(1)$ using an iterative implementation (unlike QuickSort, QuickSelect is easily tail-recursive).

    Algorithm W(n) A(n) B(n) S(n) Precondition Comparisons Swaps Notes
    Sequential Search S48 ฮ˜(n) ฮ˜(n) ฮ˜(1) O(1) None n (unsucc), n/2 (succ avg) โ€” Works on unsorted data
    Skipping Algorithm S51 ฮ˜(n) ฮ˜(n) ฮ˜(1) O(1) Sorted n/k + k (optimal k=โˆšn) โ€” Cost: n/k+k; optimal k=โˆšn โ†’ ฮ˜(โˆšn)
    Binary Search โญ S52 ฮ˜(log n) ฮ˜(log n) ฮ˜(1) O(1) Sorted โŒŠlogโ‚‚nโŒ‹ + 1 โ€” T(n) = T(n/2) + 1
    Form: lg n + 1
    QuickSelect S62 ฮ˜(nยฒ) ฮ˜(n) ฮ˜(n) $O(n)$ [?] None n (avg), nยฒ (worst) n (avg), nยฒ (worst) Finds k-th smallest element; iterative = $\Theta(1)$ space

    Partition Function โญ

    Rearranges elements around pivot: left โ‰ค pivot < right. Used in QuickSort and QuickSelect.

    Complexity: $W(n) = A(n) = \Theta(n)$ | $S(n) = O(1)$
    Data size: $n = (r - l + 1)$ (segment length)
    ๐Ÿ“Š

    Sorting Algorithms

    L4-L5

    1. Selection Sort โญ S68

    1. Verbal Description:

    Repeatedly finds the minimum element and swaps it to the front.

    selectionSort(S, len):
      for i from 0 to len-1:
        mini = indexOfMin(S, i, len)
        swap(S, i, mini)
    2. Analysis:

    $W(n) = A(n) = B(n) = \Theta(n^2)$. Exactly $n(n-1)/2$ comparisons. $S(n) = O(1)$. Unstable.

    2. Insertion Sort โญ S70

    1. Verbal Description:

    Builds a sorted partition by inserting each element into its correct place.

    insertionSort(arr, len):
      for next from 1 to len-1:
        curr = next; temp = arr[next]
        while(curr > 0 && temp < arr[curr-1]):
          arr[curr] = arr[curr-1]; curr--
        arr[curr] = temp
    2. Analysis:

    $W(n) = \Theta(n^2)$, $B(n) = \Theta(n)$ (adaptive). $S(n) = O(1)$. Stable.

    3. Merge Sort โญ S76

    mergeSort(arr, l, r):
      if(l < r):
        m = (l + r) / 2
        mergeSort(arr, l, m); mergeSort(arr, m+1, r)
        merge(arr, l, m, r)
    Analysis:

    $W(n) = \Theta(n \log n)$. $S(n) = O(n)$ (not in-place).

    Data Structure Choice: Using **linked lists** avoids the $O(n)$ space requirement for an auxiliary array, preventing "unacceptably high" space complexity.

    Stable

    4. Comparison Summary

    Metric Best Case $B(n)$ Worst Case $T(n)$
    Recurrence $2B(n/2) + n/2$ $2T(n/2) + n - 1$
    Closed Form ($n = 2^k$) $\frac{n}{2} \lg n$ $n \lg n - n + 1$
    Big-Theta $\Theta(n \lg n)$ $\Theta(n \lg n)$

    4. Partition (Lecture) โญ S93

    1. Verbal Description:

    Rearranges elements around a pivot (a[l]). Returns index p where a[l..p-1] โ‰ค a[p] and a[p+1..r] > a[p].

    partition(a, l, r):
      i = l+1; j = r; m = a[l]
      do:
        while(i < r && a[i] <= m) i++
        while(j > i && a[j] >= m) j--
        if(i < j) swap(a[i], a[j])
      while(i < j)
      swap(a[l], a[j]); return j
    Pivot Properties (after):
    • $a[j] = m$ (pivot in its final sorted position)
    • $a[l..j-1] \le m$ (left side)
    • $a[j+1..r] \ge m$ (right side)
    Loop Invariant:

    $a[l+1..i-1] \leq m$ and $a[j+1..r] \geq m$. The pointers $i$ and $j$ move inward; elements outside are already correctly partitioned.

    Complexity:

    $\Theta(n)$ where $n = r - l + 1$. Each element is visited at most once by either $i$ or $j$.

    Step-by-step trace: partition([5, 3, 8, 1, 7], 0, 4)

    m=5, i=1, j=4: [5, 3, 8, 1, 7]
    Inner: iโ†’3 (a[2]=8>5), jโ†’3 (a[3]=1<5)
    swap(a[3],a[2]): [5, 3, 1, 8, 7]
    Inner: iโ†’3, jโ†’2 โ†’ iโ‰ฅj, loop exits
    swap(a[0],a[2]): [1, 3, 5, 8, 7] โ†’ return 2
    Result: pivot 5 at index 2, left [1,3]โ‰ค5, right [8,7]โ‰ฅ5 โœ…

    5. QuickSort โญ S93

    quickSort(a, l, r):
      if(l < r):
        p = partition(a, l, r)
        quickSort(a, l, p-1); quickSort(a, p+1, r)
    Analysis:

    $W(n) = \Theta(n^2)$, $A(n) = \Theta(n \log n)$. $S(n) = O(\log n)$. Unstable.

    Case Scenarios:

    • Best Case: Balanced splits (pivot always in middle). Depth: $\Theta(\log n)$ S99.
    • Average Case: Random permutations. Depth: $\Theta(\log n)$ S100.
    • Worst Case: Already sorted or invertedly sorted data (highly unbalanced splits). Depth: $\Theta(n)$ S101.

    Recursion Complexity: At each level of the recursion tree, the total work is $\Theta(n)$ because the partition calls are performed on disjoint subsequences whose total length is $n$ (minus previous pivots) S98.

    In-Place Efficiency: QuickSort is inherently efficient because the Partition procedure works in-place (constant auxiliary space for comparison/swapping), making it practical for large datasets.

    6. Binary Heap Foundation โญ S150

    siftDown(i):
      l = 2i+1; r = 2i+2; min = i
      if(l <= n && heap[l] < heap[min]) min = l
      if(r <= n && heap[r] < heap[min]) min = r
      if(min != i): swap(i, min); siftDown(min)
    
    buildHeap: for(i = n/2-1; i >= 0; i--) siftDown(i)

    7. HeapSort โญ S158

    heapSort(arr, n):
      buildHeap(arr)
      for(i = n-1; i > 0; i--):
        swap(arr[0], arr[i]); siftDown(0)
    Analysis:

    $W(n) = A(n) = \Theta(n \log n)$. $S(n) = O(1)$ (in-place). Unstable.

    8. CountSort โญ S106

    countSort(a, n, k):
      // k = magnitude of range [0, k]
      C = new Array(k+1) // Auxiliary space: O(k)
      for(i=0; i<n; i++) C[a[i]]++
      for(i=1; i<=k; i++) C[i] += C[i-1] // cumulative counts
      for(i=n-1; i>=0; i--) B[--C[a[i]]] = a[i] // place stably
    Analysis:
    • Best Case: $B(n) = \Omega(n + k)$
    • Average Case: $A(n) = \Theta(n + k)$
    • Worst Case: $W(n) = O(n + k)$
    • Space: $S(n) = \Theta(n + k)$

    $k$ = size of the count array, determined by the largest value in the input. Stable.

    ๐Ÿ’ก Memory Breakdown:
    • Count array C: size $k+1$ โ†’ $O(k)$
    • Output array B: size $n$ โ†’ $O(n)$
    • Total auxiliary: $O(n + k)$

    โš ๏ธ When is CountSort inefficient? If $k \gg n$ (e.g., $k = n^2$), both time and space become $O(n^2)$. CountSort is efficient only when $k = O(n)$.

    9. Radix Sort โญ S109

    radixSort(a, d):
      for i from 1 to d:
        stablySortByDigit(a, i)
    Analysis:

    $W(n) = \Theta(d(n + k))$. Requires a stable underlying sort. Stable.

    Algorithm W(n) A(n) B(n) Space $S(n)$ Place? Stable? Comparisons Swaps Behavior / Height
    Selection Sort โญ [S67] ฮ˜(nยฒ) ฮ˜(nยฒ) ฮ˜(nยฒ) ฮ˜(1) In-place No n(n-1)/2 n-1 Build sorted portion from left; always nยฒ comparisons
    Insertion Sort โญ [S72] ฮ˜(nยฒ) ฮ˜(nยฒ) ฮ˜(n) ฮ˜(1) In-place Yes ~nยฒ/4 (avg), n-1 (best) ~nยฒ/4 (avg), 0 (best) Adaptive: best on sorted; worst on reverse
    Merge Sort โญ [S78] ฮ˜(n log n) ฮ˜(n log n) ฮ˜(n log n) ฮ˜(n) [?] Out-of-place Yes n log n; merge: m+n-1 n-1 Recursion depth: log n; divide & conquer
    QuickSort [S92] ฮ˜(nยฒ) ฮ˜(n log n) ฮ˜(n log n) $O(n)$ [S102] In-place No n log n (avg), nยฒ (worst) n log n (avg), nยฒ (worst) Implicit memory cost; rewrite long recursion as iterative for $\Theta(\log n)$ S102
    HeapSort [S163] ฮ˜(n log n) ฮ˜(n log n) ฮ˜(n log n) ฮ˜(1) In-place No 2n log n 2n log n Heap height: log n; extract-max repeatedly
    CountSort โญ [S106] ฮ˜(n + k) ฮ˜(n + k) ฮ˜(n + k) ฮ˜(n+k) Out-of-place Yes โ€” โ€” Linear time; k = range; non-negative integers only [?]
    RadixSort โญ [S108] ฮ˜(dยทn) ฮ˜(dยทn) ฮ˜(dยทn) ฮ˜(n) Out-of-place Yes โ€” โ€” d = digits; ideal for lexicographic sort [?]

    ๐Ÿงฎ Non-Comparison Sorting Details

    CountSort Mechanics S106-108

    Uses direct addressing to place elements. Requires non-negative integers fitting in RAM.

    • Memory: Uses 2 helper arrays (counts and result).
    • Pre-decrementation: Uses --counts[a[i]] to place elements, avoiding the need to shift the entire array.
    • Complexity: $A(n,m) = W(n,m) = 2n+m = \Theta(n,m)$. Space $S(n,m) = n+m = \Theta(n,m)$.

    RadixSort Principles S109

    A sorting scheme rather than a single algorithm. Ideal for lexicographic sort of fixed-length objects (strings, multi-digit IDs).

    • Direction: Processes positions from last to first (Least Significant Digit).
    • Requirement: Must use an internal stable sorting algorithm to preserve relative order of previous passes.

    ๐Ÿ’พ Sorting Space Complexity Analysis

    QuickSort: Why $O(n)$? S102

    QuickSort is In-place, but has an implicit memory cost for recursion:

    • Worst Case: $O(n)$ due to linear recursion depth on skewed splits.
    • Optimization: Rewriting the recursive call for the longer subsequence as an iterative loop (tail recursion optimization) reduces pessimistic space to $\Theta(\log n)$.

    Merge Sort: $O(n)$ vs $O(1)$ S80

    Merge Sort is Out-of-place on arrays due to the auxiliary array needed for the merge step.

    • Array: $\Theta(n)$ space is required for the temporary copy.
    • Linked List: $\Theta(1)$ extra space is possible (ignoring recursion) S87.

    โš–๏ธ What is Stability? S91

    A sorting algorithm is stable if it preserves the relative order of elements with equal keys.

    Example: [A1, B, A2] sorted by letter โ†’ Stable: [A1, A2, B] | Unstable: [A2, A1, B]

    Iterative Sorting: If an algorithm is stable, it is possible to sort records by multiple attributes in iterations (from least significant to most significant attribute) without destroying the sorting order of the previous iteration.

    Stable Algorithms (Yes)

    • Insertion Sort: Only shifts elements strictly smaller. Equal elements stay in original relative order.
    • Merge Sort: Stable if the merge function prefers the left element in case of a tie (L[i] <= R[j]).
    • CountSort: Processes the input array backwards (right-to-left) to place elements in the output correctly.
    • RadixSort: Depends on the stability of the underlying sort (usually CountSort) to work correctly.

    Unstable Algorithms (No)

    • Selection Sort: The long-distance swap of the minimum element can jump it over an equal element elsewhere.
    • QuickSort: The Partition function swaps elements across the pivot, often mixing up the relative order of identical values.
    • HeapSort: Building the heap and repeatedly extracting the max/min involves many non-local swaps that destroy stability.

    โš ๏ธ Comparison Sort Lower Bound S103

    Any comparison-based sort requires $\Omega(n \log n)$ comparisons in the worst case.

    Decision Tree Model:
    • Algorithm follows a binary tree (comparisons = nodes).
    • Why $n!$? Each of the $n!$ possible input permutations must lead to a different leaf (output), otherwise the algorithm couldn't distinguish two inputs.
    • Number of reachable leaves $L$ must be $\ge n!$ (to cover all input permutations).
    • Tree height $h$ (max comparisons): $2^h \ge L \ge n! \implies h \ge \log_2(n!)$.
    • Using Stirling's Approximation: $\log_2(n!) \approx n \log_2 n - 1.44n \implies \Omega(n \lg n)$.
    Important Details:
    • Average case too: The $\Omega(n \lg n)$ bound also applies to the average number of comparisons (averaged over all $n!$ permutations), not just worst case.
    • Universal bound: This applies to any possible comparison-based sorting algorithm โ€” even ones not yet invented.
    • Bypass: Non-comparison sorts (CountSort, RadixSort) avoid this bound by using element values as indices rather than comparing pairs.

    Merge Function โญ

    Merges two sorted segments. Max comparisons = n + m - 1. Time: O(n+m).

    ๐Ÿ“ฆ

    Concrete Data Structures

    L7

    โš–๏ธ Sequence Access Patterns

    Sequences support two basic types of access, with differing efficiency on array implementations:

    • Absolute Access: Location identified by index. Fast on arrays: $O(1)$ constant time.
    • Relative Access: Location identified as successor/predecessor of a given element. Slow on arrays: $\Theta(n)$ linear time (requiring shifts/scans).

    ๐Ÿ“ˆ Dynamic Array S137

    A resizing array that grows (doubles) when full. Provides $O(1)$ amortized insertion at the end. Focus: Indexing.

    ๐Ÿ“ Dynamic Ordered Set (OPT) S183

    Extension of Dictionary for linearly ordered keys. Supports order-based operations (min, max, succ, pred). Focus: Order.

    ๐Ÿ“Š Amortized Analysis

    Averaging cost over a sequence of operations. Focuses on the total cost of $n$ operations.

    Comparison: Dynamic Array vs. Dynamic Ordered Set

    Dynamic Array = "Unbounded Array". Optimized for $O(1)$ access by index and end-insertion. Not necessarily sorted. S137
    Dynamic Ordered Set = "Sorted Dictionary". Optimized for search and order-based proximity (predecessor/successor). Usually implemented via AVL/RB-Trees ($O(\log n)$). S183

    ๐Ÿ“Š Comparison of Sequence Representations S143

    Operation Array Dynamic Array Singly Linked List Doubly Linked List
    Access by index O(1) O(1) O(n) O(n)
    Insert at front O(n) O(n) O(1) O(1)
    Insert at end O(1)* O(1)** O(1)*** O(1)
    Delete at front O(n) O(n) O(1) O(1)
    Delete at end O(1) O(1) O(n) O(1)
    Search O(n) O(n) O(n) O(n)

    * Fixed size. ** Amortized cost. *** With tail pointer.

    ๐Ÿ Operations on Sequence Ends S114

    In many practical applications, operations concern only the ends of the sequence (e.g., removeFirst, addLast). On **Arrays**, inserts at arbitrary positions require shifting elements โ†’ $\Theta(n)$. Exception: insert at end is $O(1)$ (fixed array, if space available) or $O(1)$ amortized (dynamic array). Linked lists or Cyclic Arrays are often more efficient for front-end operations.

    ๐Ÿ”— Singly vs. Doubly Linked S113

    • Singly: next pointer only. Minimal memory overhead.
    • Doubly: next and prev pointers. Costs twice the memory for links, but allows bi-directional navigation (crucial for InsertionSort).

    ๐Ÿ› ๏ธ Splice & Extensions

    • Splice(a,b,t): Moves sublist $(a \dots b)$ after node $t$. **$O(1)$ constant time**, even for large sublists!
    • Size Attribute: Efficiently cached in $O(1)$ by updating on insert/delete.
    • โš ๏ธ Splice Catch: Maintaining $O(1)$ size is impossible during an inter-list splice if the sublist length is unknown.
    • Attributes: last pointer (enables $O(1)$ pushBack).

    ๐Ÿ—๏ธ Abstract Data Structures (ADS)

    An **ADS** is defined by its interface (supported operations), not by its implementation. Implementation choice (e.g., Array vs. List) determines the time/space complexity of these operations. ADS is opposed to concrete data structures.

    pancake Stack (ADS) โญ S125

    A LIFO (Last-In, First-Out) structure defined by its core interface:

    • push(T): Add element at top.
    • pop(): Remove and return top element (modifier).
    • top(): Return top element without removing (non-modifier).

    Implementation: Possible with SList or Array (keep a top index; $O(1)$ push/pop at the end).

    Applications: Undo operations, function calls (call stack), browser back button, expression parsing.

    ๐ŸŽŸ๏ธ Queue (ADS) โญ S126

    A FIFO (First-In, First-Out) structure.

    • inject(T): Add element at tail.
    • out(): Remove and return head element (modifier).
    • front(): Return head element without removing.

    Implementation: Best with SList (with tail pointer) or Cyclic Array (CArray).

    Why not standard Array? Removing from the front of a standard array requires $O(n)$ shifting of all subsequent elements.

    ๐Ÿ—ƒ๏ธ Deque (ADS) โญ S127

    Double Ended Queue (pronounced "deck"). A generalization of both stack and queue.

    • first(), last(): View ends.
    • pushFront/Back(T): Add to ends.
    • popFront/Back(): Remove from ends.

    Implementation: Best with DList or CArray.

    Constraints:
    • Why not SList? popBack() is $O(n)$ because we must traverse the list to find the predecessor of the last node.
    • Why not Array? $O(n)$ shifting required for any front-end operation.

    โš–๏ธ Linked Lists vs. Arrays

    Linked Lists (+)
    • Fast relative ops ($O(1)$ insert-after)
    • Unbounded size (dynamic)
    Linked Lists (-)
    • Memory overhead for pointers
    • Slow absolute access ($\Theta(n)$)

    Note: Arrays have **bounded** size and fast $O(1)$ absolute access but slow insertions.

    ๐Ÿ”„ Cyclic Structures

    • Cyclic List: Last node points to first.
      Doubly invariant: (n.next.prev) == (n.prev.next) == n
    • Cyclic Array (CArray): Uses modulo n arithmetic to wrap indexing. If MAX is capacity, indices move as: idx = (idx + 1) % MAX.
      Allows $O(1)$ access to both ends without shifting.
    ๐Ÿ—ƒ๏ธ

    Abstract Data Types

    L7
    ADT Access Pattern Operations Efficient Implementation
    Stack LIFO (Last In First Out) push(), pop(), top() Array or Linked List - O(1)
    Queue FIFO (First In First Out) inject(), out(), first() Circular array or Linked List - O(1)
    Deque Both ends pushFront/Back(), popFront/Back() Doubly linked list - O(1)
    Dynamic Ordered Set S183 Order-based Dictionary min(), max(), succ(), pred() Balanced BST (AVL/RB) - O(log n)

    Amortized Analysis (Unbounded Array)

    Doubling strategy: When full, create 2ร— array & copy. Single insert can be O(n), but amortized cost = O(1) per insert.

    ๐ŸŒณ

    Priority Queue & Binary Heap S147

    L8 โญ COMPULSORY

    Heap Property (Min-Heap)

    For every node: priority(parent) โ‰ค priority(child). Root has minimum. Height = O(log n).

    Operation Complexity Description
    Insert (sift-up) O(log n) Add at end, bubble up
    DeleteMin (sift-down) O(log n) Remove root, replace with last, sift down
    FindMin O(1) Return root
    BuildHeap O(n) Heapify from n/2 down to 0
    HeapSort O(n log n) BuildHeap + n ร— deleteMin

    Array Representation

    For node at index i: Parent = โŒŠ(i-1)/2โŒ‹, Left child = 2i+1, Right child = 2i+2

    ๐Ÿ“–

    Dictionary & Tree Structures S168

    L9 โญ COMPULSORY

    Dictionary ADT S168

    Abstract data structure mapping keys (type K) to values (type V). Operations:

    • search(K key) โ€” returns the value associated with the key (or special value if absent)
    • insert(K key, V value) โ€” adds/updates a key-value pair
    • delete(K key) โ€” removes the element with the given key

    Simple (Naive) Implementations S171

    Data size: $n$ = number of elements. Dominating operation: key comparison. Space: $\Theta(n)$ for all simple implementations.

    Variant Search Insert Delete Notes
    Unordered (Array/List) O(n) O(1) O(n) Append to end/head
    Ordered Array O(log n) O(n) O(n) Binary search; shift on insert/delete
    Ordered Linked List O(n) O(n) O(n) Keeping sorted does NOT help!

    โš  Ordered linked list gains nothing: no random access โ†’ no binary search possible.

    Direct Addressing S172

    Keys come from universe $U \subseteq \mathbb{N}$. Store element with key $k$ at index $k$ in a $|U|$-element array.

    Operations
    • Search: O(1)
    • Insert: O(1)
    • Delete: O(1)
    Trade-off

    Space: $O(|U|)$

    Extremely fast but wastes memory when $|U| \gg n$. Only practical when key universe is small.

    Hash Tables S173

    Elements stored in an $m$-element array $[0, \ldots, m-1]$, where $m \ll |U|$. Position computed by hash function $h : U \to [0..m-1]$.

    Hash Function Properties S175

    An ideal hash function $h \to [0, \ldots, m-1]$ should have:

    • Uniform load: each index $0 \le i < m$ equally likely for a random key
    • Fast computation: $O(1)$ time
    • Avalanche: different values even for very similar keys

    Example: $h(k) = k \bmod m$ (usually $m$ is a prime number)

    Non-integer keys must first be transformed to integers (e.g., strings โ†’ sum of char codes). This pre-hash should also satisfy hash function properties.

    Collisions S176

    Collision: when $h(k) = h(j)$ for two distinct keys $k \ne j$. Two resolution strategies:

    1. Chaining Method

    Key $k$ is added to a linked list $l(h(k))$ stored at position $h(k)$.

    2. Open Hashing (Open Addressing)

    Other indices are probed in a repeatable sequence until a free slot is found.

    Chain Method โ€” Operations S177

    $n$ = elements stored, computing $h(k)$ takes $O(1)$:

    Operation Cost Why
    insert(k) O(1) Compute $h(k)$, prepend to list at $h(k)$
    search(k) $O(1 + |l(h(k))|)$ Compute $h(k)$, scan list $l(h(k))$
    delete(k) $O(1 + |l(h(k))|)$ Compute $h(k)$, scan list to find & remove

    Worst case: all $n$ keys map to same slot โ†’ $|l(h(k))| = n$ โ†’ $\Theta(n)$ comparisons (no better than naive!).

    Average Case Analysis โ€” Chain Method S178

    Under uniform load assumption, define load factor $\alpha = n/m$.

    Proof sketch: Let $X$ = length of list $l(h(k))$. For each element $e \in S$, define indicator $X_e = [h(k) = h(e)]$, so $X = \sum_{e \in S} X_e$.

    $$E[X] = E\left[\sum_{e \in S} X_e\right] = \sum_{e \in S} P(X_e = 1) = |S| \cdot \frac{1}{m} = \frac{n}{m} = \alpha$$

    Result: Average cost of all operations is $O(1 + \alpha)$. If $m = O(n)$, then $\alpha = O(1)$ and all operations take O(1) average time.

    Universal Hashing S179

    A family $H$ of hash functions into range $\{0, \ldots, m-1\}$ is called $c$-universal (for $c > 0$) if for a randomly chosen $h \in H$, any two distinct keys $i, j$ collide with probability:

    $$P(h(i) = h(j)) \le \frac{c}{m}$$
    • Family is universal if $c = 1$
    • Purpose: avoid "malicious" data patterns โ€” hash function is randomly picked from the family at initialization
    • Guarantee: With $c$-universal family in chain method, average time of operations is $O(1 + cn/m)$

    Open Hashing (Open Addressing) S180

    Exactly one element per slot. On collision, probe a sequence $\pi(k) = (h(k,0), h(k,1), \ldots, h(k,m-1))$ until a free slot is found. Requires $n < m$ (i.e., $\alpha < 1$).

    Probing Strategy Formula Problem
    Linear $h(k,i) = (h(k) + i) \bmod m$ Primary clustering
    Quadratic $h(k,i) = (h(k) + c_1 i + c_2 i^2) \bmod m$ Secondary clustering
    Double hashing $h(k,i) = (h_1(k) + i \cdot h_2(k)) \bmod m$ Best โ€” "more random" permutations

    For double hashing: e.g., $h_1(k) = k \bmod m$, $h_2(k) = 1 + (k \bmod m')$, $m' = m-1$. The $h_1, h_2$ should differ.

    Note: delete in open hashing requires special handling โ€” must restore the hash table (e.g., lazy deletion with tombstone markers).

    Average Case Analysis โ€” Open Hashing S181

    Under assumption that all probe orders are equally probable, with load factor $\alpha = n/m < 1$:

    Unsuccessful Search (key absent)

    $$\text{Avg comparisons} = \frac{1}{1 - \alpha}$$

    Successful Search (key present)

    $$\text{Avg comparisons} = \frac{1}{\alpha} \ln \frac{1}{1-\alpha} + \frac{1}{\alpha}$$

    Worst case: $O(n)$ โ€” linear. When $n$ approaches $m$, open hashing degrades to naive linear scan. Must maintain $n < m$.

    (*) Perfect Hashing S182

    Previous methods guarantee expected constant time. Perfect hashing guarantees worst-case $O(1)$ search time for a static set.

    • Can be constructed for a given set of $n$ elements in expected $O(n)$ time
    • Based on 2-universal hash function families (Fredman, Komlรณs, Szemerรฉdi 1984)
    • Useful when keys are known in advance and don't change

    Hash Table โ€” Space Complexity

    Method Space Constraint
    Chaining $\Theta(m + n)$ $m$ slots + $n$ list nodes
    Open Addressing $\Theta(m)$ Must have $m > n$ ($\alpha < 1$)

    Dynamic Ordered Set S183

    Extension of Dictionary where keys have a linear order. Adds order-based operations:

    Dictionary Ops
    • search(K key)
    • insert(K key, V value)
    • delete(K key)
    Order Ops
    • minimum(), maximum()
    • predecessor(K key)
    • successor(K key)

    โš  Hash tables support dictionary ops efficiently but do NOT efficiently support order operations.

    Binary Search Tree (BST) S184

    BST property: For each node, all keys in left subtree $\le$ node's key $\le$ all keys in right subtree.

    • Min key: leftmost node. Max key: rightmost node.
    • All operations run in $O(h)$, where $h$ = height
    • Worst case: $h = O(n)$ (degenerate/skewed tree)

    BST: Search S185

    searchRecursive(node, key):     // called with node == root
      if(node == null or node.key == key) return node
      if(key < node.key) return search(node.left, key)
      else return search(node.right, key)
    
    searchIterative(node, key):     // called with node == root
      while(node != null and node.key != key):
        if(key < node.key) node = node.left
        else node = node.right
      return node

    BST: Min, Max, Successor, Predecessor S186

    minimum(node):                  // called with node == root
      while(node.left != null) node = node.left
      return node
    
    maximum(node):
      while(node.right != null) node = node.right
      return node
    
    successor(node):
      if(node.right != null) return minimum(node.right)
      p = node.parent
      while(p != null and node == p.right):
        node = p; p = p.parent
      return p
    
    predecessor(node):              // mirror of successor
      if(node.left != null) return maximum(node.left)
      p = node.parent
      while(p != null and node == p.left):
        node = p; p = p.parent
      return p

    All run in $O(h)$.

    BST: Insert S187

    insert(node, key):              // recursive version
      if(key < node.key) then
        if(node.left == null):
          n = create new node with key
          node.left = n
        else: insert(node.left, key)
      else:                         // key >= node.key
        if(node.right == null):
          n = create new node with key
          node.right = n
        else: insert(node.right, key)
    
    insert(T, z):                   // iterative version
      y = null; x = root
      while(x != null):
        y = x
        if(z.key < x.key) x = x.left
        else x = x.right
      z.parent = y
      if(y == null) root = z
      else if(z.key < y.key) y.left = z
      else y.right = z

    BST: Delete & Delete1 S188-189

    delete(node, key):              // find node by key, then remove
      if(key < node.key) then
        delete(node.left, key)
      else if(key > node.key) then
        delete(node.right, key)
      else begin                    // key == node.key โ†’ found!
        if node is a leaf then
          deletesimple(node)        // just remove it
        else
          if(node.left != null) then
            find x = rightmost node in node.left  // predecessor
            node.key = x.key
            delete1(x)              // x has at most 1 child
          else
            find x = leftmost node in node.right  // successor
            node.key = x.key
            delete1(x)
    
    delete1(node):                  // helper: node has at most 1 child
      subtree = null
      parent = node.parent
      if(node.left != null)
        subtree = node.left
      else
        subtree = node.right
      if(parent == null)
        root = subtree
      else if(parent.left == node)  // node is a left son
        parent.left = subtree
      else                          // node is a right son
        parent.right = subtree

    Delete handles 3 cases: leaf (remove), one child (delete1 bypasses), two children (replace key with predecessor/successor, then delete1).

    BST: Average Case Analysis S190-191

    Under the random permutation model (every insertion order equally likely):

    Average Height: Expected height of BST is $O(\log n)$ โ€” can be proved by analogy to QuickSort (proof omitted in lecture).

    Average Depth: For a sequence of keys $\langle k_i \rangle$ inserted into a BST, define for key $k_j$:

    • $G_j = \{k_i : 1 \le i < j $ and $k_l> k_i > k_j$ for all $l < i$ such that $k_l> k_j\}$ โ€” direct predecessors
    • $L_j = \{k_i : 1 \le i < j $ and $k_l < k_i < k_j$ for all $l < i$ such that $k_l < k_j\}$ โ€” direct successors

    Path from root to $k_j$ consists exactly of $G_j \cup L_j$, so $d(k_j) = |G_j| + |L_j|$. The $i$-th element is a current minimum with probability $1/i$, so $E[|G_j|] = H_n = \sum_{i=1}^{n} 1/i = O(\log n)$. Analogously for $L_j$. Thus: $d(k_j) = O(\log n)$.

    Note: under the alternative model where every $n$-element BST shape is equally likely, average height is $\Theta(\sqrt{n})$ โ€” this model is less natural.

    BST: Complexities of Operations S192

    Data size: $n$ = number of elements. Dominating operation: comparison of keys.

    Operation Average Worst Case
    search $\Theta(\log n)$ $O(n)$
    insert $\Theta(\log n)$ $O(n)$
    delete $\Theta(\log n)$ $O(n)$
    minimum / maximum $\Theta(\log n)$ $O(n)$
    successor / predecessor $\Theta(\log n)$ $O(n)$

    AVL Tree (Adelson-Velskij, Landis) S193

    Simplest tree data structure guaranteeing $O(\log n)$ worst-case height.

    Definition: A BST where for every node, the difference in height of left and right subtrees is at most 1:

    $$BF(v) = h(v.\text{left}) - h(v.\text{right}) \in \{-1, 0, 1\}$$

    AVL: Maximum Height Proof S194

    Let $T_h$ = minimum number of nodes in an AVL tree of height $h$:

    • $T_0 = 1$, $T_1 = 2$
    • $T_h = 1 + T_{h-1} + T_{h-2}$ (root + minimal left subtree + minimal right subtree)

    Thus $T_h \ge F_h$ (Fibonacci numbers). Since Fibonacci grows exponentially in $h$, the minimum node count grows exponentially โ†’ $h = O(\log n)$.

    Conclusion: Height of an $n$-element AVL tree is at most $O(\log n)$.

    AVL: Operations & Rotations S195-196

    Operations are the same as BST, but after each modifying operation:

    • Balance factors ($bf$) are recomputed bottom-up
    • If any $|bf| = 2$, a rotation is performed on that node

    Rotation types:

    Type When Cost
    Single rotation (LL or RR) Same-direction imbalance (bf of child matches parent) O(1)
    Double rotation (LR or RL) Opposite-direction imbalance O(1)

    Rotations preserve BST order and restore subtree height. After rotation, $|bf|$ cannot exceed 2 on a valid AVL.

    AVL: Worst-Case Analysis S197

    • Each rotation: $O(1)$
    • Operations bounded by tree height: $O(\log n)$
    • Height is at most logarithmic โ†’ all dictionary operations: $O(\log n)$ worst case

    Note: a single delete may trigger $O(\log n)$ rotations (e.g., on Fibonacci trees). Insert triggers at most 1 rotation.

    Self-Organising BST (Splay Trees) S198

    Guarantee amortised $O(\log n)$ for all ordered dictionary operations. Any sequence of $m$ operations has total cost $O(m \log n)$.

    Core idea: Every operation uses a helper splay(k) that moves key $k$ (or its predecessor/successor) to the root via zig-zig and zig-zag rotations.

    splay(k)

    Bring $k$ (or its closest key) to root via rotations.

    insert(k)

    splay(k) โ†’ brings successor/predecessor $k'$ to root โ†’ make $k'$ a child of new node $k$.

    delete(k)

    splay(k) โ†’ $k$ becomes root โ†’ remove $k$ (two subtrees) โ†’ splay(k) on left subtree โ†’ brings predecessor to root โ†’ attach right subtree.

    Linked List Operations: Print & Splice S120

    print(head):
      curr = head
      while(curr != null):
        print(curr.data); curr = curr.next
    
    splice(p, b, e):  // Move [b..e] after p โ€” O(1)
      b.prev.next = e.next; e.next.prev = b.prev
      e.next = p.next; b.prev = p
      p.next.prev = e; p.next = b

    ๐Ÿ“Š Complete Dictionary Implementation Comparison

    Structure Search Insert Delete Space Order Ops Notes
    Unordered (Arr/List) O(n) O(1) O(n) $\Theta(n)$ O(n) Simplest
    Ordered Array O(log n) O(n) O(n) $\Theta(n)$ O(log n) Binary search
    Ordered List O(n) O(n) O(n) $\Theta(n)$ O(n) Sorting doesn't help
    Direct Addressing O(1) O(1) O(1) $O(|U|)$ โ€” Wastes memory
    Hash Table (chaining) O(1) avg O(1) avg O(1) avg $\Theta(m+n)$ โ€” $O(n)$ worst
    Hash Table (open addr.) O(1) avg O(1) avg O(1) avg* $\Theta(m)$ โ€” Requires $\alpha < 1$
    Perfect Hashing O(1) W โ€” โ€” $O(n)$ โ€” Static set only
    BST โญ O(h) O(h) O(h) $\Theta(n)$ O(h) $h$: O(n) W, O(log n) avg
    AVL Tree O(log n) O(log n) O(log n) $\Theta(n)$ O(log n) Worst-case guaranteed
    Splay Tree O(log n) am. O(log n) am. O(log n) am. $\Theta(n)$ O(log n) am. Self-adjusting
    Red-Black Tree O(log n) O(log n) O(log n) $\Theta(n)$ O(log n) Fewer rotations than AVL

    W = worst case, avg = average case, am. = amortised, * = delete in open addressing needs special handling

    Binary Tree Properties

    Max nodes at height h: $2^{h+1} - 1$. Height of n nodes: $\lfloor \log_2 n \rfloor$ (complete tree).

    โœ…

    Correctness & Complexity Analysis Examples

    L1-L2, L7

    Example 1: Sum of Array โญ

    1. Specification:

    Initial condition: Arr is an integer array, len is a natural number.
    Final condition: Result is the sum of all elements: $\sum_{j=0}^{len-1} Arr[j]$.

    sum(Arr, len):
      sum = 0
      i = 0
      while(i < len):
        sum += Arr[i]
        i++
      return sum
    2. Stop Property:

    $i$ starts at 0, increases by 1 each loop, stops when $i \ge len$. Since $len$ is finite, the loop terminates.

    3. Loop Invariant:

    $sum = \sum_{j=0}^{i-1} Arr[j]$. At termination ($i=len$), $sum = \sum_{j=0}^{len-1} Arr[j]$.

    4. Complexity:

    $W(n) = \Theta(n)$ time, $S(n) = \Theta(1)$ space.

    Example 2: Maximum in Array

    1. Specification:

    Initial condition: Arr of length $len > 0$.
    Final condition: Result $x = \max \{Arr[j] \mid 0 \le j < len\}$.

    findMax(Arr, len):
      i = 1
      x = Arr[0]
      while(i < len):
        if (Arr[i] > x) x = Arr[i]
        i++
      return x
    2. Stop Property:

    Finite loop from $i=1$ to $len$. Each step $i$ increases by 1.

    3. Loop Invariant:

    $x = \max \{Arr[j] \mid 0 \le j < i\}$. At termination, $i=len \implies x=\text{global max}$.

    Example 3: Fast Exponentiation (Power)

    Efficiently computes $x^n$ using Divide & Conquer by squaring $x$.

    power(x, n):
      if (n == 0) return 1
      if (n % 2 == 0) 
        return power(x*x, n/2)
      else 
        return x * power(x*x, (n-1)/2)
    1. Specification:

    Pre: $x \in \mathbb{R}$, $n \in \mathbb{N}_0$.
    Post: Result $= x^n$.

    2. Complexity:

    $W(n) = \Theta(\log n)$ multiplications.
    $S(n) = \Theta(\log n)$ recursion stack.

    Amortized Case: Stack with multiPop S135

    1. Operation Description:

    multiPop(k) calls pop() $k$ times or until the stack is empty. While a single call is $O(k)$, the average cost over a sequence is small.

    2. Aggregate Method:

    Total $pop$s $\le$ total $push$es. In $m$ operations, total cost is $O(m) \implies O(1)$ amortized.

    3. Accounting Method:

    Pay 2 for push (1 real + 1 credit). pop (and multiPop) uses stored credits.

    4. Potential Method:

    $\Phi(S) = size(S)$.
    $a_{push} = 1 + (s+1) - s = 2$.
    $a_{multipop} = k + (s-k) - s = 0$.

    Amortized Analysis: Formal Methods S135-142

    1. Aggregate Method:

    Show that for any sequence of $n$ operations, the total time $T(n)$ is bounded. Amortized cost $a = T(n)/n$ (absolute worst-case average).

    2. Accounting Method:

    Assign an amortized cost $\hat{c}_i$ to each operation. If $\hat{c}_i > c_i$ (real cost), store the surplus as credit for that element. Requirement: $\sum_{i=1}^n \hat{c}_i \ge \sum_{i=1}^n c_i$ (the total credit must always be non-negative).

    3. Potential Method:

    Assign a potential function $\Phi_i$ to state $i$, with $\Phi_0 = 0$ and $\Phi_i \ge 0$.

    Amortised cost: $a_i = t_i + \Phi_i - \Phi_{i-1}$ (where $t_i$ is real cost).
    Total: $\sum a_i = \sum t_i + (\Phi_m - \Phi_0)$.
    Since $\Phi_m - \Phi_0 \ge 0$, then $\sum t_i \le \sum a_i$.