COMP 2711H: Lecture 16

Date: 2024-10-07 18:00:56

Reviewed:

Topic / Chapter:

summary

❓Questions

Notes

Extra Problems

Problems
- following problems: easy on trees, hard on general graph
- explanation: on 3721
  - 👨‍🏫 for now, trust me :p
Problem 1: Longest path
- input: graph $G = (V, E)$
- output: longest path in $G$
- lemma: given $G$ a tree, then longest path in $G$
  - = $(u, v)$ -path where $u, v$ are leaves
  - if it's not a leaf, it must have an extra vertex that is a leaf
    - as a tree can't have a tree
    - then, we can extend the path to that leaf
- algorithm:
  - pick any non-leaf vertex $r$ as root
  - turn tree into a rooted tree
  - we can: do case work on highest point
    - 👨‍🎓 or: lowest common ancestor
- assign depth to all nodes
  - depth: maximum nodes one can go down until it reaches the leaf $depth = {0 1 + max depth [c_{i}] v is a leaf$
  - = dynamic programming in 3711
  - but it's just another way of computing for now
  - $lp [v]$ : length of longest path which highest vertex is $v$
  - longest path w/ $v$ as the highest point: either
    - max depth[v]
    - 2+max depth[v]
    - 2+2nd-max depth[v]
- 👨‍🏫 "try" to solve it on general graph
Problem 2: Maximum independent set
- set $I \subseteq V$
  - s.t. $\forall e \in E, e \neq \subseteq I$
  - i.e. subgraph without any edge connecting them
- input: graph $G = (E, V)$
- output: largest possible size of independent set
- if input is a tree, algorithm:
  - pick any arbitrary $r \in V$ as root
  - $T_{v}$ : subtree of $T$ rooted at $v$
    - 👨‍🏫 but, doesn't help solving it directly
  - $i s_{0} [v]$ : size of the largest independent set $I \subseteq T_{v}$
    - s.t. $v \neq \in I$
  - $i s_{1} [v]$ : size of the largest independent set $I \subseteq T_{v}$
    - s.t. $v \in I$
  - output: $max (i s_{0} [r], i s_{1} [r])$
    - but, how can we compute each cases?
- for a leaf $l$ :
  - $i s_{0} [l] = 0$
  - $i s_{1} [l] = 1$
- for a non-leaf:
  - $i s_{0} [v] = \sum max (i s_{0} [c_{i}], i s_{1} [c_{i}])$
  - $i s_{1} [v] = 1 + \sum i s_{0} [c_{i}]$
- strategy: compute for all leaf
  - if all of a node's children are computed
    - then compute the node
Problem 3: (Huffman) Coding
- exists: a file consists of words $a_{1}, a_{2}, \dots, a_{n}$
- a coding is: $c : a_{i} \mapsto c_{i}$
- file: "abcaaabcaaa"
  - a: being repeated multiple times
  - let's say, we map:
    - $a \mapsto 0$
    - $b \mapsto 1$
    - $c \mapsto 01$
- but at current stage: it cannot be recovered
  - e.g. 0101 can be
    - abab
    - abc
    - cab
    - cc
- we want: prefix-free code
  - no $c_{i}$ : a prefix of a $c_{j}$ ( $j \neq = i$ )
  - e.g.
    - $a \mapsto 10$
    - $b \mapsto 01$
    - $c \mapsto 00$
  - 100100: only maps to abc
- or, we can compress it better
  - $a \mapsto 0$
  - $b \mapsto 10$
  - $c \mapsto 11$
  - more compact, thanks to 1-bit a
- what is the most cost-effective code?
  - how can we find so?
  - 👨‍🏫 codes can be represented as a tree
- example tree

graph TD
    1((ε))
    2((0))
    3(( ))
    4((10))
    5((11))
    1--0-->2
    1--1-->3
    3--0-->4
    3--1-->5

as we want to minimize the length
- what should we do? based on frequency
- 👨‍🏫 Huffman's algorithm!
Huffman's algorithm
- suppose: word $a_{i}$ appears $w_{i}$ times
- sort words based on $w_{i}$
  - merge the two least frequent words (in a tree)
  - create a Huffman tree on $n - 1$ words
  - then repeat recursively
example process
- $a : 12, b : 2, c : 5, d : 1, e : 1$
- $a : 12, b : 2, c : 5, d e : 2$
- $a : 12, c : 5, b d e : 4$
- $a : 12, c b d e : 9$
- now: assign 0 to one, and 1 to another
- finally

graph TD
      0(( ))
      1((a))
      2(( ))
      3((c))
      4(( ))
      5((b))
      6(( ))
      7((d))
      8((e))
      0--0-->1
      0--1-->2
      2--0-->3
      2--1-->4
      4--0-->5
      4--1-->6
      6--0-->7
      6--1-->8

proof of correctness: is this optimal?
- let's say, we are given an ideal coding
- the least frequent words: we want it to be far down as possible
- lemma: take: leaves are farthest from the root
  - show that such two nodes must be a sibling
  - or else: i.e. farthest leaf's parent has only one child
    - than we can simply replace the parent with leaf node
  - and that is enough: optimal way to code sibling are 0,1
    - path to least-frequent sibling's parents
    - +1 (for child)
  - actually:
    - induction on no. of words
    - base case: 2 words
❓ can we parallelize the unpacking process?
- 👨‍🎓 simply: putting separators every $x$ bytes

COMP 2711H: Honors Discrete Mathematical Tools for Computer Science