COMP 2711H: Lecture 16

Date: 2024-10-07 18:00:56

Reviewed:

Topic / Chapter:

summary

❓Questions

Notes

Extra Problems
  • Problems

    • following problems: easy on trees, hard on general graph
    • explanation: on 3721
      • πŸ‘¨β€πŸ« for now, trust me :p
  • Problem 1: Longest path

    • input: graph
    • output: longest path in
    • lemma: given a tree, then longest path in
      • = -path where are leaves
      • if it's not a leaf, it must have an extra vertex that is a leaf
        • as a tree can't have a tree
        • then, we can extend the path to that leaf
    • algorithm:
      • pick any non-leaf vertex as root
      • turn tree into a rooted tree
      • we can: do case work on highest point
        • πŸ‘¨β€πŸŽ“ or: lowest common ancestor
    • assign depth to all nodes
      • depth: maximum nodes one can go down until it reaches the leaf
      • = dynamic programming in 3711
      • but it's just another way of computing for now
      • : length of longest path which highest vertex is
      • longest path w/ as the highest point: either
        • max depth[v]
        • 2+max depth[v]
        • 2+2nd-max depth[v]
    • πŸ‘¨β€πŸ« "try" to solve it on general graph
  • Problem 2: Maximum independent set

    • set
      • s.t.
      • i.e. subgraph without any edge connecting them
    • input: graph
    • output: largest possible size of independent set
    • if input is a tree, algorithm:
      • pick any arbitrary as root
      • : subtree of rooted at
        • πŸ‘¨β€πŸ« but, doesn't help solving it directly
      • : size of the largest independent set
        • s.t.
      • : size of the largest independent set
        • s.t.
      • output:
        • but, how can we compute each cases?
    • for a leaf :
    • for a non-leaf:
    • strategy: compute for all leaf
      • if all of a node's children are computed
        • then compute the node
  • Problem 3: (Huffman) Coding

    • exists: a file consists of words
    • a coding is:
    • file: "abcaaabcaaa"
      • a: being repeated multiple times
      • let's say, we map:
    • but at current stage: it cannot be recovered
      • e.g. 0101 can be
        • abab
        • abc
        • cab
        • cc
    • we want: prefix-free code
      • no : a prefix of a ()
      • e.g.
      • 100100: only maps to abc
    • or, we can compress it better
      • more compact, thanks to 1-bit a
    • what is the most cost-effective code?
      • how can we find so?
      • πŸ‘¨β€πŸ« codes can be represented as a tree
    • example tree
graph TD
    1((Ξ΅))
    2((0))
    3(( ))
    4((10))
    5((11))
    1--0-->2
    1--1-->3
    3--0-->4
    3--1-->5
  • as we want to minimize the length
    • what should we do? based on frequency
    • πŸ‘¨β€πŸ« Huffman's algorithm!
  • Huffman's algorithm
    • suppose: word appears times
    • sort words based on
      • merge the two least frequent words (in a tree)
      • create a Huffman tree on words
      • then repeat recursively
  • example process
    • now: assign 0 to one, and 1 to another
    • finally
graph TD
      0(( ))
      1((a))
      2(( ))
      3((c))
      4(( ))
      5((b))
      6(( ))
      7((d))
      8((e))
      0--0-->1
      0--1-->2
      2--0-->3
      2--1-->4
      4--0-->5
      4--1-->6
      6--0-->7
      6--1-->8
  • proof of correctness: is this optimal?
    • let's say, we are given an ideal coding
    • the least frequent words: we want it to be far down as possible
    • lemma: take: leaves are farthest from the root
      • show that such two nodes must be a sibling
      • or else: i.e. farthest leaf's parent has only one child
        • than we can simply replace the parent with leaf node
      • and that is enough: optimal way to code sibling are 0,1
        • path to least-frequent sibling's parents
        • +1 (for child)
      • actually:
        • induction on no. of words
        • base case: 2 words
  • ❓ can we parallelize the unpacking process?
    • πŸ‘¨β€πŸŽ“ simply: putting separators every bytes