Homework 3

Skeleton Code

Some skeleton code has been provided for you in the hw3 folder and has been pushed to the Github repository resources. If you already have this repository locally cloned, just perform a git pull. Otherwise you’ll need to clone it.

Written Portion

Problem 1 - Heap Tracing (8%)

You will answer several questions about Heaps in this problem.

For this problem, you must draw this by hand or using some form of drawing app and submit the results as a PDF. Name your file q1.pdf.

Part a (3%)

Perform the buildHeap (aka makeHeap) algorithm on the following array to create a min-Heap from the arbitrary array shown below. Show the state of the array as a binary tree after each iteration (call to heapify()) of the algorithm. (If that does not make sense review the lecture materials to review the buildHeap algorithm.)

Part b (5%)

Draw the tree representation of the following binary Min Heap in its initial configuration, and after each operation. Make sure to clearly indicate each of your final answers.

Problem 2 - A* Algorithm Tracing (7%)

A-Star

You are given the above unweighted graph, and want to find the shortest path from node L to node A, using A* Search. Your algorithm has the following properties:

Place your answer in a file q2.txt which lists the order that nodes are explored and discovered (start from L) showing the g, h, and f values for each node as it is discovered. Use the format shown below (where the values are fictitious and intended for demonstrating the format):

- explore L
  - discover H (g=1, h=9, f=10)
  - discover P (g=1, h=7, f=8)
- explore P
...

Programming Portion

Problem 3 - Linked List Recursion Coding (20%)

Write a program that reads two lists of integers specified in a text file with integers separated by spaces on two lines (and only two lines) of the file into two singly linked lists. Then use recursion to:

  1. Recursively remove consecutive integers that are equal from the first list (only keeping one)
  2. Makes a new list (i.e. copying the items in the list) such that the new list contains the concatenation of the first and second list using recursion

Thus if the input file was:

1 3 3 3 5 9 7 7
2 4 4 4

you would output:

1 3 5 9 7 2 4 4 4

Each of the two operations above should be implemented as separate recursive functions (you can feel free to define helper function if you want a different signature) that should be called in succession as in:

removeDuplicates(head1);
head3 = concatenateLists(head1, head2);

These recursive functions should not contain any loops. If you do use loops in these functions you will receive a 0.

Using the Item definition and prototypes in the provided rem_dup_lib.h from the resources/hw3 folder.

Write the definitions for the two functions in a file rem_dup_lib.cpp. You are welcome to define recursive helper functions if you like in this file. Again, no loops are allowed in these function nor any helpers you define.

Finally, complete the test program (in rem_dup.cpp) that will read in the integers from a file into the two lists, call removeDuplicates on the 1st list and the call concatenate on the resulting 1st and original 2nd list. We have completed the main() which will make the appropriate sequence of calls. You only need to complete the function to read in the 2 input lists from a file: readLists. You may use loops for reading in the numbers. Do NOT define any classes or use STL structures. Note: readLists will need to produce 2 outputs (head1 for the first list and head2 for the second)…and you can’t return 2 values from a function in C++. Thus you’d need to pass in the pointers by reference as either:

void readLists(const char* filename, Item*& head1, Item*& head2);

or

void readLists(const char* filename, Item** head1, Item** head2);

Feel free to add other arguments to these functions if you need; we’re just showing how you’d have to pass the head pointer.

We will test rem_dup_lib.cpp/.h separately but please use the rem_dup.cpp program to test your code. The usage for rem_dup.cpp (after you compile to a rem_dup executable) is:

$ ./rem_dup input.txt output.txt

Your input should be read from the filename given as the 1st command line argument. Your output show be stored in the filename given as the 2nd command line argument.

There will be no formatting errors in the input file, though each of the two lines may be blank (have no numbers) which indicates an empty list.

Memory should not be leaked when you remove consecutive equal items and all items should be deleted at the end of the program.

Ensure your Makefile must have a rule rem_dup to compile this program and that rule should also be listed under your all rule.

Problem 4 - Templated Stack Class (10%)

Implement a templated Stack class, Stack<T>.

It must:

Problem 5 - Dirty Laundry (10%)

Consider a gym with gray and black towels which patrons discard into a can and employees come and collect some number of towels to wash from the top of the pile. Given a file whose contents record the sequence in which towels are discarded along with when and how many towels the employee picks up to wash, please output the sequence of towels the employee picks up to wash each time he visits the can (Note: towel discards and employee pick ups can come in any order).

The input file consists of integers separated by spaces. 0 = black towel, -1 = gray towel, and any positive number bigger than 0 represents the employee collecting that many towels from the top of the pile. If there are less towels than the employee tries to pick up, he will just get the available towels.

Sample file contents (e.g. laundry.in). There will be no format errors in this file so don’t worry about that.

0  0  0  -1  2  -1  3

Output file contents (e.g. laundry.out). There should be one line per employee pickup of towels.

gray black 
gray black black

We will run your program as:

./laundry laundry.in laundry.out

and examine the contents of laundry.out to check your work.

You must use your Stack<T> class to help solve this problem.

Your solution should run in O(n) where n is the number of integers in the input file. Failure to meet this requirement will result in a loss of half of the available points of this problem.

Ensure your Makefile has a rule laundry to compile this program and that rule should also be listed under your all rule.

Problem 6 - Comparator Functor Background (0%)

The following is background info that will help you understand how to do the next step.

If you saw the following:

 int x = f();

You’d think f is a function. But with the magic of operator overloading, we can make f an object and f() a member function call to operator() of the instance, f as shown in the following code:

  struct RandObjGen {
    int operator() { return rand(); }
  };
  
  RandObjGen f;
  int r = f(); // translates to f.operator() which returns a random number by calling rand()

An object that overloads the operator() is called a functor and they are widely used in C++ STL to provide a kind of polymorphism.

We will use functors to make a sort algorithm be able to use different sorting criteria (e.g., if we are sorting strings, we could sort either lexicographically/alphabetically or by length of string). To do so, we supply a functor object that implements the different comparison approach.

  struct AlphaStrComp {
    bool operator()(const string& lhs, const string& rhs) 
	{ // Uses string's built in operator< 
	  // to do lexicographic (alphabetical) comparison
	  return lhs < rhs; 
	}
  };

  struct LengthStrComp {
    bool operator()(const string& lhs, const string& rhs) 
	{ // Uses string's built in operator< 
	  // to do lexicographic (alphabetical) comparison
	  return lhs.size() < rhs.size(); 
	}
  };
  
  string s1 = "Blue";
  string s2 = "Red";
  AlphaStrComp comp1;
  LengthStrComp comp2;
  
  cout << "Blue compared to Red using AlphaStrComp yields " << comp1(s1, s2) << endl;
  // notice comp1(s1,s2) is calling comp1.operator() (s1, s2);
  cout << "Blue compared to Red using LenStrComp yields " << comp2(s1, s2) << endl;
  // notice comp2(s1,s2) is calling comp2.operator() (s1, s2);

This would yield the output

1  // Because "Blue" is alphabetically less than "Red"
0  // Because the length of "Blue" is 4 which is NOT less than the length of "Red (i.e. 3)

We can now make a templated function (not class, just a templated function) that lets the user pass in which kind of comparator object they would like:

template <class Comparator>
void DoStringCompare(const string& s1, const string& s2, Comparator comp)
{
  cout << comp(s1, s2) << endl;  // calls comp.operator()(s1,s2);
}

  string s1 = "Blue";
  string s2 = "Red";
  AlphaStrComp comp1;
  LengthStrComp comp2;

  // Uses alphabetic comparison
  DoStringCompare(s1,s2,comp1);
  // Use string length comparison
  DoStringCompare(s1,s2,comp2);

In this way, you could define a new type of comparison in the future, make a functor struct for it, and pass it in to the DoStringCompare function and the DoStringCompare function never needs to change.

These comparator objects are use by the C++ STL map and set class to compare keys to ensure no duplicates are entered.

template < class T,                        // set::key_type/value_type
           class Compare = less<T>,        // set::key_compare/value_compare
           class Alloc = allocator<T>      // set::allocator_type
           > class set;

You could pass your own type of Comparator object to the class, but it defaults to C++’s standard less-than functor less<T> which is simply defined as:

template <class T> 
struct less 
{
  bool operator() (const T& x, const T& y) const {return x<y;}
};

For more reading on functors, search the web or try this link

Problem 7 - m-ary Heaps (25%)

Now that you have learned the basics of heaps, build your own m-ary Heap class whose public definition and skeleton code are provided in the heap.h skeleton file. Rather than specifying a specific type of heap (Min- or Max-Heap) we will pass in a Comparator object so that if the comparator object functor implements a less-than check of two items in the heap then we will have a min-heap. If the Comparator object functor implements a greater-than check of two items in the heap, we will have a max-heap.

template <typename T, typename Comparator = std::less<T> >
class Heap
{
 public:
   /// Constructs an m-ary heap for any m >= 2
   Heap(int m = 2, Comparator c = Comparator() );

   // other stuff
   
};

You may use the STL vector<T> container if you wish. Of course, you MUST NOT use the STL priority_queue class or make_heap, push_heap, etc. algorithms.

Notice we can provide a default template type for the Comparator so that the client doesn’t have to specify it if they are happy with the std::less (i.e. which assumes a built-in operator< is available for type T and calls it).

Notice the constructor also provides default value for m and the Comparator which is a default-constructed Comparator. Default parameter values will be used if the client does not supply them. Also if a default parameter is used then all parameters after it must also have default values (i.e. default parameters must be at the end of the declaration). The last thing to note is you only supply the default value in the declaration of the function. When you define the function (i.e. write the implementation below) you would not specify the default value again but just leave the argument types/names. For example, when implementing the function you’d just define:

template <typename T, typename Comparator>
Heap<T,Comparator>::Heap(int m, Comparator c /* Don't specify the default value here, only above */ ) 
// Your code here

So with these defaults in place, the client that wants a binary min-heap with a type that has a less-than operator need only write:

Heap<string> h1;  // calls the default constructor with default args:    
// m=2 and std::less<string> is used as the default template type for Comparator
// and std::less<string>() (i.e. the default constructor) will be the
// comparator object created by the constructor

If the client wants some custom method of comparison so that they can implement a max-heap or some other alternative using a custom comparator can write:

ObjAComparator c1( /* some constructor arguments as desired */ );
Heap<ObjA, ObjAComparator> h1(2, c1);

Before using our tests, we recommend you write a simple test program of your own to check some simple cases that you come up with. You could push a few integers in random order and then pop them all out and ensure they are in order. You may want to try heaps of different types like int and/or string. Also, be sure it works as a min-heap and as a max-heap using different comparators. For reference if your type T has a < operator, then C++ defines a less functor which will compare the type T items using the operator<(). Similar there is a greater functor already defined by C++ that will compare using the operator>(). They are defined in the functional header (#include <functional>) and you can look up their documentation online for further information. This is meant to save you time writing functors for types that can easily be compared using built in comparison operators.

Once you feel like your heap seems to be on the right track, you can use a subset of basic tests that we have provided (hw3_heap_checker folder in resources) to check your heap. They are not necessarily exhaustive and we may run more complete tests for grading but this should help you. Just copy them to your hw3 folder, cd hw3_heap_checker, and run the normal cmake ., make and run the tests.

Note: You may use this Heap object in future homeworks, so test it well now!

Problem 8 - Huffman Codes (20%)

Background

One potential application for a priority queue is Huffman coding. Huffman coding is a form of lossless compression (no information is lost / the operation is invertible to the original). Huffman coding is a key component in other compression algorithms such as the JPG and MPG image and video compression algorithms.

Huffman coding is a variable-length coding scheme that determines the frequency of occurrence of each symbol in the input data (how often each symbol occurs). It then assigns the symbols that appear the most often to the shortest codes (and those that occur least often with the longest codes). How those variable-length codes are generated is the crux of the algorithm and is explained below. But once generated, we simply scan through the input replacing each occurence of a symbol with its corresponding code which generally reduces the storage required. Note: we need to also store the dictionary of keys and codes to that we can invert the operation (decompress) to recover the original.

Algorithm

There are many websites that will give an overview and even actual C++ code. You may review those for the overview of the algorithm but you are NOT ALLOWED to copy code from any online source. Instead, use them for understanding the general approach.

Below is the pseudocode for the compression algorithm that we require you to implement:

A step by step visualization of how the Huffman tree is built is shown below.

Most online references construct a tree using pointers, etc. We will not do that (and if you do so, it will be an indicator that you inappropriately used online code). Instead, we will take a potentially less efficient approach and use nodes that store the collection of all keys that would be in the leaves of the subtree rooted at that node, and we will build up the code strings as we join two nodes (sub-trees).

Example 1

Here is how the Huffman codes would be generated using the string mississippi.

Note: In these images we are not explicitly showing the heap but the construction of the “Huffman” tree. To understand what is happening the heap, we color code the items below. The two yellow nodes are the nodes popped from the heap in that iteration, and the green nodes are what remains at the end of that iteration.

drawing

Example 2

Here is how the Huffman codes would be generated using the string i boo big bruins. drawing

Using the Huffman Codes

Returning to our example of the string mississippi we could count the frequency of each symbol to obtain:

A valid Huffman coding would be:

The resulting compressed string would be: 100011110111101011010

Decompression

To decompress we simply scan one character at a time from the compressed data, appending each character to a code word (string) and as soon as the code word matches any code in the key/code dictionary we can substitute the original character.

Given the compressed string for mississippi and the codes we found above the decompression process would be:

I/O Routines

We can compress any data (binary/ASCII/etc.) we wish but must determine the set of possible “symbols” or the size of each symbol (e.g. bytes [0-255]) in the input. For this assignment, we will use strings that correspond to the hexadecimal values of UTF-16 (2-byte) characters. UTF-16 is a character coding scheme that represents each possible character as a 16-bit (2-bytes = 4 hex characters) value (i.e. 0000 - ffff in hex). It is a superset of ASCII (i.e. 16-bit codes 0000-007f represent the same characters as ASCII 00-7f hex). The input files for your program will be the ASCII representation (so you can easily use std::strings) of the hexadecimal for these files.

For example, given a file (mississippi.txt) that contains the characters: mississippi, we could save that as a UTF-16 formatted file where each character in the input text would be represented as a 2-byte code (e.g. m is ASCII 6d hex and, thus, in UTF-16 would be 006d hex) and then convert each of those 2 bytes to its corresponding 4-hex digits. For your convenience in reading and making sense of these files, we will then store those 4 hex digits as ASCII strings and save them to a text file so you can use normal file I/O routines that work with ASCII.

Thus mississippi will be coded in an input file (mississippi.aschex) as:

006d 0069 0073 0073 0069 0073 0073 0069
0070 0070 0069 

We have provide a Python script that takes normal ASCII text files and converts to this ASCII-UTF16-hex format for you. You can run it as follows:

$ python3 conv-to-utf16-aschex.py mississippi.txt mississippi.aschex

mississippi.txt can be replaced with any normal ASCII text file and the resulting output will be written to the filename provided as the last command line argument.

The Code Base

We provide a COMPLETED class with static methods to read and write these ASCII hex (of UTF-16 codes) files:

class AsciiHexIO {
public:
    static RawTextVector read(const char* filename);
    static void write(const char* filename, const RawTextVector&  text);

};

These operations are the inverse of each other (i.e. calling read on an .aschex file and then passing the returned vector to write will result in a new file that is a copy of the original)

Next, we have a partially completed class that provides routines to read and write compressed files.

typedef std::vector<std::string> RawTextVector;
typedef std::string CompressedData;
typedef std::map<std::string, std::string> CodeKeyMap;

class AsciiHuffmanIo {
public:
    static void writeCodedText(const char* filename, const CompressedData& inText, const CodeKeyMap& code);
    static void readCodedText(const char* filename, CompressedData& outText, CodeKeyMap& code);
};

Finally, a class HuffmanCoder is started for you to take the various data structures and compress or decompress the data.

class HuffmanCoder {
public:
    typedef std::map<std::string, size_t> KeyFrequencyMap;
    typedef std::map<std::string, std::string> KeyCodeMap;

    static void compress(const RawTextVector& inText, CompressedData& outText, CodeKeyMap& codes);
    static void decompress(const CompressedData& inText, const CodeKeyMap& codes, RawTextVector& outText);
    static double ratio(const RawTextVector& inText, const CompressedData& outText, const CodeKeyMap& codes)
;
};

Note: ratio is a function that computes the compression ratio (how much space was saved) and displays that info.

Your Task

  1. Complete the AsciiHuffmanIO::readCodedText function to read in compressed files and return the string of compressed data and the code/key map.
  2. Write the HuffmanCoder::compress function
    • Use the pseudocode at the top of this problem statement to guide your implementation. You MUST USE your heap implementation (heap.h) for the priority queue.
    • You must maintain appropriate data structures (i.e. maps) to avoid expensive linear searches when replacing the input data symbols with the Huffman codes as well as avoiding expensive linear search when searching for the code of a symbol as you build up the Huffman codes.
  3. Write the HuffmanCoder::decompress function
    • Use the decompression process described above to develop your algorithm and code for converting the compressed text back to the original sequence of symbols

Testing

We have provided a COMPLETE test program in huffman-test.cpp. This program is a utility that will use the various I/O and HuffmanCoder functions to either compress or decompress files. You can use it one of two ways:

To compress a file, you could run:

$ ./huffman-test c samples/mississippi.aschex samples/mississippi.huf

The first argument c indicates the program will compress a file. It then takes the ASCII coded UTF-16 hex file in the samples folder: samples/mississippi.aschex and compress it, saving the compressed file to samples/mississippi.buf (or whatever path/filename you provide).

To decompress a file, you could run:

$ ./huffman-test d samples/mississippi.huf samples/mississippi.dec

Here, d indicates the program will decompress a file. It then takes a file that shold be the output of a compression operation (similar to the example above) and decompresses it to a new file given by the last command line argument.

After decompression, the resulting file (samples/mississippi.dec) should be a match for the original input (samples/mississippi.aschex).

You could use the diff utility on Linux/Unix/MacOS to compare the files:

$ diff samples/mississippi.aschex samples/mississippi.dec

If the two files have differences, the diff utility will display them. If they are the same, no output will be generated (a good thing).

You can use this flow of compressing, decompressing, and then using diff to ensure your program works. We have provided a few files for you to test with and you can create others by using the conv-to-utf16-aschex.py script described above:

```
Original size (bytes): 754
Compressed size (bytes): 433
Compression ratio: 1.74
```
```
Original size (bytes): 6.58734e+06
Compressed size (bytes): 1.87401e+06
Compression ratio: 3.52
```

Checkpoint

For checkpoint credit, submit your working code for the linked list recursion problem. Ensure you add/commit/push your hw-username repo with a hw3 subfolder that contains:

Submission Files

Ensure you add/commit/push all your source code files, Makefile, and written problem files. Do NOT commit/push any test suite folder/files that we provide from any folder other than the resources/hw3 repo.

WAIT You aren’t done yet. Complete the last section below to ensure you’ve committed all your code.

Commit then Re-clone your Repository

Be sure to add, commit, and push your code in your hw3 directory to your hw-username repository. Now double-check what you’ve committed, by following the directions below (failure to do so may result in point deductions):

  1. In your terminal, cd to the folder that has your resources and hw-username
  2. Create a verify-hw3 directory: $ mkdir verify-hw3
  3. Go into that directory: $ cd verify-hw3
  4. Clone your hw_username repo: $ git clone git@github.com:usc-csci104-summer2022/hw-username.git
  5. Go into your hw3 folder $ cd hw-username/hw3
  6. Switch over to a docker shell, navigate to the same verify-hw3/hw-username/hw3 folder.
  7. Recompile and rerun your programs and tests to ensure that what you submitted works. You may need to copy over a test-suite folder from the resources repo, if one was provided.