CSCI 104 - Fall 2019 Data Structures and Design

Problem 0 (Course Evaluations, 0%)

Please make sure to fill out the online course evaluations for this course. The instructor takes these seriously, and you can have a positive impact on how this class is taught in future semesters. Pay it forward!

Problem 1: Making sure you understand Trees (10%)

AVL Trees

Consider the following initial configuration of an AVL Tree:

AVL Tree

Draw the tree representation of the AVL tree after each of the following operations, using the method presented in class. Submit it as hw6.txt, or any other file extension you prefer, but indicate the filename in your README.

Your operations are done in sequence, so your tree should have 11 values in it when you are done. Make sure to clearly indicate each of your final answers.

Splay Trees

Consider the following initial configuration of a Splay Tree:

Splay Tree

Draw the tree representation of the Splay tree after each of the following operations, using the method presented in class. Continue your answers in hw6.txt.

Your operations are done in sequence, so your tree should have 7 values in it when you are done. Make sure to clearly indicate each of your final answers.

We highly recommend you try to solve these by hand before using any tools to verify your answers.

Problem 2: Implementing Binary Search Trees (15%)

We are providing for you a half-finished file bst.h (in the homework-resources repository) which implements a simple binary search tree. You will need to fill in the implementation for the following functions inside BinarySearchTree:

Note: For the insert() method, you should handle duplicate entries by overwriting the current value with the updated value.

Problem 3: Implementing AVL Trees (35%)

We are providing you a half-finished file avlbst.h (in the homework-resources repository) for implementing an AVL Tree. It builds on the file you completed for the previous question.

Complete this file by implementing the insert() and remove() functions for AVL Trees. You are strongly encouraged to use private/protected helper functions, and to put some of them in the BinarySearchTree file, so that your Splay Tree for problem 4 can access them (such as a BST::remove() function).

Notes and Requirements

Problem 4 Implementing Splay Trees (25%)

Complete the file splay.h for implementing a Splay Tree. You will need to implement the insert() and remove() functions.

Recall that when you delete from a Splay Tree, you should take the parent of the deleted node (which is likely to be the predecessor or successor of the value you deleted) and splay it to the top.

Problem 5 (Analyze the Performance of your Data Structures, 15%)

Create a program analysis that should be created using make analysis, and will be called as follows:

./analysis [input] [output]

Here, [input] will be an input text file, which you will run your program on. Your program should take each string in the text file (a sequence of characters separated by 1 or more newlines, tabs, or spaces) and insert them into your data structures. [output] is an output text file, into which you will store your results.

Here are the steps that your program should perform:

  1. Read all of the strings into a vector<string>.
  2. Create an AVL Tree and populate it (insert) with all the strings.
  3. Create a Splay Tree and populate it with all the strings.
  4. Report timing information to the output file.

Your program should separately time and report the duration for 2 and 3. Information on how to time your code is given below. The reason we make you read/write everything into arrays first is that file access is really really slow compared to memory access. Thus, if you ran steps 2 and 3 directly from the input file, or wrote directly to the output file, the time for file access would almost certainly completely drown out the differences between AVL Trees and Splay Trees. So please don't try to take the shortcut of interacting with files while you're timing.

Note that you don't really need the Value of the (key, value) pair for this application, so feel free to place whatever value you like here.

Your output file should report the following:

Report these statistics as in the following example (all these numbers are completely made up, so don't read anything into them at all). Please follow this format so that our script can match.

1000 insertions
AVL: 0.1417 seconds
Splay: 0.1243 seconds
Splay expensive: 42

Timing Your Code

For more detailed information on how to time code, check the course coding guide. Here is a briefer summary. Basically, you do something like the following:

#include <iostream>
#include <ctime>

int main() {
    clock_t start;
    double duration;

    /* Preprocessing here that you don't want to time */

    start = clock();

    /* Your algorithm here */

    duration = ( clock() - start ) / (double) CLOCKS_PER_SEC;

    std::cout << duration/r << endl;
}

The problem with that is if the algorithm runs fast enough, duration will always be 0, and you can't distinguish 0.000001 seconds from 0.0000000001 seconds, which is a huge difference. The solution is to repeat the algorithm many times, then take the average. See the course coding guide if you need more detail.

What to Report

In your README file, report on the following:

Chocolate Problem (1 Chocolate Bar): Merging Startups, Revisited

Remember the merging startups from Homework 3? We will modify the problem a little bit now. Companies will not split any more. And you don't need to worry about any destructors. So what you want to be able to do is just merge the companies of two given students (assuming that they aren't already in the same company), and to query whether two given students are currently in the same company or not. That's it. But now, every operation has to run in time O(log n), where n is the total number of students.

Design a data structure that supports these two operations to run in worst-case time O(log n), and prove that this is in fact the worst-case running time. You do not need to provide actual C++ code - clear pseudo-code is enough. The important part is the proof.

If you are really ambitious, you can try to design a data structure in which the worst-case amortized cost per operation is less than O(log n). One can get it down to almost constant, definitely much less than O(log log n). But this is quite a lot harder.

You can submit your homework here. Please make sure you have read and understood the submission instructions.

WAIT! You aren't done yet. Complete the last section below to ensure you've committed all your code.

Commit then Re-clone your Repository

Be sure to add, commit, and push your code in your directory to your hw-usc-username repository. Now double-check what you've committed, by following the directions below (failure to do so may result in point deductions):

  1. Please make sure you have read and understood the submission instructions.
  2. Go to your home directory: $ cd ~
  3. Create a verify directory: $ mkdir verify
  4. Go into that directory: $ cd verify
  5. Clone your hw-username repo: $ git clone git@github.com:usc-csci104-fall2019/hw-username.git
  6. Go into your folder $ cd hw-username/
  7. Recompile and rerun your programs and tests to ensure that what you submitted works.
  8. Go to the Assignments page and click on the submission link.
  9. Find your full commit SHA (not just the 7 digit summary) and paste the full commit SHA into the textbox on the submission page.
  10. Click Submit via Github
  11. Click Check My Submission to ensure what you are submitting compiles and passes any basic tests we have provided.