CSCI 104 - Fall 2019 Data Structures and Design

Class Design & Coding Practice

This lab is going to be a little different. Instead of going over a new topic, we're going to talk about general best practices for writing code.

At this point in the class, you might have heard your instructor telling you to use good coding practice, or you might have lost a few points on HW1 or HW2 for "bad coding practice". What do we mean when we say that, and how can you improve your code to follow good coding practices?

We've also identified some common mistakes we've seen in your code so far, and how to fix them.

1 - Why It Matters

After finishing three assignments, you're all capable of writing working code. However, there's more to writing code than just getting the right output. Often times, when you work on a project of a larger scale, you aren't the only one who needs to read and understand your code. Additionally, you may need to revisit your own code many times after it's done. It's important to make your code readable and maintainable.

2 - Class Design

When you do object oriented programming, a good class design can often be the foundation of good coding practice.

2.1 - Deque Review

Deques, or double-ended queues allow for efficient (fast) additions and removals from either 'end' (front or back) of the list.

If we were to write a deque class to store integers, where do we begin? Usually when you write a class, the first thing you need to decide is what constitutes a class' public interface. The public interface of a class determines how other classes interact with it. In C++ terms, the public functions and member variables will be a class' public interface.

2.2 - Deque Interface

Since we are implementing a data structure, the public interface is largely determined by its properties. For deque, we allow for efficient addition and removals from either end. That means we need a function to remove/add elements to either end of the deque:

// Adds a new integer to the front
void push_front(int val);

// Adds a new integer to the back
void push_back(int val);

// Removes the front item
void pop_front();

// Removes the back item
void pop_back();

We chose these function names for two reasons: 1. they clearly describe what the functions do and 2. they are the consistent with the C++ STL library.

In deque, stack, and queue, pop usually means removing an element from the list, and push means adding an element. In general, it's good to conform to the conventions when you choose function names. While insert_front and remove_back also describe what the functions do, other people who use your code will find it less intuitive.

Another important feature about following convention is that a code that currently uses C++ deque can easily switch to use your code, without having to modify each function call.

We still have some more functions to add — what about reading from the deque? Since a deque allows fast access to front and back of the list, we should also provide a getter for each:

// returns front item or
int front();

// returns back item or
int back();

What would happen if we call front() on an empty deque? The proper way to handle this is to throw an exception. However, it's important that there should be a way to let your users know if the deque is empty, so they can avoid making the function call in the first place.

// retrns true if the list is empty
bool empty();

// returns number of items in the list
int size();

Do we need the empty function if we already have the size function? If not, why might we choose to have an empty function anyway?

At this point, we have created all the functions that we need based on the properties of deque. However, we are not done yet. We need a way to construct and destruct the list. We don't need any additional information to construct the list, so the constructor takes in no argument.

Deque();
~Deque();

Here is the public interface of our deque:

Deque();
~Deque();
bool empty();
int size();
int front();
int back();
void push_front(int val);
void push_back(int val);
void pop_front();
void pop_back();

2.3 - Interface vs. Implementation

Everything we have discussed is the interface of the deque class. It does not tell us anything about how the functions should be implemented and that is for good reason. A class interface only determines how other classes interact with it. Changing the underlying implementation of a class should not affect its behavior.

With that in mind, let's talk about some potential ways we can implement deque.

In doubly-linked lists, each item tracks the previous element and the next element. In circular doubly-linked lists, the head's previous element is the tail and the tail's next element is the head.

One possible implementation of deque uses doubly-linked list. We can track that list's head, tail and size.

class Deque {
private:
  struct Item {
    int val;
    Item* prev;
    Item* next;
  };

  Item* head_;
  Item* tail_;
  int size_;
};

This will allow an O(1) runtime for all the functions in our interface except for one. Which one?

Notice that head_ and size_ are both private members. That means when your users create Deque objects, they cannot access size_ or head_ directly. The only way for your users to interact with the deque is through its public functions.

This is for good reason, other than that "you lose points on homework if you have public members". Imagine that you first learned about doubly linked list and implented your deque. Later, you discovered that you can eliminate the tail_ member if you use circular linked list to implement your deque instead.

If you kept all your members private, you can simply remove the tail_ member and update your implementation. Your users should not be affected at all.

What would happen if you kept tail_ as a public member? Your users might have seen it as public, and assumed that it will be safe to access it directly. That means your user might be using mydeque.tail_, which will not compile as soon as you remove the tail_ member!

Hiding internal data from your users is called encapsulation, and it's an important concept of object oriented programming. Keep this in mind when you design classes in the future.

3 - What Consists of Good Coding Practice

In addition to class design, there are many good coding practices that do not seem to affect functionality of your program, but can certainly make your code more readable, scalable, and maintainable.

If you decide to work in industry, you will find that most companies have their own coding guidelines. Most likely, the standards differ from company to company and from industry to industry. When that happens, you need to make sure you follow your company's coding style guide. Before that happens, we are going to teach you some basic guidelines to writing good code.

Side note: Here are a few links to company style guides:

3.1 - Don't Repeat Yourself

This rule is simple: whenever you feel the need to repeat yourself (i.e. you are tempted to copy and paste your code), it should raise a red flag. There are three common ways to eliminate duplicate code:

  1. Change your logic to group similar tasks together
  2. Move the code into a function
  3. Template the function or class

We'll talk about templating soon. For now, we will focus on changing your logic and moving code into a fucntion.

Lets take a look at part1.cpp. Here we can see that in our main function, our program prints out the sum of the first 10, 100, and 1000 numbers respectively. Notice that there's a reoccurring pattern here:

int sum = 0;
for(int i = 0; i < 10; i++)
{
  sum += i;
}

This block of code seems to repeat mutliple times, with only one of the numbers changing every time. Having code like this tends to make it long and very unfriendly to potential readers. It is possible to group all three for loops into one, and use if statements inside the for loop:

// Method 1: Change Logic
int sum = 0;
for (int i = 0; i < 1000; i++)
{
  sum += i;
  if (i == 9 || i == 99 || i == 999)
  {
    std::cout << "Sum of the first " << i + 1 << " numbers: " << sum << std::endl;
  }
}

Alternatively, we can fix this by making a function like so:

int sumFirstNumbers(int n)
{
  int sum = 0;
  for(int i = 0; i < n; i++)
  {
    sum += i;
  }
  return sum;
}

void printSum(int sum, int n)
{
  std::cout << "Sum of the first " << n << " numbers: ";
  std::cout << sumFirstNumbers(n) << std::endl;
}

Now, we can replace every segment of similar code with a call to this function and drastically increase readibility.

Moving code into a function has the benefit that it can be called from different places, whereas changing your logic allows you to access all variables in the current scope without having to pass everything into a function. Writing a function is generally more applicable than changing logic. If changing the logic will make the code unintuitive and hard to understand, creating a function is more preferable style-wise. However, which method you choose will depend on the context.

Bottom line: If you feel like copying and pasting code, think twice, then think a third time, then think a fourth time. Keep thinking until you are convinced that copying and pasting that code is a bad idea.

3.2 - Writing Modular Code

Code modularity is the idea that your program is composed of multiple small, independent modules. Ideally, changing implementation of one module should not affect another module, as long as the public interface is not changed.

You have seen how when we create the Deque class, we were careful to not expose implementation details to users. Thus, we are free to change our implementation of deque without our users needing to know. This idea applies outside of class designs as well.

When you are writing a long program, it's often a good idea to separate out some logic into separate functions. This is another reason why we write functions, other than to eliminate duplicate code.

3.3 - Write Self-Documenting Code

You might have remember us telling you to write comments to document your code. In addition to adding comments, there's another way to make your code easier to understand - by writing self-documenting code using sensible variable and function names.

Let's take a look at part2.cpp. Here, we have two functions, calculate1 and calculate2.

unsigned int calculate2(unsigned int n) {
  if (n == 0) {
    return 1;
  }
  return n * calculate2(n - 1);
}

unsigned int calculate1(unsigned int n) {
  if (n <= 1) {
    return n;
  }
  return calculate1(n - 1) + calculate1(n - 2);
}

Here, we can only tell that these functions are trying to calculate something using the argument n, but the function names fail to tell us what exactly it is trying to calculate. Since these two functions share similar names, we might think that these two functions are related.

However, if we take a closer look at the functions, you will see that calculate1 computes the nth Fibonacci number and calculate2 computes the nth factorial. Let's go ahead and rename the functions to fibonacci and factorial instead. Now when we revisit these functions in the future, it will be easier for us to figure out what these functions do.

In the main function of part2.cpp, we notice that there are a few variables.

unsigned int a = calculate1(10) - 13;
unsigned int b = calculate2(8) + 0x629be;
unsigned int magic = ~(((a + b - 0x629b1) >> 8) ^ 0x88) & 0x3f;

Again, we can only tell that from these variables that hold the results of some important values that we gained from using the 'fibonacci' and the 'factorial' functions, but if this program was a hundred lines longer, we may forget what these variables are supposed to represent. Let's change a to be fibonacciResult and b to be factorialResult, so that a few hundred lines later, we still remember what these variables hold.

4 - Assignment Part 1: Clean-up

In assign1.cpp, we have a program that randomly generates two numbers, divide the bigger number by the smaller one, and print the quotient and the remainder. Identify a common pattern and eliminate duplicate code either by changing the logic or by creating a function.

In assign2.cpp, we've included a completed but badly written program that uses the quadratic equation to compute one of the two roots of a few preset quadratic equations. Your job is to clean up existing code.

5 - Assignment Part 2: GTests

Docker users: If this part doesn't work for you, partner up with a classmate, and pair program to find the bugs!

We hope this is useful information that you'll use for future assignments in this course, and in the rest of your CS career at school and beyond.

But you're not quite done after you write good code — you should also be writing tests. A lot of you gave the feedback that the gtest lab was useful, but that you wanted more practice actually writings tests. So, you're going to do just that for checkoff today!

In the gtest folder, we have provided an implementation of ArrayList for you - there is a header file arraylist.h, with the specification of the expected function in the comments, and a pre-compiled the ArrayList implementation. You do not have access to the source code. We do this because again, we do not want you to think about test cases in terms of the source code, but in a wider perspective of "how things should work".

There are three bugs in the source code, and it is your task to find two of the problems by writing a good test suite. Write a set of tests for each of add(), set(), and remove() according to the specification provided in the header file, and thinking about edge cases. Find two of the three bugs and you're good to go.

We have written a sample test case for you for the get() function, although it is by no means an exhaustive test case. There is also a simple Makefile included for you, simply compile and run the test with make tests.

Some tips:

Check off:

To get checked off:

  1. Eliminate duplicate code in assign1.cpp.
  2. Move repeated code in assign2.cpp into function(s), and name the function(s) properly!
  3. Modify variable names so that they're meaningful.
  4. Compile the code with make assign1 and make assign2. Make sure that none of the code functionality has changed by ensuring that you get the expected outputs.
  5. Write additional test cases to find 2/3 bugs for the ArrayList