Written by: Kyle Oliver and Milad Fatenejad

C++ Boot Camp - Object-Oriented Programming

C++ was designed to support object-oriented programming. This programming paradigm makes certain kinds of modeling very rich and natural, and it allows for complex code that is still relatively easy to maintain and collaborate on with others. While powerful, object-oriented programming requires a different mindset from that used in writing procedural or function-based scripts and programs. Don't let that scare you--just remember that when using C++, it's best to try to develop an object-oriented mindset. Hopefully, this will change the way you program and help make your code much more modular and reusable.

Objects

The most important thing to remember about objects is that they have both state and behavior. Their state is determined by member data, their behavior by member functions, or methods. For instance, you can imagine that an object representing a student might have member data representing a GPA, one or more majors, etc. Student behavior might include member functions to finishHomework() or sleep(8) sleep(4) or attendHackerWithin(). We define the possible state and behavior of some type of object using a class.

Classes

A class is like a blueprint for building a certain type of object. This is the programming construct that allows you to define what state and behavior your objects will have. In today's lesson, we'll learn about classes by spending some time defining an Organism class for a population dynamics simulation. Our eventual goal is to run a simulation of the plants and animals living in a forest. This simulation could help us gain some insights about how their respective populations depend on each other and fluctuate over time, subject to some assumptions about the mechanisms by which they live, reproduce, and die.

Please note that Kyle and Milad are nuclear engineers, not ecologists. We chose an example outside our field because we wanted to be sure everyone would have some shared intuition about the system. Our goal here is not to accurately model the way organisms live and die in the wild but to, if you will, model the modeling process itself. We hope that in doing so we can teach by example something of what it takes to think like an object-oriented programmer.

Member Declarations

To help separate the design of a class from its implementation, the standard practice is to declare its member data (state) and methods (behavior) in a header file separate from the actual definition of those member data and methods. In C++ syntax, we can define a class in a header file as follows:

// Organism.hpp

class Organism
{
  ... // declarations of member data and methods go here
};

If you discipline yourself to "sketch out" your class in a header file before actually implementing it, you set yourself up to anticipate "big picture" design flaws before you've wasted any time writing code based on those flaws. In that spirit, let's think a bit about what our Organisms need to "know about themselves" (i.e., their state) and be able to do (i.e., their behavior). First of all, we need to keep track of what kind of organism each object (or instance) of this class will be. Next, we assume for now that they have infinite food, so they will live to some given life expectancy and then simply die of old age. Finally, since our population wouldn't be very "dynamic" if that's all that we modelled, let's also incorporate a means for them to reproduce.

We ought to be able to capture all of those behaviors by giving each object four pieces of information about itself and two functions to help it do what it needs to do. Thus, we declare four variables' worth of member data and two methods in our blueprint for how to build Organisms:

// Organism.hpp

class Organism
{
  // MEMBER DATA

  std::string name; // what type of organism am I?

  int age; // how old am I?

  int maxAge; // how old will I be when I die?

  double reproProb; // what's my probability of reproduction?

  // MEMBER FUNCTIONS (METHODS)

  // creates a new organism object with a certain state
  Organism(const std::string& n, double rProb, int max);

  // simulates the Organism's life for one time step and returns 
  // a value that indicates what happened to it
  int advance(); 
};

Constructors

You may have noticed that one of the two methods we declare in our class definition doesn't have a return value. That's because it's a special method called a constructor. A constructor is the function you call when you want to instantiate (build an instance of) an object, and it's implied that it returns that new object. You can always tell that a function is a constructor because it has the same name as the class itself. We've actually already seen some constructors, including the one for the string class. The following code calls the string class constructor with an argument that initializes the string object it creates. Both calls are legitimate C++ syntax; the second drives home the point that a constructor is just a function that instantiates and returns a new object. The type of the object that gets returned is just the name of the class:

#include <string>

int main()
{
  std::string str1("Hello, World!"); // initializes a string to "Hello, World!"
  std::string str2 = std::string("Hello, World!"); // equivalent, shows the function call explicitly
  
}

Thus, you can probably guess what our constructor for the Organism class does; it assigns the arguments given in the function call to the appropriate variables that serve as member data. Our implementation of the Organism class constructor looks like this:

#include "Organism.hpp"

Organism::Organism(const std::string& n, double rProb, int max)
{
  name = n;
  reproProb = rProb;
  maxAge = max;
  age = 0;
}

Aside: Scope Resolution Operator
We're now in a better position to understand the double-colon symbol we've been using in conjunction with the namespace std. It's called the "scope resolution operator," and it's used "to qualify hidden names so that you can still use them" (thanks IBM). So when we write std::cout, we're telling the complier "I know I haven't defined anything called cout in this scope (or maybe I have but that's not what I'm trying to refer to), so please know that I'm talking about the cout defined in the namespace called std."

In the context of object-oriented programming, we use the scope operator to tell the compiler that we're defining a method for a particular class, not just some unattached global function. That's why we have this strange constructor notation above

Organism::Organism(const std::string& n, double rProb, int max)

The Organism:: part tells the compiler "OK, I'm about to define a function that belongs to the Organism class. As we just learned, that function happens to also be called Organism, because it's a constructor. So if we were defining the method advance(), we'd type Organism::advance().

Encapsulation and Interfaces

Before we go any further, we need to introduce the object-oriented idea of encapsulation. Also called "information hiding," this concept is key to enabling many programmers to work together on a complex piece of code while introducing as few bugs as possible. Wikipedia currently describes the concept as follows:

In computer science, encapsulation is the hiding of the internal mechanisms and data structures of a software component behind a defined interface, in such a way that users of the component (other pieces of software) only need to know what the component does, and cannot make themselves dependent on the details of how it does it.

Even if all the components are part of the same piece of software, encapsulation allows you to say to your collaborators "I'll take care of this Organism class; here's what I promise it will be able to do." But that last part of the definition (and the reason why "information hiding" is a pretty apt description of this concept) is the most important part. The internal workings of your classes may be (indeed, probably are) in a constant state of flux. You do not want people who use your class to write a bunch of code that depends on some piece of data that you might need to change. Hiding the implementation of your class also makes your code much more modular and flexible. If you decide at some point that an integrate() method you've written for some kind of mathematical function class should use a lookup table rather than some more complex numerical integration scheme, you don't need to tell users of the class that you've changed anything; you simply change the hidden implementation.

Here's a more concrete example. In addition to being "born" (constructed), our Organisms need to be able to advance through a single time step, possibly reproducing or dying in the process. In our header file, we've declared that Organism::advance() returns an integer that indicates which of the three possibilities actually occurred. That's all the user of an Organism needs to know, even though we know that (for now), the method works like this:

int Organism::advance()
{
  // if we're too old, die:
  if(age == maxAge)
    return -1;
        
  // otherwise we're one year older
  age ++;

  // figure out if we reproduce

  // get a double between 0 and 1, make sure to avoid integer divition:
  double randNum = rand() / (static_cast<double>(RAND_MAX)); // rand() calls built-in C RNG

  if (randNum < reproProb) {
    return 1;
  }
}

In the more complex model that we'll elaborate soon, we've actually changed how Organisms decide if they live, die, or reproduce. But any code that uses Organisms (we'll see some soon) wouldn't need to be re-written; only the implementation has changed, not the interface itself.

The mechanism for encapsulation in languages like C++ and Java are the words public and private. Data and methods that are declared public can be accessed from outside of the class encapsulation; private data and methods are hidden from the user of your class. Thus, we're now ready to see the full source code for our Organism class (for now, don't worry about the copy constructor, Organism(const Organism& org)):

Organism.hpp

Line 
1#ifndef _ORGANISM_HPP_
2#define _ORGANISM_HPP_
3
4#include <string>
5 
6class Organism
7{
8
9private: // hidden data and methods go here
10       
11  /**
12   * Reproduction probability for this species each time step.
13   */
14  double reproProb;
15
16  /**
17   * The maximum age for this organism
18   */
19  int maxAge;
20
21  /**
22   * This Organism's current age.
23   */
24  int age; 
25 
26  /**
27   * The type of organism.
28   */
29  std::string name;
30
31public:
32
33  /**
34   * Creates a new Organism of the given name, reproduction
35   * probability and maximum age.
36   */
37  Organism(const std::string& n, double rProb, int max);
38
39  /**
40   * Copy an organism.
41   */
42  Organism(const Organism& org);
43
44  /**
45   * Advances this Organism through a time step. Returns -1 if the organism
46   * died, 1 if the organism reproduced, zero otherwise.
47   */
48  int advance();
49
50};
51
52#endif
53}

Organism.cpp

Line 
1#include <cstdlib>
2#include "Organism.cpp"
3
4Organism::Organism(const std::string& n, double rProb, int max)
5{
6  name = n;
7  reproProb = rProb;
8  maxAge = max;
9  age = 0;
10}
11
12int Organism::advance()
13{
14  // if we're too old, die:
15  if(age == maxAge)
16    return -1;
17       
18  // otherwise we're one year older
19  age ++;
20
21  // figure out if we reproduce
22
23  // get a double between 0 and 1, make sure to avoid integer divition:
24  double randNum = rand() / (static_cast<double>(RAND_MAX));
25
26  if (randNum < reproProb) {
27    return 1;
28  }
29}
30
31Organism::Organism(const Organism& org)
32{
33  name = org.name;
34  reproProb = org.reproProb;
35  maxAge = org.maxAge;
36  age = 0;
37}

Aside: Documenting Interfaces
We can loosely associate header files with interfaces. In fact, automatic documentation software like doxygen usually assumes that it's generating an interface for users to reference. Consequently, private data and methods are usually left out of the documentation by default. They wouldn't do the user any good anyway, unless he or she could actually change the encapsulated implementation of the class by changing its source code.

Those funny-looking comments in our header file,

/**
 * the ones that look like this,
 */

are specially formatted to be recognized and included in automatically generated documentation, which usually ends up looking something like this.

Aside: An Important Pre-Processor Trick
If you think carefully about Organism.hpp, you'll realize that we've declared a bunch of member data and methods for the Organism class. But the act of declaring its member data and methods is how you actually define a class. And remember what we said on the first day of class: you can only define something once.

Now imagine we have some other piece of code, say main.cpp below. This file #includes both Organism.hpp and another header file, Time.hpp. We'll talk more about what the Time class does in a moment, but for now let it suffice to say that Time.hpp also includes Organism.hpp. Thus, unless we do something tricky, when the compiler sees these two lines in main.cpp

#include "Organism.hpp"
#include "Time.hpp"

it's going to freak out. Can you see why?

Remember, all #include does is tell the preprocessor to glue in some code from another file. So in the code above, it glues in the text from both header files. But if Time.hpp also #includes Organism.hpp, then the second line will cause the preprocessor to glue in a second copy of the definition of the Organism class.

We prevent the compiler from freaking out by adding the first, second, and last lines of Organism.hpp. These are preprocessor directives that say, in effect, "If you haven't already pasted the code from this file into the text that the compiler's going to read, go ahead and do it now. But remember that you did, because I might #include this file again letter, and if I do so, I don't want you to paste it again, less the compiler freak out."

You get the preprocessor to "remember" whether it's added this header yet by defining some symbol (in this case, _ORGANISM_HPP_). It doesn't matter what you name the symbol, but it's helpful if you choose some consistent convention.

Using Objects: Population Dynamics, Take 1

OK, now that we've created a blueprint for building Organisms, let's build some and let them do something interesting. Let's say some hypothetical colleagues have written for us another class, called Time. The Time class has the following public interface:

  /**
   * Creates a new Time object to run a simulation from some start
   * time to some end time.
   */
  Time(int start, int end);

  /**
   * Takes the simulation through a single timestep. Returns true if
   * there are more timesteps to step through, false otherwise.
   */
  bool step(std::list<Organism>& orgs);

  /**
   * Returns the current time.
   */
  int getTime() {return t;}

Aside: Inline Functions
The third function in the Time class interface deserves a couple words of explanation. This method is an example of an "accessor method" (sometimes called a "get" method). Its obvious function is to give the caller access to some aspect of the state of the object. It's also possible to give the user access to that information by making the state data public, but you may not want the user to know how you're recording that state, and you probably don't want the user to try to change that state in a way you didn't intend.

However, accessor methods are very simple, they usually get called a lot, and they don't usually "give away" anything of interest about your implementation. That makes them good candidates for inlining, which is where you both declare and define the method in the same place. There are some space and speed benefits to inlining that we don't want to have to go into here, but note also that there are some drawbacks. In this course, we just wanted you to have seen an inline function, because you'll certainly run across them.

Notice that just by reading this interface, we have a pretty good idea of what to do in order to use the Time class together with our Organism class to run a simulation. We can use the Time constructor to build a new Time object that will control our simulation for us. Then we simply create some collection of Organisms (in this case a list, which is like a vector but has different methods and a different implementation) and pass it by reference repeatedly to the Time object's step() method until it returns false, which means the simulation has ended. That's exactly what our program in main.cpp does. Have a look at the program for yourself to try to understand what's going on, and then we'll offer a bit of explanation below.

Line 
1#include "Organism.hpp"
2#include "Time.hpp"
3#include <iostream>
4#include <list>
5
6int main()
7{
8
9  const int YEAR = 12;
10
11  Organism rabbit("rabbit", 1.0/YEAR, YEAR);   
12
13  std::list<Organism> orgs;
14  int i;
15  for(i = 0; i < 1000; i++) orgs.push_back(rabbit);
16
17  Time time(0, 2000*YEAR);
18  while(time.step(orgs)) {
19    std::cout << time.getTime() << "\t" << orgs.size() << std::endl;
20  }
21 
22
23  return 0;
24}

As you can probably tell, the main points of the program are as follows:

  • Line 11: Call the Organism constructor to instantiate an Organism object representing a rabbit. It will live for one year and have a 1-in-12 chance of reproducing each month.
  • Line 15: Put 1,000 copies of that rabbit into a big list.
  • Lines 17-20: Create a Time object and use it to simulate the life of those 1,000 rabbits and their offspring for 2,000 years. We'll also print out the population each month.

Note that to evaluate the condition in the while loop, we call the time object's step() method. Thus, the loop continues to execute until this method returns false, which means the simulation is complete. While we don't know for sure what step() actually does, we can guess that it somehow calls each Organism's advance() method each time step and kills them off or creates copies depending on what values advance() returns.

Keeping in mind that all the biology in our model is made up and hyper-simplistic, it's still fun to take a look at the results of this program. Let's go to our ~/cpp-bootcamp/oop/organism-simple directory. If we simply compile and run it as follows

g++ *.cpp
./a.out > data.txt

then we can plot the data we piped to the text file. It looks like this:

Plot of first run with simple model

The way we implemented reproduction means we should expect highly stochastic behavior; each rabbit has one offspring on average, but they can have as many as twelve and as few as zero. After a run of bad luck around t = 1,800 years, they go extinct. Don't worry, they had a good run.

Inheritance

So now what? How can we improve our model? Well, one improvement is to note that species' populations are always coupled to the populations of other species. However, the mechanisms for that coupling differ significantly. At the risk of overgeneralizing, let's make a primary distinction between plants and animals:

  • Animals eat either other animals or other plants. If there's insufficient numbers of the kinds of things that they eat, they will die.
  • Plants don't eat anything per se, but they compete with each other for physical space, nutrients in the ground, and sunlight. If there are two many other plants growing in the same area already, new seedlings probably won't survive.

So it would help our model if we could make some organisms that behave like plants and others that behave like animals. That should allow us to capture some of the interplay between their relative populations. But imagine how complex and bug-prone our code would get if we had to fully differentiate the behavior of many different types of organisms using conditionals within the relevant methods of the Organism class. For instance, we'd have to alter our advance() method to allow both kinds of behavior:

int Organism::advance()
{
  if (name == "grass" || name == "tree" || name == "bush" || name == "flower" || name == ...) {

    // behave like a plant
  }

  if (name == "rabbit" || name == "deer" || name == "hawk" || name == "bigfoot" || name == ...) {
    
    // behave like an animal
  } 
}

Of course, you could save yourself some work by writing a function that allows you to classify Organisms on the fly:

int Organism::advance()
{
  if (isAPlant(Organism o)) {

    // behave like a plant
  }
  if (isAnAnimal(Organism 0)) {

    // behave like an animal
  }
}

But that's still pretty inelegant, bug-prone, and (most of all) difficult to maintain (you have to remember to keep updating your isA...() functions each time you add support for a new species). Enter derived classes or, to use the fancy computer-science generalization, inheritance. Inheritance allows you to make new blueprints for more specialized objects. These derived classes, also called subclasses, inherit the state and abilities of the class from which they derive (their parent- or superclass), but they also have specialized abilities and may override the behavior of their parents if appropriate. Depending on the detail of your model, then, you might choose to "subclass" every species you want to include. For simplicity, we chose to create two subclasses: one for plants, and one for animals. The syntax for public inheritance (the only kind we'll discuss here) in C++ begins by tagging the class definition of the derived class with the name of the parent class.

//Plant.hpp
#include "Organism.hpp"
...
class Plant : public Organism {

private :
  
public :

  /**
   * Copies a plant.
   */
  Plant(const Plant& plant);

  /**
   * Creates a plant with the given name, reproduction rate, and
   * maximum age (see Organism class), as well as a maximum total
   * number of plants this one can coexist with.
   */
  Plant(const std::string& name, double rep, int maxage, int maxnum);

  /**
   * Maximum number of plants this one can coexist with.
   */
  int maxNum;

  /**
   * Plants can also die if there are too many other plants
   * around. This method checks 
   */
  virtual void specialAdvance(Forest& forest);

  /**
   * Creates a copy of this Plant and returns it as a pointer to an
   * Organism.
   */
  Organism* clone() const { return new Plant(*this); }
};

Subclass Constructors

We say that a subclass has an is-a relationship to its parent class, as in "a Plant is an Organism." Since a plant is an Organism, it's got the data and capabilities of an Organism, and so to construct a Plant, you have to call the Organism constructor. There's a specific syntax for doing so: you add a colon after the function signature and then call the relevant superclass constructor. In fact, this notation works for initializing member data for classes without inheritance as well.

//Plant.cpp
#include "Plant.hpp"
...

Plant::Plant(const std::string& name, double rep, int maxage, int maxnum) :
  Organism(name, rep, maxage), // initialize parent class
  maxNum(maxnum) // initialize member data
{
  // you could also do maxNum = maxnum; here
}

Private vs. Protected Data

One of the reasons we need this fancy parent class constructor notation is that your derived class may or may not have direct access to the member data of the parent class. Remember, in our original version of the Organism class, the age and maxAge members of the Organism class were private. So even though a Plant is an Organism, it can't access them. So if a Plant's plant-specific behaviors need access data that's declared in the Organism class, there are two options:

  • Only allow the derived class to access parent class data via public methods.
  • Change the tag on the private data to protected.

The latter option causes all the protected members of the parent class to be public to all derived classes but private to everyone else. It's not as attractive from an encapsulation perspective, but it's a little less cumbersome and is ultimately what we elected to go with for this example.

Polymorphism

To understand the last couple major concepts related to inheritance, it's helpful to see an example of how derived classes get used. A powerful consequence of inheritance is polymorphism, the idea that you can in some sense treat different inherited types the same way. For reasons that will become clear, we've added to our model a Forest class that, among other things, stores the collection of Organisms:

//Forest.hpp
#include "Organism.hpp"
...
class Forest {

public:

  std::list<Organism*> ecosystem; // stores all the organisms
...

};

Polymorphism allows us to treat Plants, Animals, and anything else that derives from Organism as if they were Organisms. Thus, we can put data of type Plant* and Animal* into this collection without fear of the compiler complaining.

#include "Forest.hpp"
#include "Plant.hpp"
#include "Animal.hpp"
#include <vector>

int main() {

  ...
  const int YEAR = 12;

  Forest forest;
  Organism mushroom("mushroom", 1.1/3.0, 3);
  Plant clover("clover", 1.1/3.0, 3, 3000);
  Animal squirrel("squirrel", 1.05/YEAR, YEAR, food);

  forest.ecosystem.push_back(&mushroom); 
  forest.ecosystem.push_back(&clover); // OK, Plants are Organisms
  forest.ecosystem.push_back(&squirrel); // OK, Animals are Organisms
  ...

}

But that's only the half of it...

Virtual Vs. Non-Virtual Functions

We can also, if we've designed our classes carefully, call some method on each Organism in the collection and be confident that each object will perform its specialized behavior as appropriate. But some thought is required to get those function calls to work right.

Let's assume for the moment that the public member function specialAdvance() has been defined in the Organism, Plant, and Animal classes to simply print "Organisms rule!", "Plants rule!", or "Animals rule!", as appropriate. If that were the case, then the following code would produce the following output.

code:

...
const int YEAR = 12;
Plant clover("clover", 1.1/3.0, 3, 3000);
Animal squirrel("squirrel", 1.05/YEAR, YEAR, food);
Organism mushroom("mushroom", 1.1/3.0, 3);

clover.specialAdvance();
squirrel.specialAdvance();
mushroom.specialAdvance();

output:

Plants rule!
Animals rule!
Organisms rule!

So far so good. But watch what happens when we call the specialized function via Organism pointers. That's how we'd be calling them if we put them in the Forest class's ecosystem member:

code:

...
const int YEAR = 12;
Plant clover("clover", 1.1/3.0, 3, 3000);
Organism* cloverOrgPtr = &clover;

Animal squirrel("squirrel", 1.05/YEAR, YEAR, food);
Organism* squirrelOrgPtr = &squirrel;

Organism mushroom("mushroom", 1.1/3.0, 3);
Organism* mushroomOrgPtr = &mushroom;

cloverOrgPtr->specialAdvance();
squirrelOrgPtr->specialAdvance();
mushroomOrgPtr->specialAdvance();

output:

Organisms rule!
Organisms rule!
Organisms rule!

Here the compiler has no way of knowing which version of specialAdvance() we want. Consequently, it gives us the Organism class's implementation, because we're calling it through pointers to Organisms.

If we don't like this outcome (and we certainly don't like it if specialAdvance() actually provides meaningful Plant- or Animal-specific behaviors), we need to use virtual functions. If you use the keyword virtual before declaring specialAdvance() in the Organism class, then you will always get the more specialized behavior, if it exists (that is, if you've decided the derived class should override the parent class's implementation).

code:

...
const int YEAR = 12;
Plant clover("clover", 1.1/3.0, 3, 3000);
Organism* cloverOrgPtr = &clover;

Animal squirrel("squirrel", 1.05/YEAR, YEAR, food);
Organism* squirrelOrgPtr = &squirrel;

Organism mushroom("mushroom", 1.1/3.0, 3);
Organism* mushroomOrgPtr = &mushroom;

// Now Organism::specialAdvance() is declared virtual!!
cloverOrgPtr->specialAdvance(); 
squirrelOrgPtr->specialAdvance();
mushroomOrgPtr->specialAdvance();

output:

Plants rule!
Animals rule!
Organisms rule!

Pure Virtual Functions and Abstract Classes

As we prepare to create a new subclass, we need to think about whether it still makes sense to create an object of the parent class. If not, we treat the parent class as abstract. An abstract class is any class that declares a pure virtual function, signified by adding = 0 between the declaration and the semicolon:

//Organism.hpp
...
virtual bool specialAdvance(Forest& forest) = 0; // pure virtual function
...

If a class defines a pure virtual function, the compiler won't let you instantiate an object of that class directly; you must instead construct one of its derived classes. This is useful if you want to force the user of your class to define their own subclass that makes use of the parent class's member data and methods in some intelligent way.

...
const int YEAR = 12;
Plant clover("clover", 1.1/3.0, 3, 3000);
Animal squirrel("squirrel", 1.05/YEAR, YEAR, food);
Organism mushroom("mushroom", 1.1/3.0, 3); // ERROR if Organism has pure virtual function(s)

Using Inheritance: Population Dynamics, Take 2

If you navigate to your ~/cpp-bootcamp/oop/organism-inheritance directory, you should now have enough knowledge of inheritance to figure out most of the features we've added to the model. These include the Forest class as well as the specialized Plant and Animal behavior we discussed at the beginning of this section (plants start start dying early when too many other plants of the same type already exist; animals can eat other plants and/or animals). Notice that the bulk of the complexity required to support this more sophisticated behavior is in the population "book-keeping" methods that we've placed in the Forest class. The Organism class has barely changed at all, and the Plant and Animal classes only need to implement Plant- and Animal-specific data and methods (specialAdvance()).

Feel free to compile from this directory as before and to play around with more complex simulations by changing main.cpp. If you decide you don't like the "biology" we've implemented, feel free to modify our source code as you see fit.

Attachments