Wednesday, 30 January 2013


Encapsulate what varies

This article is motivated by a technical discussion that I happened to be part of, a couple of days back. It is about exploring the application of a very fundamental design principle which is "Identify the aspects of your application that vary and separate them from what stays the same". This is one principle that is at the heart of several other design principles and design patterns.

The real world problem that we discussed is about implementing a function to map student marks to grades. A student may score marks from 0 to 100 in an exam. The grade scheme is as follows:

Marks       Grade
00 - 09        F
10 - 19        F
20 - 29        F
30 - 39        F
40 - 49        F
50 - 59        E
60 - 69        D
70 - 79        C
80 - 89        B
90 - 100      A

The requirement is to write a C++ function that accepts the students Mark as argument and returns the corresponding Grade.

std::string GetGrades(int Marks)

There are many many ways of doing this. One of the approaches involves writing a 'if then else' loop as shown. An alternate approach uses switch-case statements.

#include <iostream>
#include <string>
 
std::string GetGrades(int Marks)
{
   std::string Grade = "Grade Not Found";
 
   if(Marks >= 0 and Marks < 10) Grade = "F";
   else if(Marks >= 10 and Marks < 20) Grade = "F";
   else if(Marks >= 20 and Marks < 30) Grade = "F";
   else if(Marks >= 30 and Marks < 40) Grade = "F";
   else if(Marks >= 40 and Marks < 50) Grade = "F";
   else if(Marks >= 50 and Marks < 60) Grade = "E";
   else if(Marks >= 60 and Marks < 70) Grade = "D";
   else if(Marks >= 70 and Marks < 80) Grade = "C";
   else if(Marks >= 80 and Marks < 90) Grade = "B";
   else if(Marks >= 90 and Marks <= 100) Grade = "A";
 
   return Grade;
}
 
int main()
{
   std::cout << Grades(99) << std::endl;
}

What is your initial thought on this implementation? What are its strengths and weaknesses? Specifically, does this design comply with 'Encapsulate What Varies principle'? What do you think about the nested 'if else' logic implementing the mapping algorithm?

Immediately, one thing that is apparent is that the function 'GetGrades' can be considered to be doing two things. It determines if the input 'Marks' falls within a certain range. It also determines what 'Grade' to return. This means that this function changes when either of the above requirements (a.k.a business rules) change.

To appreciate the inherent problem in this code, let's consider that the mapping of Marks to Grades changes, and Marks between 40-49 are also considered as "E". Obviously our 'GetGrades' function changes. If the 'Mark' ranges are now required to be in steps of 20 instead of 10 (with corresponding adjustments in the 'Grades'), our 'GetGrades' function needs modification.

Whenever a module changes for more than one reason, it strongly indicates a suboptimal design.

Is it possible to make the logic inside of 'GetGrades' more robust in the event of change of the mapping of Marks to Grades? Can we make 'GetGrades' more maintainable?

Let's look at the implementation more carefully.

First, let's look at what varies.

Here, there are two things which vary.

-The Mark Range identified by beginning of a mark range and end of the mark range, and
-The mapping of the mark range to Grade

Now, let's look at what does not vary?

The logic to map Marks to a Grade does not vary. The logic simply checks if the input Mark falls in a certain range and returns the appropriate Grade.

So, now let us look at how to separate them out to comply with 'Encapsulate what varies' principle.

Firstly, let us use std::pair<int, int> to denote the concept of a Mark Range.

typedef std::pair<int, int> MarkRange;

and secondly, let's denote the concept of Grades by a simple typedef as shown

typedef std::string Grade;

So, now our problem statement now boils down to 'Given a MarkRange, Retrieve the Grade'. This very strongly calls out for an association between MarkRange and Grade. This can be depicted in our code using an associative STL container e.g. std::map.

typedef std::map<MarkRange, Grade> MarkToGrade;

The client code can statically (or dynamically in a for loop e.g.) initialize an object of type MarkToGrade as shown below (C++11).

MarkToGrade Grademap {
   make_pair(std::make_pair(0, 9), "F"),
   make_pair(std::make_pair(10, 19), "F"),
   make_pair(std::make_pair(20, 29), "F"),
   make_pair(std::make_pair(30, 39), "F"),
   make_pair(std::make_pair(40, 49), "F"),
   make_pair(std::make_pair(50, 59), "E"),
   make_pair(std::make_pair(60, 69), "D"),
   make_pair(std::make_pair(70, 79), "C"),
   make_pair(std::make_pair(80, 89), "B"),
   make_pair(std::make_pair(90, 100), "A"),
};

We are now almost done. All that is need is to rewrite our immutable logic to determine the 'Grade' from 'MarkRange' (C++11)

Grade GetGrades(int Marks)
{
   Grade g = "Grade Not Found";
   
   for(auto const &entry : Grademap)
   {
      if(Marks >= entry.first.first && Marks <= entry.first.second)
      {
         g = entry.second;
         break;
      }
   }
 
   return g;
}

Notice, that the implementation is now much more robust in the event of a change. The function 'GetGrades' does not change if the Mark to Grade mapping is altered. What changes is just the 'Grademap'. We say that 'GetGrades' is open for extension but closed for modification.

By encapsulating what varies and separating it out from what remains same, we have effectively achieved better organized code. We comply strongly with Single Responsibility Principle and with Open Closed principle.

Here is the full listing of the code.

#include <iostream>
#include <map>
#include <string>
 
typedef std::string Grade;
typedef std::pair<int, int> MarkRange;
typedef std::map<MarkRange, Grade> MarkToGrade;
 
MarkToGrade Grademap {
   make_pair(std::make_pair(0, 9), "F"),
   make_pair(std::make_pair(10, 19), "F"),
   make_pair(std::make_pair(20, 29), "F"),
   make_pair(std::make_pair(30, 39), "F"),
   make_pair(std::make_pair(40, 49), "F"),
   make_pair(std::make_pair(50, 59), "E"),
   make_pair(std::make_pair(60, 69), "D"),
   make_pair(std::make_pair(70, 79), "C"),
   make_pair(std::make_pair(80, 89), "B"),
   make_pair(std::make_pair(90, 100), "A"),
};
      
Grade GetGrades(int Marks)
{
   Grade g = "Grade Not Found";
   
   for(auto const &entry : Grademap)
   {
      if(Marks >= entry.first.first && Marks <= entry.first.second)
      {
         g = entry.second;
         break;
      }
   }
 
   return g;
}
 
int main()
{
   std::cout << GetGrades(99) << std::endl;
   std::cout << GetGrades(9) << std::endl;   
}

Obviously, this implementation is not perfect and has lots of scope of improvement. As an example, the design can be refactored to use an associative container such as the C++11 std::unordered_multimap which is more suited for faster searches (as we can expect more searches than insertion/deletion in our problem domain). Nevertheless, the solution does highlight the refactorings which are relevant to the topic of this discussion 'Encapsulate what varies'.

Let me know your thoughts and comments.


Friday, 25 January 2013

Rvalue References - Learning 1


This post represents my first encounter with rvalue references. The following are noteworthy:

1. Function 'increment' can be overload based on lvalue and rvalue reference. The main function calls 'increment' and based on the type of the operand, the appropriate overload is selected.
2. Rvalues prefer binding to Rvalue reference and Lvalues prefer binding to Lvalue reference. Note that postfix operator ++ returns an unnamed temporary and is an Rvalue
3. Lambda functions can be declared in namespace scope and can also be declared to accept Lvalue and Rvalue references, just like ordinary functions
4. However, to invoke the lamda function by name, we need to store the value of the lambda expression in a variable. Since it is very difficult to spell out the exact type of the lambda function, the 'auto' keyword is convenient, which enables the compiler to auto deduce the type of the variable based on the type of the initializer lambda expression
5. Using lambda functions however does not allow overloading, and the caller needs to use the appropriate lambda expression variable based on whether the argument is Lvalue or Rvalue e.g.


#include <iostream>

void increment(int &x) {
   std::cout << "Lvalue reference" << std::endl;
   x = x + 1;
}

void increment(int &&x) {
   std::cout << "Rvalue reference" << std::endl;
   x = x + 1;
}

auto r1 = [](int &x) {   std::cout << "Lvalue Lambda reference" << std::endl;
   x = x + 1;
};


auto r2 = [](int &&x) {   std::cout << "Rvalue Lambda reference" << std::endl;
   x = x + 1;
};


int main()
{
   int x = 2;
   increment(x);        // calls Lvalue version
   increment(x++);      // calls Rvalue version
   increment(++x);      // calls Lvalue version
   increment(x + 1);    // calls Rvalue version

   r1(x);               // calls Lvalue version Lambda
   r2(x++);             // calls Rvalue version Lambda
   r1(++x);             // calls Lvalue version Lambda
   r2(x + 1);           // calls Lvalue version Lambda

   std::cout << x << std::endl;
}

The output of the code shown is

Lvalue reference
Rvalue reference
Lvalue reference
Rvalue reference
Lvalue Lambda reference
Rvalue Lambda reference
Lvalue Lambda reference
Rvalue Lambda reference
10

Thursday, 24 January 2013

C++ Templates - POI

In this post, I will briefly discuss about some of the key concepts related to templates.

// Primary Template
template <typename T = bool, typename U = int, typename V = long>
class Bark
{
public:
    void bark() {
        std::cout << 0;
    }
};
// Partial Specialization
template <typename U, typename V> class Derived : public Bark<bool, U, V> // <----Line1 { public: void bark() { this->Bark<bool, U, V>::bark(); std::cout << 1; } }; #if 1 template<typename U, typename V> // <----Line2 class Bark<bool, U, V> { public: void bark() { static_assert(std::is_same<U, int>::value, "Types don't match"); static_assert(std::is_same<V, long>::value, "Types don't match"); std::cout << 2; } }; #endif
// <----POIs (2) (3) (1)                   // <----Line3
int main() { Derived<int, long> d; d.bark(); // This instantiates the partial specialization // of class Base Bark<bool> b; b.bark(); // prints 2 }

In the code snippet above, we define the primary class template Bark that has three type parameters each having a default value.

The source code also defines a partial template specialization of the primary class template Bark where the type parameter T is fixed as 'bool'. There is nothing esoteric about this except for noticing the fact that this partial class template specialization can be instantiated without specifying the type arguments for U and V. This is because, the partial class template specialization for Bark picks up the default parameters for U and V from the primary class template. Notice the type Bark<bool> used to define 'b' in main which just specifies only one argument (the first argument) while instantiating the partial class template specialization.

This important principle is used in several applications of templates. One notable use of this property is in the usage of the class template enable_if.

The second concept illustrated in the code pertains to the concept of what is known as the Point of Instantiation. There are points in the source of template clients where a C++ compiler must have access to the declaration or the definition of a template entity. These are called POI (Point of Instantiation).

When the compiler sees the declaration of 'd' in main, it tries to instantiate the primary class template Bark. Instantiation of this implicitly requires instantiating a specialization of Bark with T = bool. However, the explicitly declared partial specialization of Bark for T = bool on Line 2 is not visible at Line1. The compiler can however try to explicitly generate a partial specialization of Bark for T = bool. However, that would then lead to duplicate declaration later on Line2.

The C++ standard has clear rules (very terse to read) for Point of instantiation of templates. Let's understand some of the rules applicable in our code example.

There are three POIs in our code sample. These are

1) POI for the class template Derived<U, V>. Note that Derived is a class template

2) POI for the specialization of Bark<bool, U, V> and,

3) POI for the specialization of the member function Bark<bool, U, V>::bark

The C++ standard specifies that the
 
1) POI for Derived<U, V> is just before the beginning of the namespace scope function 'main' which refers to the instantiation of Derived<U, V>. This can be considered as the line marked as Line3, just before the 'main' function.

2) POI for specialization of Bark<bool, U, V> is just before the POI of 1) above

3) POI for specialization of the member function Bark<bool, U, V>::bark is the same as that of 2) above

The commented line just before 'main' indicates that the POI for (2) and (3) are the same, and the POI for (1) follows the POI for (2) and (3).

The guiding clauses for the above observations are as follows:

a) $14.6.4.1/2 - "....Otherwise, the point of instantiation for such a specialization immediately precedes the namespace scope declaration or definition that refers to the specialization."

b) $14.6.4.1/2 - "For a class template specialization...if the context from which the specialization is referenced depends on a template parameter,...the point of instantiation is immediately before the point of instantiation of the enclosing template."

c) $14.6.4.1/1 - "...or a specialization for a member function...the point of instantiation of the specialization is the point of instantiation of the enclosing specialization."

With the POIs clearly defined, the compiler now sees the partial specialization of the primary template 'Bark' on Line2 while instantiating Derived<U, V>. This is because the user specified partial specialization Bark<bool, U, V> is visible from the POI of Derived<U, V> which is (c) on Line3.

Note that the declaration Bark<bool> partial specialization is not instantiated again while defining 'b' in 'main'. This is because, the partial specialization Bark<bool, U, V> has already been instantiated while instantiating the Derived<int, long>.

It is important to realize that if the code within #if1 is put under #if 0 (thereby suppressing the definition of the  user specified partial class specialization), the resulting code is still valid. In this case, once again all the POIs are on Line3, but since no user provided partial specialization of Bark is visible, the compiler generates the partial specialization of Bark for T = bool from the primary class template Bark<T, U, V>

Wednesday, 23 January 2013

Determining size of a C style array

Determining the size of a C style array at compile time is a requirement in many situations. Here are some of the possible ways to do this in C++11.


#include <iostream>

template<int N> struct ARRAYSIZEHELPER
{
    enum { mysize = N };
};
template<class T, int N> ARRAYSIZEHELPER<N> SizeHelper(T (&)[N]);


template<class T, int N> constexpr int SizeHelper2(T (&)[N])
{
    return N;
}

template<class T, int N> T (&SizeHelper3(T (&)[N]))[N];

// C++03 and C++11
#define ARRAYSIZE1(arg) (sizeof(arg)/sizeof(arg[0]))

// C++11 only
#define ARRAYSIZE2(arg) (decltype(SizeHelper(arg))::mysize)

// C++11 only
#define ARRAYSIZE3(arg) (SizeHelper2(arg))

// C++03 and C++11
#define ARRAYSIZE4(arg) (sizeof(SizeHelper3(arg))/sizeof(arg[0]))

int main()
{
    struct S {
        S() {}
        virtual void foo() {}
    private:
        int x;
    };

    // C style array of chars
    char buf1[10];

    // Using first method
    std::cout << ARRAYSIZE1(buf1) << std::endl;

    // using second method
    std::cout << ARRAYSIZE2(buf1) << std::endl;

    // using third method
    // constexpr facilitates determining size of buf2 at compile
    // time
    char buf2[2 * ARRAYSIZE3(buf1)];
    std::cout << ARRAYSIZE3(buf2) << std::endl;

    // constexpr facilitates determining size of buf2 at compile
    // time
    char buf3[2 * ARRAYSIZE3(buf2)];

    // using fourth method
    std::cout << ARRAYSIZE4(buf3) << std::endl;

    S buf4[60];
    // using fourth method
    std::cout << ARRAYSIZE4(buf4) << std::endl;
}

The first approach using ARRAYSIZE1 is the traditional approach used by lots of C and C++ code. It however lacks type safety since it can be used with a pointer argument (e.g. char *) and yield a wrong result without any indication whatsoever.

The second approach using ARRAYSIZE2 requires instantiation of ARRAYSIZEHELPER. It relies on the decltype feature of C++11. In any realistic project, this could pose a problem as it could trigger several unique instantiations of ARRAYSIZEHELPER which serve no other purpose other than returning the size of their argument. This is surely an overkill.

The third approach using ARRAYSIZE3 requires instantiation of the function template SizeHelper2 and relies on the constexpr feature of C++11. Once again, in any realistic project, this could pose a problem as it could trigger several unique instantiations of SizeHelper2 function template, which serve no other purpose other than returning the size of their argument. This is surely an overkill.

The last approach is my favourite. It translates the non type template parameter 'N' representing the size of the array function argument into the return type of the function template instantiation. Also, it just triggers the declaration of the prototype of the function template SizeHelper3 which isn't so much of a strain on the build process.