Random learnings on C++

Monday 18 February 2013

FizzBuzz program

In this post, we will look at my implementation of the FizzBuzz program. Here is a brief background and writeup behind the very core idea of this seemingly simple programming assignment.

Very briefly the problem statement is to "Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz".

As is mentioned, "this sort of question won’t identify great programmers, but it will identify the weak ones. And that’s definitely a step in the right direction."

Here is what I could get in a short period of time after a couple of iterations. The nice thing which I like about my implementation is that it does not use any division or multiplication. It uses simple addition instead and there is no check for divisibility by 15 at all.

#include <iostream>

int main()
{
   int three {3}, five {5};
   std::string s, threes {"Fizz"}, fives{"Buzz"};
   for(int i = 1; i <= 100; i++)
   {
      s = "";
      if(i == three)
      {
         s = threes;
         three += 3;
      }
      
      if(i == five)
      {
         s += fives;
         five += 5;     
      } 
      
      if(!s.empty()) std::cout << s << std::endl;
      else std::cout << i << std::endl;
   }
}

Incidentally, the solution to this programming assignment in a variety of different languages here. Have a look for yourself.

Back to basics: Palindrome

In this post, we will look at the implementation of a program to determine if a given sequence of values is a palindrome.

Wikipedia defines palindrome as "... a word, phrase, number, or other sequence of units that may be read the same way in either direction, with general allowances for adjustments to punctuation and word dividers."

Note that palindrome applies not only to strings (strings of characters) but to sequences of units of other types as well. For the purpose of this article, we will ignore 'adjustments to punctuation and word dividers'

Here is a short program in C++11 that determines if a given sequence is a palindrome.

#include <iostream>
#include <string>
#include <vector>

class PalTester
{
public:
   template<class IT>
   bool IsPalindrome(IT itb, IT ite)
   {
      return isPalindrome(itb, ite);
   }
   
private:
   template<class IT>
   bool isPalindrome(IT itb, IT ite);
};

template<class IT>
bool PalTester::isPalindrome(IT itb, IT ite)
{
   --ite;
   
   while(itb < ite)
   {
      if(*ite != *itb)
      {
         break;
      }
      else
      {
         --ite;
         ++itb;
      }
   }

   return (ite <= itb);
}

int main()
{
   PalTester t;
   
   std::wstring s(L"MON");
   std::cout << t.IsPalindrome(s.begin(), s.end());
   
   std::vector<int> v{1, 2, 3, 4, 3, 2, 1};
   std::cout << t.IsPalindrome(v.begin(), v.end());
}

The important design highlights are as follows:

a) The interface of the class PalTester has a single member function template IsPalindrome which accepts an interator to the start and one past the end of any arbitrary sequence (similar to the way end of range iterators are handled in STL)

b) This makes the design to be consistent with the definition of a palidrome. Such an interface is also consistent with the design of several STL containers and algorithms which achieve container independence by working on appropriate iterator ranges. Note, how our example determines that the sequence of values (units of integer type) form a palindrome.

c) The function PalTester::IsPalindrome is a member function template. It requires bidirectional iterators which support prefix opertaor ++ and --. Also, the iterator needs to be deferenceable.

d) The elements of the sequence need to provide operator !=, opertaor < and operator <=.

c) Notice that the v is initialized using list initialization which is introduced in C++11.

Wednesday 30 January 2013

Encapsulate what varies

This article is motivated by a technical discussion that I happened to be part of, a couple of days back. It is about exploring the application of a very fundamental design principle which is "Identify the aspects of your application that vary and separate them from what stays the same". This is one principle that is at the heart of several other design principles and design patterns.

The real world problem that we discussed is about implementing a function to map student marks to grades. A student may score marks from 0 to 100 in an exam. The grade scheme is as follows:

Marks Grade

00 - 09 F

10 - 19 F

20 - 29 F

30 - 39 F

40 - 49 F

50 - 59 E

60 - 69 D

70 - 79 C

80 - 89 B

90 - 100 A

The requirement is to write a C++ function that accepts the students Mark as argument and returns the corresponding Grade.

std::string GetGrades(int Marks)

There are many many ways of doing this. One of the approaches involves writing a 'if then else' loop as shown. An alternate approach uses switch-case statements.

#include <iostream>
#include <string>
 
std::string GetGrades(int Marks)
{
   std::string Grade = "Grade Not Found";
 
   if(Marks >= 0 and Marks < 10) Grade = "F";
   else if(Marks >= 10 and Marks < 20) Grade = "F";
   else if(Marks >= 20 and Marks < 30) Grade = "F";
   else if(Marks >= 30 and Marks < 40) Grade = "F";
   else if(Marks >= 40 and Marks < 50) Grade = "F";
   else if(Marks >= 50 and Marks < 60) Grade = "E";
   else if(Marks >= 60 and Marks < 70) Grade = "D";
   else if(Marks >= 70 and Marks < 80) Grade = "C";
   else if(Marks >= 80 and Marks < 90) Grade = "B";
   else if(Marks >= 90 and Marks <= 100) Grade = "A";
 
   return Grade;
}
 
int main()
{
   std::cout << Grades(99) << std::endl;
}

What is your initial thought on this implementation? What are its strengths and weaknesses? Specifically, does this design comply with 'Encapsulate What Varies principle'? What do you think about the nested 'if else' logic implementing the mapping algorithm?

Immediately, one thing that is apparent is that the function 'GetGrades' can be considered to be doing two things. It determines if the input 'Marks' falls within a certain range. It also determines what 'Grade' to return. This means that this function changes when either of the above requirements (a.k.a business rules) change.

To appreciate the inherent problem in this code, let's consider that the mapping of Marks to Grades changes, and Marks between 40-49 are also considered as "E". Obviously our 'GetGrades' function changes. If the 'Mark' ranges are now required to be in steps of 20 instead of 10 (with corresponding adjustments in the 'Grades'), our 'GetGrades' function needs modification.

Whenever a module changes for more than one reason, it strongly indicates a suboptimal design.

Is it possible to make the logic inside of 'GetGrades' more robust in the event of change of the mapping of Marks to Grades? Can we make 'GetGrades' more maintainable?

Let's look at the implementation more carefully.

First, let's look at what varies.

Here, there are two things which vary.

-The Mark Range identified by beginning of a mark range and end of the mark range, and

-The mapping of the mark range to Grade

Now, let's look at what does not vary?

The logic to map Marks to a Grade does not vary. The logic simply checks if the input Mark falls in a certain range and returns the appropriate Grade.

So, now let us look at how to separate them out to comply with 'Encapsulate what varies' principle.

Firstly, let us use std::pair<int, int> to denote the concept of a Mark Range.

typedef std::pair<int, int> MarkRange;

and secondly, let's denote the concept of Grades by a simple typedef as shown

typedef std::string Grade;

So, now our problem statement now boils down to 'Given a MarkRange, Retrieve the Grade'. This very strongly calls out for an association between MarkRange and Grade. This can be depicted in our code using an associative STL container e.g. std::map.

typedef std::map<MarkRange, Grade> MarkToGrade;

The client code can statically (or dynamically in a for loop e.g.) initialize an object of type MarkToGrade as shown below (C++11).

MarkToGrade Grademap {
   make_pair(std::make_pair(0, 9), "F"),
   make_pair(std::make_pair(10, 19), "F"),
   make_pair(std::make_pair(20, 29), "F"),
   make_pair(std::make_pair(30, 39), "F"),
   make_pair(std::make_pair(40, 49), "F"),
   make_pair(std::make_pair(50, 59), "E"),
   make_pair(std::make_pair(60, 69), "D"),
   make_pair(std::make_pair(70, 79), "C"),
   make_pair(std::make_pair(80, 89), "B"),
   make_pair(std::make_pair(90, 100), "A"),
};

We are now almost done. All that is need is to rewrite our immutable logic to determine the 'Grade' from 'MarkRange' (C++11)

Grade GetGrades(int Marks)
{
   Grade g = "Grade Not Found";
   
   for(auto const &entry : Grademap)
   {
      if(Marks >= entry.first.first && Marks <= entry.first.second)
      {
         g = entry.second;
         break;
      }
   }
 
   return g;
}

Notice, that the implementation is now much more robust in the event of a change. The function 'GetGrades' does not change if the Mark to Grade mapping is altered. What changes is just the 'Grademap'. We say that 'GetGrades' is open for extension but closed for modification.

By encapsulating what varies and separating it out from what remains same, we have effectively achieved better organized code. We comply strongly with Single Responsibility Principle and with Open Closed principle.

Here is the full listing of the code.

#include <iostream>
#include <map>
#include <string>
 
typedef std::string Grade;
typedef std::pair<int, int> MarkRange;
typedef std::map<MarkRange, Grade> MarkToGrade;
 
MarkToGrade Grademap {
   make_pair(std::make_pair(0, 9), "F"),
   make_pair(std::make_pair(10, 19), "F"),
   make_pair(std::make_pair(20, 29), "F"),
   make_pair(std::make_pair(30, 39), "F"),
   make_pair(std::make_pair(40, 49), "F"),
   make_pair(std::make_pair(50, 59), "E"),
   make_pair(std::make_pair(60, 69), "D"),
   make_pair(std::make_pair(70, 79), "C"),
   make_pair(std::make_pair(80, 89), "B"),
   make_pair(std::make_pair(90, 100), "A"),
};
      
Grade GetGrades(int Marks)
{
   Grade g = "Grade Not Found";
   
   for(auto const &entry : Grademap)
   {
      if(Marks >= entry.first.first && Marks <= entry.first.second)
      {
         g = entry.second;
         break;
      }
   }
 
   return g;
}
 
int main()
{
   std::cout << GetGrades(99) << std::endl;
   std::cout << GetGrades(9) << std::endl;   
}

Obviously, this implementation is not perfect and has lots of scope of improvement. As an example, the design can be refactored to use an associative container such as the C++11 std::unordered_multimap which is more suited for faster searches (as we can expect more searches than insertion/deletion in our problem domain). Nevertheless, the solution does highlight the refactorings which are relevant to the topic of this discussion 'Encapsulate what varies'.

Let me know your thoughts and comments.

Friday 25 January 2013

Rvalue References - Learning 1

This post represents my first encounter with rvalue references. The following are noteworthy:

1. Function 'increment' can be overload based on lvalue and rvalue reference. The main function calls 'increment' and based on the type of the operand, the appropriate overload is selected.
2. Rvalues prefer binding to Rvalue reference and Lvalues prefer binding to Lvalue reference. Note that postfix operator ++ returns an unnamed temporary and is an Rvalue
3. Lambda functions can be declared in namespace scope and can also be declared to accept Lvalue and Rvalue references, just like ordinary functions
4. However, to invoke the lamda function by name, we need to store the value of the lambda expression in a variable. Since it is very difficult to spell out the exact type of the lambda function, the 'auto' keyword is convenient, which enables the compiler to auto deduce the type of the variable based on the type of the initializer lambda expression
5. Using lambda functions however does not allow overloading, and the caller needs to use the appropriate lambda expression variable based on whether the argument is Lvalue or Rvalue e.g.

#include <iostream>

void increment(int &x) {
   std::cout << "Lvalue reference" << std::endl;
   x = x + 1;
}

void increment(int &&x) {
   std::cout << "Rvalue reference" << std::endl;
   x = x + 1;
}

auto r1 = [](int &x) {   std::cout << "Lvalue Lambda reference" << std::endl;
   x = x + 1;
};


auto r2 = [](int &&x) {   std::cout << "Rvalue Lambda reference" << std::endl;
   x = x + 1;
};


int main()
{
   int x = 2;
   increment(x);        // calls Lvalue version
   increment(x++);      // calls Rvalue version
   increment(++x);      // calls Lvalue version
   increment(x + 1);    // calls Rvalue version

   r1(x);               // calls Lvalue version Lambda
   r2(x++);             // calls Rvalue version Lambda
   r1(++x);             // calls Lvalue version Lambda
   r2(x + 1);           // calls Lvalue version Lambda

   std::cout << x << std::endl;
}

The output of the code shown is

Lvalue reference
Rvalue reference
Lvalue reference
Rvalue reference
Lvalue Lambda reference
Rvalue Lambda reference
Lvalue Lambda reference
Rvalue Lambda reference
10

Thursday 24 January 2013

C++ Templates - POI

In this post, I will briefly discuss about some of the key concepts related to templates.

// Primary Template

template <typename T = bool, typename U = int, typename V = long>
class Bark
{
public:
    void bark() {
        std::cout << 0;
    }
};

// Partial Specialization
template <typename U, typename V>
class Derived : public Bark<bool, U, V>    // <----Line1
{
public:
    void bark() {
        this->Bark<bool, U, V>::bark();
        std::cout << 1;
    }
};

#if 1
template<typename U, typename V>           // <----Line2
class Bark<bool, U, V>
{
public:
    void bark() {
        static_assert(std::is_same<U, int>::value, "Types don't match");
        static_assert(std::is_same<V, long>::value, "Types don't match");
        std::cout << 2;
    }
};
#endif

// <----POIs (2) (3) (1)                   // <----Line3
int main()
{
    Derived<int, long> d;
    d.bark();
    // This instantiates the partial specialization
    // of class Base
    Bark<bool> b;
    b.bark();   // prints 2
}

In the code snippet above, we define the primary class template Bark that has three type parameters each having a default value.

The source code also defines a partial template specialization of the primary class template Bark where the type parameter T is fixed as 'bool'. There is nothing esoteric about this except for noticing the fact that this partial class template specialization can be instantiated without specifying the type arguments for U and V. This is because, the partial class template specialization for Bark picks up the default parameters for U and V from the primary class template. Notice the type Bark<bool> used to define 'b' in main which just specifies only one argument (the first argument) while instantiating the partial class template specialization.

This important principle is used in several applications of templates. One notable use of this property is in the usage of the class template enable_if.

The second concept illustrated in the code pertains to the concept of what is known as the Point of Instantiation. There are points in the source of template clients where a C++ compiler must have access to the declaration or the definition of a template entity. These are called POI (Point of Instantiation).

When the compiler sees the declaration of 'd' in main, it tries to instantiate the primary class template Bark. Instantiation of this implicitly requires instantiating a specialization of Bark with T = bool. However, the explicitly declared partial specialization of Bark for T = bool on Line 2 is not visible at Line1. The compiler can however try to explicitly generate a partial specialization of Bark for T = bool. However, that would then lead to duplicate declaration later on Line2.

The C++ standard has clear rules (very terse to read) for Point of instantiation of templates. Let's understand some of the rules applicable in our code example.

There are three POIs in our code sample. These are

1) POI for the class template Derived<U, V>. Note that Derived is a class template

2) POI for the specialization of Bark<bool, U, V> and,

3) POI for the specialization of the member function Bark<bool, U, V>::bark

The C++ standard specifies that the

1) POI for Derived<U, V> is just before the beginning of the namespace scope function 'main' which refers to the instantiation of Derived<U, V>. This can be considered as the line marked as Line3, just before the 'main' function.

2) POI for specialization of Bark<bool, U, V> is just before the POI of 1) above

3) POI for specialization of the member function Bark<bool, U, V>::bark is the same as that of 2) above

The commented line just before 'main' indicates that the POI for (2) and (3) are the same, and the POI for (1) follows the POI for (2) and (3).

The guiding clauses for the above observations are as follows:

a) $14.6.4.1/2 - "....Otherwise, the point of instantiation for such a specialization immediately precedes the namespace scope declaration or definition that refers to the specialization."

b) $14.6.4.1/2 - "For a class template specialization...if the context from which the specialization is referenced depends on a template parameter,...the point of instantiation is immediately before the point of instantiation of the enclosing template."

c) $14.6.4.1/1 - "...or a specialization for a member function...the point of instantiation of the specialization is the point of instantiation of the enclosing specialization."

With the POIs clearly defined, the compiler now sees the partial specialization of the primary template 'Bark' on Line2 while instantiating Derived<U, V>. This is because the user specified partial specialization Bark<bool, U, V> is visible from the POI of Derived<U, V> which is (c) on Line3.

Note that the declaration Bark<bool> partial specialization is not instantiated again while defining 'b' in 'main'. This is because, the partial specialization Bark<bool, U, V> has already been instantiated while instantiating the Derived<int, long>.

It is important to realize that if the code within #if1 is put under #if 0 (thereby suppressing the definition of the user specified partial class specialization), the resulting code is still valid. In this case, once again all the POIs are on Line3, but since no user provided partial specialization of Bark is visible, the compiler generates the partial specialization of Bark for T = bool from the primary class template Bark<T, U, V>

Wednesday 23 January 2013

Determining size of a C style array

Determining the size of a C style array at compile time is a requirement in many situations. Here are some of the possible ways to do this in C++11.

#include <iostream>

template<int N> struct ARRAYSIZEHELPER
{
    enum { mysize = N };
};
template<class T, int N> ARRAYSIZEHELPER<N> SizeHelper(T (&)[N]);


template<class T, int N> constexpr int SizeHelper2(T (&)[N])
{
    return N;
}

template<class T, int N> T (&SizeHelper3(T (&)[N]))[N];

// C++03 and C++11
#define ARRAYSIZE1(arg) (sizeof(arg)/sizeof(arg[0]))

// C++11 only
#define ARRAYSIZE2(arg) (decltype(SizeHelper(arg))::mysize)

// C++11 only
#define ARRAYSIZE3(arg) (SizeHelper2(arg))

// C++03 and C++11
#define ARRAYSIZE4(arg) (sizeof(SizeHelper3(arg))/sizeof(arg[0]))

int main()
{
    struct S {
        S() {}
        virtual void foo() {}
    private:
        int x;
    };

    // C style array of chars
    char buf1[10];

    // Using first method
    std::cout << ARRAYSIZE1(buf1) << std::endl;

    // using second method
    std::cout << ARRAYSIZE2(buf1) << std::endl;

    // using third method
    // constexpr facilitates determining size of buf2 at compile
    // time
    char buf2[2 * ARRAYSIZE3(buf1)];
    std::cout << ARRAYSIZE3(buf2) << std::endl;

    // constexpr facilitates determining size of buf2 at compile
    // time
    char buf3[2 * ARRAYSIZE3(buf2)];

    // using fourth method
    std::cout << ARRAYSIZE4(buf3) << std::endl;

    S buf4[60];
    // using fourth method
    std::cout << ARRAYSIZE4(buf4) << std::endl;
}

The first approach using ARRAYSIZE1 is the traditional approach used by lots of C and C++ code. It however lacks type safety since it can be used with a pointer argument (e.g. char *) and yield a wrong result without any indication whatsoever.

The second approach using ARRAYSIZE2 requires instantiation of ARRAYSIZEHELPER. It relies on the decltype feature of C++11. In any realistic project, this could pose a problem as it could trigger several unique instantiations of ARRAYSIZEHELPER which serve no other purpose other than returning the size of their argument. This is surely an overkill.

The third approach using ARRAYSIZE3 requires instantiation of the function template SizeHelper2 and relies on the constexpr feature of C++11. Once again, in any realistic project, this could pose a problem as it could trigger several unique instantiations of SizeHelper2 function template, which serve no other purpose other than returning the size of their argument. This is surely an overkill.

The last approach is my favourite. It translates the non type template parameter 'N' representing the size of the array function argument into the return type of the function template instantiation. Also, it just triggers the declaration of the prototype of the function template SizeHelper3 which isn't so much of a strain on the build process.