Design patterns – Victor Laskin's Blog http://vitiy.info Programming, architecture and design (С++, QT, .Net/WPF, Android, iOS, NoSQL, distributed systems, mobile development, image processing, etc...) Mon, 18 May 2015 21:14:22 +0000 en-US hourly 1 https://wordpress.org/?v=5.4.2 Separating constraints, iterations and data (C++11) http://vitiy.info/separating-constraints-iterations-and-data-cpp11/ http://vitiy.info/separating-constraints-iterations-and-data-cpp11/#respond Sun, 17 May 2015 21:37:45 +0000 http://vitiy.info/?p=561 Two recent posts in Bartosz’s programming cafe describe nice application of list monad to solve the following puzzle:

send puzzleEach letter corresponds to single digit. There are a lot of methods to solve this. Bartosz is using list monad which is very simular to list comprehension methods which were described here. While this maybe not the fastest way to solve this specific puzzle, his approach shows the ways to solve large cluster of similar but smaller problems which we meet at basic “production” level. SEND+MORE problem may be is not so good example to show power of list monad because of one very important problem i want to discuss in this post.

Let’s rephrase the puzzle – we have 8 different variables and have find all combinations which satisfy some constraints.

Straightforward solution – form all possible combinations of variables and filter them using constraint conditions. To form such combinations we use general iteration over the list of possible values.

The problem: when the list of values is not small enough or count of variables is greater than 3 we could face performance problem as iterations become too long.

Size of SEND+MORE puzzle is close enough to meet this problem but still modern cpu could handle it straightforward way.

SLIGHTLY DIFFERENT WAY

The main idea is to separate iterations, data and constraints. And the most important: we need to split one global constraint into smaller constraints and apply them as early as possible.

And while doing that i want to maintain the simplicity of Bartosz’s solution.

To make it all work i will use some tools from my previous post . Main parts are currying of functions and functional piping.

DATA:

// 
using sInt = std::shared_ptr<int>;

// the list of possible values
vector<int> digits = {0,1,2,3,4,5,6,7,8,9};

// variables to find
sInt s,e,n,d,m,o,r,y;
        
// additional vars (described further)
sInt r0,r1,r2,r3;

// fill variables (0)
for_each_argument_reference([](sInt& i){ make(i,0); }, s,e,n,d,m,o,r,y,r0,r1,r2,r3);

I use shared_ptr to access data here – that’s not important in this particular example. Even raw pointers could show the idea. The important part is the list of possible values called digits, and pointers to all variables.

Next – let’s define CONSTRAINTS:

// No constraint
auto any = to_fn([](sInt x){ return true; });

Constraint is a function which returns “true” if given value has passed the condition. So constraint any gives green light to every integer. to_fn function here is just converting lambda into std::function.

// This is how we add numbers digit by digit:
// 0  + d + e = y + r1 * 10
// r1 + n + r = e + r2 * 10
// r2 + e + o = n + r3 * 10
// r3 + s + m = o + m * 10

auto fn_constraint = to_fn([](sInt r0, sInt x, sInt y, sInt z, sInt r){
     // r0 + x + y = x + r * 10
     *r = *r0 + *x + *y - *z;
     if (*r == 0) return true;
     if (*r == 10) { *r = 1; return true; };
     return false;
});
    
const auto constraint = fn_to_universal(fn_constraint);

Instead of trying to invent some tricky constrains we go very simple and logical way – our constraint just defines how we add decimal numbers. Nothing more – nothing else. r0,r1,r2,r3 – are numbers which go on next column during  addition.

Only one ‘not so nice’ step here is setting r through pointer.  This is done to be able to use it during the following more deep constraints.

After definition i’m wrapping function into universal class which could handle currying and piping – see this post for details.

The last column of digits is an exception – so we have to define separate constraint for it:

auto fn_last_constraint = to_fn([](sInt r0, sInt x, sInt y, sInt z){
    // r0 + x + y = x + y * 10
    return (*y != 0) && (*r0 + *x + *y == *z + *y * 10);
});
const auto last_constraint = fn_to_universal(fn_last_constraint);

Note that we are also checking for first number to be non zero.

So finally instead of one global constraint we have 4 smaller constrains and could apply them sooner to decrease amount of iterations.

ITERATIONS

Functional iterator is simple:

void fn_pick(sInt x, function<bool(sInt)> constraint, function<void(vector<int>)> process, vector<int> list)
{
    for (auto item : list)
    {
        *x = item;
        if (constraint(x))
            process(list | filter >> [&](int el){ return (el != item); });
    }
}
  
fn_make_universal(pick, fn_pick);

This function is just picking all possible values from the list, apply constraint and if it was positive check, calls process method for reduced list of values (which does not contain picked value).

The last piece is the function to print the result:

auto printResult = [&](vector<int> list){ printf("RESULT %i%i%i%i + %i%i%i%i = %i%i%i%i%i \n", *s,*e,*n,*d,*m,*o,*r,*e,*m,*o,*n,*e,*y); };

FINALLY

digits | pick << d << any <<
        (pick << e << any <<
        (pick << y << (constraint << r0 << d << e >> r1) <<
        (pick << n << any <<
        (pick << r << (constraint << r1 << n >> e >> r2) <<
        (pick << o << (constraint << r2 << e >> n >> r3) <<
        (pick << s << any <<
        (pick << m << (last_constraint << r3 << s >> o )
        << printResult )))))));

// RESULT 9567 + 1085 = 10652

Sorry that i’m using my ‘<<‘ notation for currying here – as this might be not ideal solution. I hope this will not prevent you to understand the idea of segregation. Of cause operation overloading could be changed to some another notation. Note that i use left and right currying together inside constraints.

This is compact enough to show the main idea – decomposing iterations and constraints.

My debug version is solving this puzle in 7ms.

PS. What I don’t like about this solution is using pointers too much. But we could change the design to pass data along functional chain without using pointers, but this could make the solution a bit more complicated. May be i will fix it later. Also i’m looking for the way to get rid of  ‘)))))))’ stuff.

PS2: Whole puzzle solution together:

using sInt = std::shared_ptr<int>;
    
// ITERATIONS
void fn_pick(sInt x, function<bool(sInt)> constraint, function<void(vector<int>)> process, vector<int> list){
    for (auto item : list)
    {
        *x = item;
        if (constraint(x))
           process(list | filter >> [&](int el){ return (el != item); });
    }
}

fn_make_universal(pick, fn_pick);

// DATA
vector<int> digits = {0,1,2,3,4,5,6,7,8,9};
sInt s,e,n,d,m,o,r,y;
sInt r0,r1,r2,r3;
for_each_argument_reference([](sInt& i){ make(i,0); }, s,e,n,d,m,o,r,y,r0,r1,r2,r3);

// CONSTRAINTS
auto any = to_fn([](sInt x){ return true; });

auto fn_constraint = to_fn([](sInt r0, sInt x, sInt y, sInt z, sInt r){
    // r0 + x + y = x + r * 10
    *r = *r0 + *x + *y - *z;
    if (*r == 0) return true;
    if (*r == 10) { *r = 1; return true; };
    return false;
});
const auto constraint = fn_to_universal(fn_constraint);

auto fn_last_constraint = to_fn([](sInt r0, sInt x, sInt y, sInt z){
     // r0 + x + y = x + y * 10
     return (*y != 0) && (*r0 + *x + *y == *z + *y * 10);
});
const auto last_constraint = fn_to_universal(fn_last_constraint);

// print out the result
auto printResult = [&](vector<int> list){ printf("RESULT %i%i%i%i + %i%i%i%i = %i%i%i%i%i \n", *s,*e,*n,*d,*m,*o,*r,*e,*m,*o,*n,*e,*y); };
     
// ROCK&ROLL
digits | pick << d << any <<
        (pick << e << any <<
        (pick << y << (constraint << r0 << d << e >> r1) <<
        (pick << n << any <<
        (pick << r << (constraint << r1 << n >> e >> r2) <<
        (pick << o << (constraint << r2 << e >> n >> r3) <<
        (pick << s << any <<
        (pick << m << (last_constraint << r3 << s >> o )
      << printResult )))))));

PS3. Bartosz’s programming cafe is a very good place to visit.

 

 

 

 

 

]]>
http://vitiy.info/separating-constraints-iterations-and-data-cpp11/feed/ 0
Templates as first-class citizens in C++11 http://vitiy.info/templates-as-first-class-citizens-in-cpp11/ http://vitiy.info/templates-as-first-class-citizens-in-cpp11/#comments Wed, 04 Mar 2015 15:22:29 +0000 http://vitiy.info/?p=524 templates as citizens

C++11 treats functions as first-class citizens and this gives us ability to construct a lot of nice things as design patterns borrowed from functional languages. Meanwhile C++ has very powerful template metaprogramming. This post is aimed to make templated functions closer to first-class citizens, and to construct some simplicity and beauty which you can get from it.

Also here will be implementation for currying of such functions! If you don’t know what currying is – just remember std::bind.

And to make it shine we’ll add piping. (This article will improve some ideas from post about functional piping ). This step is optional and you can replace such piping with commas and function calls.

TEMPLATE TO SIMPLE FUNCTOR

Ok, let’s start from very simple template function.

template <typename T>
void print(T t)
{
    cout << t << endl;
}

I will use different styles of code colouring here to indicate difference between example code and inner implementation which is similar for all cases.

To pass it to some function we need to wrap it into functor:

class tfn_print { 
public: 
   template <typename... Args> 
   auto operator()(Args&&... args) const ->decltype(print(std::forward<Args>(args)...))  
   { 
       return print(std::forward<Args>(args)...); 
   } 
}

Here operator() is overloaded to pass all arguments to print template. As we will modify this functor to cover all cases, the list of arguments is provided using variadic templates. Note: you can define all functions inside such wrappers from the start, but there is way to do it through macro:

#define make_citizen(X) class tfn_##X { public: template <typename... Args> auto operator()(Args&&... args) const ->decltype(X(std::forward<Args>(args)...))  { return X(std::forward<Args>(args)...); } }

make_citizen(print);

This small macro creates class from templated function.

Important note for macro haters. Yes, macros should be avoided when you can substitute them using variadic templates and other new stuff. But still there are some cases when you can’t do that. So when you have couple of very small obvious macro definitions which will make your code considerably smaller, more readable and maintainable – use it. Such cases are rare and when it happens C++ committee should look upon it and add some means into the language itself to fix it.

Example:  

// testing print...
{
    tfn_print print;
    print(5);
    print("hello");
}

We redefined print in smaller scope as instance of object. Now we can use it as function and pass as argument to another function. Note that the same function is used with arguments of different type – so no need to create distinct functors.

Next we will create some additional instruments to work with such ‘templated’ functors more effectively.

PASS TUPLE TO FUNCTION AS ARGUMENT LIST

As we will use such ability later – let’s write simple function which will expand std::tuple and feed results into the given function.

// apply tuple to function...
    
namespace fn_detail {
        
        template<int ...>
        struct int_sequence {};
        
        template<int N, int ...S>
        struct gen_int_sequence : gen_int_sequence<N-1, N-1, S...> {};
        
        template<int ...S>
        struct gen_int_sequence<0, S...> {
            typedef int_sequence<S...> type;
        };
        
        template <typename F, typename... Args, int... S>
        inline auto fn_tuple_apply(int_sequence<S...>, const F& f, const std::tuple<Args...>& params) -> decltype( f((std::get<S>(params))...) )
        {
            return f((std::get<S>(params))...);
        }
        
}
    
template <typename F, typename... Args> 
inline auto fn_tuple_apply(const F& f, const std::tuple<Args...>& params) -> decltype( f(std::declval<Args>()...) )
{
    return fn_detail::fn_tuple_apply(typename fn_detail::gen_int_sequence<sizeof...(Args)>::type(), f, params);
}

In C++14 this could be done a bit shorter using std::integer_sequence, but at the moment I’m forced to use just C++11 ( to compile things for Android for example). The sequence of integer indices S is constructed through template recursion, and than is passed as additional argument for unpacking. This is just one of possible implementations.

Usage:

auto f = [](int x, int y, int z) { return x + y - z; };
auto params = make_tuple(1,2,3);
auto res = fn_tuple_apply(f, params);
print(res);

// Result: 0

The next step is the idea that you can combine function input parameters into tuple using several steps instead of one std::make_tuple call.

TUPLE CONCATENATION

For concatenation of tuples C++11 has function named std::tuple_cat. To make things more compact we can overload << operator to add new parameter into tuple.

// tuple concatenation via << operator
template<typename... OldArgs, typename NewArg>
tuple<OldArgs...,NewArg> operator<<(const tuple<OldArgs...> & t, const NewArg& arg)
{
    return std::tuple_cat(t, std::make_tuple(arg));
}

Usage:

auto list = make_tuple(1,4);
auto res2 = fn_tuple_apply(f, list << 4); // f(1,4,4)
print(res2);

// Result: 1

FUNCTIONAL PIPELINE 

See post about functional piping to get the idea (we use different implementation for piping here). Anyway, this step is optional and you can replace such piping with commas and function calls, but piping here looks very nice.

First simple overload will provide ability to pipe single argument into function:

// pipe single argument into function via | operator
template<typename T, class F>
auto operator|(T&& param, const F& f) -> decltype(f(param)) 
{
    return f(std::forward<T>(param));
}

Usage:

// Return count of elements as templated operator
template <typename T>
int count(const T& container)
{
    return container.size();
}
    
make_citizen(count);

{
    tfn_count count;
    vector<string> slist = {"one", "two", "three"};
    slist | count | print;
}

// Result: 3

One new sample template function count() is just returning size() of provided collection. We provide the list of three strings into count function and after that pipe the result into print().

This is nice but application seems to be rather limited as functions tend to have more than one argument. Let’s solve this using some curry…

CURRYING

curry

Currying in C++11 is usually made using std::bind, which is flexible and useful for a lot of cases. But we can’t use it for templates. So let’s create some wrapper class which will implement curry and provide the simplest way to work with it inside pipelines.

General way (like in haskell) is to curry from left to right argument – so call f(1,2,3) is equivalent to f(1)(2)(3). But in piping example everything is slightly different –f(1,2,3) is equivalent to 1 | f(2,3). In other words I want to curry both ways – left and right. Yes, std::bind gives ability to specify any order but this is for price of a bit longer syntax that it’s required in general cases. And haskell’s syntax is not so good also, imho. That’s because it’s hard for programmer to identify difference between currying and function call. So here i’m using different syntax (and this decision is optional) where you have separate operators for function call and argument curry.

The next class is called fn_universal as representation of some function which is polymorphic (in sense that it can accept arguments of different types) and can be used with currying and piping. May be this class will be extended further later. You might want to rename it to fn_curryed or something like it.

Of course, to hold arguments we will use tuples. The implementation:

// universal function / extended function wrapper !
template<typename F, typename TPackBefore = std::tuple<>, typename TPackAfter = std::tuple<>>
class fn_universal  {
    private:
        F f;                            ///< main functor
        TPackAfter after;               ///< curryed arguments
        TPackBefore before;             ///< curryed arguments
    public:
        
        fn_universal(F && f) : f(std::forward<F>(f)), after(std::tuple<>()), before(std::tuple<>()) {}
        
        fn_universal(const F & f, const TPackBefore & before, const TPackAfter & after) : f(f), after(after), before(before) {}
        
       
        template <typename... Args>
        auto operator()(Args... args) const -> decltype(
            fn_tuple_apply(f, std::tuple_cat(before, make_tuple(args...), after))
        ) {
            // execute via tuple
            return fn_tuple_apply(f, std::tuple_cat(before, make_tuple(std::forward<Args>(args)...), after));
        }
        
        
        // curry
        
        template <typename T>
        auto curry(T && param) const -> decltype(fn_universal<F,decltype(std::tuple_cat(before, std::make_tuple(param))),TPackAfter>(f, std::tuple_cat(before, std::make_tuple(param)), after))
        {
            return fn_universal<F,decltype(std::tuple_cat(before, std::make_tuple(param))),TPackAfter>(f, std::tuple_cat(before, std::make_tuple(std::forward<T>(param))), after);
        }
        
        
        template <typename T>
        auto curry_right(T && param) const -> decltype(fn_universal<F, TPackBefore, decltype(std::tuple_cat(after, std::make_tuple(param)))>(f, before, std::tuple_cat(after, std::make_tuple(param))))
        {
            return fn_universal<F, TPackBefore, decltype(std::tuple_cat(after, std::make_tuple(param)))>(f, before, std::tuple_cat(after, std::make_tuple(std::forward<T>(param))));
        }
    
};

Note that class is immutable.

The main nice thing about this class: there is no restriction on types or count of arguments. All what is done here is combining provided arguments into two tuples – right and left parameters. And when time comes to execute function we just combine everything into single tuple and feed this tuple into function. So you can curry function which supports variadic number of arguments!

And to provide easy interface for currying I use the following overloads for operators:

// left curry by << operator
template<typename UF, typename Arg>
auto operator<<(const UF & f, Arg && arg) -> decltype(f.template curry<Arg>(std::forward<Arg>(arg)))
{
    return f.template curry<Arg>(std::forward<Arg>(arg));
}
    
// right curry by >> operator
template<typename UF, typename Arg>
auto operator>>(const UF & f, Arg && arg) -> decltype(f.template curry_right<Arg>(std::forward<Arg>(arg)))
{
    return f.template curry_right<Arg>(std::forward<Arg>(arg));
}

Let’s add small builder helper. Also (very optional) I might add definition macro to make examples a bit shorter.

template <typename F>
auto fn_to_universal(F && f) -> fn_universal<F>
{
    return fn_universal<F>(std::forward<F>(f));
}

#define make_universal(NAME, F) make_citizen(F); const auto NAME = fn_to_universal(tfn_##F());

This line is just defining new ‘universal’ function from given template function. You could change this using more convenient way.

EXAMPLES FOR CURRYING

Trivial examples:

// currying....
auto f = [](int x, int y, int z) { return x + y - z; };
auto uf = fn_to_universal(f);
auto uf1 = uf << 1;
auto uf2 = uf1 << 2 << 5;
uf2() | print;
// result: -2

// Piping:
      
1 | (uf << 4 << 6) | print; // 4+6-1 = 9
        
3 | (uf >> 6 >> 7) | print; // 3+6-7 = 2

Note the order of arguments. Not so complicated to read.

Now let’s write couple of template functions to get some realistic examples. This will look like functional operators. I even could mark these functions as inner implementation as they are so common, but to underline that you have the ability to control this behaviour yourself I leave it marked as sample:

// MAP
template <typename T, typename... TArgs, template <typename...>class C, typename F>
auto fn_map(const C<T,TArgs...>& container, const F& f) -> C<decltype(f(std::declval<T>()))>
{
        using resultType = decltype(f(std::declval<T>()));
        C<resultType> result;
        for (const auto& item : container)
            result.push_back(f(item));
        return result;
}
    
// REDUCE (FOLD)
template <typename TResult, typename T, typename... TArgs, template <typename...>class C, typename F>
TResult fn_reduce(const C<T,TArgs...>& container, const TResult& startValue, const F& f)
{
        TResult result = startValue;
        for (const auto& item : container)
            result = f(result, item);
        return result;
}
    
// FILTER
template <typename T, typename... TArgs, template <typename...>class C, typename F>
C<T,TArgs...> fn_filter(const C<T,TArgs...>& container, const F& f)
{
        C<T,TArgs...> result;
        for (const auto& item : container)
            if (f(item))
                result.push_back(item);
        return result;
}

    
make_universal(fmap, fn_map);
make_universal(reduce, fn_reduce);
make_universal(filter, fn_filter);

It’s obvious that such primitives can be reused avoiding a lot of code duplication.

And one minor function – make sum of all arguments:

template <typename T, typename... Args>
T sum_impl(T arg, Args... args)
{
    T result = arg;
    [&result](...){}((result += args, 0)...);
    return result;
}
    
make_universal(sum, sum_impl);

Here could be more trivial implementation and couple of overloads, but to show that you have no limitation with count of arguments I leave it like this.

Also we can modify print to handle any amount of arguments:

template <typename... Args>
void print(Args... args)
{
    (void)(int[]){((cout << args), 0)...}; cout << endl;
}

auto uprint = fn_to_universal(print);

AND NOW: Let’s try this in action:

vector<string> slist = {"one", "two", "three"};

// all strings as one 
slist | (reduce >> string("") >> sum) | (uprint << "All: ");
// All: onetwothree

// sum of elements of array
vector<int>{1,2,3} | (reduce >> 0 >> sum) | (uprint << "Sum: ");
// Sum: 6

// count sum length of all strings in the list
slist | (fmap >> count) | (reduce >> 0 >> sum) | (uprint << "Total: " >> " chars");
// Total: 11 chars

This is quite readable and functional-style code. Not the one we could expect from C++.

So the idea is that now we can write small universal templates and combine them in pipelines as we like. Looks great!

Consider small templates as small universal building blocks.

More examples…

Let’s assume we are working with some business data like collection of users.

template <typename T, typename TName>
bool isNameEqualImpl(const T& obj, const TName& name)
{
    return (obj->name == name);
}
    
make_universal(isName, isNameEqualImpl);
    
template <typename T, typename TId>
bool isIdEqualImpl(const T& obj, const TId& id)
{
    return (obj->id == id);
}
    
make_universal(isId, isIdEqualImpl);
    
template <typename T>
bool isNotNullImpl(const T& t)
{
    return (t != nullptr);
}
    
make_universal(isNotNull, isNotNullImpl);
    
    
template <typename F, typename TKey, typename T, typename... TArgs, template <typename...>class C>
T findOneImpl(const C<T,TArgs...>& container, const F& f, const TKey& key)
{
    for (const auto& item : container)
       if (f(item, key))
           return item;
    return nullptr;
}
    
make_universal(ffind, findOneImpl);

First 3 methods are trivial – checking  that name or id fields are equal to given values and that whole object is not null.

The last one is find method inside container (we assume here that business data items are presented as immutable structures inside smart pointers). Find receives one function as validate functor and key argument to pass to this validator. Let’s see how to use it:

// example data
vector<User> users {make<User>(1, "John", 0), make<User>(2, "Bob", 1), make<User>(3, "Max", 1)};
        
users | (filter >> (isName >> "Bob")) | ucount | uprint; // 1
        
users | (filter >> (isId >> 13)) | ucount | uprint; // 0
        
vector<int>{1,2,6} | (fmap >> (ffind << users << isId)) | (filter >> isNotNull) | ucount | uprint; // 2

Such examples do not require an explanation I suppose!

We can convert user names to xml like this:

string xmlWrapImpl(string name, string item)
{
    return "<" + name + ">" + item + "</" + name + ">";
}
    
make_universal(xmlWrap, xmlWrapImpl);

// Produce XML
users | fmap >> [](User u){ return u->name; } | fmap >> (xmlWrap << "name") | reduce >> string("") >> sum | xmlWrap << "users" | print;

// result: <users><name>John</name><name>Bob</name><name>Max</name></users>

As you can see we can still use lambdas inside expressions.

FUNCTIONAL CHAINS

take-action

Now let’s add one more important element to proposed scheme. I want to be able to compose pipeline of functions, store it under some name and then call it multiple times as single function. In other words i want this:

auto countUsers = chain((fmap >> (ffind << users << isId)) | (filter >> isNotNull) | ucount);
        
vector<int>{1,2,6} | countUsers | (uprint << "count of users: ");

In fact this is simple functional composition (like the one discussed in appendix here ), and it can be implemented with ease in case of general functions. But when we are talking about templates things become a bit more complicated. But this is still possible!

// -------------------- chain of functors --------------------->
    
// The chain of functors ... is actualy just a tuple of functors
template <typename... FNs>
class fn_chain {
private:
        const std::tuple<FNs...> functions;
        
        template <std::size_t I, typename Arg>
        inline typename std::enable_if<I == sizeof...(FNs) - 1, decltype(std::get<I>(functions)(std::declval<Arg>())) >::type
        call(Arg arg) const
        {
            return std::get<I>(functions)(std::forward<Arg>(arg));
        }
        
        template <std::size_t N, std::size_t I, typename Arg>
        struct final_type : final_type<N-1, I+1, decltype(std::get<I>(functions)(std::declval<Arg>())) > {};
        
        template <std::size_t I, typename Arg>
        struct final_type<0, I, Arg> {
            using type = decltype(std::get<I>(functions)(std::declval<Arg>()));
        };
        
        
        template <std::size_t I, typename Arg>
        inline typename std::enable_if<I < sizeof...(FNs) - 1, typename final_type<sizeof...(FNs) - 1 - I, I, Arg>::type >::type
        call(Arg arg) const
        {
            return this->call<I+1>(std::get<I>(functions)(std::forward<Arg>(arg)));
        }
        
public:
        fn_chain() {}
        fn_chain(std::tuple<FNs...> functions) : functions(functions) {}
        
        // add function into chain
        template< typename F >
        inline auto add(const F& f) const -> fn_chain<FNs..., F>
        {
            return fn_chain<FNs..., F>(std::tuple_cat(functions, std::make_tuple(f)));
        }
        
        
        // call whole functional chain
        template <typename Arg>
        inline auto operator()(Arg arg) const -> decltype(this->call<0,Arg>(arg))
        
        {
            return call<0>(std::forward<Arg>(arg));
        }
        
};

How this works? When constructing such chain we just put functors into tuple. Tricky part comes when we need to call the function – we need to construct universal caller without knowing function specifications. This is possible using compile-time recursion and std::decltype/std::declval. Creating dummy arguments we go through whole chain recursively and detect final result type.

And to make it work under GCC i hit one of compiler bugs – inside call method recursion during type detection you need to add ‘this->‘ to make it compile (bug).

And to chain functors into the list we need to overload | operator again:

template<typename... FNs, typename F>
inline auto operator|(fn_chain<FNs...> && chain, F&& f) -> decltype(chain.add(f))
{
    return chain.add(std::forward<F>(f));
}

Now we can use this whole concept like:

// Use functional chain:
auto f1 = [](int x){ return x+3; };
auto f2 = [](int x){ return x*2; };
auto f3 = [](int x) { return (double)x / 2.0; };
auto f4 = [](double x) { return SS::toString(x); };
auto f5 = [](string s) { return "Result: " + s; };
auto testChain = fn_chain<>() | f1 | f2 | f3 | f4 | f5;
// execution:
testChain(3) | print;
        
auto countUsers = fn_chain<>() | (fmap >> (ffind << users << isId)) | (filter >> isNotNull) | ucount;
vector<int>{1,2,6} | countUsers | (uprint << "count of users: ");

Note that we use different types during chain execution.

So now we have the ability to store and combine pipelines as we like. Nice.

FINAL TOUCH – MONADIC PIPING (OPTIONAL)

This part is optional but it fits so well… because here will be very smooth shift to monads. Smooth and easy.

What if we change the direction of piping? Let’s pipe functions into data instead piping data into functions! Of cause, we can’t do this with raw data – we need some object to wrap it and call received functions for us. This object can do some additional manipulations with functions along the way. And yes, thats a kind of monad.

Why we need this kind of stuff? Because we can gain additional control of functional chain evaluation.

As example i’ll use maybe monad – which has very simple logic: if we have nullptr as output from any function in chain – we don’t execute further and report empty state. This is kind of implicit error protection.

Implementation will be a bit extended to support pointer and non-pointer types.

Also, this is not ‘true’ monad as true monadic bind operation requieres functions which return monad as result. But using the following implementation you can apply normal functions without any modifications.

// ------------------ maybe -------------------------->

enum class maybe_state { normal, empty };
    
template <typename T>
typename std::enable_if< std::is_object<decltype(T()) >::value, T>::type
set_empty() { return T(); }
    
template<> int set_empty<int>() { return 0; }
template<> string set_empty<string>() { return ""; }
    
template<typename T>
class maybe {
private:
        const maybe_state state;
        const T x;
        
        template <typename R>
        maybe<R> fromValue(R&& result) const
        {
            return maybe<R>(std::forward<R>(result));
        }
        
        template <typename R>
        maybe<std::shared_ptr<R>> fromValue(std::shared_ptr<R>&& result) const
        {
            if (result == nullptr)
                return maybe<std::shared_ptr<R>>();
            else
                return maybe<std::shared_ptr<R>>(std::forward<std::shared_ptr<R>>(result));
        }
        
       
public:
        // monadic return
        maybe(T&& x) : x(std::forward<T>(x)), state(maybe_state::normal) {}
        maybe() : x(set_empty<T>()), state(maybe_state::empty) {}
        
        // monadic bind
        template <typename F>
        auto operator()(F f) const -> maybe<decltype(f(std::declval<T>()))>
        {
            using ResultType = decltype(f(std::declval<T>()));
            if (state == maybe_state::empty)
                return maybe<ResultType>();
            return fromValue(f(x));
        }
         
        // extract value
        T getOr(T&& anotherValue) const { return (state == maybe_state::empty) ? anotherValue : x; };
};
    
template<typename T, typename F>
inline auto operator|(maybe<T> && monad, F&& f) -> decltype(monad(f))
{
    return monad(std::forward<F>(f));
}
    
    
template<typename T, typename TDefault>
inline T operator||(maybe<T> && monad, TDefault&& t)
{
    return monad.getOr(std::forward<TDefault>(t));
}
    
template <typename T>
maybe<T> just(T&& t)
{
    return maybe<T>(std::forward<T>(t));
}

This class can be extended to handle error messages and so on.

I use the same pipe | operator here because its ‘unix’ meaning is perfectly applicable here.

Examples:

maybe<int>(2) | (ffind << users << isId) | [](User u){ return u->name; } | [](string s){ cout << s << endl; return s; };

// Bob
        
(maybe<int>(6) | (ffind << users << isId) | [](User u){ return u->name; }).getOr("Not found") | (uprint << "User: ");

// User: Not found        

just(vector<int>{1,2,6}) | countUsers | [&](int count){ count | (uprint << "Count: "); return count; };

// Count: 2

So here we execute processing chain on non existing user and nothing bad happens. Also we can pipe our saved chain countUsers into maybe as expected.

I could produce a lot of other examples here, but may be this is nice topic for another post.

CONCLUSION

Using compile-time recursion and type-detection you can create powerful tools for combining templated functional blocks. Using currying and piping together with functional chains gives very flexible instrument to build compact functional-style methods in C++11. Monadic execution could be added to this scheme without any problems.

Full working sample on github:    gist

ps. About templating – i think usage of templates in production should be very minimalistic because template overuse can lead to very unmaintainable code. So when using proposed scheme keep in mind that all blocks should be very small and logically reasonable.

 

]]>
http://vitiy.info/templates-as-first-class-citizens-in-cpp11/feed/ 12
10 ways to not shoot yourself in the foot using C++11 http://vitiy.info/ten-ways-to-not-shoot-yourself-in-the-foot-using-cpp11/ http://vitiy.info/ten-ways-to-not-shoot-yourself-in-the-foot-using-cpp11/#comments Tue, 13 Jan 2015 18:17:06 +0000 http://vitiy.info/?p=425

Within C++, there is a much smaller and cleaner language struggling to get out (Stroustrup)

The following text could be modified – current rev.1.0

Very often i hear from java/erlang/etc people that C++ is so bad that it is very unwise to use so-old-school language now, when we have more ‘safe’ higher level languages. Everybody heard about foot-shooting using C++.  What about C++11?

Bjarne said that C++11 feels like whole new language and, at first, i did not take it seriously as modifications looked more as minor additions (especially for boost users). Now i changed my mind – using new features combined together can transform your way of coding into new form. I’m talking not about adding new features to your code, but about changing your coding style.

How not to shoot yourself in the foot? Here is the list of my rules to make the C++ coding life sweet and easy. This is simple convention to follow and can be adopted very fast. Not only it gives more stable implementation but also more clean and understandable design.

This convention is composition of Scott Meyers rules, Functional programming ideas and reducing-complexity ideology by Steve McConnell.

This convention adds minor overhead to implementation so when you really need to optimise minor parts of your code you can skip some rules. Also there are some cases when you can skip the rules as they will produce less maintainable solutions (see descriptions).

Some rules have strong form – more profit and a bit more overhead. And if you follow strong forms C++ will truly feel like a different language.

All this is discussable as its only my own solution and can be corrected and improved. You are welcome to add some comments below.

RULES

Rule 1 – Use RAII and smart pointers only! (Strong: use only std::shared_ptr and RAII) 

Memory management is one of most referenced problems of C++ – working with raw pointers is like dancing with blades. To solve this automatically we have two approaches: GC from java/C# or Objective-C’s reference counting. I prefer reference counting as more controllable solution – so use smart pointers everywhere (or no pointers at all using RAII).

Strong version: use shared_ptr for every pointer to prevent mistakes with passing of ownership. This makes additional copy overhead but this is the prices for safety. Only when you need to optimise some bottlenecks you can bypass them.

You have to keep shared pointer specific in mind. First problem with reference counting is situation when there is ownership cycle between two classes. This situation could be reproduced inside special data structures only and should not happen at normal conditions. Typically if you have such pointer ring this is indication of bad design. Second problem is how not to keep some data as zombie. To avoid this you just need not to store pointers to data inside long living objects (as cache or something). So don’t treat smart pointers as some magic – logic for shared_ptr is quite simple.

Last thing – remember that in multithread environment only composition of immutable data/thread-safe data and shared pointers can lead to safe implementation.

Rule 2 – Use const where its possible (Strong: use immutable data only)

Making things const can reduce entropy inside your code. If method requires input parameter as const – mark it so. If method does not affect class – mark it const. If class field can be marked const – do so. This not only gives compiler more space for optimisations, but reduces the chance to make some unwanted modifications to zero.

Common mistakes when working with modified object state can be prevented using approach from functional programming – make whole your data immutable. This even solves problems with concurrent access from different threads.

By immutable data class i mean class with const fields (which are inited during creation) and no default empty constructor.

class UserData {
public:
    const int id;
    const int value;
    const string name;

    UserData(const int id, const int value, const string name) : id(id), value(value), name(name) {}
};

Such declaration can be generated by macro which can contain copy constructors, serializators/deserializators, etc. Note that move semantics is not applicable to immutable structures as move modifies source object and immutable object can’t be modified.

You can have shared_ptr as container for your immutable data to save space and resources while passing data through processing chain. You can even wrap all business classes into shared pointers and accept convention that your data classes are wrapped by default.

using User = std::shared_ptr<UserData>;

If you are familiar with convention where you put Ptr at end of smart pointer class names to make codes understand that its pointer, you would argue here that this will reduce readability. But actually if you do it for whole business domain it will on the contrary make thing more compact and understandable when you work with functional processing.

There are some cases where some algorithms can be expressed in a lot more compact way when some variable inside is mutable. Even if you decided to go that way for readability or speed make sure to cover this part with tests with additional caution.

Rule 3 – Use make_shared() instead of ‘new‘ for shared pointers

To get rid of new/delete notation completely use make_shared for smart pointers. About benefits of make_shared relating to memory allocation you can read in documentation.

As make_shared requires class type as template parameter one can add some sugar as optional step:

template <class T, class... P>
inline auto make(P&&... args) -> T 
{
    return std::make_shared<typename T::element_type>(std::forward<P>(args)...);
}
    
template <class T, class... P>
inline void make(shared_ptr<T> & sharedValue, P&&... args)
{
    sharedValue = make<shared_ptr<T> >(std::forward<P>(args)...);
}

And init objects like:

make(a,10);
make(b,{50,10,35});

Note that you should have no destruction functions.

Rule 4 – Use for(const auto& x :X) instead of old school cycles (Strong: use indexed cycles in simple pure functions only)

Instead of old indexed ‘for’ cycles use C++11 foreach approach: for(const auto& x :X). When you have no explicit cycle index you can make no range mistakes. And this notation has more readable form. Instead of auto you can use class name if this is needed for improving readability. Also note optional keyword const here and, of course, you can skip & sign here. For more details read Scott Meyers.

Your custom collections can be extended to support for(:) with ease:

/// Custom iterator for c++11 cycles
    class Iterator {
        T* data;
        int position;
    public:
        Iterator(T* _data, int _position):data(_data),position(_position) {}
        
        T& operator*() { return data[position]; }
        Iterator& operator++() { ++position; return *this; }
        bool operator!=(const Iterator& it) const { return position != it.position; }
    };

    Iterator begin() { return { items, 0 }; }
    Iterator end()   { return { items, mCount }; }
    
    /// Custom iterator for c++11 cycles (Const version)
    class IteratorConst {
        T* data;
        int position;
    public:
        IteratorConst(T* _data, int _position):data(_data),position(_position) {}
        
        const T& operator*() const { return data[position]; }
        IteratorConst& operator++() { ++position; return *this; }
        bool operator!=(const IteratorConst& it) const { return position != it.position; }
    };
    
    IteratorConst begin() const { return { items, 0 }; }
    IteratorConst end()   const { return { items, mCount }; }

Strong version: all algorithms which require some indexed cycles should be implemented as low level functions and should be called in functional style. Like map(), filter(), reduce(), etc. This step requires a bit functional approach as you have to separate iteration algorithm and cycle body. Body should remain in business domain while cycle iteration itself must be moved to low utility level. Next rule is linked to this.

Rule 5 – Use functional approach to work with your data

First, if you are not yet familiar with functional programming read some books on it – like Thinking Functional (books even could be not related directly to C++). You don’t have to master functional programming, but only get basic foundation principles and understanding of benefits of functional data processing in modern concurrent environments. My position is that you have to combine old OOP paradigm and FP to get the best parts from both of them.

C++11 has std::function, lambdas, closures, etc. Use it! Forget about pointer to functions. Forget about structures you pass to events. Forget about that old stuff which looked more like workarounds. Yes, C++11 will not bring so minimalistic functional code to have fun with, but its enough to implement all the stuff. And even more – you can create your own container class extension methods to work with arrays to make things a lot more compact and shiny. For example, here is the code to get square images from collection:

return images.filter([](Image image){ return (image->isSquare()); }).notDeleted();

Or you can add each() method to the list class to call function for each element:

/// calls function for each element
inline MVList<T> each(std::function<void(T)> f) const
{
    if (mCount == 0) return *this;
    for (int i = 0; i < mCount; i++)
        f(items[i]);
    return *this;
}

Here i used old indexed cycle to show that this iteration can be encapsulated. When working with such collection you can call only functional-style methods and don’t know nothing about how exactly iteration is made inside. This reduces complexity.

If you are not familiar with lambdas there is one point to know about: you have to control the way you do closures. If you have experience with JS you probably got situations when lambda is inside lambda inside lambda and so on and pass data through this madness. Such unreadable and not maintainable situation should be avoided as much as large number of inner cycles.

Data processing with FP leads to a lot of copy constructing overhead. Good idea is to use shared  pointers to immutable data here.

Last thing here: there are cases when FP is not so suitable. Don’t try to use functional paradigm as hummer everywhere.

Rule 6 – Use auto to compact code and prevent type casting mistakes

But use it with care. First when i hear about auto i thought its a bad thing, because it would produce unreadable code. And it seemed a bit of violation of strong typing. But in practice it appeared quite usable, because there are a lot of cases where you understand what type you have. Often the name of your variable is almost the same as the name of class.

Rare cases when you have a lot of autos grouped together and you feel readability can suffer from it – don’t use it.

Unexpectedly, auto can reveal some type casting issues. This is an additional benefit. You can read Scott Meyers latest book about it to get details.

Rule 7 – Use uniform initalization instead of ‘=’

There are pros and cons inside Scott Meyers book for so called uniform initialisation, but i see it as one of steps forward. It contains strict checks and has more flexible notation. When needed you can define constructors from std::initializer_list to get more code beauty.

MVList<int> set{5, 129, 14, 130, 33, 132};

Initialisation is looking more logical with immutable structures when in fact you can’t reassign them using ‘=’ as they are untouchable.

Rule 8 – Use nullptr instead of NULL and don’t forget to check pointers

Looks like the simplest rule in list? No. Here should be discussed one important linked problem to care about. When you work with shared pointers and the result of your function can contain nullptr elements you can get crash when try to access null-pointer data. How to avoid this?

First, you can add notNull() method to list class to filter ‘not null’ elements inside your processing chain. But this will not cover all cases. Unfortunately, old way to check things ( if (a != nullptr) { … } ) is still alive and in large range of cases is the most compact way to go. The problem is that functional approach from math is not so compatible with if-branches. Structures like Either in FP are not perfect imho. So this is the place to add some modifications to this list.

Rule 9 – Use templates instead of MACRO when its possible (Strong: use functional factorization instead of template hell)

As C++11 has variadic templates now – so you can cover a lot more cases where previously we had to use macros.

But there is important thing about template programming! Every time you want to use generics think twice if its the only way to make decomposition? One of the most hated things about C++ is its complexity. You might read couple of books like Alexandrescu’s and feel some kind of new flexibility using templates everywhere. But in practice when you use more than one template argument things become less maintainable. I suggest use templates only when you can’t implement your task without them. Best solution would be pure factorization in functional style, but its not possible in large set of cases (even in math). If you have to use generics do it the way so you can fully encapsulate composite structure and keep template arguments to minimum (preferably only one argument). Also class should have obvious purpose for being templated (nice example is object collection of specific type). Remember that your fellow programmers don’t have to make guesses and WTFs looking at code. So avoid overtemplating.

Rule 10 – MINOR RULES

  • Use features from std lib (threads, etc) instead of third party libs
  • Write your classes to avoid destrutors (Rule of zero)
  • Use “using” instead of “typedef” to improve readability (and it can be templated)
  • Use scoped enums to improve readability and to create more type safe code
  • Use =delete; for obsolete and unwanted functions
  • Use override/final

Conclusion

Using new features of C++11 together plus changing your habits and way of thinking to more functional way, gives you whole new C++ perspective. Keeping in mind the advantages of C++ as fast cross-platform solution, and that quite simple convention rules can shift it close to functional languages in terms of stability, makes C++11 good player on market. So now its more your choice to shoot yourself in the foot or not if you have nice C++11 gun.

PS. Im not talking here that C++11 is the best. No no no. But it is definitely not ugly monster as the most people think now. Especially if you wash it a bit.

This list can be extended and modified as my vision might be changed. If you have some suggestions feel free to comment.

]]>
http://vitiy.info/ten-ways-to-not-shoot-yourself-in-the-foot-using-cpp11/feed/ 2
How to make simple CQRS implementation without Event Sourcing http://vitiy.info/how-to-make-simple-cqrs-implementation-without-event-sourcing/ http://vitiy.info/how-to-make-simple-cqrs-implementation-without-event-sourcing/#comments Thu, 06 Nov 2014 13:45:47 +0000 http://vitiy.info/?p=347 Here I will try to describe the concept of CQRS without event sourcing based on some other principle instead. The concept to organise your data and messages to create scalable and reliable CQRS/Messaging-based architecture without additional complexity from event store.

CQRS - MVC

At first here is small introduction on what SQRS is. I suppose you are familiar with MVC model and use it on daily basis. And i suppose you understand benefits of separation your code into separate modules.

CQRS – Command-Query Responsibility Segregation. CQRS just separates model into 2 separate parts – READS model and WRITES model. They also can be referenced as Query model and Command model.

You might find some more complex definitions all over the internet, but actually all other patterns are additional. Just separating model into queries and commands makes it CQRS. Segregation must be clean so commands can’t query data.

More links to read about CQRS: Clarified CQRS / The list of SQRS resources

MESSAGING

The most natural addition to CQRS is eventual model (but even this is not obligation). When you send command you expect to dive into asynchronous world because processing could take some time on server. And beside ‘async’ lets remember some other words – distributed systems, cloud synchronisation, sharing, eventual consistency, concurrent access. Even if you think that now nothing from this list is close to your application’s field – very soon it will be there.

So i say yes to messaging: you can add persistent queues of messages into CQRS scheme.

CQRS Events queues scheme

Serialized commands go into message queues to different locations – it can be local cache or several services on server side. Message queues can have local journaling to prevent message loss. For example, if you send data to server and there was no connection – the messages will be stored locally. And finally when you restart the application the queue will be reexecuted.

Cache db / reporting database (this could be some key-value storage or just memory cache inside client application) is primary source for query interface. When you pass some command for execution you can inform cache simultaneously (also through event queue) that you have some changes coming.

EVENT SOURCING – CQRS/ES

The most documents about SQRS refer to Event Sourcing as must-have feature in additional to SQRS. I don’t think so. This concept fits well into eventual model, but if we affect ‘client state’ of application (ui, inner cache, etc) only through event listener – its not yet Event Sourcing. Even more – “event sourcing is a completely orthogonal concept to CQRS” (source). Definition is the following:

Storing all the changes (events) to the system, rather than just its current state.

So this is method of storing business data as sequence of modification events. For some cases its very useful as you can restore object state at any point in history but generally its not must-have feature. This is actually the problem of how to store your business data. And decision use it or not comes from business domain. There is considerable overhead from using it as you have to restore final state for each object every time, you need more storage to keep all events and processing inside query model will be more complex to maintain. Often you need to make snapshots to make it run fast enough.

At first you think that event sourcing is simple and easy model – but in most cases it is ugly way (imho). When you interact with objets in real world you don’t need whole history on each of them. Only one exception is finance transaction list from your credit card – but its rare case. The most of your objects don’t require stored events of modification as separate instances (we still can store this information in some logs).

There was some discussion on distributed podcast telling CQRS might bring +30% of developing time and sure this is critical if there are no really large benefits from it. BUT, Imho, all this talks are more about EVENT SOURCING. Implementation of basic CQRS is not so complicated (and you event don’t have to use messaging), but the switch to ‘data combined from events’ concept is much more complex for your developers to handle. So the idea is to replace event sourcing principle with some other simple concept to make life easy.

But what instead of event sourcing? Im going to present some solution here but first lets review what problems we are going to solve. I see now (2014) that basic scheme of interaction for client-server application is evolving to this:

Evolution of client-server scheme

Now, instead of simple client-server pair, we have several client devices with ability to sync data and several nodes of distributed server system.

Here we can make one more crucial decision – decide where your system stands in CAP theorem – AP or CP. This is a business choice. And you can still do CQRS in full CP way, but i think today Eventual consistent systems are more attractive. So here we are talking about them. Note that on server side of cluster still can be CP parts – like etcd syncing of cluster configuration or MongoDb sharded cluster.

I will go even further – lets assume we have 2 datacenters and replicate data between them (you can just create 2 droplets in digital ocean to emulate this situation without any effort). Lets assume you have mongo cluster as storage in each of them to hold your data.

CQRS_Cross_DC

After submitting some command we need to pass it as message through various channels to local cache, server 1 and server 2. As we do this asynchronously the following problems may arise here:

  • Message loss
  • Message duplication
  • Message reorder
  • Concurrent access conflicts

These are common problems when you are talking about messaging. Lets introduce additional requirements:

  1. After offline work the changes should be synchronised with servers and other clients
  2. Multiple clients should be notified the fastest possible way
  3. Messages from client should go to server side through 2 different routes simultaneously
  4. Local cache should receive some updates immediately and if command will fail on server side additional notification message should cancel it properly

You can imagine the set of problems which may arise here if you handle database updates the usual way. Plus, you have to keep in mind the following picture – in real world situation might get worse…

messagingasitbocomes

Pretty scary… Lets introduce some concept which will handle most of declared problems implicitly.

DATA SEGREGATION BY RESPONSIBILITY or SIGNED-DOCUMENTS MESSAGING CONCEPT for CQRS

The next part is a concept which may be not new but i failed to find it as common knowledge related to CQRS.

As solution we need to use some messaging principle similar to event sourcing (and combine it with messaging transport layer based on ZeroMQ, NanoMsg or similar).

Event sourcing principle – each object is composition of modification events.

Alternative principle – each object is composition of immutable parts exclusively signed (owned) by specified users.  Each immutable part has ‘a sign’ – time and user signature indicating ownership. Each message replaces corresponding immutable part completely (and of course there are no delete operations – only is_deleted flag inside structure).

Object contains owned signed parts

Analogy from real life – you represent some business solutions as series of signed documents from various departments. At any moment you can replace one document by another, but you can’t modify document once it was signed. You can send any amount of copies of such documents to anyone without any collision. Missing or reordered docs are also not a problem in such scheme – to perform some action you need the list of actual documents and if some of them are missing  or invalid we can request them again from owner.

signature

In real life we have documents which are signed by several persons. Actually this documents either can be split in separate signed parts or ownership can be transferred from one person to another when first person can no longer access first object. So we have one document – single person responsibility. (Here we talk of document as object part).

This concept is similar to functional programming where you deal with immutable objects. You can also treat sign as revision marker of version control system. If existing sign is newer you can’t overwrite the document by previous version by mistake. So before each database write interaction you are checking signs.

You could also treat parts as svn files. And as you might guess the main problem is how to solve concurrent commits? In ownership scheme you can successfully commit only by grabbing ownership (and sign document with another signature). In simple case you just use the latest validated version of committed document.

In special cases when the group of users can modify the document at the same time (the case you prefer to avoid in real life because this is complex problem how to merge conflict changes in such situation). Here we have 2 subcases:

  • The document is big and modification time is considerable. Imagine google docs here. The simple solution is to have exclusive temporary lock on document. During edit process the lock can be extended. Additionally, the group of users can be notified of changes if its needed.
  • Second case – you have large set of data and set of users who can operate on such data simultaneously. Here you can set additional restrictions for collection updates. At start of modification client must request new cache (this can be done async way) and when you send message with modified version of data it contains sign of previous source cache. Trying to overwrite more recent data will produce error message. Trying to make overwrites based on outdated cache will also produce error message.

If during your business process some objects pass from one user to another you can add additional inner parts to object signed by different users instead of signing whole object data. For example, if your product needs some verification from someone instead of is_verified flag inside main object data you add new part inside object composition: verification + sign.

Whole concept can called: treat your data as composition of signed documents. But keep in mind that signed here are parts of business objects. And they can be very small. Even if its only one integer field, but by business process it should be assigned by different person, there is a sense to separate it into another part to make things clear. Also it make sense in terms of security.

The last thing is additional ‘second-order objects‘ as i call them. Those are the result of some minor map-reduce operations on first-order data. Counters, statistics, summaries, distributions, short representations, views, caches… all such objects are kind of cache. They have no signs (and its logical as it is the result of composition of signed objects).

EVENT MODIFICATIONS / REFACTORING

I think the approach when you are cloning events and make new name to each new version is ugly (like EVENT_MAKE_PURCHASE_V14 -> EVENT_MAKE_PURCHASE_V15, etc). But you have to use such approach only when you are working with event sourcing. If you are not limited to event sourcing you can just make your message handler be aware of possible missing fields (or even add version inside message structure). In other worlds handler should just convert message to the most recent known version in the most common cases.

So there are no additional issues here.

SAGAS

One more term you can often hear when CQRS is mentioned: Saga – business process manager handling reversible transaction chain. Inside saga there is some state machine which controls workflow (in ideal case). This solution provides additional protection from message ordering/missing/duplication problems inside world of messages.

At simple approach saga stays on client side and related messages contain saga’s id to pass messages into it. Its like mini-controller containing internal saga state. In some cases this approach is good, but i prefer to embed state into the message when it is possible to make more functional-style processing. Of course, for some cases you have to check the state transition at server side to prevent exploits.

I think its not so easy to predict if your business processes are sagas or will become sagas in future – but describing them as finite state machine is a good idea. Anyway this is also additional part to CQRS but is not required. And implementation may vary too – you don’t have to create separate class for each saga, you just have to make obvious message chains.

DOCUMENT ORIENTED STORAGE

As you treat your data as composition of documents some NOSQL document-oriented storage is the most natural solution here. And you can contain all signed parts inside one nosql-document. The sings can be stored as expanded form to simplify filter/search operations. As primitive example you could have the following for user:

user:
{
    info: {
        name: "Victor",
        email: "some@mail.com",
        sign: {
            name: "Victor",
            user_id: 783275082375032503872503,
            time: 147812470128927;
            device: "hgio234803223943";  
        }
    },
    roles: {
        role: "writer",
        sign: {
            name: "Jack (Admin 145)",
            user_id: 3284903275037095732,
            time: 173248937248397;
            device: "394324h3uik2432h";  
        }
    }
}

You can see two parts here – user data entered by user and role given by some administrator.

Disadvantages of signed-documents approach

  • A bit more traffic is required (not in comparison to event sourcing, but to general 3-tier)
  • Slightly different view at data storage is required from your developers

Advantages of signed-documents approach

  • Implicit solution for all basic ‘messaging’ problems
  • Simple principle of data management which is close to actual business flow (signed documents)
  • Immutable parts give more consistency and less message implementation chaos
  • Sign validation and responsibility segregation give ability to track user influence and maintain data order (access restrictions from design)
  • High-concurrency access problems are solved with sign locks and cache actualisation settings.
  • After implementation functions related to signing and sign verifications/locks, approach induces minimal additional complexity – splitting your documents into parts gives you only benefits (especially when you can still grab them whole using nosql solution)

SUMMARY – WHAT TO DO

To summarise it all – instead of event sourcing we can use different concept – document-signing approach and the following rules:

  • Split your database objects not into composition of past events, but into composition of signed documents (divide objects by ownership and responsibility).
  • Treat objects parts as immutable record which can only be replaced completely.
  • Send whole signed immutable part inside each message which can affect them.
  • Each immutable part has a sign which contains time/user/sender/lock informations and is used to resolve all message collisions.
  • Extract counters and other second-order data into cache and regenerate it if composition of immutable parts has changed (or upon request).
  • Think of your data as composition of signed business documents.
  • In need of high-concurrent access impose additional sign and cache actualisation requirements.

I could also name this as Data Segregation by Responsibility  – CQRS/DSR. This approach makes CQRS messaging easy.

This document can be modified to reflect my new experience and can be extended with new samples. Any thoughts are welcome in comments.

]]>
http://vitiy.info/how-to-make-simple-cqrs-implementation-without-event-sourcing/feed/ 6