Distributed solutions – Victor Laskin's Blog

Encapsulation of asynchronous behaviour in distributed system of scripted autotests

Victor Laskin — Fri, 06 May 2016 07:01:02 +0000

This is an example how to create continuation-like monadic behaviour in C++11 (encapsulate async execution) which is used for specific example: writing asynchronous scripts for system of distributed autotests.

I needed to add autotests into continuous integration of client-side multiplatform product. Basic way to build autotesting system is to grab one of well-known frameworks (especially if we are talking about web solutions) and test product locally. Local testing environment simulates clicks buttons inside UI and gets the results right away. This is simple and good way except the case when you have truly cross-platform solution (win, mac, web, iOS, android, etc), custom rich UI framework and you want to perform tests on various kinds of systems at same time.

So I came up with remote distributed scheme for autotests. And inside this post there are some ways to make this scheme shine.

This post contains 3 main points:

what is distributed system of autotests and why the one would need it (advantages / disadvantages)
how to implement asynchronous scenarios of distributed autotests using continuation monadic style in C++11/14 (how to encapsulate asynchronous execution and do this the most compact way)
how to integrate ChaiScript scripting into such system to make it more flexible

PART 1 – DISTRIBUTED SYSTEM OF AUTOTESTS

If you are interested only in ways of encapsulation of async interfaces in C++ you could skip this chapter and move on to PART 2.

Let’s build server which will control clients via sending events through separate web-socket connection. It is like performing tests of car when you use remote control. Next diagram shows main structure differences:

Let’s discuss advantages at first:

Cover a lot of devices

In such scheme we could launch clients at any device (as we don’t need to setup local testing environment any more). Also it’s possible to run tests on clients under debugger and see what happens.

Write once – run on device park

We could launch tests on nightly builds using all available range of devices. When new test is written it will work on all devices (except case when test uses some platform specific aspect like push notifications, etc). When new device is added into the zoo there is almost zero overhead on setup of testing environment.

Analyze results

We could export statistics from all tests into one server side storage and analyze all variety of test cases inside one analytical system. In my case it’s ElasticSearch + Kibana, but this could be any tools/storages you like to work with. Also the one could use autotests as profiling system: you could compare performance results measured during same tests (using different devices or different versions of target application).

Fast write, Fast run

Once again you need to write test only once to support all variety of platforms, operating systems, devices, etc. This is huge gain in time. Also when there is no need in custom setup of testing environment on device we could analyse more cases or spend more time writing new tests. Developers could run tests right on their builds without no need to upload them into some testing environment.

Finally – More stable CI

Performing tests on large zoo of devices, analysing not only green lights from tests but fine performance measurements, increased speed of test development and autotests integration – all this leads to more stable continuous integration.

Disadvantages

Are there any disadvantages? First one is that you have to spend some time implementing such system. There are no boxed solutions as far as I know. But actually this is not the main problem. On application side it’s enough to create simple interface for accepting external commands received through socket. It could be only UI control commands (like click at x,y coords or is this element visible requests), or it could be some complex interaction with inner application model. Any way this is simple request-answer interface, which could be extended with periodic performance reports, sending last chains of inner events or even passing crash log reports to server.

What really is the problem: how we should write tests when test target is separated from test itself – how to write asynchronious tests?

For example, instead of general way when you deal with local tests and need to push some interface button you just need to call some method like button1.click() and thats all. But when you are dealing with distributed scheme you need to send some message to client – “push button1” and wait for answer. And we need to take into account that client might not even answer or our ui button might be hidden already and message could fail to execute.

To solve this first idea is to implement test on server side as finite state machine (link). Let’s assume test has a state which could be represented as some simple enumeration (or item from some set). Each state has listeners for client’s answers. And by each async operation we jump from one state to another. For example, for “click button” case we have two states – first: we send message and switch to second state, second: if we receive answer that button was successfully clicked we move on to next action.

If we have simple test of linear continuous actions we could just keep state as int and increment it performing each ‘step’ from current state to next one.

STEP("start",
      send("tests.ui.click element", {{"id", bLogin}});
      state++;
     );
 
STEP("tests.ui.click element",
     if (answer->isOk())
     {
         state++;
         // .... next action here ....
     } else 
     { 
         // try something else
     }
     );

Everything could be implemented in such manner, but the problem is that if your tests are not so trivial the code will become unreadable mess of state jumps. Also there will be a problem to extract functionality into separate methods because state logic overlaps each other – finishing handler for last action inside some function block should move on to next state which is outside of scope. You might look for ways to organise this structure and solve such ‘overlap’ problems but there is a better solution to solve this at once.

PART 2 – C++14 CORE FOR WRITING ASYNCHRONOUS TESTS

IT’S POSSIBLE TO WRITE ASYNCHRONOUS CODE IN SYNCHRONIOUS MANNER

In ideal world I just want to keep writing the code of my test using old ‘synchronous‘ way as it is more simple, readable, etc. But I want it to work async way. To achieve this we need all power of C++11/14 because this will look like continuation monad in C++. What are continuations you could read here.

I want to write code using general sequential notation – but implicitly this code will produce a lot of functional handlers and totally encapsulate whole state machine inside. Yes, this is possible using modern C++!

Warm up example – simple login test:

// THIS IS ASYNCHRONIOUS!
    
client->getCurrentView()->assertValue("Start view:", "login");
auto tbEmail = client->getElement("login", "tbEmail")->assertOk();
client->setTextInTextbox(tbEmail, email);
auto tbPsw = client->getElement("login", "tbPassword")->assertOk();
client->setTextInTextbox(tbPsw, password);
auto bLogin = client->getElement("login", "bLogin");
bLogin->assertOk();
client->clickElement(bLogin);
    
client->getCurrentView()->makePeriodic(300)->successOnValue("main");

This seems like normal local test but in reality under the hood this code creates state machine with message handlers which will be executed asynchroniously. Not only this is much shorter than defining states and handles manually, but also this is general way of doing things which is familiar for any auto tester.

The problem is that not all things could be transferred totally unchanged. Cycles, conditions and so on can not be written in such clean way except we introduce some preprocessing or DSL to make changes into asynchronous form. But let’s go step by step.

IMPLEMENTATION DETAILS

First big help here is auto keyword. When we execute request method like getElement in previous example the result is actually not a value, but a future. And as we need as compact syntax as possible and a lot of additional customisations, here will fit custom implementation. (There are a lot of ways to implement this – so any specific ways are optional and you could tune them as you like)

Here is part of such asynchronous value which is not yet acquired:

template 
class ATRequestValue : public std::enable_shared_from_this> {
public:
    
    std::shared_ptr value;        // main value - is empty at start
    
    ATTestState state;               // state of test when request has to be made

    std::function request;                       // make request
    std::function response;     // parse response
    MVMessage answer;                                       // client's answer [optional]
    //....

Main fields here: value – pointer to value which will be filled when answer from client will be received by server. state – corresponding state of test. request – function which will send actual request to client. response – function with fill parse answer from server.

And so getElement function in previous code piece is not actually performing some actins but creates such ATRequestValue and fills it with corresponding request/response handlers as lamda functions:

auto getElement(const ValueProvider& prefix, const ValueProvider& name){
        return addRequest(std::make_shared(test,
            "tests.ui.get element",
            [=](ATIdPtr value){
                 test->send("tests.ui.get element", {{"prefix", prefix()}, {"name", name()}}, value->delay);
            },
            [=](MVMessage msg){
                 return std::make_shared(toID(msg->getString("id")));
            }
        ));
}

In this example we try to find some UI element by name and some prefix (which is also string), and result is some identifier of type id.

Function addRequest is something like this:

/// Add custom request (async stage) into chain
    template
    T addRequest(T request)
    {
        // add item to current scope
        scopes.currentScope->addItem(request->name, [=](){
            jumpTo(request); 
            return true;
        });
        
        // add to list of requests
        requests.push_back(make_at_variant(request));
        return request;
    }

Before we discuss scopes, jumpTo, etc – let’s first talk why arguments of getElement are not just strings, but ValueProvider. We could have kept them as strings if all parameters had been predefined before test. We can’t do that because in reality the most of inputs are computed depending on previous results. Thats why we need value providers – functors which will return (provide) some value of predefined type.

// Provide value through evaluation of inner function
template 
class ValueProvider {
private:
    std::function f;
public:
    ValueProvider(const std::function& f) : f(f) {}
    ValueProvider(T value) : f([value](){ return value; }) {}
    
    // construct from any functor
    template 
    ValueProvider(F f) : f([f](){ return f(); }) {}
    
    // get operator
    T operator()() const { return f(); }
};

Now we could pass lamda as parameter to getElement which will be executed later – for example:

auto bLogin = client->getElement("login", [](){ 
    // here could be complex logic
    return "bLog"s + "in"s; 
});

Now let’s go back to place where getElement is called. Details on how exactly we form request and read response are not so important – what matters is how to make a chain of such calls. This is the place where we need scopes, elements, requests and jumps between them.

Let’s look at first simple case when we have only one scope (simple test with no branches or cycles). In such case we just need basic jump from one request to another (and gather them inside one scope’s storage to keep them alive).

template 
void jumpTo(std::shared_ptr> value)
{
    test->state = value->state;
    test->state++; // just increment state - simple case
    value->request(value);
}

In simple case to perform jump we need just increment test’s state and perform request of next RequestValue. To finish the magic we just need to assemble() things together into one state machine.

// Final assemble
void assemble(){     
    ATTestState s = 1;
    for (auto request : requests)
         addHandlersForRequest(s, request);
}

/// Main constructor of stages
template 
void addHandlersForRequest(ATTestState& state, std::shared_ptr> value)
{
     value->state = state;
     
     // go to next stage   
     state++;
        
     // add answer
     test->actions[state].set(value->name, [value, this](MVMessage msg){
            value->answer = msg;
            value->value = value->response(msg);
           
            scopes.runNextItemInScope();
        });
        
}

I’m skipping here the variadic part – ATRequestValue could contain different types of values and so the list of requests should be able to hold variadic types – but may be this will be the topic of separate post.

In trivial case when we have only one scope running next item could be pretty simple:

bool runNextItemInScope(){
    scope->position++;
    scope->children[scope->position].run();
}

And this enough to implement simple sequential chains without any branching. But we need more.

SCOPES AND BRANCHING

First problem arises when we want to add first conditional branch into our test’s logic. When we write something like:

auto text = getElementText(editBox1);
if (text.get() == "5"){
    auto element = getElement("main", "button1");
    client->clickElement(element);
}

This will not work because condition will be executed synchronously and text will not yet be received. So this code will crash. The solution is to create async_if implementation which will accept predicate and evaluate it at right moment when the text will be received. Also we need scopes here. They could be implemented as simple lists of run functions, but with some additional enter / exit handlers.

/// Item inside scope
class ATScopeItem {
public:
    string name;                        /// for debugging
    std::function run;          /// run item - returns false if there is nothing to run
};

// type of scope
enum class ATScopeType { normal, cycle };

/// Async scope
class ATScope {
public:
    ATScopeType scopeType;
    ATScope() : position(-1), 
       scopeType(ATScopeType::normal), 
       enter([this](){ position = -1; return !isEmpty(); }), 
       exit([](){ return false; }) {}

    vector children;             ///< the list of scope items
    
    int position;                             ///< current position in scope
    
    std::function enter;              ///< on scope enter
    std::function exit;               ///< on scope exit
    
    
    void addItem(string name, std::function item){
        children.push_back({name, item});
    }
    
    bool isEmpty() { return (children.size() == 0); }
};

Optional stuff here – names for scope elements (for debugging). Also there are scope types – we will need them later.

Enter and exit functions return boolean values – when true is returned it means we had some asynchronious action called inside and should wait for result. If false is returned we could continue execution and move to next item in scope without waiting.

Also we need some structure which will contain all scopes and organise movement between them. We could use some tree or stack here.

// Stack of scopes
class ATScopeStack {
public:
    
    ATScopePtr currentScope;  ///< pointer to current scope

    // main push/pop operations 
    void pushScope(ATScopePtr scope) { scopes.push(scope); currentScope = scope; }
    void pop(){ scopes.pop(); currentScope = scopes.top(); }
    
    bool enterScope(ATScopePtr scope){
        scope->position = -1;
        if (currentScope != scope)
            pushScope(scope);
        if (scope->enter())
            return runNextItemInScope();
        return false;
    }
    
    bool isRootScope(){ return (scopes.size() == 1); }
    
    // call run for next element (recursive, could go through stack of scopes)
    bool runNextItemInScope(){
        
        auto scope = currentScope;
        
        while (true)
        {
            // if we have more items in current scope
            if (scope->position < (int)scope->children.size() - 1)
            {
                scope->position++;
                if (!scope->children[scope->position].run())
                {
                    continue;
                }
                else
                {
                    return true;
                }
            }
            else
            {
                if (scope->exit())
                    return true; // exit() called some block
                
                if (isRootScope())
                    return false; // do nothing
                
                pop();
                scope = currentScope;
            }
        }
        return false;
    }
    
    std::vector allScopes;     ///< all scopes as list
private:
    std::stack scopes;    ///< current state of scope stack
};

Function enterScope here also returns boolean so it could stop synchronous execution and wait for next result in asynchronous chain. Now our main function runNextItemInScope which goes to next item becomes slightly more complicated – it’s a cycle which will call elements in current scope one by one until any of them signal that we have to wait by returning true from it’s run(). When there are no more items in scope we pop upper scope from scope stack and continue execution there using same pattern. And finally when scope has no parent (stack has only 1 item) we just stop.

It’s great that in such scheme we could implement not only “if” branch but cycles too. But let’s start with async_if:

auto async_if(std::function condition, std::function thenBody, std::function elseBody){
        
        // then
        auto thenScope = make();
        scopes.pushScope(thenScope);
        scopes.allScopes.push_back(thenScope);
        thenBody();
        scopes.pop();
        
        // else
        auto elseScope = make();
        scopes.pushScope(elseScope);
        scopes.allScopes.push_back(elseScope);
        elseBody();
        scopes.pop();
        
        // create scope item
        scopes.currentScope->addItem("async_if", [=](){
            if (condition())
                return scopes.enterScope(thenScope);
            else
                return scopes.enterScope(elseScope);
        });
    }

So this function actually executes both then/else branches right away! But this execution only creates functional handlers which will be called later. We pass both branches and predication as parameters here. Thanks to C++11’s lambda syntax it’s possible to write such code:

auto text = getElementText(editBox1);
async_if([=]()( return (text.get() == "5"); )
    [=](){
        // more async logic
        auto element = getElement("main", "button1");
        client->clickElement(element);
    },[=](){
        // do something else async way
    }
);

I have to admit this is not so pretty as simple if and this is very sad. Of course this is much better than writing async handlers straightforward way, but this still is not perfect. Optional way is to make some preprocessing eliminating boilerplate functional wrapping, or even make some DSL for autotests. Using additional scripting overlay may ease the pain a bit (will be discussed at part 3).

WHILE / CONTINUE

Same way we could define while cycle.

auto async_while(std::function condition, std::function body){
        // body
        auto scope = make();
        scope->scopeType = ATScopeType::cycle;
        scopes.pushScope(scope);
        scopes.allScopes.push_back(scope);
        body();
        scopes.pop();
        
        thenScope->exit = [=](){
            if (condition())
                return scopes.enterScope(scope);
            return false;
        };
        
        // create scope item
        scopes.currentScope->addItem("async_while", [=](){
            if (condition())
                return this->scopes.enterScope(scope);
            return false;
        });
    }

Here we set custom exit procedure which checks condition and reenters scope if it was not satisfied. Usage example could be the following:

async_while([](){ return true; }, [](){
    client->clickElement(someButton).setDelay(2000);
});

We could also introduce continue operator. This is the place where we need to know the type of scope. Logic is simple – we go up through scope stack until we find cycle scope, and then we reenter that scope.

auto async_continue(){
        // create scope item
        scopes.currentScope->addItem("async_continue", [=](){
            
            // we go back to first cycle scope
            while (scopes.currentScope->scopeType != ATScopeType::cycle)
            {
                scopes.pop();
                
                if (scopes.isRootScope())
                    return false; // do nothing
            }
            
            return scopes.enterScope(scopes.currentScope);
        });
}

Now we could have any kind of nested structure like async_continue inside async_if which is inside async_while.

FOREACH

Async foreach is a bit tricky as you have to iterate over something which is not yet obtained. But whole trick is that you have to use data provider instead of data itself. Provider is a function which gives array – and you could access it by index or iterator. There are a lot of freedom here so I put here only basic start example for some vector and indexed access:

template 
auto async_foreach(ValueProvider> listProvider, std::function)> body){
        
        // we create new provider for element of list
        std::shared_ptr index = make_shared(0);
        ValueProvider elementProvider([=](){
            return listProvider()[*index];
        });
        
        // body
        auto scope = make();
        scope->scopeType = ATScopeType::cycle;
        scopes.pushScope(scope);
        scopes.allScopes.push_back(scope);
        body(elementProvider);
        scopes.pop();
        
        scope->enter = [=](){
            thenScope->position = -1;
            if (listProvider().size() == 0) return false;
            *index = 0;
            return !scope->isEmpty();
        };
        
        scope->exit = [=](){
            *index += 1;
            if ((*index >= listProvider().size()) || (scope->isEmpty()))
                return false;
            scope->position = -1;
            return scopes.runNextItemInScope();
        };
        
        // create scope item
        scopes.currentScope->addItem("async_foreach", [=](){
            return this->scopes.enterScope(scope);
        });
        
    }

Here we redefine both enter/exit handlers to create an iterator. Cycle body is called with parameter which is a value provider once again.

Usage example:

global tabs = getElementChildren(barWithTabs);
 async_foreach(
      provideVectorId([](){ return tabs.get(); }),
      [](auto x){
           auto tab = checkElement(x);
           touchElement(tab, 5.0, 5.0).setDelay(1000);
      });

This sample clicks all tabs inside some menu bar.

INSERT ANY SYNCHRONIOUS ACTION INTO CHAIN – ASYNC_DO

There are a lot of cases when between asynchronous requests we need to perform some synchronous actions (like printing something to log). Of course it cannot be written general way because in that case it will be executed at test’s assemble time when we only set handlers and yet have no data. As one way to solve this we add additional function – async_do.

auto async_do(std::function body){
        
        // create scope item
        scopes.currentScope->addItem("async_do", [=](){
            body();
            return false;
        });
         
}

So we just wrap functional body into scope element and insert it into scope. We could introduce here a lot more of helper functions such as async_set or async_copy. They could also assign the result to some variable or do something else.

Usage example of async_do could be found in part 3.

EXTENDING ASYNC VALUES

Second way to perform synchronous actions is to insert some functional checks right inside request values.

First one to add is special assert which check that async request went well:

bTotalPrice = getElement("", "bTotalPrice").assertOk("Something went wrong");

// or we could check some value
auto text = getElementText(bTotalPrice).assertNotEqual("$0", "Balance should not be equal to zero in this test");

// or we could even check some functional condition
text.assertCondition("Balance should be in USD", [](auto x){
    return (x.get().find("$") != string::npos); 
});

// or we could just do something after value was received
text.andDo([](){ log("Data was received"); });

To implement such functions we only need to create array of functional checks inside ATRequestValue and add such methods as:

/// Assert that value is equal to given value
Ptr assertValue(string message, T shouldBeValue){
        checks.emplace_back([this, shouldBeValue, message](){
            if (*value != shouldBeValue) 
                 test->fail(message + " expected: " + request_value_to_string(shouldBeValue) + " got: " + request_value_to_string(*value));
        });
        return this->shared_from_this();
}

I use fluent interface pattern here – this is optional of course.

You can find usage examples of such checks in part 3.

Also we could add function waitFor(interval, condition) which will make periodic requests until provided functional condition will be fulfilled.

I think you already got a lot of ideas how to expand this approach. So let’s move on to the last chapter.

PART 3 – INTEGRATION OF SCRIPTING LANGUAGE INSIDE C++

Final step to make our life sweet enough is addition of some scripting language. Advantages?

there will be ability to add new tests without rebuilding server (actually we even could not restart it)
we could write new tests faster
whole thing becomes more flexible
automation testers feel more comfortable as it’s not C++ (yes, that’s a plus )

Here comes our new hero – ChaiScript. At first I was thinking about lua, but after reading ChaiScript’s docs I decided that language has more that enough flexibility to cover autotests needs. Main advantage of ChaiScript is that it is made for C++ integration and so the one could expect very fast way to add the scripting into the app. ChaiScript is relatively new but mature enough. It has relatively good documentation.

It took me only one day to integrate it into testing environment.

Whole ‘scripting part’ implementation at the moment is just 300 lines of code. And it gave the ability to write all asynchronous tests completely inside separate script files.

Base syntax of ChaiScript is very similar to C++ or JS. In simple case there are only two points the one should change to make C++ test work as script:

// Lambda functions have different declaration
[](){ ... }  

// is replaced with 
fun(){ ... }   

// And -> operator is replaced with .
value.get() // not value->get()

Actually syntax for lambdas is even a bit better as too much square brackets may make things less readable.

Example of test:

// This is script

def start(){

    getCurrentView().assertValue("Start view:", "main");

    async_while(fun(){ return true; }, fun(){

        auto plotter = getElement("main", "plotter").assertOk("Can't find main plotter");
        
        // replot if plot button is visible
        auto bPlot = getElement(plotter, "bPlot");
        global isPlotBtnVisible = getElementVisibility(bPlot);
        async_if(fun(){ return isPlotBtnVisible.get(); },
                 fun(){ clickElement(bPlot); }, 
                 fun(){});

        // select min or max value (random) by clicking on it
        auto bExtremum = getElement(plotter, provide(fun(){ return (random(2) == 0) ? "bMax" : "bMin"; })).assertOk("Can't find min/max button");
        clickElement(bExtremum);

        // get selected value
        auto bSelectedValue = getElement(plotter, "bValue").assertOk("Can't find selected value");
        global valueText = getElementText(bSelectedValue).logValue("Value:").assertCondition("Not zero:", fun(x){ return (x.get().size() > 0); });
        async_do(fun(){
            // here we strip $ sign, convert text into double and print it to log
            auto b = replaceInString(valueText.get(), "\$", "");
            auto balance = to_double(b);
            log("Balance: " + to_string(balance));
        });

        // wait 3 secs
        getCurrentView().setDelay(3000).assertValue("Current view is still:", "main");
    });

};

This is test which every 3 seconds selects min or max value on some plotter, performs some checks and prints it to log. I hope now you can feel the benefits of encapsulation of async requests.

As for interface implementation between scripting and C++ – it is pretty simple. Not only you could export functions and object methods to ChaiScript, but you could also export custom type conversions, complex types and so on. I will not provide implementation details here as it will make the post too big, but you could get the idea from this cheatsheet.

One minor disadvantage: as far as I know there is no default parameter capture modes for lambda functions inside ChaiScript at the moment. If you work with a lot of small functions this could be an improvement to have syntax like fun[=](){ …. } which is not available at the moment. Or even make is as default behaviour. I hope @lefticus will add this in future.

Anyway, ChaiScript seems like a nice solution for me at the moment.

SHORT CONCLUSION

Encapsulation of async execution using modern C++ gives ability to create distributed solutions. This post shown how such technique could be used for creation of custom system of distributed autotests. Scripting could be added into such systems to increase flexibility.

Small presentation of my cross-platform engine for mobile and desktop applications

Victor Laskin — Wed, 21 Jan 2015 14:50:39 +0000

I made small presentation about my cross-platform engine for mobile and desktop applications. Codename Kobald. Click on image to play it in new window (use arrows and space to move through):

This is not-so-technical presentation and main info about engine will come later as separate post.

Writing custom protocol for nanomsg

Victor Laskin — Tue, 09 Dec 2014 11:24:04 +0000

Nanomsg is next version of ZeroMQ lib, providing smart cross-platform sockets for implementation of distributed architectures. Here you can find basic examples of included protocols (communication patterns). Lib is simple (written in pure C) and does not have any dependencies like boost. And as this is at least 3rd iteration from same author you can expect some quality/performance here.

This is kind of solution for the hell of writing of your own serious socket server. If you already had such experience you should understand the range of problems which are not so obvious at start. But here we expect to skip all such problems and go straight to processing messages. Lib handles automatic reconnection in case of link disconnects, nonblocking receiving/sending, sockets which can handle large set of clients, etc. All this seems like perfect solution for server-side inner transport of fast distributed architectures.

But i also want to try it outside. The basic communication patterns (PAIR, BUS, REQREP, PUBSUB, PIPELINE, SURVEY) may fit large set of inner server transport schemes, but there are some minor limits in current implementation for client side application. I mean limits of protocols, not the lib itself.

Unfortunately, current version of PUBSUB protocol does filtering on subscriber side. So ‘subscribing clients’ will receive all message flow and this is unbearable for me.

BUS protocol requires full-linked scheme:

I expect BUS-like protocol to work in more sparse conditions.

As nanomsg is open-source lib under nice licence (MIT/X11 license) – first thought was to extend some of existing protocols to meet my needs.

Why new protocol?

As i wanted to try these smart sockets for external clients, to meet today’s reality i’m assuming each client has set of devices, which are connected simultaneously to some network service.

At first i aimed to create some complex routing protocol, but than came up with more simple approach: I want to create custom protocol as fusion of BUS and SUB/PUB protocols (here i refer it as SUBBUS).

Scheme:

Black lines are initial connections. Coloured lines are messages. This scheme contains 2 clients Bob and John. John has 2 devices and Bob is geek, so he has 4 devices simultaneously connected to server node. Each message from client device goes to other devices of same client. You can look at this scheme as two BUS protocols separated by subscription.

This gives ability to perform instant cloud synchronisation, simultaneous operation from multiple devices and other various fun stuff.

Possible inner structure (there can other ways):

Each node has the list of subscriptions (socket option as list of strings) i.e. /users/john/ or /chats/chat15/.
Subscription filtering is done on sending side (This is important if you have large number of clients – each of them don’t have to receive all messages. Not only for saving bandwidth but also for security reasons.) So client should somehow send his subscription list to server (subscription forwarding). In case of reconnect this information should be also resent again. While subscriptions were not sent client should receive nothing.
Each message should contain routing prefix (header) i.e. /users/john/ or /chats/chat15/
Each node should have tree of connected client subscriptions which contains pipe lists as leafs. Sending operation uses this tree to send to subscribed range of clients.
Each message from client node should be transmitted to other nodes within same subscription (forwarding). This is done before server side processing and aimed to speed up message propagation between devices. Some optional filters can be added here.
[Optional] SSL-like encryption for each pipe
All this stuff should be as simple as possible

Its not too complicated to start writing your own protocol for nanomsg. The only problem is that lib is written in pure C – so you must be a bit ready for it. Go to src/protocols folder. It contains all protocols sources you can explore. Mostly they simply implement the list of given methods, which are described inside src/protocol.h:

/*  To be implemented by individual socket types. */
struct nn_sockbase_vfptr {

    /*  Ask socket to stop. */
    void (*stop) (struct nn_sockbase *self);

    /*  Deallocate the socket. */
    void (*destroy) (struct nn_sockbase *self);

    /*  Management of pipes. 'add' registers a new pipe. The pipe cannot be used
        to send to or to be received from at the moment. 'rm' unregisters the
        pipe. The pipe should not be used after this call as it may already be
        deallocated. 'in' informs the socket that pipe is readable. 'out'
        informs it that it is writable. */
    int (*add) (struct nn_sockbase *self, struct nn_pipe *pipe);
    void (*rm) (struct nn_sockbase *self, struct nn_pipe *pipe);
    void (*in) (struct nn_sockbase *self, struct nn_pipe *pipe);
    void (*out) (struct nn_sockbase *self, struct nn_pipe *pipe);

    /*  Return any combination of event flags defined above, thus specifying
        whether the socket should be readable, writable, both or none. */
    int (*events) (struct nn_sockbase *self);

    /*  Send a message to the socket. Returns -EAGAIN if it cannot be done at
        the moment or zero in case of success. */
    int (*send) (struct nn_sockbase *self, struct nn_msg *msg);

    /*  Receive a message from the socket. Returns -EAGAIN if it cannot be done
        at the moment or zero in case of success. */
    int (*recv) (struct nn_sockbase *self, struct nn_msg *msg);

    /*  Set a protocol specific option. */
    int (*setopt) (struct nn_sockbase *self, int level, int option,
        const void *optval, size_t optvallen);

    /*  Retrieve a protocol specific option. */
    int (*getopt) (struct nn_sockbase *self, int level, int option,
        void *optval, size_t *optvallen);
};

So you can just clone some protocol as base foundation for your own – i took bus folder and cloned it to subbus. I renamed everything inside from ‘bus’ to ‘subbus’ using find/replace. In root src folder there is bus.h file which contains only list of consts for protocol access. You also need to clone it under your new protocol name (subbus.h in my case). Next steps are to add new protocol to makefile and socket types list.

Add to makefile.am:

NANOMSG_PROTOCOLS = \
    $(PROTOCOLS_BUS) \
    $(PROTOCOLS_SUBBUS) \ .....

PROTOCOLS_SUBBUS = \
    src/protocols/subbus/subbus.h \
    src/protocols/subbus/subbus.c \
    src/protocols/subbus/xsubbus.h \
    src/protocols/subbus/xsubbus.c

Add protocol to /core/symbol.c

{NN_BUS, "NN_BUS", NN_NS_PROTOCOL,
        NN_TYPE_NONE, NN_UNIT_NONE},
{NN_SUBBUS, "NN_SUBBUS", NN_NS_PROTOCOL,
        NN_TYPE_NONE, NN_UNIT_NONE},

Add protocol’s socket types into supported list inside /core/global.c (don’t forget includes):

/*  Plug in individual socktypes. */
  
...
    nn_global_add_socktype (nn_bus_socktype);
    nn_global_add_socktype (nn_xbus_socktype);
    nn_global_add_socktype (nn_subbus_socktype);
    nn_global_add_socktype (nn_xsubbus_socktype);

After that i grabbed one of examples for bus protocol from here and changed socket creation part:

#include "../nanomsg/src/subbus.h"

int node (const int argc, const char **argv)
{
  int sock = nn_socket (AF_SP, NN_SUBBUS);
  if (sock < 0)
  {
    printf ("nn_socket failed with error code %d\n", nn_errno ());
    if (errno == EINVAL) printf("%s\n", "Unknown protocol");
  }

...

After that sample should compile and work. If you failed to add your protocol copy to socket types list you will get Unknown protocol error.

Here is complete Dockerfile i use to build&run simple test. It gets latest nanomsg from github, modifies sources to include new protocol, copies protocol source from host, builds the lib and protocol test.

# THIS DOCKERFILE COMPILES Custom Nanomsg protocol + sample under Ubuntu
 
FROM ubuntu

MAINTAINER Victor Laskin "victor.laskin@gmail.com"

# Install compilation tools

RUN apt-get update && apt-get install -y \
    automake \
    build-essential \
    wget \
    p7zip-full \
    bash \
    curl \
    git \
    sed \
    libtool

# Get latest Nanomsg build from github

RUN mkdir /nanomsg && cd nanomsg
WORKDIR /nanomsg

RUN git clone https://github.com/nanomsg/nanomsg.git && ls


# Modify nanomsg files to register new protocol

RUN cd nanomsg && sed -i '/include "..\/bus.h"/a #include "..\/subbus.h"' src/core/symbol.c && \
	sed -i '/"NN_BUS", NN_NS_PROTOCOL,/a NN_TYPE_NONE, NN_UNIT_NONE}, \n\
    {NN_SUBBUS, "NN_SUBBUS", NN_NS_PROTOCOL,' src/core/symbol.c && \
	cat src/core/symbol.c && \
	sed -i '/#include "..\/protocols\/bus\/xbus.h"/a #include "..\/protocols\/subbus\/subbus.h" \n\#include "..\/protocols\/subbus\/xsubbus.h"' src/core/global.c && \
	sed -i '/nn_global_add_socktype (nn_xbus_socktype);/a nn_global_add_socktype (nn_subbus_socktype); \n\
    nn_global_add_socktype (nn_xsubbus_socktype);' src/core/global.c && \
	cat src/core/global.c | grep nn_global_add_socktype

# Modify Makefile.am 

RUN cd nanomsg && sed -i '/xbus.c/a \\n\
PROTOCOLS_SUBBUS = \\\n\
    src/protocols/subbus/subbus.h \\\n\
    src/protocols/subbus/subbus.c \\\n\
    src/protocols/subbus/xsubbus.h \\\n\
    src/protocols/subbus/xsubbus.c \n\
    \\
    ' Makefile.am && \
    sed -i '/$(PROTOCOLS_BUS)/a $(PROTOCOLS_SUBBUS) \\\
    ' Makefile.am && cat Makefile.am 


# This is temporal fix - DISABLE STATS
RUN sed -i '/nn_global_submit_statistics ();/i if (0)' nanomsg/src/core/global.c

# Get custom protocol source (copy from host)

RUN mkdir nanomsg/src/protocols/subbus
COPY subbus.h /nanomsg/nanomsg/src/
COPY subbus/*.c /nanomsg/nanomsg/src/protocols/subbus/
COPY subbus/*.h /nanomsg/nanomsg/src/protocols/subbus/

# Build nanomsg lib

RUN cd nanomsg && ./autogen.sh && ./configure && make && ls .libs

# Get and build custom protocol test

RUN mkdir test
COPY testsubbus.c /nanomsg/test/
COPY test.sh /nanomsg/test/
RUN cd test && ls && gcc -pthread testsubbus.c ../nanomsg/.libs/libnanomsg.a -o testbus -lanl && ls

# Set port and entry point

EXPOSE 1234 1235 1236 1237 1238 1239 1240
ENTRYPOINT cd /nanomsg/test/ && ./test.sh

Note: the lib is still beta (0.5-beta, released on November 14th, 2014) so you could expect something yet not polished there. Inside script you could find the line which disables statistics as it has some blocking bug at the moment but i expect it to be fixed very soon as the fix was pulled already.

Docker is optional way to build this, of course, and you can modify this Dockerfile to simple client script. Don’t forget to change the name of your protocol.

Modifications i made

I will not paste here the cuts of source code as it will make the post too messy. This is plain old C so even simple things tend to be a bit longer there. So i will note some main steps of my implementation. Keep in mind that thats only my approach and everything can be done another way.

I modified nn_xsubbus_setopt to set subscriptions (i use linked list to store the list of local subscriptions).

I have two trees to speed up all process of communication routing. First tree contains descriptions of client subscriptions by pipe id (nn_pipe*). Also it contains the flag if this node’s subscriptions were sent to this pipe for first time. To make this tree more balanced i use some hash of pointer to pipe as binary tree key.

This tree is used in nn_xsubbus_add / nn_xsubbus_rm / nn_xsubbus_out functions to synchronise subscription lists. nn_xsubbus_add is called when new pipe is connected and there we add new leaf into the tree. nn_xsubbus_out tells that pipe is writable so we can send our list of subscriptions to other side (if we have not already done it). nn_xsubbus_rm – pipe was removed.

Second tree is used for main sending operation and gives the list of pipes by subscription string key. As starting point i took triple tree where each node contains actual list of connected pipes. nn_xsubbus_send method splits header from each message and sends it to corresponding tree part.

When new message arrives inside nn_xsubbus_recv there is check of header, and if it starts from special mark of the list of subscriptions – we add this list into the second tree. If message is ‘normal’ there is sending to other pipes of same subscription (message forwarding as BUS protocol wants).

Note, that trees should work as persistent trees in multithread environment. I prefer some non locking structures here. Current implementation does not clean up chains of disconnected leafs (just removes the pipes) to achieve this simple way. Some tree rebalancing algorithm would be nice to add in future.

As test i slightly modified bus test sample to set subscription from argv[2] as socket option and prepend message by current subscription.

./testbus node0 / tcp://127.0.0.1:1234 & node0=$!
./testbus node1 /USER/JOHN/ tcp://127.0.0.1:1235 tcp://127.0.0.1:1234 & node1=$!
./testbus node2 /USER/BOB/ tcp://127.0.0.1:1236 tcp://127.0.0.1:1234 & node2=$!
./testbus node3 /USER/JOHN/ tcp://127.0.0.1:1237 tcp://127.0.0.1:1234 & node3=$!
./testbus node4 /USER/BOB/ tcp://127.0.0.1:1238 tcp://127.0.0.1:1234 & node4=$!
./testbus node5 /USER/BOB/ tcp://127.0.0.1:1239 tcp://127.0.0.1:1234 & node5=$!
./testbus node6 /USER/BOB/ tcp://127.0.0.1:1240 tcp://127.0.0.1:1234 & node6=$!

Here is the part of test output (for Bob):

node5: RECEIVED '/USER/BOB/=node2 18' 20 FROM BUS
node4: RECEIVED '/USER/BOB/=node2 18' 20 FROM BUS
node6: RECEIVED '/USER/BOB/=node2 18' 20 FROM BUS
node2: RECEIVED '/USER/BOB/=node5 18' 20 FROM BUS
node5: RECEIVED '/USER/BOB/=node6 18' 20 FROM BUS
node4: RECEIVED '/USER/BOB/=node5 18' 20 FROM BUS
node6: RECEIVED '/USER/BOB/=node5 18' 20 FROM BUS
node2: RECEIVED '/USER/BOB/=node6 18' 20 FROM BUS
node5: RECEIVED '/USER/BOB/=node4 18' 20 FROM BUS
node2: RECEIVED '/USER/BOB/=node4 18' 20 FROM BUS
node4: RECEIVED '/USER/BOB/=node6 18' 20 FROM BUS
node6: RECEIVED '/USER/BOB/=node4 18' 20 FROM BUS

As you can see there is bus between node2, node4, node5, node6.

I will post the sources here after i perform some tests with large set of clients, some stress tests and so on.