C++11: Implementation of list comprehension in SQL-like form

list comprehension

List comprehension in functional languages is the name of list constructor syntax, which is similar to set-builder notation from math.

What are the benefits of using list comprehension? One is readability, and the other one is the fact of decoupling iteration from actual construction. We could even hide parallel execution under the hood of list comprehension. Also by adding additional options to such declaration we could make the list construction a lot shorter.

If we look closer at list comprehension’s syntax, it will remind of one another very familiar thing – SQL select! Output expression, input set, predicates are equivalent to select, from, where sequence (of course not exactly, but they are very alike). Ok, let’s implement such syntax sugar using C++11 (without boost and LINQ-like libs).

You can skip implementation details and scroll to examples to see the result first.

SHORT IMPLEMENTATION

Let’s define whole operation as simple function which will produce vector<> of something.

Feel the power of variadic templates. First argument is simple function and its declaration defines output type and input variables. The next argument is tuple of source containers and their number should be equal to number of output expression’s arguments. Third argument is composition of ‘where’ filters. The rest parameters are optional flags.

You would expect a lot of code as implementation but the core will be quite compact.

Let’s write main processing cycle:

This is quite strait-forward approach – we will go through all combinations of input elements performing simple ranged for cycles. So when using list comprehension the one should keep in mind that providing too vast input arrays will result in too slow computation time. For example, if we have 3 sources, complexity will be O(n^3). So i decided to add some minor protection here – limit of requested data (it is like LIMIT instruction inside SQL-request syntax). When the limit is reached – cycles will be aborted (but this anyway will not solve all possible conditions).

As for main working part – here we use template recursion over tuple of sources, performing a cycle on each step. If you are not familiar with such approach here is working sample of simple iteration over tuple elements.

All other stuff below is optional!

First additional thing to add is set of optional flags to setup query limit, sorting options and so on. For implementation of simple bool flags we would just use class enum from C++11, but if we want to set complex options (like manual sorting with compare function, etc) we could go with the following approach:

Now our select has 3 additional options – limit, district and sort. Of course, we assume our result items to be comparable if we expect sort to work.

We can add additional declaration to be able to call select function without filters when we do not need them:

And to have fun let’s do some evil thing

Aside from quite obvious shortcuts for function calls we are making two important conversions here. First – we are converting first function into std::function using to_fn(). This is equivalent to to_function() from my previous post about functional decomposition in C++ and AOP. In short, it’s equal to casting to std::function<..> with automatic detection of arguments types and return type.

Second thing is more interesting – fn_logic_and(…). This method is making one function from bunch of filter functions (to be provided as third argument into main list comprehension function). This is not classic composition of functions as we don’t have chained calls, but instead it’s like logical operation over list of functions. For implementation of fn_logic_and() look at post bottom. This function is optional – you can pass the filter functions as tuple and iterate over it or store them as vector.

Note that if we don’t want to make copies of sources (as std::make_tuple does it) we can switch to std::forward_as_tuple as the solution for passing tuple of references. But this might not be so safe. Anyway the approach itself does not require to copy the sources – you can use references or smart-pointers to pass sources (after minor modifications).

As final touch before getting something that works, we need source class for range of natural numbers (as the majority of list comprehension examples use integers). To use ranged for over this source, we need to implement Iterator inside.

Note that by default the range is only [1..1000].

Ok, let’s have some fun.

EXAMPLES OF LIST COMPREHENSION 

Let’s start from something trivial:

We can implement map operation as list comprehension:

Ok, time for filters. Let’s find matching elements inside two sets of integers (and sort them):

Another example using DISTINCT:

Next example from Erlang guide on list comprehension – pythagorean triplets:

Result:

We could run select using custom data classes ( select id and name from users where id in [….] )

We also can perform nested selects:

Sure in haskell this will be a lot more compact, but, as you can see, it’s not as scary as expected from C++.

LAZY EVALUATION

If we want to make whole select lazy this can be done pretty easily:

Notice that function contains capture by value inside. This is optional choice. Anyway, if your ‘where‘ filters contain capture by reference of some additional parameters this could lead to undefined behaviour when is used out of scope. But this is general practice with lambdas.

There is nice post from Bartosz here about laziness, list comprehension and C++’s way to implement it from different angle. Here is described more strait-forward and thus more simple and clear way (it does not mean it’s better). We don’t wrap data and sources into something and so on, but implementation of laziness from Bartosz has one nice feature – it gives you ability to request results of query part by part as true lazy evaluation should do. Let’s add this feature into current scheme.

Let’s add new option, which will contain information about current request state. For more compact example it will just contain array of integer indexes (you can change it to Iterators if you want).

When this option is provided, iterations over tuple of sources should be slightly altered:

We just changed cycle to indexed version. Function stores last iteration positions inside indexes[] array and starts from it when iteration is called again. One of options would be encapsulating this state inside select functor, but here we store state it inside outer option object. This is just one of possible ways.

Let’s change Pythagorean Triplets example… (this is not final version)

Looks nice? But there is one major problem. Lazy evaluation could work with very large ranges of input data (even infinite ranges are fine in Haskell). If we provide Naturals(1,100000000) as input for our 3 variables in current example, the cycles will run forever… What to do to solve this? Many ways:

  • we could recreate input sources before each iteration depending on current indexes of upper cycles
  • we could wrap sources into classes which will provide method narrowing iteration range before each cycle
  • we could provide additional restricting functions as options which will break inner cycles conditionally

The most flexible way is providing functions which will produce sources instead of forwarding sources themselves (as expected in functional world…). There is no problem to change implementation to handle such input, but i want add one more feature – i want to support both object sources and functions which will produces sources simultaneously. And even to be able to use them mixed in single request. And again, to implement this we could add relatively tiny layer of code (for C++):

Inside main cycles we add additional call to getSource() function to provide the source on each iteration. So now it looks like:

And getSource() has conditional implementation depending on which source type it gets. The main problem here is how to identify that provided class is lambda function?

To do this we can use std::enable_if in couple with std::declval and std::is_object:

First variant checks that object has begin() method. If it has it – it’s raw object. Second variant just attempts to get result type of possible function call. Luckily we can just pass the list of arguments as variadic template.

Let’s test it:

We had to change the order of parameters and now it works fast enough (enough for not optimised example code).

Note: the same way you can extend select to support different types of sources. For example, wrapped inside shared_ptr<>, etc.

Also note that after all modifications this is still just simple function. All macros are optional and you could use select() as normal function.

HOW ABOUT CONCURRENCY?

Making acceptable assumptions that our sources are safe to iterate over at the same time, there will be no problem to extend our sample to support concurrent execution! We can slightly modify continuation option, which we used for laziness by adding special constructors. This will allow us to manually set iteration start indexes as job parameters.

To be able to call select function in concurrent manner let’s add alias:

That’s all! Next is the example how run select concurrently:

This approach can be extended to support additional border conditions and automatic split into parts.

You also could implement implicit concurrency inside main iterations of list comprehension by splitting upper cycle into several equal parts.

CONCLUSION

It’s possible to implement list comprehension in C++11 without using LINQ or similar libs. It will not be haskell-short but still you can gain some profits from it, as more readable and maintainable code in some cases and so on. Combined with custom self-defined additional options you can make such declaration even more profitable.

Lazy and concurrent evaluation could be added with no big effort.

But be aware of inner implementation – as there can be cases when naive list comprehension can lead to high cpu-consumption. Also in some cases list comprehension can have longer declaration than other ways of functional processing possible in modern C++ (like the one discussed here).

If you don’t like SQL syntax or macros you can still use list comprehension as function without any macro. If you invent more short names for options, add short function from() which will just make tuple from arguments, you can already use it like:

and it’s also possible to get rid of to_fn() there.

Working example can be found on ideone / gist

APPENDIX: LOGIC OPERATIONS OVER FUNCTION LIST (OPTIONAL)

We could pass filters just as vector of functions slightly changing ‘checking’ part of cycle, so the next part is totally optional. But the following code might be useful somewhere else.

This part contains two functions: fn_compose(…) and fn_logic_add(…).

The first one is making normal mathematical composition: if you have input functions f1(),f2(),f3() it will return f(..) = f3(f2(f1(..))).

fn_logic_add will return f(..) = f1(..) && f2(..) && f3(..) as single std::function. Of cause, all input functions should have the same list of arguments and bool return type. This can be considered as logical and operation over the list of functions.

This can be extended to support wide range of other operations over the list of functors. As soon as i expand it to size of some considerable value i will publish this part coupled with other additional handy conversion functions as tiny lib or something on github.

Any thoughts and corrections are welcome at comments.