Array family interface tour

This documentation builds on the scitbx-tour, adding some trivial functions and the wrappers needed for joint Python/C++ array access.

Introduction

This tour includes

  • a general overview
  • simple example C++ code
  • its Python interface wrappers
  • examples of joint Python/C++ access to the C++ data

It also contains Python I/O examples to further illustrate the benefits of using a Python/C++ hybrid system.
The version array-family-tour-0_1 of voea contains the complete corresponding code.

Any module providing functionality for both C++ and Python consists of three parts: the main C++ files, the wrapper files, and a Python initializer for the wrapper.

Customization of the conversions are thus handled by, and separate from, the primary C++ code.

C++ array facilities

To accomodate a broad range of needs within C++, there are several ways of handling the same underlying array type. Briefly, the types involved are tiny, small, shared, and versa.

All types have an array interface following std::vector as much as possible, e.g.

.begin(), .end(), .size(), []

but there are tradeoffs between efficiency (space and time) and flexibility. Because the interface follows std::vector, refer to documentation of the latter when in doubt.

Automatic types with memory management

No use of new for allocation of any of these -- they handle their own memory. All exceptions (size violations, memory problems, etc.) propagate to Python. [] is available for all types.

  • tiny
    On the stack. Size specified at compile time. Strictly 1d, zero-offset array. Full name is scitbx::af::tiny.
    Example C++:
    tiny<double, 3> a;
    a[1];
    a[3];   // Error, but no bounds check, no exceptions.  Fast.
    tiny<double, 3> a(1,2,3);  // Up to ~10 args.
  • small
    On the stack. Capacity at compile time (think IWORK), size at runtime, resizeable to capacity. Strictly 1d, zero-offset array. Full name is scitbx::af::small.
  • shared
    On the heap Strictly 1d, zero-offset array. Run time size, resizeable beyond initial size (internal realloc pointer handling). Taken from std::vector, but copies of shared are by reference, with count. Full name scitbx::af::shared
  • versa
    Inherited from shared, but multidimensional via policy for indexing through accessors ( AccessorType ). Thus, multi-dimensional access to the internal contiguous one-dimensional data is customized via accessors. The full name is scitbx::af::versa. Shared versa arrays resizing is subtle, see the example below. Accessors can be customized for special needs, such as the padding needed for FFTs. Padding is handled by c_grid_padded in scitbx/include/scitbx/array_family/accessors.

    Example:
    // C indexing, homogeneous data.
    // Initial 11 by 22 array
    versa<float, c_grid<2> > va(c_grid<2>(11,22));
    // Same memory, now 2 by 5 array.
    va.resize( c_grid<2>(2,array) );
    

Automatic type examples

Using

namespace array = scitbx::af;

for clarity, the easiest to use (and slowest) array type is

array::shared< type >

The following forms a sequence of doubles from m to n.

array::shared
range_inefficient(int m, int n)
{
    array::shared<double> result;
    for(unsigned i = m; i < n; i++) {
        result.push_back( static_cast<double>(i) );
    }
    return result;
}

Here, the size of the array is known; by pre-allocating the space for the array, some work is saved:

array::shared
range_better(int m, int n)
{
    array::shared<double> result;
    result.reserve(n - m);      /* Avoid reallocation for known size. */
    for (unsigned i = m; i < n; i++) {
        result.push_back( static_cast<double>(i) );
    }
    return result;
}

Internally, there is still much checking going on. For most arrays, random access is desirable, so the indexing operator [] can be used. Also, initializing memory can take time and is not needed here, so the fastest array::shared version is

array::shared<double>
range_fast(int m, int n)
{
    /* Use correct initial size, do not initialize memory. */
    array::shared<double> result(n - m, array::init_functor_null<double>());

    for (unsigned i = m; i < n; i++) {
        /* Direct memory access */
        result[i] = static_cast<double>(i);
    }
    return result;
}

The use of array::initfunctornull<double>() is needed to convince C++ not to initialize the memory.

Raw types, no memory handling at all

The fastest access for heap-based arrays is offered via the ref and const_ref types.

  • ref
    Used for direct pointer-style access read/write access to array data for efficiency. In particular, there is no indirection as in shared or versa. The full name is scitbx::af::ref.
  • const_ref
    As ref, but for read only access. The full name is scitbx::af::const_ref

Raw type examples

The automatic types (shared, versa) still dereference memory on every access -- this is needed for cooperation between multiple "owners" of the array. For the most direct, C-like access, the range example can be done as follows

array::shared<double>
range_fastest(int m, int n)
{
    array::shared<double> result(n - m, array::init_functor_null<double>());

    /* Local raw ref. */
    array::ref<double> R = result.ref();
    for (unsigned i = m; i < n; i++) {
        R[i] = static_cast<double>(i);
    }

    return result;
}

Now, the array starting memory is dereferenced only once.

Python connection and usage

The two C++/Python array conversion mechanisms are

  1. copy semantics (pass-by-value)
  2. reference semantics (pass-by-reference)

Convenience operators are available in Python for both types.
For small objects, the content is used in forming a Python tuple or list, independent of the original. For larger objects, a handle to the original is made available in Python.
C++ produced return values correspond 1-1 to Python types; values passed from Python to C++ are many-to-1.

Pass-by-value types

Behavior

In pass-by-value, conversion to Python is always as tuple, while conversion to C++ is done for tuples, lists, ranges, and iterators -- most sequence types.

The tiny and small types use pass-by-value. C++ instances of these types do not exist as such in Python. They are all converted to a tuple, losing their type. E.g., the C++

tiny<double, 3> a(1,2,3);  // Up to ~10 args.

becomes

(1,2,3)

in Python; the type tiny is lost. Generally,

tiny<int, 3> a;

becomes a tuple

(int, int, int)

in python.
In the other direction, the C++ function signature determines the tuple/list conversion's target type. E.g., given

foo(tiny<int, 3> x)

the list 1,2,3 is converted to a new

tiny<int, 3>(1,2,3)

instance. Given

foo(small<float, 2> x)

the list 2,3 is converted to a new

small<float, 2>(2,3)

Examples using pre-wrapped types

The trivial function

int
tiny_to_cpp(array::tiny<int, 2> t)
{
    return (t[0] + t[1]);
}

has an equally simple wrapper

def("tiny_to_cpp", voea::tiny_to_cpp );

which, along with other wrappers for the module, is embedded in boilerplate code like

BOOST_PYTHON_MODULE(voea_ext)
{
    using namespace boost::python;
    ...
    def("tiny_to_cpp", voea::tiny_to_cpp );
    ...
}

On the Python side,

print voea.tiny_to_cpp( [1,2] )

then gives 3.

Introducing new template instantiations

The previous examples involve new functions using types already wrapped in the array toolbox.
When introducing new types, say

array::tiny<float, 2>

new wrappers may be needed. This type is likely to be useful to many users, and introducing multiple wrappers for a single type introduces problems.
While the global boost::python registry can be queried to avoid overlaps/contradictions as needed, it is simpler and cleaner to centralize the wrapping of native types -- all modules will have consistent access to them.

Here are the steps needed to add a new type to the central registry.

As these wrappers are new, they can reside anywhere; for now, they are defined locally. Later, the main package owner (Ralf) can/may include them as needed for conflict resolution.

Following

$SCITBX_DIST/array_family/boost_python/flex_ext.cpp

the extension module, voea_ext.cpp additionally needs

#include <scitbx/boostpython/containerconversions.h>

and

scitbx::boost_python::container_conversions::tuple_mapping_fixed_size<
  scitbx::af::tiny<float, 2>
  >();

This makes the type array::tiny<float, 2> available for use via Python's tuple-to-tiny conversion.

Overloaded C++ functions

Overloaded operators and functions are very common. To continue the previous sample, we want the function tinytocpp to also handle floats. Thus, in addition to the previous

int tiny_to_cpp(array::tiny<int, 2> t)

we define

float
tiny_to_cpp(array::tiny<float, 2> t)
{
    return (t[0] + t[1]);
}

for C++. The new tiny<float, 2> type is already wrapped. The wrappers for this pair of function need to be distinguished, and include full type casts. Thus, instead of

def("tiny_to_cpp", voea::tiny_to_cpp );

they are now

    def("tiny_to_cpp",
        (float (&)(array::tiny<float, 2>) ) voea::tiny_to_cpp );

    def("tiny_to_cpp",
        (int (&)(array::tiny<int, 2>) ) voea::tiny_to_cpp );

With these, the Python session becomes

>>> print voea.tiny_to_cpp( [1,2] )
3
>>> print voea.tiny_to_cpp( [1.1, 2] )
3.10000002384

Pass-by-reference types

In pass-by-reference, exchanges in both directions use a wrapped C++ object.
The pass-by-reference types are shared and versa. However, these do not show up in Python directly. Instead, a C++ wrapper module, flex, is used to interface to these. E.g., an instance of the C++ type

array::shared< scitbx::vec3<double> >

shows up as

<scitbx_array_family_flex_ext.vec3_double object>

in Python.
The Python interface strongly resembles that of the Python list type; see list examples and documentation for details.

C++ view

On the C++ side, flex is a module only; there is no C++ type flex. Only the file

scitbx_array_family_flex_ext.cpp

exists; there is no scitbx::array_family::flex namespace or class. All main C++ code thus uses shared and versa, never flex.

Python view

On the Python side, there is only

scitbx.array_family.flex

The types shared and versa do not show up at all, so all main Python code uses the flex interface (and ignored the underlying versa).
To provide the type

flex.double

in Python from an array of doubles in C++, the flex module defines a Python type double as a wrapper of the C++ type

versa<double, flex_grid<> >

Hence the naming flex::double, flex::vec3_double etc.

A note on implementation

The scitbx.array_family.flex Python module handles specific versa template instantiations.
Internal python array type is

scitbx::af::versa< ELEMENTTYPE, flex_grid< af::small<INDEXTYPE,10> > >

where

INDEXTYPE is usually long

Python instances of

scitbx::af::versa<double>

can be used by C++ functions with signatures

scitbx::af::shared<double>

In C++, this causes interference between boost, inheritance, and the (boost) flex extension module.
Python's accessors are ALL in the flex module. This means using

typedef scitbx::af::versa< scitbx::af::tiny<double, 3 > > euler_angles;

in C++ is fine, but the simple boost wrapper

class_<voea::euler_angles > ("euler_angles");

will NOT HAVE ANY of the af::flex accessors available in Python. However, the wrapper mechanism in flex_wrapper.h creates the boost python wrapper for a particular completed versa type: a chosen element type, and using flex_grid as accessor.

A wrapper for shared

The previously written

array::shared<double>
range_inefficient(int m, int n)
{
    array::shared<double> result;
    for(unsigned i = m; i < n; i++) {
        result.push_back( static_cast<double>(i) );
    }
    return result;
}

uses already wrapped types; only the binding

def("range_inefficient", voea::range_inefficient);

is needed.

Using a shared in Python

Basic access as usual:

arr = voea.range_inefficient(2,10)
arr
<scitbx_array_family_flex_ext.double object at 0x4016e86c>

By itself, this may not justify the connection effort. But some extra functionality, obtained without repeated programmer effort, does. First, there is direct conversion to Python types which makes vast functionality available immediately. Consider the printing problem. Instead of starting with

for i in arr:
  print i
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0

and improving this in innumerable ways (pretty-printing, using a re-readable format, etc.), one can simply use

list(arr)
[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

which could later be re-used via

new_arr = eval('[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]')
new_arr
[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

if the string was saved in a file.
Of course, data serialization is equally short:

ps = pickle.dumps(arr)
unpickled_arr = pickle.loads(ps)

This is also used for writing to and reading from files in binary form. Then, there is Matlab style slicing:

print list(unpickled_arr[1:5])
print list(unpickled_arr[1:5:2])

For complex structures, re-using existing facilities is mandatory; as a preview, the call

angles = voea.py(15.0,
                 [0.0, 90.0],
                 [0.0, 359.0])

produces a vector of triples. Now, the single command

pprint(list(angles))

provides a properly indented, delimited, trivially re-readable printout:

from pprint import pprint
pprint(list(arr))
angles
<scitbx_array_family_flex_ext.vec3_double object at 0x407e35cc>
pprint(list(angles))
[(0.0, 0.0, 0.0),
 (0.0, 15.0, 0.0),
 (0.0, 15.0, 71.799999999999997),
 ...
 (0.0, 90.0, 163.18181818181816),
 (0.0, 90.0, 179.49999999999997)]