Array family interface tour

Minimal framework add-on, Python use

Version $Id: array-family-tour.txt,v 1.3 2005/02/17 22:12:11 hohn Exp $


This builds on the scitbx-tour, adding some trivial functions and the wrappers needed for joint Python/C++ array access.

This tour includes

It also contains Python I/O examples to further illustrate the benefits of using a Python/C++ hybrid system.

The version array-family-tour-0_1 of voea contains the complete corresponding code.

Any module providing functionality for both C++ and Python consists of three parts: the main C++ files, the wrapper files, and a Python initializer for the wrapper.

Customization of the conversions are thus handled by, and separate from, the primary C++ code.

C++ array facilities

To accomodate a broad range of needs within C++, there are several ways of handling the same underlying array type. Briefly, the types involved are tiny, small, shared, and versa.

All types have an array interface following std::vector as much as possible, e.g.

.begin(), .end(), .size(), []
but there are tradeoffs between efficiency (space and time) and flexibility.

Because the interface follows std::vector, refer to documentation of the latter when in doubt.

Automatic types with memory management.

No use of new for allocation of any of these -- they handle their own memory. All exceptions (size violations, memory problems, etc.) propagate to Python. [] is available for all types.

On the stack. Size specified at compile time. Strictly 1d, zero-offset array. Full name is scitbx::af::tiny.
Example C++: 
tiny<double, 3> a;
a[3];   // Error, but no bounds check, no exceptions.  Fast.
tiny<double, 3> a(1,2,3);  // Up to ~10 args.
On the stack. Capacity at compile time (think IWORK), size at runtime, resizeable to capacity. Strictly 1d, zero-offset array. Full name scitbx::af::small
On the heap Strictly 1d, zero-offset array. Run time size, resizeable beyond initial size (internal realloc pointer handling). Taken from std::vector, but copies of shared are by reference, with count. Full name scitbx::af::shared
Inherited from shared, but multidimensional via policy for indexing through accessors ( AccessorType ). Thus, multi-dimensional access to the internal contiguous one-dimensional data is customized via accessors. The full name is scitbx::af::versa. Shared versa arrays resizing is subtle, see the example below. Accessors can be customized for special needs, such as the padding needed for FFTs. Padding is handled by c_grid_padded in scitbx/include/scitbx/array_family/accessors.
        // C indexing, homogeneous data. 
        // Initial 11 by 22 array
        versa<float, c_grid<2> > va(c_grid<2>(11,22));
        // Same memory, now 2 by 5 array.
        va.resize( c_grid<2>(2,array) );

Automatic type examples


namespace array = scitbx::af;
for clarity, the easiest to use (and slowest) array type is
array::shared< type > 
The following forms a sequence of doubles from m to n.
range_inefficient(int m, int n)
    array::shared<double> result;
    for(unsigned i = m; i < n; i++) {
        result.push_back( static_cast<double>(i) );
    return result;

Here, the size of the array is known; by pre-allocating the space for the array, some work is saved:

range_better(int m, int n)
    array::shared<double> result;
    result.reserve(n - m);      /* Avoid reallocation for known size. */
    for (unsigned i = m; i < n; i++) {
        result.push_back( static_cast<double>(i) );
    return result;

Internally, there is still much checking going on. For most arrays, random access is desirable, so the indexing operator [] can be used. Also, initializing memory can take time and is not needed here, so the fastest array::shared version is

range_fast(int m, int n)
    /* Use correct initial size, do not initialize memory. */
    array::shared<double> result(n - m, array::init_functor_null<double>());

    for (unsigned i = m; i < n; i++) {
        /* Direct memory access */
        result[i] = static_cast<double>(i);
    return result;
The use of array::initfunctornull<double>() is needed to convince C++ not to initialize the memory.

Raw types, no memory handling at all.

The fastest access for heap-based arrays is offered via the ref and const_ref types.

Used for direct pointer-style access read/write access to array data for efficiency. In particular, there is no indirection as in shared or versa. The full name is scitbx::af::ref.
As ref, but for read only access. The full name is scitbx::af::const_ref

Raw type examples

The automatic types (shared, versa) still dereference memory on every access -- this is needed for cooperation between multiple "owners" of the array. For the most direct, C-like access, the range example can be done as follows

range_fastest(int m, int n)
    array::shared<double> result(n - m, array::init_functor_null<double>());

    /* Local raw ref. */
    array::ref<double> R = result.ref();
    for (unsigned i = m; i < n; i++) {
        R[i] = static_cast<double>(i);

    return result;
Now, the array starting memory is dereferenced only once.

Python connection and usage

The two C++/Python array conversion mechanisms are

  1. copy semantics (pass-by-value)
  2. reference semantics (pass-by-reference)
Convenience operators are available in Python for both types.
For small objects, the content is used in forming a Python tuple or list, independent of the original. For larger objects, a handle to the original is made available in Python.

C++ produced return values correspond 1-1 to Python types; values passed from Python to C++ are many-to-1.

Pass-by-value types


In pass-by-value, conversion to Python is always as tuple, while conversion to C++ is done for tuples, lists, ranges, and iterators -- most sequence types.

The tiny and small types use pass-by-value. C++ instances of these types do not exist as such in Python. They are all converted to a tuple, losing their type. E.g., the C++

tiny<double, 3> a(1,2,3);  // Up to ~10 args.
in Python; the type tiny is lost. Generally,
tiny<int, 3> a;
becomes a tuple
(int, int, int)
in python.

In the other direction, the C++ function signature determines the tuple/list conversion's target type. E.g., given

foo(tiny<int, 3> x)
the list
is converted to a new
tiny<int, 3>(1,2,3) 
instance. Given
foo(small<float, 2> x)
the list
is converted to a new
small<float, 2>(2,3)

Examples using pre-wrapped types

The trivial function

tiny_to_cpp(array::tiny<int, 2> t)
    return (t[0] + t[1]);
has an equally simple wrapper
def("tiny_to_cpp", voea::tiny_to_cpp );
which, along with other wrappers for the module, is embedded in boilerplate code like
    using namespace boost::python;
    def("tiny_to_cpp", voea::tiny_to_cpp );

On the Python side,

print voea.tiny_to_cpp( [1,2] )
then gives 3.

Introducing new template instantiations

The previous examples involve new functions using types already wrapped in the array toolbox.

When introducing new types, say

array::tiny<float, 2>
new wrappers may be needed. This type is likely to be useful to many users, and introducing multiple wrappers for a single type introduces problems.

While the global boost::python registry can be queried to avoid overlaps/contradictions as needed, it is simpler and cleaner to centralize the wrapping of native types -- all modules will have consistent access to them.

Here are the steps needed to add a new type to the central registry.

As these wrappers are new, they can reside anywhere; for now, they are defined locally. Later, the main package owner (Ralf) can/may include them as needed for conflict resolution.


the extension module, voea_ext.cpp additionally needs
#include <scitbx/boostpython/containerconversions.h>
  scitbx::af::tiny<float, 2>

This makes the type array::tiny<float, 2> available for use via Python's tuple-to-tiny conversion.

Overloaded C++ functions

Overloaded operators and functions are very common. To continue the previous sample, we want the function tinytocpp to also handle floats. Thus, in addition to the previous

int tiny_to_cpp(array::tiny<int, 2> t)
we define
tiny_to_cpp(array::tiny<float, 2> t)
    return (t[0] + t[1]);
for C++. The new tiny<float,2> type is already wrapped. The wrappers for this pair of function need to be distinguished, and include full type casts. Thus, instead of
def("tiny_to_cpp", voea::tiny_to_cpp );
they are now
        (float (&)(array::tiny<float, 2>) ) voea::tiny_to_cpp );

        (int (&)(array::tiny<int, 2>) ) voea::tiny_to_cpp );

With these, the Python session becomes

>>> print voea.tiny_to_cpp( [1,2] )
>>> print voea.tiny_to_cpp( [1.1, 2] )

Pass-by-reference types

In pass-by-reference, exchanges in both directions use a wrapped C++ object.

The pass-by-reference types are shared and versa. However, these do not show up in Python directly. Instead, a C++ wrapper module, flex, is used to interface to these. E.g., an instance of the C++ type

array::shared< scitbx::vec3<double> >
shows up as
<scitbx_array_family_flex_ext.vec3_double object>
in Python.

The Python interface strongly resembles that of the Python list type; see list examples and documentation for details.

C++ view

On the C++ side, flex is a module only; there is no C++ type flex. Only the file

exists; there is no scitbx::array_family::flex namespace or class. All main C++ code thus uses shared and versa, never flex.

Python view

On the Python side, there is only

The types shared and versa do not show up at all, so all main Python code uses the flex interface (and ignored the underlying versa).

To provide the type

in Python from an array of doubles in C++, the flex module defines a Python type double as a wrapper of the C++ type
versa<double, flex_grid<> >
Hence the naming flex::double, flex::vec3_double etc.

A note on implementation

The scitbx.array_family.flex Python module handles specific versa template instantiations.

Internal python array type is

scitbx::af::versa< ELEMENTTYPE, flex_grid< af::small<INDEXTYPE,10> > >
INDEXTYPE is usually long

Python instances of

can be used by C++ functions with signatures

In C++, this causes interference between boost, inheritance, and the (boost) flex extension module.

Python's accessors are ALL in the flex module. This means using

typedef scitbx::af::versa< scitbx::af::tiny<double, 3 > > euler_angles;
in C++ is fine, but the simple boost wrapper
class_<voea::euler_angles > ("euler_angles");
will NOT HAVE ANY of the af::flex accessors available in Python. However, the wrapper mechanism in flex_wrapper.h creates the boost python wrapper for a particular completed versa type: a chosen element type, and using flex_grid as accessor.

A wrapper for shared

The previously written

range_inefficient(int m, int n)
    array::shared<double> result;
    for(unsigned i = m; i < n; i++) {
        result.push_back( static_cast<double>(i) );
    return result;
uses already wrapped types; only the binding
def("range_inefficient", voea::range_inefficient);
is needed.

Using a shared in Python

Basic access as usual:

>>> arr = voea.range_inefficient(2,10)
>>> arr
<scitbx_array_family_flex_ext.double object at 0x4016e86c>

By itself, this may not justify the connection effort. But some extra functionality, obtained without repeated programmer effort, does. First, there is direct conversion to Python types which makes vast functionality available immediately. Consider the printing problem. Instead of starting with

>>> for i in arr:
...         print i
and improving this in innumerable ways (pretty-printing, using a re-readable format, etc.), one can simply use
>>> list(arr)
[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
which could later be re-used via
>>> new_arr = eval('[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]')
>>> new_arr
[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
if the string was saved in a file.

Of course, data serialization is equally short:

ps = pickle.dumps(arr)
unpickled_arr = pickle.loads(ps)
This is also used for writing to and reading from files in binary form.

Then, there is Matlab style slicing:

print list(unpickled_arr[1:5])
print list(unpickled_arr[1:5:2])

For complex structures, re-using existing facilities is mandatory; as a preview, the call

    angles =,
                     [0.0, 90.0],
                     [0.0, 359.0])
produces a vector of triples. Now, the single command
>>> pprint(list(angles))
provides a properly indented, delimited, trivially re-readable printout:
    >>> from pprint import pprint
    >>> pprint(list(arr))
    >>> angles
    <scitbx_array_family_flex_ext.vec3_double object at 0x407e35cc>
    >>> pprint(list(angles))
    [(0.0, 0.0, 0.0),
     (0.0, 15.0, 0.0),
     (0.0, 15.0, 71.799999999999997),
     (0.0, 90.0, 163.18181818181816),
     (0.0, 90.0, 179.49999999999997)]

This document was generated using AFT v5.095