This documentation builds on the scitbx-tour, adding some trivial functions and the wrappers needed for joint Python/C++ array access.
This tour includes
It also contains Python I/O examples to further illustrate the benefits of using a Python/C++ hybrid system.
The version array-family-tour-0_1 of voea contains the complete corresponding code.
Any module providing functionality for both C++ and Python consists of three parts: the main C++ files, the wrapper files, and a Python initializer for the wrapper.
Customization of the conversions are thus handled by, and separate from, the primary C++ code.
To accomodate a broad range of needs within C++, there are several ways of handling the same underlying array type. Briefly, the types involved are tiny, small, shared, and versa.
All types have an array interface following std::vector
as much as possible, e.g.
.begin(), .end(), .size(), []
but there are tradeoffs between efficiency (space and time) and flexibility.
Because the interface follows std::vector
, refer to documentation of the latter when in doubt.
No use of new for allocation of any of these -- they handle their own memory. All exceptions (size violations, memory problems, etc.) propagate to Python. [] is available for all types.
tiny
scitbx::af::tiny
.tiny<double, 3> a;
a[1];
a[3]; // Error, but no bounds check, no exceptions. Fast.
tiny<double, 3> a(1,2,3); // Up to ~10 args.
small
scitbx::af::small
.
shared
std::vector
, but copies of shared are by reference, with count. Full name scitbx::af::shared
versa
scitbx::af::versa
. Shared versa arrays resizing is subtle, see the example below. Accessors can be customized for special needs, such as the padding needed for FFTs. Padding is handled by c_grid_padded in scitbx/include/scitbx/array_family/accessors.// C indexing, homogeneous data.
// Initial 11 by 22 array
versa<float, c_grid<2> > va(c_grid<2>(11,22));
// Same memory, now 2 by 5 array.
va.resize( c_grid<2>(2,array) );
Using
namespace array = scitbx::af;
for clarity, the easiest to use (and slowest) array type is
array::shared< type >
The following forms a sequence of doubles from m
to n
.
array::shared
range_inefficient(int m, int n)
{
array::shared<double> result;
for(unsigned i = m; i < n; i++) {
result.push_back( static_cast<double>(i) );
}
return result;
}
Here, the size of the array is known; by pre-allocating the space for the array, some work is saved:
array::shared
range_better(int m, int n)
{
array::shared<double> result;
result.reserve(n - m); /* Avoid reallocation for known size. */
for (unsigned i = m; i < n; i++) {
result.push_back( static_cast<double>(i) );
}
return result;
}
Internally, there is still much checking going on. For most arrays, random access is desirable, so the indexing operator []
can be used. Also, initializing memory can take time and is not needed here, so the fastest array::shared
version is
array::shared<double>
range_fast(int m, int n)
{
/* Use correct initial size, do not initialize memory. */
array::shared<double> result(n - m, array::init_functor_null<double>());
for (unsigned i = m; i < n; i++) {
/* Direct memory access */
result[i] = static_cast<double>(i);
}
return result;
}
The use of array::initfunctornull<double>()
is needed to convince C++ not to initialize the memory.
The fastest access for heap-based arrays is offered via the ref
and const_ref
types.
scitbx::af::ref.
scitbx::af::const_ref
The automatic types (shared, versa) still dereference memory on every access -- this is needed for cooperation between multiple "owners" of the array. For the most direct, C-like access, the range example can be done as follows
array::shared<double>
range_fastest(int m, int n)
{
array::shared<double> result(n - m, array::init_functor_null<double>());
/* Local raw ref. */
array::ref<double> R = result.ref();
for (unsigned i = m; i < n; i++) {
R[i] = static_cast<double>(i);
}
return result;
}
Now, the array starting memory is dereferenced only once.
The two C++/Python array conversion mechanisms are
Convenience operators are available in Python for both types.
For small objects, the content is used in forming a Python tuple or list, independent of the original. For larger objects, a handle to the original is made available in Python.
C++ produced return values correspond 1-1 to Python types; values passed from Python to C++ are many-to-1.
In pass-by-value, conversion to Python is always as tuple, while conversion to C++ is done for tuples, lists, ranges, and iterators -- most sequence types.
The tiny and small types use pass-by-value. C++ instances of these types do not exist as such in Python. They are all converted to a tuple, losing their type. E.g., the C++
tiny<double, 3> a(1,2,3); // Up to ~10 args.
becomes
(1,2,3)
in Python; the type tiny
is lost. Generally,
tiny<int, 3> a;
becomes a tuple
(int, int, int)
in python.
In the other direction, the C++ function signature determines the tuple/list conversion's target type. E.g., given
foo(tiny<int, 3> x)
the list 1,2,3
is converted to a new
tiny<int, 3>(1,2,3)
instance. Given
foo(small<float, 2> x)
the list 2,3
is converted to a new
small<float, 2>(2,3)
The trivial function
int
tiny_to_cpp(array::tiny<int, 2> t)
{
return (t[0] + t[1]);
}
has an equally simple wrapper
def("tiny_to_cpp", voea::tiny_to_cpp );
which, along with other wrappers for the module, is embedded in boilerplate code like
BOOST_PYTHON_MODULE(voea_ext)
{
using namespace boost::python;
...
def("tiny_to_cpp", voea::tiny_to_cpp );
...
}
On the Python side,
print voea.tiny_to_cpp( [1,2] )
then gives 3
.
The previous examples involve new functions using types already wrapped in the array toolbox.
When introducing new types, say
array::tiny<float, 2>
new wrappers may be needed. This type is likely to be useful to many users, and introducing multiple wrappers for a single type introduces problems.
While the global boost::python registry can be queried to avoid overlaps/contradictions as needed, it is simpler and cleaner to centralize the wrapping of native types -- all modules will have consistent access to them.
Here are the steps needed to add a new type to the central registry.
As these wrappers are new, they can reside anywhere; for now, they are defined locally. Later, the main package owner (Ralf) can/may include them as needed for conflict resolution.
Following
$SCITBX_DIST/array_family/boost_python/flex_ext.cpp
the extension module, voea_ext.cpp
additionally needs
#include <scitbx/boostpython/containerconversions.h>
and
scitbx::boost_python::container_conversions::tuple_mapping_fixed_size<
scitbx::af::tiny<float, 2>
>();
This makes the type array::tiny<float, 2>
available for use via Python's tuple-to-tiny conversion.
Overloaded operators and functions are very common. To continue the previous sample, we want the function tinytocpp to also handle floats. Thus, in addition to the previous
int tiny_to_cpp(array::tiny<int, 2> t)
we define
float
tiny_to_cpp(array::tiny<float, 2> t)
{
return (t[0] + t[1]);
}
for C++. The new tiny<float, 2>
type is already wrapped. The wrappers for this pair of function need to be distinguished, and include full type casts. Thus, instead of
def("tiny_to_cpp", voea::tiny_to_cpp );
they are now
def("tiny_to_cpp",
(float (&)(array::tiny<float, 2>) ) voea::tiny_to_cpp );
def("tiny_to_cpp",
(int (&)(array::tiny<int, 2>) ) voea::tiny_to_cpp );
With these, the Python session becomes
>>> print voea.tiny_to_cpp( [1,2] )
3
>>> print voea.tiny_to_cpp( [1.1, 2] )
3.10000002384
In pass-by-reference, exchanges in both directions use a wrapped C++ object.
The pass-by-reference types are shared and versa. However, these do not show up in Python directly. Instead, a C++ wrapper module, flex, is used to interface to these. E.g., an instance of the C++ type
array::shared< scitbx::vec3<double> >
shows up as
<scitbx_array_family_flex_ext.vec3_double object>
in Python.
The Python interface strongly resembles that of the Python list type; see list examples and documentation for details.
On the C++ side, flex is a module only; there is no C++ type flex. Only the file
scitbx_array_family_flex_ext.cpp
exists; there is no scitbx::array_family::flex namespace or class. All main C++ code thus uses shared and versa, never flex.
On the Python side, there is only
scitbx.array_family.flex
The types shared and versa do not show up at all, so all main Python code uses the flex interface (and ignored the underlying versa).
To provide the type
flex.double
in Python from an array of doubles in C++, the flex module defines a Python type double as a wrapper of the C++ type
versa<double, flex_grid<> >
Hence the naming flex::double
, flex::vec3_double
etc.
The scitbx.array_family.flex Python module handles specific versa template instantiations.
Internal python array type is
scitbx::af::versa< ELEMENTTYPE, flex_grid< af::small<INDEXTYPE,10> > >
where
INDEXTYPE is usually long
Python instances of
scitbx::af::versa<double>
can be used by C++ functions with signatures
scitbx::af::shared<double>
In C++, this causes interference between boost, inheritance, and the (boost) flex extension module.
Python's accessors are ALL in the flex module. This means using
typedef scitbx::af::versa< scitbx::af::tiny<double, 3 > > euler_angles;
in C++ is fine, but the simple boost wrapper
class_<voea::euler_angles > ("euler_angles");
will NOT HAVE ANY of the af::flex
accessors available in Python. However, the wrapper mechanism in flex_wrapper.h
creates the boost python wrapper for a particular completed versa type: a chosen element type, and using flex_grid
as accessor.
The previously written
array::shared<double>
range_inefficient(int m, int n)
{
array::shared<double> result;
for(unsigned i = m; i < n; i++) {
result.push_back( static_cast<double>(i) );
}
return result;
}
uses already wrapped types; only the binding
def("range_inefficient", voea::range_inefficient);
is needed.
Basic access as usual:
arr = voea.range_inefficient(2,10)
arr
<scitbx_array_family_flex_ext.double object at 0x4016e86c>
By itself, this may not justify the connection effort. But some extra functionality, obtained without repeated programmer effort, does. First, there is direct conversion to Python types which makes vast functionality available immediately. Consider the printing problem. Instead of starting with
for i in arr:
print i
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
and improving this in innumerable ways (pretty-printing, using a re-readable format, etc.), one can simply use
list(arr)
[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
which could later be re-used via
new_arr = eval('[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]')
new_arr
[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
if the string was saved in a file.
Of course, data serialization is equally short:
ps = pickle.dumps(arr)
unpickled_arr = pickle.loads(ps)
This is also used for writing to and reading from files in binary form. Then, there is Matlab style slicing:
print list(unpickled_arr[1:5])
print list(unpickled_arr[1:5:2])
For complex structures, re-using existing facilities is mandatory; as a preview, the call
angles = voea.py(15.0,
[0.0, 90.0],
[0.0, 359.0])
produces a vector of triples. Now, the single command
pprint(list(angles))
provides a properly indented, delimited, trivially re-readable printout:
from pprint import pprint
pprint(list(arr))
angles
<scitbx_array_family_flex_ext.vec3_double object at 0x407e35cc>
pprint(list(angles))
[(0.0, 0.0, 0.0),
(0.0, 15.0, 0.0),
(0.0, 15.0, 71.799999999999997),
...
(0.0, 90.0, 163.18181818181816),
(0.0, 90.0, 179.49999999999997)]