Simple structured binary i/o in python

author:Alejandro López Correa
license:MIT License

A python module to simplify reading and writing structured binary files (or file-like objects, it is also useful for reading and writing network packages for example). This module is a convenience layer on top of standard python module "struct".
Tested with python 2.7 and 3.2

Quick example:
import binio
from binio.types import *
rgbColourIO = [
        (1, t_float, 'red'),
        (1, t_float, 'green'),
        (1, t_float, 'blue')
    ] )

rgb = { 'red':1.0, 'green':1.0, 'blue':1.0 }
f = open( 'rgb.bin', 'w+b' )
rgbColourIO.write_dict( f, rgb ) 0 )
rgb2 = rgbColourIO.read_dict( f )
assert rgb == rgb2

A more complex example:
import binio

# definition using DSL
object3DIO = """
            1 : uint32          : objectID
           32 : string          : objectName
            1 : uint32          : tpsz
         tpsz : string@utf8     : texturePath
            1 : uint32          : vCount
     vCount*3 : float           : vertexBuf
            1 : uint32          : tCount
     tCount*3 : uint32          : triangleBuf
    """ )

# generate a test file using dict
test = { 'objectID':0, 'objectName':'test', 'tpn':0, 'texturePath':''
       , 'vCount':1, 'vertexBuf':[1,2,3], 'tCount':1, 'triangleBuf':[1,1,1] }
object3DIO.write_dict( open( "a.mesh", "wb" ), test )

# read from a file
obj = object3DIO.read_struct( open( "a.mesh", "rb" ) )

# modify texture path
obj.texturePath = "textures/car.tga"
obj.tpn = len( obj.texturePath.encode( 'utf8' ) )

# store changes in another file
object3DIO.write_struct( open( "b.mesh", "wb" ), obj )

Definition format

A DSL is a "domain-specific language". This DSL is a text block that contains item definitions per row. Empty lines are ignored, as well as white-space.
Each item definition contains three values separated by colons: count expression, data type and item name.
The lines may contain python-style comments.

rgbIO = ( """
        # rgb structure, three 32 bit floats
        1: float    : red       # red value
        1: float    : green     # green value
        1: float    : blue      # blue value
    """ )

list of triplets

A list of tuples with three elements each: count expression, type and item name. Count expression can be either a integer >= 0 or a string with the expression.

rgbIO = ( [
        (         1, binio.types.t_uint32, 'pcount'),
        ('pcount*3', binio.types.t_float,  'positions')
    ] )

Count expression
This can be an integer or a simple arithmetic expression using previous integer item names, constant integers and the following operators: ()+-*/%.

Supported data types

DSL types
byte, int8, i8, uint8, u8, short, int16, i16, ushort, uint16, u16, int, int32, i32, uint, uint32, u32, int64, i64, uint64, u64, float, float32, float64, double, char, string, string@utf8, string@utf16

list of triplets types
The submodule binio.types contains the type objects. The names are the same as in DSL types but starting with a 't_' to avoid collisions with some default types. Also, string types with codecs replace the @ with a _.

byte --> t_byte
float --> t_float
string@utf8 --> string_utf8

In python 2 this is a string, in python 3 a byte array.

integer types
signed and unsigned integer types, for example: int8, i8, uint32, int64
supported sizes: 8, 16, 32, 64
special values: short (int16), ushort (uint16), int (int32), uint (uint32)

floating point types
float, float32: 32 bits
double, float64: 64 bits

text types
char, string: ascii-encoded string
string@utf8: utf8-encoded string
string@utf16: utf16-encoded string

Item names
The name must follow the naming rules for python variables, with the extra requirement that it must start with a letter. It will be a member of the binio object being created, initialised with an appropiate default value.

Byte order
The module contains the following byte order constants:
BYTE_ORDER_NATIVE (native to the machine running the code)
No alignment is applied.

Read and write methods accept an optional byte order value, with default value "BYTE_ORDER_NATIVE".

Public interface of the object returned by new()
    get_size_in_bytes() --> returns integer
    read( f, byteOrder=BYTE_ORDER_NATIVE )
    write( f, byteOrder=BYTE_ORDER_NATIVE ) --> returns bytes written
    read_struct( f, byteOrder=BYTE_ORDER_NATIVE ) --> returns object with items read as member vars
    write_struct( f, x, byteOrder=BYTE_ORDER_NATIVE ) --> returns bytes written
    read_dict( f, byteOrder=BYTE_ORDER_NATIVE ) --> returns dict with items read
    write_dict( f, x, byteOrder=BYTE_ORDER_NATIVE ) --> returns bytes written

The module contains a "new" function that accepts a definition string and returns a "codec" object that encapsulates the code to read and write the structure specified. The read/write methods expect a file-like object and an optional "byteOrder" parameter. In case of reading errors the file read cursor is restored (using tell() and seek() methods). There is also a clear() method that resets all data members to default values, and a get_size_in_bytes() method that returns size in bytes. This size might depend on actual data if count expressions are not constant.

The object can be reused. Calls to read() just replace existing data. For example, the same object may be used to read several records in sequence. It can be used to just write data as well.

While read() places data into the codec object itself, and write() gets data from that object, the methods read_struct() and write_struct() use external data taken from members of an object, and read_dict() and write_dict() use dicts.
read_struct() returns an object with data items as members, and methods clear() and get_size_in_bytes().
write_struct() extracts data item values from the given external object 'x'.
read_dict() returns data items in a dict.
write_dict() takes data item values from the given dict 'x'.

download / installation
Option 1: pip binio
Option 2: easy_install binio
Option 3: download source, unpack and execute python install
to-do list