Protocol Buffers DataReader Extensions for .NET

.NET, as a mostly-statically typed language, has a lot of really good options for serializing statically-typed objects. Protocol Buffers, MessagePack, JSON, BSON, XML, SOAP, and the BCL’s own proprietary binary serialization are all great for CLR objects, where the fields can be determined at runtime.

However, for data that is tabular in nature, there aren’t so many options. In my past two jobs I’ve had a need to serialize data:

  • That is tabular – not necessarily CLR DTOs.
  • Where the schema is unknown before it is deserialized – each data set can have totally different columns.
  • In a way that is streamable, so entire entire data sets do not have to be buffered in memory at once.
  • That can be as large as hundreds of thousands of rows/columns.
  • In a reasonably performant manner.
  • In a way that could potentially be read by different platforms.
  • Into as small a number of bytes as possible.

Protocol Buffers DataReader Extensions for .NET was born out of these needs. It’s powered by Marc Gravell’s excellent Google Protocol Buffers library, protobuf-net, and it packs data faster and smaller than the equivalent DataTable.Save/Write XML:

Usage is very easy. Serializing a data reader to a stream:

DataTable dt = ...;using (Stream stream = File.OpenWrite("C:foo.dat"))using (IDataReader reader = dt.CreateDataReader()){    DataSerializer.Serialize(stream, reader);}

Loading a data table from a stream:

DataTable dt = new DataTable();using (Stream stream = File.OpenRead("C:foo.dat"))using (IDataReader reader = DataSerializer.Deserialize(stream)){    dt.Load(reader);}

It works with IDataReaders, DataTables and DataSets (even nested DataTables). You can download the protobuf-net-data from NuGet, or grab the source from the GitHub project page.