separated advanced features from core functionality

This commit is contained in:
fraillt
2017-09-19 16:10:49 +03:00
parent ad7090539e
commit f0508025f6
36 changed files with 1194 additions and 964 deletions

View File

@@ -7,22 +7,25 @@ Library design:
* `serializer/deserializer functions overloads`
* `extending library functionality`
* `errors handling`
* `forward/backward compatibility via growable`
* `forward/backward compatibility via Growable extension`
Core Serializer/Deserializer functions (alphabetical order):
* `align`
* `boolByte`
* `boolBit`
* `container`
* `extend`
* `getContext`
* `object`
* `text`
* `value`
Advanced Serializer/Deserializer functions (alphabetical order):
* `align`
* `boolBit`
* `entropy`
* `extend`
* `growable`
* `range`
Serializer/Deserializer extensions via `extend` method (alphabetical order):
* `ContainerMap`
* `Entropy`
* `Growable`
* `Optional`
* `ValueRange`
BasicBufferWriter/Reader functions:
* `writeBits/readBits`
@@ -35,14 +38,12 @@ BasicBufferWriter/Reader functions:
* `getError (reader only)`
* `isCompletedSuccessfully (reader only)`
Tips and tricks:
* if you're getting static assert "please define 'serialize' function", most likely it is because your SERIALIZE function is not defined in same namespace as object.
Limitations:
* max **text** or **container** size can be 2^(n-2) (where n = sizeof(std::size_t) * 8) for 32-bit systems it is 1073741823 (0x3FFFFFF).
* when using **growable** serialized buffer cannot be greater than 2^(n-2) (where n = sizeof(std::size_t) * 8).
* when using **Growable** extension, serialized buffer size in bytes, cannot be greater than 2^(n-2) (where n = sizeof(std::size_t) * 8).
Other:
* [Contributing](../CONTRIBUTING.md)

View File

@@ -1,28 +1,70 @@
## Motivation
Inspiration to create **bitsery** came mainly because there aren't any good alternatives for C++.
I wanted serializer that is easy to use like [cereal](http://uscilab.github.io/cereal/)
Most well-known serialization libraries are *too fat* and tries to solve too many things by supporting multiple data formats (binary, json, xml) and multiple languages (C++, C#, Javascript, etc..) while in the process becomes hard to use, are memory or/and speed inefficient.
The best alternative that I was able to find is [flatbuffers](https://google.github.io/flatbuffers/).
It is fast, memory efficient, and [comparing with other alternatives](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html) looks like *de facto* choice for games.
While Flatbuffers is designed with multiple programming languages support, bitsery is designed specifically for C++.
I wanted serializer that is easy to use like [cereal](http://uscilab.github.io/cereal/), is cross-platform compatible, and has support for forward/backward compatibility like [flatbuffers](https://google.github.io/flatbuffers/), is save to use with untrusted (malicious) data, and most importantly is fast and has small binary footprint.
Furthermore I wanted full serialization control and ability to work on bit level, so I can further reduce data size. For example, serializing container of [quaternions](https://en.wikipedia.org/wiki/Quaternion) I can reduce size by large amount. *Size of orientation quaternion can be reduced from 128bits (4floats) down to 29bits using "smallest three" technique and still retaining decent precision*.
Most well-known serialization libraries sacrifice memory and speed efficiency by supporting multiple data formats (binary, json, xml) and multiple languages (C++, C#, Javascript, etc..), these features also adds additional library complexity.
## A word about JSON
People use C++ because they want speed and memory efficiency, and JSON is not on the list of efficient serialization format.
Often times people use C++ because they want speed and memory efficiency, and JSON is not on the list of efficient serialization format.
Although JSON is very readable and very convenient when used together with dynamically typed languages (such as JavaScript).
When serializing data from statically typed languages, however, JSON not only has the obvious drawback of runtime inefficiency, but also forces you to write more code to access data (counterintuitively) due to its dynamic-typing serialization system.
It's also a text format,- human readable, but space inefficient.
Adding optional support for JSON doesn't come for free either.
When there is no multi-language support, we no longer require IDL(interface definition language) to define schemas so we could have consistent interface across multiple languages.
When we no longer have code generation, it becomes imposibble to support JSON *for free* without defining additional metadata, because C++ doesn't have a reflection system (although static reflection was proposed to standard recently).
To support JSON, additional metadata is required.
In C++ it can be achieved by two ways:
* with macros, that generate additional types like [cereal](http://uscilab.github.io/cereal/) does.
* with code generation from IDL (interface definition language) like [flatbuffers](https://google.github.io/flatbuffers/) does.
So, to avoid unnecessary library complexity it is best to forget about JSON, and stick with what machines and C++ is good at,- binary format.
Both solutions adds additional complexity to the library. In the future C++ will get reflection system, currently static reflection was proposed to standard.
Bitsery is a result of what you can get, when you sacrifice multi-language support and JSON format, but take other *goodies*.
So, to avoid unnecessary library complexity it is best to forget about JSON, and stick with what machines and C++ is good at, - binary format.
# Bitsery
*todo*
Bitsery is designed to be lightweight and simple to use, yet powerful and extendable library.
To ensure it works as intended it is unit tested, and has 100% code coverage.
Now let's review features in more detail.
* **Cross-platform compatible.** if same code compiles on Android, PS3 console, and your PC either x64 or x86 architecture, you are 100% sure it works.
To achieve this, bitsery specifically defines size of underlying data, hence syntax is *value\<2\>* (alias function *value2b*) instead or *value*, or *container2b* for element type of 16bits, eg int16_t.
Bitsery also applies endianess transformation if nessesarry.
**If** however, you don't like this verbose syntax, you can just write *serialize* functions for fundamental types, and forget about *value\<N\>*, *container\<N\>*, etc.
But do it on your own risk, or write static asserts.
* **Optimized for speed and space.** library itself doesn't do any allocations (except if you use backward/forward compatibility) so data writing/reading is fast as memcpy to/from your buffer.
It also doesn't serialize any type information, all information needed is writen in your code!
* **No code generation required: no IDL or metadata** since it doesn't support any other formats except binary, it doesn't need any metadata.
* **Runtime error checking on deserialization** library designed to be save with untrusted network data, that's why all overloads that work on containers has *maxSize* value, unless container is static size like *std::array*, this way bitsery ensures that no malicious data will not crash you.
* **Supports forward/backward compatibility for your types** library has optional forward/backward compatibility for types implemented in *BasicBufferReader/BasicBufferWriter* by allowing to have inner data sessions in inside buffer.
This is the only functionality that requires dynamic memory allocation.
*Glowable* extension use these sessions to add compatibility support for your types, in most basic form.
You can implement your own extensions if you want to be able to add default values.
* **2-in-1 declarative control flow, same code for serialization and deserialization.** only one function to define, for serialization and deserialization in same manner as *cereal* does.
It might be handy to have separate *load*, *save* functions, but Bitsery explicitly doesn't support it, to avoid any serialization deserialization path differences, because it is very hard to catch an errors if you make a bug in one of these functions.
The only way around this through extensions, write your custom flow once, and reuse where you need them.
* **Allows fine-grained serialization control** this is a feature that no other libraries provides.
Bitsery allows to use bit-level operations and has two extensions that use them:
* *ValueRange*,- if you have a *int16_t* data type, but you know that your object only stores values in \[0..1000\] range, it will write 10bits, instead of 16bits. ValueRanges also works with floats.
* *Entropy*,- full term is *entropy encoding*, which means that when you have most common value, or multiple values, it will write just few bits instead of full object.
Eg.: imagine that you have a struct Person{ int32_t Id; string Profession; }.
You know that mostly there are young persons, so the most common value will be equal to: "Student", "Child", "NoProfession", in this case you'll pay 2bits for each record, but write no data if string matches.
Using these bit-level operations and extensions you can compose your own extensions for vectors, matrices or any other types.
Further more, all other operations will not align data automatically for you, so data will be compressed as much as possible.
One more advanced and dangerous feature, is ability to have serialization context, so you can control your serialization flow at runtime, but make sure that these contexts are in sync between serializer and deserializer.
One possible use case for serialization context is to pass min/max ranges for *ValueRange* when your information changes at runtime.
* **Easily extendable** library is designed to be easily extendable for any type and flow.
You want to support your custom container, its fine there is *ContainerTraits* for this, only few methods required to implement.
To use same container for buffer writing/reading add specialization to *BufferContainerTraits*.
You want to customize serialization flow - use extensions, only two methods to define, and *ExtensionTraits* to further customize usage.
* **Configurable endianess support.** default is *Little Endian*, but if your primary target is PowerPC architecture, eg. PlayStation3, just change your configuration to be *Big Endian*.
* **No macros.** Not so much to say, if you are like me, then it's a feature :)
*project for performance benchmark will be added to separate github project, i'll give you a link to it when its done.*

View File

@@ -1 +1 @@
int char (except bool)
ints chars floats (except bool)

View File

@@ -1,14 +1,8 @@
The grand plan for this tutorial is to learn how to serialize/deserialize any object efficiently in time and space, so you could focus on other, more interesting things.
This tutorial will cover these main topics:
* [Hello World](hello_world.md) serialize a simple struct.
* [2 in 1](two_in_one.md) write one control flow for both: serialization and deserialization.
* [Composer](composition.md) efficiently compose complex serialization flows.
* [Squeeze Me!](compression.md) compress your data when you know what it stores.
* [Anything is Possible](extensions.md) extend library for custom container, compress geometry and more.
* [Little or Big](endianness.md) change endianness if you want best performance on PowerPC.
In order to successfully use the library you need c++14 compatible compiler. In theory you could also use c++11 compatible compiler, but c++14 generic lambdas really change the way you can work with this library, so all tutorial sections will asume that you use c++14 compatible compiler.
So without further ado lets start with [hello world](hello_world.md).
* `Hello World` write one control flow for both: serialization and deserialization.
* `Composer` efficiently compose complex serialization flows.
* `Squeeze Me!` compress your data when you know what it stores.
* `Anything is Possible` extend library for custom container, compress geometry and more.
* `Little or Big` change endianness if you want best performance on PowerPC.

View File

@@ -1,3 +1,3 @@
*document in progress*
* explain why *value* and *object* is fundamental functions.
* write about **growable** and **customize**
* write about *Growable* extension

View File

@@ -1,167 +1,128 @@
# The problem
# Quick Start
You want to serialize *Player* structure efficiently into buffer.
This is a quick guide to get **bitsery** up and running in a matter of minutes.
The only prerequisite for running bitsery is a modern C++11 compliant compiler, such as GCC 4.9.4, clang 3.4, MSVC 2015, or newer.
Older versions might work, but it is not tested.
## Get bitsery
bitsery can be directly included in your project or installed anywhere you can access header files.
Grab the latest version, and include directory `bitsery_base_dir/include/` to your project.
There's nothing to build or make - **bitsery** is header only.
## Add serialization methods for your types
**bitsery** needs to know which data members to serialize in your classes.
Let it know by implementing a serialize method for your type:
```cpp
struct Vector3f {
float x;
float y;
float z;
};
struct Player {
Vector3f pos;
char name[50];
struct MyStruct {
uint32_t i;
char str[6];
std::vector<float> fs;
};
template <typename S>
void serialize(S& s, MyStruct& o) {
s.value4b(o.i);
s.text1b(o.str);
s.container4b(o.fs, 100);
};
```
# Poor man's implementation
**bitsery** also can serialize private class members, just move *serialize* function inside structure, and make it *friend* (*fiend void serialize(.....)*).
Since you don't want to waste any space using any text serialization library like json or xml, the one of the most easiest and obvious solution is to simply write memory representation of the structure directly to buffer.
**bitsery** has verbose syntax, because it is cross-platform compatible by default and has full control over how to serialize data (read more about it in [motivation](../design/README.md))
This example contains core functionality that you'll use all the time, so lets get through it:
* **s.value4b(o.i);** serialize fundamental types (ints, floats, enums) value**4b** means, that data type is 4 bytes. If you use same code on different machines, if it compiles it means it is compatible.
* **s.text1b(o.str);** serialize text (null-terminated) of char type, if you use *wchar* then you would write *text2b*.
* **s.container4b(o.fs, 100);** serializes any container of fundamental types of size 4bytes, **100** is max size of container.
**Bitsery** is designed to be save with untrusted (malicious) data from network, so for dynamic containers you always need to provide max possible size available, to avoid buffer-overflow attacks.
**text** didn't had this max size specified, because it was serializing fixed size container.
External serialization functions should be placed either in the same namespace as the types they serialize or in the **bitsery** namespace so that the compiler can find them properly.
## Serialization and deserialization
### Create serializer
Create a serializer and send the data you want to serialize to it.
Possible implementation could look like this.
```cpp
#include <iostream>
#include <vector>
#include <cassert>
#include <cstdint>
#include <algorithm>
struct Vector3f {
float x;
float y;
float z;
};
struct Player {
Vector3f pos;
char name[50];
};
void serialize(std::vector<uint8_t>& buf, const Player& data) {
auto ptr = reinterpret_cast<const uint8_t*>(&data);
std::copy_n(ptr, sizeof(data), std::back_inserter(buf));
}
int main() {
Player data;
std::vector<uint8_t> buf;
serialize(buf, data);
assert(buf.size() == sizeof(data));
std::cout << "size: " << buf.size() << std::endl;
return 0;
}
std::vector<uint8_t> buffer;
BufferWriter bw{buffer};
Serializer ser{bw};
```
```bash
size: 64
```
Serialization process consists of three independant parts.
* **std::vector<uint8_t> buffer;** core object, that will store the data for serialization and deserialization.
* **BufferWriter bw{buffer};** writer knows how to write bytes to buffer, and how to resize buffer, or how to use fixed-size buffer. It also applies endianess transformations if nesessary.
* **Serializer ser{bw};** serializer is a high level wrapper that knows how to convert object to stream of bytes, and write then to buffer.
Although it is simple and fast (it could be faster if we reserve buffer before writing) it has a lot of limitations.
* char[50] always writes to atleast 50 bytes in buffer, even if player name is *Yolo*.
* you can't replace char[50] with std::string.
* you can't use this solution if you need to support different endianness systems, and you should be extra careful if different systems has different size for fundamental types like int, long int, etc...
* you pay for structure field alignment hence size is equal to 64, not 62(3*4+50).
Serializer doesn't store any state, it only has reference to buffer, so it is safe to create many of those if nesessary.
You can improve your name serialization in various ways, but then your serialization and deserialization code gets compllicated and error prone. We can do better than this.
BufferWriter also doesn't own buffer, but it stores state about writing position and container size.
# Bitsery solution
One important note that when using bit-level operations, dont forget to flush buffer writer **bw.flush()** otherwise, some data might not be written to buffer.
### Serialize object
```cpp
MyStruct data{8941, "hello", {15.0f, -8.5f, 0.045f}};
ser.object(data); // serializes data
```
**ser.object(data)** is a final core function along with **value, text, container**.
This function is actually equivalent to calling *serialize(ser, data)* directly, but it displays friendly static assert message if it cannot find *serialize* function for your type.
### Deserialize object
```cpp
BufferReader br{bw.getWrittenRange()};
Deserializer des{br};
MyStruct res{};
des.object(res); //deserializes data
```
Deserialization process is equivalent to serialization, except that *BufferReader* reader has getError() method that returns deserialization state.
## Full example code
Let's solve the same problem with the library.
```cpp
#include <vector>
#include <bitsery/bitsery.h>
#include <cstring>
#include <iostream>
struct Vector3f {
float x;
float y;
float z;
};
struct Player {
Vector3f pos;
char name[50];
};
using namespace bitsery;
void serialize(Serializer<BufferWriter>& s, const Player& data) {
s.value4b(data.pos.x);
s.value4b(data.pos.y);
s.value4b(data.pos.z);
s.text1b(data.name);
}
struct MyStruct {
uint32_t i;
char str[6];
std::vector<float> fs;
};
template <typename S>
void serialize(S& s, MyStruct& o) {
s.value4b(o.i);
s.text1b(o.str);
s.container4b(o.fs, 100);
};
int main() {
Player data;
strcpy(data.name,"Yolo");
std::vector<uint8_t> buffer;
BufferWriter bw{buffer};
Serializer ser{bw};
std::vector<uint8_t> buf;
BufferWriter bw{buf};
Serializer<BufferWriter> ser{bw};
MyStruct data{8941, "hello", {15.0f, -8.5f, 0.045f}};
ser.object(data); // serializes data
serialize(ser, data);
BufferReader br{bw.getWrittenRange()};
Deserializer des{br};
bw.flush();
auto range = bw.getWrittenRange();
std::cout << "size: " << std::distance(range.begin(), range.end()) << std::endl;
return 0;
MyStruct res{};
des.object(res); //deserializes data
}
```
```bash
size: 17
```
First of all, buffer size dropped from 64 down to 17bytes: 12 bytes (3*4) for floats and only 5bytes for the name "Yolo".
In the process you also lost all limitations that had naive solution. You even gain some features for free:
* endianess support.
* more readable serialization code.
Let's look at the code, how we did this.
There are three distinct parts that participate in serialization process.
* Buffer - container that we store our serialized data, in our case vector<uint8_t>.
* BufferWriter - resposible for writing bytes and bits to *Buffer*, it also makes sure that it is portable across Little and Big endian systems.
* Serializer - extendable interface that converts any type to bytes or bits, and use *BufferWriter* to write them. Serializer object does not store any state, it only forwards all calls to BufferWritter, further more it ensures that code is portable at compile time. This means, that if your serialization code compiles on other platform, it will be 100% correct.
```cpp
std::vector<uint8_t> buf;
BufferWriter bw{buf};
Serializer<BufferWriter> ser{bw};
```
Serialization function is very readable, and explicitly express intent what and how to serialize:
* *value4b* serialize [fundamental type](../design/fundamental_types.md) (ints, floats, chars, enums) of 4 bytes.
* *text1b* effectively serialize text, and underlying text type is 1byte per letter.
```cpp
s.value4b(data.pos.x);
s.value4b(data.pos.y);
s.value4b(data.pos.z);
s.text1b(data.name);
```
> learn more about why you need to write [value4b instead of value](../design/function_n.md).
Finally, before getting serialized data you must *flush* BufferWriter, it writes any remaining bits to buffer and additional data for types that require forward/backward compatibility. In our case it is not required, because we only worked with whole bytes, but it is good practice to always use it after finishing serialization.
To actually get written data you must call *getWrittenRange*, it return begin/end iterators to our buffer (*std::vector<uint8_t> buf*), for performance reasons BufferWritter always resizes underlying buffer to *capacity* so it could use containers iterator to update data, instead of back_insert_iterator to insert data.
```cpp
bw.flush();
auto range = bw.getWrittenRange();
```
# Summary
You have learned how to serialize simple structure to buffer that occupies no unnecessary bytes, and is portable across any system. You also learned that serialization process consist of three independant parts: buffer, buffer writer, and serializer. You used serializer to explicitly declare what and how to serialize and learned that you should always call BufferWriter.flush() before using buffer data.
In [next chapter](two_in_one.md) you'll learn how to use this expressive, declarative serialization function and use it to deserialize buffer to object, and in the process gain runtime error checking for free!
**currently documentation and tutorial is progress, but for more usage examples see examples folder**

View File

@@ -1,138 +0,0 @@
# The Problem
Deserialization process is the same to serialization in a sense, that all serialization/deserialization operations is in the same order, except that instead of writing to buffer you read from it, so it is very desirable to have the same code express both functionality, but is it really possible? Let's find out!
To achieve this *Deserializer* has exactly the same interface as *Serializer*, EXCEPT that all methods in *Deserializer* accept data as *T&*, but *Serializer* accepts as *const T&*.
So one way to make this happen is to have *Serializer/Deserializer* as template parameter, and actual object accept as *T&* like this.
```cpp
template <typename S>
void serialize(S& s, Player& o) {
s.value4b(o.pos.x);
s.value4b(o.pos.y);
s.value4b(o.pos.z);
s.text1b(o.name);
}
```
You can use this function for serialization and deserialization, but you can`t pass *const T&*, which is huge limitation.
# Bitsery solution
In order to fix this *const T&* issue, all we need to do is use [SFINAE](http://en.cppreference.com/w/cpp/language/sfinae) technique to enable this function if T is *Object* or *const Object*, like this:
```cpp
template <typename S, typename T, typename std::enable_if<std::is_same<T, Player>::value || std::is_same<T, const Player>::value>::type* = nullptr>
void serialize (S& s, T& o) {
...
}
```
Let's modify our [hello world](hello_world.md) example and add deserialization to it.
```cpp
#include <vector>
#include <bitsery/bitsery.h>
#include <cstring>
#include <iostream>
struct Vector3f {
float x;
float y;
float z;
bool operator == (const Vector3f& o) const {
return x == o.x && y == o.y && z == o.z;
}
};
struct Player {
Vector3f pos;
char name[50];
};
using namespace bitsery;
SERIALIZE(Player) {
s.value4b(o.pos.x);
s.value4b(o.pos.y);
s.value4b(o.pos.z);
s.text1b(o.name);
}
Player createData() {
Player data;
data.pos.x = 0.45f;
data.pos.y = 50.9f;
data.pos.z = -15687.87f;
strcpy(data.name,"Yolo");
return data;
}
int main() {
const Player data = createData();
Player res{};
std::vector<uint8_t> buf;
BufferWriter bw{buf};
Serializer<BufferWriter> ser{bw};
serialize(ser, data);
bw.flush();
BufferReader br{bw.getWrittenRange()};
Deserializer<BufferReader> des{br};
serialize(des, res);
std::cout << "buffer completed successfully: " << br.isCompletedSuccessfully() << std::endl
<< "pos equals: " << (res.pos == data.pos) << std::endl
<< "name equals: " << (strcmp(res.name, data.name) == 0);
return 0;
}
```
```bash
buffer completed successfully: 1
pos equals: 1
name equals: 1
```
We created *Deserializer* and modified *serialize* function to accept *Serializer* and *Deserializer*.
Deserialization is very similar as serialization, it also consists of three separate components:
* Buffer - container that we read data from, in our case *vector<uint8_t>*.
* BufferReader - reads bytes and bits from *Buffer*, it makes sure that it is portable across Little and Big endian systems and also checks for errors at runtime, because data might come from untrusted source and can terminate program with buffer-overflow or segmentation fault if we are not careful.
* Deserializer - same interface as *Serializer* but forward all data to *BufferReader* to read bits and bytes.
Since deserialization involves error checking there are two additional functions to check if everything is correct after deserialization.
* [BufferReader.isCompletedSuccessfully()](../reference/buf_is_completed_successfully.md) - returns true, if whole buffer was read during deserialization and no errors was found.
* [BufferReader.getError()](../reference/buf_get_error.md) - returns current buffer reader state. Useful when buffer contains more than one object, and you want to check each objects deserialization state separately.
One thing to note about BufferReader is that it doesn't have constructor that accepts buffer directly. Instead it only accepts begin/end iterators, because it needs to know precise data buffer length, to correctly use *isCompleteSuccessfully* function.
```cpp
BufferReader br{bw.getWrittenRange()};
Deserializer<BufferReader> des{br};
```
To reduce code for *serialize* function using *SFINAE* technique, **bitsery** has macro *SERIALIZE*. Using this macro code looks much cleaner, and now this function can accept both *Player* and *const Player*.
```cpp
SERIALIZE(Player) {
s.value4(o.pos.x);
s.value4(o.pos.y);
s.value4(o.pos.z);
s.text1(o.name);
}
...
serialize(ser, data); //ser-> Serializer, data-> const Player
...
serialize(des, res); //des-> Deserializer, data-> Player
```
# Summary
You have learned how to write *serialize* function for your type, that works with serialization and deserialization. You also learned that deserialization is very similar to serialization, but has runtime error checking.
In [next chapter](composition.md) you'll learn how to compose complex serialization/deserialization flows efficiently.