Serialization Overview

Introduction

NimYAML tries hard to make transforming YAML characters streams to native Nim types and vice versa as easy as possible. In simple scenarios, you might not need anything else than the two procs dumping: dump and loading: load. On the other side, the process should be as customizable as possible to allow the user to tightly control how the generated YAML character stream will look and how a YAML character stream is interpreted.

An important thing to remember in NimYAML is that unlike in interpreted languages like Ruby, Nim cannot load a YAML character stream without knowing the resulting type beforehand. For example, if you want to load this piece of YAML:

%YAML 1.2
--- !nim:system:seq(nim:system:int8)
- 1
- 2
- 3

You would need to know that it will load a seq[int8] at compile time. This is not really a problem because without knowing which type you will load, you cannot do anything useful with the result afterwards in the code. But it may be unfamiliar for programmers who are used to the YAML libraries of Python or Ruby.

Supported Types

NimYAML supports a growing number of types of Nim's system module and standard library, and it also supports user-defined object, tuple and enum types out of the box. A complete list of explicitly supported types is available in Schema.

Important: NimYAML currently does not support polymorphism. This may be added in the future.

This also means that NimYAML is generally able to work with object, tuple and enum types defined in the standard library or a third-party library without further configuration, assuming that all fields of the object are accessible at the code point where NimYAML's facilities are invoked.

Scalar Types

The following integer types are supported by NimYAML: int, int8, int16, int32, int64, uint8, uint16, uint32, uint64. Note that the int type has a variable size dependent on the target operation system. To make sure that it round-trips properly between 32-bit and 64-bit operating systems, it will be converted to an int32 during loading and dumping. This will raise an exception for values outside of the range int32.low .. int32.high! If you define the types you serialize yourself, always consider using an integer type with explicit length. The same goes for uint.

The floating point types float, float32 and float64 are also supported. There is currently no problem with float, because it is always a float64.

string is supported and one of the few Nim types which directly map to a standard YAML type. NimYAML is able to handle strings that are nil, they will be serialized with the special tag !nim:nil:string. char is also supported.

To support new scalar types, you must implement the constructObject() and representObject() procs on that type (see below).

Container Types

NimYAML supports Nim's array, set, seq, Table, OrderedTable and Option types out of the box. While YAML's standard types !!seq and !!map allow arbitrarily typed content, in Nim the contained type must be known at compile time. Therefore, Nim cannot load !!seq and !!map.

However, it doesn't need to. For example, if you have a YAML file like this:

%YAML 1.2
---
- 1
- 2

You can simply load it into a seq[int]. If your YAML file contains differently typed values in the same collection, you can use an implicit variant object, see below.

A special case is Option[T]: This type will either contain a value or not. NimYAML maps !!null YAML scalars to the option's none(T) value. This also works for ref types because Option for those types will use nil as its none(T) value.

By default, Option fields must be given even if they are none(T). You can circumvent this by putting the annotation {.sparse.} on the type containing the Option field.

Reference Types

A reference to any supported non-reference type (including user defined types, see below) is supported by NimYAML. A reference type will be treated like its base type, but NimYAML is able to detect multiple references to the same object and dump the structure properly with anchors and aliases in place. It is possible to dump and load cyclic data structures without further configuration. It is possible for reference types to hold a nil value, which will be mapped to the !!null YAML scalar type.

ptr types are not supported because it seems dangerous to automatically allocate memory which the user must then manually deallocate.

Anchors and aliases are not supported when calling NimYAML at compile time.

User Defined Types

For an object or tuple type to be directly usable with NimYAML, the following conditions must be met:

Every type contained in the object/tuple must be supported
All fields of an object type must be accessible from the code position where you call NimYAML. If an object has non-public member fields, it can only be processed in the module where it is defined.
The object must not have a generic parameter

NimYAML will present enum types as YAML scalars, and tuple and object types as YAML mappings. Some of the conditions above may be loosened in future releases.

Variant Object Types

A variant object type is an object type that contains one or more case clauses. NimYAML supports variant object types. Only the currently accessible fields of a variant object type are dumped, and only those may be present when loading.

The value of a discriminator field must be loaded before any value of a field that depends on it. Therefore, a YAML mapping cannot be used to serialize variant object types - the YAML specification explicitly states that the order of key-value pairs in a mapping must not be used to convey content information. So, any variant object type is serialized as a list of key-value pairs.

For example, this type:

type
  AnimalKind = enum
    akCat, akDog
  
  Animal = object
    name: string
    case kind: AnimalKind
    of akCat:
      purringIntensity: int
    of akDog:
      barkometer: int

will be serialized as:

%YAML 1.2
--- !nim:custom:Animal
- name: Bastet
- kind: akCat
- purringIntensity: 7

You can also use variant object types for processing heterogeneous data sets. For example, if you have a YAML document which contains differently typed values in the same list like this:

%YAML 1.2
---
- 42
- this is a string
- !!null

You can define a variant object type that can hold all types that occur in this list in order to load it:

import yaml

type
  ContainerKind = enum
    ckInt, ckString, ckNone
  Container {.implicit.} = object
    case kind: ContainerKind
    of ckInt:
      intVal: int
    of ckString:
      strVal: string
    of ckNone:
      discard

var
  list: seq[Container]
  s = newFileStream("in.yaml")
load(s, list)

{.implicit.} tells NimYAML that you want to use the type Container implicitly, i.e. its fields are not visible in YAML, and are set dependent on the value type that gets loaded into it. The type Container must fullfil the following requirements:

It must contain exactly one case clause, and nothing else.
Each branch of the case clause must contain exactly one field, with one exception: There may be at most one branch that contains no field at all.
It must not be a derived object type (this is currently not enforced)

When loading the sequence, NimYAML writes the value into the first field that can hold the value's type. All complex values (i.e. non-scalar values) must have a tag in the YAML source, because NimYAML would otherwise be unable to determine their type. The type of scalar values will be guessed if no tag is available, but be aware that 42 can fit in both int8 and int16, so in the case you have fields for both types, you should annotate the value.

When dumping the sequence, NimYAML will always annotate a tag to each value it outputs. This is to avoid possible ambiguity when loading. If a branch without a field exists, it is represented as a !!null value.

Customizing Field Handling

NimYAML allows the user to specify special handling of certain object fields via annotation pragmas.

Transient Fields

It may happen that certain fields of an object type are transient, i.e. they are used in a way that makes (de)serializing them unnecessary. Such fields can be marked as transient. This will cause them not to be serialized to YAML. They will also not be accepted when loading the object.

Example:

type MyObject: object
  storable: string
  temporary {.transient.}: string

Default Values

When you load YAML, you might want to allow for the omission certain fields, which should then be filled with a default value. You can do that like this:

type MyObject: object
  required: string
  optional {.defaultVal: "default value".}: string

Whenever a value of type MyObject now is loaded and the input stream does not contain the field optional, that field will be set to the value "default value".

Customize Serialization

It is possible to customize the serialization of a type. For this, you need to implement two procs, constructObject̀ and representObject. If you only need to process the type in one direction (loading or dumping), you can omit the other proc.

constructObject

proc constructObject*(
  ctx   : var ConstructionContext,
  result: var MyObject,
) {.raises: [YamlConstructionError, YamlStreamError].}

This proc should construct the type from the YamlStream in ctx.input. Follow the following guidelines when implementing a custom constructObject proc:

For constructing a value from a YAML scalar, consider using the constructScalarItem template, which will automatically catch exceptions and wrap them with a YamlConstructionError, and also will assure that the item you use for construction is a yamlScalar. See below for an example.
For constructing a value from a YAML sequence or map, you must use the constructChild proc for child values if you want to use their constructObject implementation. This will check their tag and anchor. Always try to construct child values that way.
For non-scalars, make sure that the last value you remove from the stream is the object's ending event (yamlEndMap or yamlEndSequence)
Use peek for inspecting the next event in the YamlStream without removing it.
Never write a constructObject proc for a ref type. ref types are always handled by NimYAML itself. You can only customize the construction of the underlying object.

The following example for constructing from a YAML scalar value is the actual implementation of constructing bool types:

proc constructObject*(
  ctx   : var ConstructionContext,
  result: var bool,
) {.raises: [YamlConstructionError, YamlStreamError].} =
  ## constructs a bool value from a YAML scalar
  ctx.input.constructScalarItem(item, bool):
    case guessType(item.scalarContent)
    of yTypeBoolTrue: result = true
    of yTypeBoolFalse: result = false
    else:
      raise ctx.input.constructionError(
        item.startPos,
        "Cannot construct to bool: " & escape(item.scalarContent)
      )

The following example for constructing from a YAML non-scalar is the actual implementation of constructing seq types:

proc constructObject*[T](
  ctx   : var ConstructionContext,
  result: var seq[T],
) {.raises: [YamlConstructionError, YamlStreamError].} =
  ## constructs a Nim seq from a YAML sequence
  let event = ctx.input.next()
  if event.kind != yamlStartSeq:
    raise ctx.input.constructionError(event.startPos, "Expected sequence start")
  result = newSeq[T]()
  while ctx.input.peek().kind != yamlEndSeq:
    var item: T
    ctx.constructChild(item)
    result.add(move(item))
  discard ctx.input.next()

representObject

proc representObject*(
  ctx  : var SerializationContext,
  value: MyObject,
  tag  : Tag,
): {.raises: [YamlSerializationError].}

This proc should push a list of tokens that represent the type into the serialization context via ctx.put. Follow the following guidelines when implementing a custom representObject proc:

Always output the first token with a yAnchorNone. Anchors will be set automatically by ref type handling.
When outputting non-scalar types, you should use representChild for contained values.
Always use the tag parameter as tag for the first token you generate.
Never write a representObject proc for ref types, instead write the proc for the ref'd type.

The following example for representing to a YAML scalar is the actual implementation of representing int types:

proc representObject*[T: int8|int16|int32|int64](
  ctx  : var SerializationContext,
  value: T,
  tag  : Tag,
) {.raises: [].} =
  ## represents an integer value as YAML scalar
  ctx.put(scalarEvent($value, tag, yAnchorNone))

The following example for representing to a YAML non-scalar is the actual implementation of representing seq and set types:

proc representObject*[T](
  ctx  : var SerializationContext,
  value: seq[T]|set[T],
  tag  : Tag,
) {.raises: [YamlSerializationError].} =
  ## represents a Nim seq as YAML sequence
  ctx.put(startSeqEvent(tag = tag))
  for item in value: ctx.representChild(item)
  ctx.put(endSeqEvent())