Skip to content

unitedstates/rtyaml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rtyaml: Round-trippable YAML

Primary author: Joshua Tauberer https://razor.occams.info

This module is a wrapper around pyyaml to read and write YAML files with some improvements:

  • Round-tripping YAML files is possible by preserving the order of keys. In Python 3.7+, just use regular dicts. In prior versions of Python, use collections.OrderedDict.
  • Saner defaults are set for how strings are written out to YAML (see below).
  • Other sane defaults are chosen like using the "safe" loader/dumper.
  • A comment block found at the very beginning of a stream when loading YAML is preserved when writing it back out.

What do you mean by round-tripping?

Round-tripping is when you load a file and then save it unchanged, you expect the bytes on disk not to change. This isn't possible with PyYAML, and it makes it difficult to use YAML files with version control like git because every time you load and save the file, things can get rearranged. Keys can change order, string quoting styles can change, #-comments are removed, and so on.

Although rtyaml can't provide round-tripping for all files, it does set some sane defaults on PyYAML so that it's easier to achieve. For instance, if you load this file with PyYAML:

and then save it back out unchanged:

import yaml
print(yaml.dump(yaml.load(open('example.yaml'))))

you get this mess:

Notice how the comment is gone, the keys zz, yy, xx changed order, the strings are inconsistently formatted, nulls use a confusing keyword, and mappings are condensed into single lines.

With ryaml, you actually get the original file back! That's basically the whole point of this library.

Installation and usage

Install:

pip install rtyaml

Usage:

import rtyaml

with open("myfile.yaml") as f:
  stuff = rtyaml.load(f)

# ...do things to stuf...

with open("myfile.yaml", "w") as f:
  rtyaml.dump(stuff, f)

As in the underlying pyyaml library, load accepts a string or bytes-string containing YAML or an open file object (binary or text). Also, the second argument to dump is optional and if omitted the function returns the YAML in a string.

load_all and dump_all are also supported, which load and save lists of documents using YAML's --- document separator.

Dependencies

  • pyyaml (in Ubuntu, the python-yaml or python3-yaml package)
  • libyaml (in Ubuntu, the libyaml-0-2 package plus, at install time only, libyaml-dev)

Details

This library does the following:

  • Uses the native libyaml CSafeLoader and CDumper for both speed and trustable operations.
  • Preserves the order of keys in dicts rather than alphebetizing the keys (Python >=3.7).
  • Allows you to use collections.OrderedDicts with dump to preserve key order (useful before Python 3.7).
  • Writes multi-line strings in block mode (rather than quoted with ugly escaped newline characters), choosing between the literal or folded mode depending on what looks better for the length of the lines in the string.
  • Writes mappings and lists in the expanded (one per line) format, which is nice when the output is going in version control.
  • Modifies the flow string quoting rules so that any string made up of digits is serialized with quotes. (The default settings serialize the string "01" with quotes but the string "09" without quotes! (Can you figure out why?))
  • None is serialized as the tilde rather than as null, which is less confusing.
  • If a block comment appears at the start of the file (i.e. one or more lines starting with a '#'), write it back out if the same object is written with rtyaml.dump().

For Python 3.6 and earlier:

  • Loads mappings collections.OrderedDict so that the key order remains the same when dumping the file later using. (This is no longer needed in Python 3.7 because key order is preserved in regular dicts now.)

With-Block Helper for Editing Files In-Place

The rtyaml.edit class is a utility class that can be used with with blocks that makes it easier to edit YAML files in-place. For example:

with rtyaml.edit("path/to/data.yaml", default={}) as data:  
   data["hello"] = "world"

The file is opened for editing ("r+" mode, or "w+" mode if it doesn't exist and a default value is given) and its contents is parsed and returned as the data with-block variable. The file is kept open while the with-block is executing. When the with-block exits, the with-block variable is written back to the file as YAML, and then the file is closed.

This will, of course, only work if the file contains an array or object (dict), and you cannot assign a new value to the with-block variable (that's just how Python with blocks work). You can only call its methods, i.e., you can edit the list (append, pop, sort, etc.) and dict (get/set keys), but you can't replace the value with an entirely new list or dict.

If the default parameter is not given, or is None, the file must exist. Otherwise, if the file doesn't exist, it's created and the with-block variable will start you off with the default value.

You can also pass a stream as the first argument if you want to open the file yourself. The stream must support seek, truncate, and close. If you open a file, you should use the "r+" or "w+" mode.

Public domain dedication

This project is dedicated to the public domain, as indicated in the LICENSE file:

The project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication. http://creativecommons.org/publicdomain/zero/1.0/

All contributions to this project must be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.