HDTree

columnar, ragged data with a dynamic, run-time defined schema

A TTree-like data organization schema stored within the HDF5 data format.

Table of Contents

dirdescriptiontests
cppC++ API for serial read/writecpp api test
pythonplanned Python API for parallel/serial read

HDTree Meta-Format Definition

HDTree is a specific metaformat on top of a HDF5 Group.

The tree has a few HDF5 attributes of its own to help interface.

  • __size__ : the number of entries in the tree
  • __version__ : the version of the HDTree meta-format
  • __api__ : the API that was used to write the HDTree
  • __api_version__ : the version of that API

Besides these attributes, all child groups of the tree are the branches. Each branch can have an arbitrary number of child branches itself.

Each branch has a two attributes

  • __type__ : the name of the type of this branch
  • __version__ : the version of the type

A variable-length container is "flattened" into two sub-branches:

  • __size__ : is a branch storing the successive sizes of the containers
  • data : is a branch storing the entries in the containers

Similarly, a variable-length mapping is flattened into three sub-branches:

  • __size__ : is a branch storing the successive sizes of the mappings
  • keys : is a branch storing the keys in the mappings
  • vals : is a branch storing the values in the mappings

The recursive process of sub-branching continues until atomic1 data types are reached. Which are the only actual HDF5 DataSets. They are stored in chunked and compressed one dimensional DataSets.

1

booleans, integers, floats, and strings

Design Principles

This page outlines the qualitative goals of the HDTree meta-format and thus also any API interacting with it. These are not concrete requirements of any API; however, they are helpful to keep in mind when diving into deep development associated with HDTree.

Accessible Along Both Axes

In the "DataFrame" vocabulary (popularized by the R language and the Python package pandas), each table of data has two axes: along the rows and along the columns. HDTree is focused on making sure the data is accessible along both of these axes because both access patterns are useful in different situations.

In HDTree vocabulary, each "branch" is a "column" and each "event" is a "row". While HDTree allows for entries within one "cell" (an intersection between a single row and column) to be an arbitrarily complex data type, this organization is still at the foundation of its development.

In-File Version Control

Not only will HDTree store the version of the meta-format in the objects it writes to HDF5 files, it will also store the version of the API and the version of any user-defined data structures. This gives allows users to recieve the benefits of a flexible schema without losing track of how their schema has evolved.

Concrete Acknowledgement of Data Organization

When defining user data types for serialization, HDTree APIs require acknowledgement of where the data is going in the HDTree in-file structure. This may not be literally required in some dynamic language APIs; however, it is important for the user to be aware of how their data will be organized and how it will end up on disk.

Not Limited to Specific Language

HDTree is the meta-format. While originating in C++, the format of data-on-disk should not be limited to a specific language. This prevents evolution of the meta-format or its APIs. This informs development of the meta-format itself by requiring any new features implemented in one language API to have plausible equivlants in other languages.

Coming From ROOT

Page is Work in Progress

Since the HDTree meta-format is directly comparable to (and inspired by) ROOT's TTree class, many users of HDTree are expected to be familiar with the ROOT ecosystem. This page is focused on providing guidance towards HDF5-related tools that would allow for similar interaction with HDTrees that ROOT's ecosystem provides for TTrees.

Graphical Browsing

  • TBrowser -> HDFView and/or JupyterLab extension

Plotting Branches

  • TTree::Draw -> h5py and matplotlib.hist
  • scikit-hep.hist

Serialization of Histogram Objects

  • pickle/h5py in python
  • HighFive in C++

Merging HDTrees

  • Simple, small example using h5py
  • Reference open issue for writing a C++ program

awkward and pandas interface

  • Issue #11 is aiming to define a HDTree Python API modeled after uproot's interface for ROOT TTrees

Contributing to HDTree

All contributing is helpful! Anything from correcting a spelling mistake in the documentation, adding a new example, patching bugs, adding features, or as big as starting an API in a new language is highly encouraged. Below, I've collected some notes on these various levels of contribution.

Documentation Updates

If you are writing more detailed explanation or adding in a new example, please git clone the repository and make sure the updated documentation can be built into a website by jekyll and has the format you expect. You can build and view the documentation locally with the help of a container runner like docker to aid in this development.

New Examples

As far as I'm concerned, the more the merrier! If you are writing an example, please be detailed about which API and which version of that API you are using so that future readers can check if anything has changed since the example was written.

Patching Bugs or Adding Features

If you find a bug or think of a new feature to add, please open a GitHub Issue to start the discussion. This allows all collaborators to see what you plan to work on as well as potentially offer some insight on how to get going.

New API

If your favorite language does not have an API represented, feel free to start writing one! A first API does not have to be super powered. Even a simple one only focused on reading without parallelization can be a good start and open the door to other contributors to expand on it.

Again, similar to patching bugs or adding features, please create a GitHub issue to start a discussion and outline a plan for what you want to implement.

As you get closer to a functional API, integration tests will also be requested. So keep in mind that you may need to be able to run one of the other APIs to help make sure your API is reading and/or writing a correct form of the HDTree meta-format.

Building the Docs

Offering documentation edits while using HDTree is incredibly helpful.

The automatically generated documentation from source code is usually more detailed and is done differently depending on the language being written, so that "API Reference" is kept in separate sites for each API. Manually written documentation is also separated by API, but are all written here and processed by mdbook.

Launching Local Version of Docs

After installing mdbook, you can use it to build and serve the doc website locally on your computer while you are writing documentation. It will then automatically refresh the website when it detects that files have changed.

Note: Some of the links in the mdbook point to the reference manuals that are generated differently in order to conform to language-specific standards, so without generating those manuals those links will be broken.

Reference Manuals

The different APIs have different methods of generating reference manuals from the comments in the code. Besides editing the manual documentation and having it processed by jekyll, these files are copied onto the gh-pages branch into a subdirectory so that they are not modified by jekyll but still hosted at the same website.

C++

The C++ reference is generated using doxygen and the doxygen-awesome theme. The theme is kept in a git submodule, so you will need to make sure the submodules are downloaded for the local version to be the same theme as the online version.

git submodule --update --init

After installing doxygen, it is expected to be run from the root of this repository.

doxygen cpp/docs/doxyfile

This produces the HTML doxumentation in the cpp/docs/html directory. You can view the HTML files generated by doxygen by opening them in your favorite browser. For example,

firefox cpp/docs/html/index.html

Structure of the Mono-Repo

In order to ensure uniformity of the HDTree meta-format, the various API implementations are kept within this mono-repo. Each implementation can cater to its language's strengths; nevertheless, the meta-format itself should unite all of the APIs.

For this reason, the different APIs will also be tested to make sure files written by one API can be read by others. Each API has its own subdirectory and it has full control over the organization within that subdirectory to conform to language conventions. So, in general, the structure of the mono-repo's root directory is very simple:

  • .github/ : GitHub workflows, templates, and other GitHub-related files
  • test/ : Integration tests to make sure files from one API can be read by another
  • cpp/ : CPP API Implementation
  • Xlang/ : some language X API implementation
  • metaformat/ : documentation about the meta-format/schema itself, not specific to any language
  • docs/ : general documentation about the HDTree project
  • README.md : GitHub README
  • SUMMARY.md : outline of mdBook-based documentation website
  • book.toml : configuration file for mdBook-based site

hdtree-cpp

C++ API for the HDTree data organization structure.

hdtree-cpp is a C++17 library with support for

  • serial read/write of an HDTree
  • schema evolution of user-defined structures stored in branches of the HDTree

Online Reference Manual

Installation

Depedencies

  • HDF5
  • HighFive
  • Boost (for demangling, plans to make optional)
cmake -B build -S . \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_INSTALL_PREFIX=<prefix>
cmake --build build --target test
cmake --build build --target install

Usage

Below are just code snippets, please look into the examples for complete programs that compile and run with HDTree.

Write

examples/save.cxx

auto tree = hdtree::Tree::save("my-file.hdf5","/path/to/tree");
auto& i_entry = tree.branch<int>("i_entry");
for (std::size_t i{0}; i < 5; i++) {
  *i_entry = i;
  // same as
  //i_entry.update(i);
  tree.save();
}

Read and Write

examples/transform.cxx

{ // read and write (separate source/dest)
  auto tree = hdtree::Tree::transform("one.hdf5","/tree1","two.hdf5","/tree2");
  // read this branch
  auto& i_entry = tree.get<int>("i_entry");
  // write this branch
  auto& my_cool_new_var = tree.branch<double>("coolio");
  for (std::size_t i{0}; i < tree.entries(); i++) {
    tree.load();
    
    *my_cool_new_var = i_entry*4.2;    

    tree.save();
  }
}

{ // read and write (same source/dest)
  auto tree = hdtree::Tree::inplace("one.hdf5","/tree1");
  // read this branch
  auto& i_entry = tree.get<int>("i_entry");
  // write this branch
  auto& my_cool_new_var = tree.branch<double>("coolio");
  for (std::size_t i{0}; i < tree.entries(); i++) {
    tree.load();
    
    *my_cool_new_var = i_entry*4.2;    

    tree.save();
  }
}

Read

examples/load.cxx

{ // read
  auto tree = hdtree::Tree::load("my-file.hdf5","/path/to/tree");
  // & required
  auto& i_entry = tree.get<int>("i_entry");
  for (std::size_t i{0}; i < r.entries(); i++) {
    tree.load();
    assert(i == *i_entry);
  }
}

Benefits

  • resets value in a Branch to a empty state after each save (i.e. treat the Branch reference as a for-loop-local variable)
  • hdtree juggles the memory addresses

Table of Contents

  • include: the headers for the HDTree C++ API
  • src: source files compiled into the hdtree-cpp library
  • test: source files for testing hdtree-cpp
  • examples: simple, example programs showing hdtree-cpp's various abilities
    • compilation and running of examples are always included in the build so that developers of hdtree-cpp can keep them up-to-date

Getting Started with HDTree C++

Installing the HDTree C++ API

CMake and a C++17 compatible C++ compiler are required. Both of which are readily available via your system software repjositories.

  • On Ubuntu derivatives: sudo apt update && sudo apt install cmake gcc g++

The HDF5 libray exists in many Unix repositories so look there for installing it, as always, you can fall back to building the latest release from source.

  • On Ubuntu derivatives: sudo apt update && sudo apt install libhdf5-dev
  • On MacOS: brew install hdf5

The HighFive C++ Wrapper is used by HDTree so it is also required.

wget https://github.com/BlueBrain/HighFive/archive/refs/tags/v2.7.0.tar.gz
  • Unpack the source
tar xzf v2.7.0.tar.gz
cd HighFive-v2.7.0
  • Configure the build. Use the CMAKE_INSTALL_PREFIX if you wish to install HighFive somewhere besides /usr/local.
cmake -DHIGHFIVE_EXAMPLES=OFF -DHIGHFIVE_UNIT_TESTS=OFF -B build -S .
  • Install the interface. May require administrative (sudo) privileges if installing to /usr/local.
make install

Building HDTree is similar to HighFive but a separate compilation step will be helpful since, unlike HighFive, HDTree is not a header-only C++ library.

wget https://github.com/tomeichlersmith/hdtree/archive/refs/tags/cpp/v0.4.5.tar.gz
  • Unpack the source
tar xzf v0.4.5.tar.gz
cd hdtree-v0.4.5/cpp
  • Configure the build (again, use CMAKE_INSTALL_PREFIX if you wish to change the install location).
cmake -B build -S .
  • Build the library.
cd build
make
  • Install
make install

First Steps

There are four ways to access a HDTree with the C++ API. They are mainly separated by different stages of processing the data. We start with save since you will first need to write a HDF5 file with an HDTree in it in order to be able to go further.

The code below is copied in from the examples directory within the C++ API source. This means the code "snippets" are pretty long, but I've tried to include explanatory comments within them. The examples are compiled alongside the HDTree C++ API so you can try it out immediately after building it.

write-only (save)

First, we are just going to write some example data to a file. This shows an example of a write-only process. After compilation, run by providing a name for the file and the tree.

hdtree-eg-save my-first-hdtree.h5 the-tree
/**
 * @file save.cxx
 * Example of saving a new HDTree into a file
 */

// for generating random data
#include <random>

// for interacting with HDTrees
#include "hdtree/Tree.h"

// utility functions for example programs
#include "examples.h"

int main(int argc, char** argv) try {
  /**
   * parse command line for arguments
   */
  std::string filename, treename;
  int rc = hdtree::examples::parse_single_file_args(argc, argv, filename, treename);
  if (rc != 0) return rc;

  /**
   * Create a tree by defining what file it is in
   * and where it resides within that file
   */
  auto tree = hdtree::Tree::save(filename, treename);

  /**
   * Create branches to define what type of information will
   * go into the HDTree. The hdtree::Tree::branch function
   * returns a handle to the created hdtree::Branch object.
   * This object can (and should) be used to interace with
   * the values that will be stored in the HDTree on disk
   * in order to reduce the number of in-memory copies that
   * need to happen. Here, we use `auto&` to avoid typing
   * out all the C++ template nonsense that hdtree::Branch
   * does under-the-hood.
   *
   * Each branch handle can be treated as a pointer
   * to the underlying type.
   *
   * **Note**: Branch handles are invalid after the tree they
   * were created from is deleted.
   */
  auto& i_entry = tree.branch<std::size_t>("i_entry");
  auto& rand_nums = tree.branch<std::vector<double>>("rand_nums");

  /**
   * Initialization of random number generation.
   * Not really applicable to HDTree, just used here to
   * show that varying length vectors can be serialized
   * with ease
   */
  std::mt19937 rng;  // no argument -> no seed
  std::uniform_real_distribution<double> norm(0., 1.);
  std::uniform_int_distribution<std::size_t> uniform(1, 100);

  /**
   * Actual update and filling of the HDTree.
   *
   * You can see here how we can treat `i_entry` 
   * as if it was a properly initialized `std::size_t *` 
   * and `rand_nums` * as if it was a properly 
   * initialized `std::vector<double> *`.
   */
  for (std::size_t i{0}; i < 100; ++i) {
    *i_entry = i;
    std::size_t size = uniform(rng);
    for (std::size_t j{0}; j < size; j++) {
      rand_nums->push_back(norm(rng));
    }

    /**
     * We choose to save each value of the loop into the tree.
     */
    tree.save();
  }

  /**
   * The final flushing of the data to disk as well as handle
   * cleanup procedures will all be handled automatically by
   * deconstruction.
   */
  return 0;
} catch (const hdtree::HDTreeException& e) {
  std::cerr << "ERROR " << e << std::endl;
  return 1;
}

read and write (transform or inplace)

Another common task is to perform calculations on some input data and save those calculations into the tree as well. This does not answer the question of what should be done with the original data. Should we (a) copy the original data and write it to a new file with the new data or (b) write the new data into the input file alongside the original data. In the HDTree C++ API, option (a) is achieved with transform and option (b) is done with inplace. Both can be run from the same executable and the choice is made depending of if you give a new file and tree name or not.

# this will use hdtree::Tree::transform
hdtree-eg-transform my-first-hdtree.h5 the-tree my-second-hdtree.h5 the-second-tree
# this will use hdtree::Tree::inplace
hdtree-eg-transform my-first-hdtree.h5 the-tree
/**
 * @file transform.cxx
 * Example of transforming an HDTree by adding more branches
 * 
 * This example determines whether a tree should be copied into
 * a new file or simply transformed in its current file by what
 * arguments are provided to the program. We assume the input
 * tree was generated by the hdtree-eg-save example program
 * defined in @ref save.cxx (i.e. we look for specific branches).
 */

// for interacting with HDTrees
#include "hdtree/Tree.h"

// utility functions for example programs
#include "examples.h"

int main(int argc, char** argv) try {
  /**
   * parse command line for arguments
   */
  std::pair<std::string,std::string> src, dest;
  int rc = hdtree::examples::parse_two_file_args(argc, argv, src, dest);
  if (rc != 0) return rc;

  /**
   * Wrap an existing on-disk HDTree
   *
   * Here is where we make the decision on whether to copy a tree
   * into a new file or not. We choose to copy the tree into
   * a new file if a destination file and tree are provided on
   * the command line. We use the slightly-ugly ternary operator
   * in order to avoid unnecessary copying from an if-else tree.
   */
  auto tree = dest.first.empty() ?
    hdtree::Tree::inplace(src.first, src.second) :
    hdtree::Tree::transform(src, dest);

  /**
   * We are going to calculate the average of the random
   * numbers within each tree entry, so we create a new
   * branch to store that result as well as retrieve
   * the branch with the numbers we will use.
   */
  auto& rand_nums = tree.get<std::vector<double>>("rand_nums");
  auto& avg = tree.branch<double>("avg");

  /**
   * Actual update and filling of the HDTree.
   *
   * We use a tree helper that will make sure we go through
   * each entry in the tree, calling the hdtree::Tree::load
   * at the beginning and hdtree::Tree::save at the end of
   * each run in the loop. This code is essentially equivalent to
   * ```cpp
   * for (std::size_t i{0}; i < tree.entries(); ++i) {
   *   tree.load();
   *   // the code inside the lambda function below
   *   if (rand_nums->size() > 0) {
   *     *avg = (std::reduce(rand_nums->begin(), rand_nums->end()))/rand_nums->size();
   *   } else {
   *     *avg = -1;
   *   }
   *   // 
   *   tree.save();
   * }
   * ```
   * Just using this example to show off some potentially-helpful
   * features - if lambda functions are causing you difficulty, 
   * feel free to avoid them. Just make sure to remember to call
   * the load and save functions!
   */
  tree.for_each([&]() {
        if (rand_nums->size() > 0) {
          *avg = (std::reduce(rand_nums->begin(), rand_nums->end()))/rand_nums->size();
        } else {
          *avg = -1;
        }
      });

  /**
   * The final flushing of the data to disk as well as handle
   * cleanup procedures will all be handled automatically by
   * deconstruction.
   */
  return 0;
} catch (const hdtree::HDTreeException& e) {
  std::cerr << "ERROR " << e << std::endl;
  return 1;
}

read-only (load)

Finally, the last common task is reading in the data from the tree and using it to do some other task (e.g. making a plot or fitting the data with some model). In this API, that is called loading and the example program included prints a simple histogram of the averages of the original data generated earlier.

Fun Fact: This is an example of the central limit theorem!

# this will error-out if you didn't run step two!
hdtree-eg-load my-first-hdtree.h5 the-tree
# the below is example output, it may change since the random data may change!
0.X | Num Entries
< 0 |
0.0 |
0.1 |*
0.2 |
0.3 |***
0.4 |********************************************
0.5 |*************************************************
0.6 |**
0.7 |
0.8 |*
0.9 |
> 1 |
/**
 * @file transform.cxx
 * Example of transforming an HDTree by adding more branches
 * 
 * This example determines whether a tree should be copied into
 * a new file or simply transformed in its current file by what
 * arguments are provided to the program. We assume the input
 * tree was generated by the hdtree-eg-save example program
 * defined in @ref save.cxx (i.e. we look for specific branches).
 */

// for interacting with HDTrees
#include "hdtree/Tree.h"

// utility functions for example programs
#include "examples.h"

int main(int argc, char** argv) try {
  /**
   * parse command line for arguments
   */
  std::string file_name, tree_name;
  int rc = hdtree::examples::parse_single_file_args(argc, argv, file_name, tree_name);
  if (rc != 0) return rc;

  /**
   * Wrap an existing on-disk HDTree
   */
  auto tree = hdtree::Tree::load(file_name, tree_name);

  std::cout << "This is what a missing branch exception looks like:" << std::endl;
  try {
    tree.get<double>("dne");
  } catch (const hdtree::HDTreeException& e) {
    // demonstrate what exceptions look like.
    std::cout << e << std::endl;
  }
  std::cout << "--- end of example exception ---" << std::endl;

  /**
   * We want to study the average of the random data
   * in each entry. This average was calculated in
   * the examples/transform.cxx program so this part
   * will fail if running on a file that wasn't updated
   * by transform!
   */
  const auto& avg = tree.get<double>("avg");

  /**
   * Our very simple histogram is going to be 10 bins with
   * an underflow (everything below 0) and overflow (everthing
   * above 1) bins.
   *
   * Since the random data is between 0 and 1, we can calculate
   * the bin index very quickly 
   *
   *  floor(avg * 10)+1
   *
   * We will include the value of exactly 1 in the last bin
   * and have a special bin for the entries without any data
   * from which to calculate an average.
   */
  std::vector<unsigned int> hist_bins(12, 0);

  /**
   * Actual loop over the tree.
   *
   * We use a tree helper that will make sure we go through
   * each entry in the tree, calling the hdtree::Tree::load
   * at the beginning of each run in the loop.
   * This code is essentially equivalent to
   * ```cpp
   * for (std::size_t i{0}; i < tree.entries(); ++i) {
   *   tree.load();
   *   // the code in teh lambda function below
   * }
   * ```
   * Just using this example to show off some potentially-helpful
   * features - if lambda functions are causing you difficulty, 
   * feel free to avoid them. Just make sure to remember to call
   * the load and save functions!
   */
  tree.for_each([&]() {
        std::size_t i_bin{0};
        if (*avg < 0) {
          i_bin = 0; 
        } else if (*avg > 1) {
          i_bin = 11;
        } else {
          i_bin = floor(*avg * 10) + 1;
        }
        ++hist_bins[i_bin];
      });

  printf("0.X | Num Entries\n");
  for (std::size_t i_bin{0}; i_bin < 12; ++i_bin) {
    std::string x;
    if (i_bin == 0) {
      x = "< 0";
    } else if (i_bin == 11) {
      x = "> 1";
    } else {
      x = "0."+std::to_string(i_bin-1);
    }
    printf("%s |", x.c_str());
    for (std::size_t c{0}; c < hist_bins.at(i_bin); ++c) printf("*");
    printf("\n");
  }

  /**
   * The final flushing of the data to disk as well as handle
   * cleanup procedures will all be handled automatically by
   * deconstruction.
   */
  return 0;
} catch (const hdtree::HDTreeException& e) {
  std::cerr << "ERROR " << e << std::endl;
  return 1;
}

User-Defined Data Structures

User-defined objects can also be serialized within HDTree. Simplified schema evolution (a la ROOT's ClassDef macro) is also available; however, this example merely shows the required boiler-plate.

HDTree's C++ API has chosen to avoid automatically deducing the on-disk naming from the in-memory class member names. This introduces more boilerplate, but, in my opinion, is helpful for essentially documenting how on-disk data was generated.

/**
 * @file user_classes.cxx
 * Example of saving and loading user-defined C++ classes 
 */

// for generating random data
#include <random>

// for interacting with HDTrees
#include "hdtree/Tree.h"

// utility functions for example programs
#include "examples.h"

/**
 * Example user class
 */
class MyData {
  float x_, y_, z_;
  // grant hdtree access so we can keep the `attach` method private
  friend class hdtree::access;
  // this is where the name of data on disk is assigned to the
  // variable name of data in memory
  template <typename Branch>
  void attach(Branch& b) {
    b.attach("x", x_);
    b.attach("y", y_);
    b.attach("z", z_);
  }
 public:
  MyData() = default;
  MyData(float x, float y, float z)
    : x_{x}, y_{y}, z_{z} {}
  // HDTree also requires classes to have a `clear` method
  // for resetting the instance to a "non-assigned" state
  void clear() {
    x_ = 0.;
    y_ = 0.;
    z_ = 0.;
  }
  // helper function since we know what this data means
  float mag() const {
    return sqrt(x_*x_+y_*y_+z_*z_);
  }
};

int main(int argc, char** argv) try {
  /**
   * parse command line for arguments
   */
  std::string filename, treename;
  int rc = hdtree::examples::parse_single_file_args(argc, argv, filename, treename);
  if (rc != 0) return rc;

  { // write a simple file with some random data points
    auto tree = hdtree::Tree::save(filename, treename);
    /**
     * Once the MyData::attach method is written, it can be put
     * into STL containers (or as a member of other user classes)
     * like any other serializable class
     */
    auto& my_data = tree.branch<std::vector<MyData>>("my_data");
    // initialization of random number generator
    std::mt19937 rng;  // no argument -> no seed
    std::uniform_real_distribution<double> norm(0., 1.);
    std::uniform_int_distribution<std::size_t> uniform(1, 100);

    for (std::size_t i{0}; i < 100; ++i) {
      std::size_t size = uniform(rng);
      for (std::size_t j{0}; j < size; ++j) {
        my_data->emplace_back(norm(rng), norm(rng), norm(rng));
      }

      tree.save();
    }

    // final flushing accomplished when tree and its branches
    // go out of scope and are destructed
  }

  { // load back from same file and write the average mag as a new branch
    auto tree = hdtree::Tree::inplace(filename, treename);
    auto& my_data = tree.get<std::vector<MyData>>("my_data");
    auto& avg_mag = tree.branch<float>("avg_mag");
    tree.for_each([&]() {
        if (my_data->size() > 0) {
          float tot_mag = 0.;
          for (const MyData& d : *my_data) {
            tot_mag += d.mag();
          }
          *avg_mag = tot_mag/my_data->size();
        } else {
          *avg_mag = -1;
        }
    });

    // final flushing accomplished when tree and its branches
    // go out of scope and are destructed
  }

  return 0;
} catch (const hdtree::HDTreeException& e) {
  std::cerr << "ERROR " << e << std::endl;
  return 1;
}

More Intense Use Case

The C++ HDTree API is mainly implemented through its various Branch classes. The Tree class is mainly there to be a helpful interface for handling a set of Branches. I point this out because if you are interested in building a larger data processing framework around the C++ HDTree API, I would suggest focusing on writing your own version of Tree to accomodate your needs rather than attempting to use the Tree that is apart of this repository.

Performance

Since ROOT is written in C++, using the C++ API for HDTree is the closest to an apples-to-apples comparison we can have between the two formats.

This page details a comparison between the two attempting to isolate the serialization performance of the two libraries.

Writing

Reading

Generating hdtree-cpp docs

The hdtree-cpp documentation is generated with Doxygen using the fancy doxygen-awesome theme In order to obtain the same styling as the online documentation, you must make sure the doxygen-awesome submodule is downloaded. You can do this with

git submodule update --init

You can generate a local copy of the documentation after installing doxygen and sphinx. We assume that doxygen is run from the root directory of the fire repository.

doxygen docs/doxyfile

The online documentation includes hyperlinks that jump between the C++ documentation generated by doxygen and the Python documentation generated by sphinx. These hyperlinks refer to the root directory of the destination github site and so they will not function when building the documentation locally.

Diagrams

Specialized diagrams were created with diagrams.net and then exported to a SVG file for inclusion in the generated HTML. Files ending in .drawio are versions of these diagrams that can be loaded by diagrams.net in order to continue with a current version of the diagram. Files of the same name but ending in .svg are the images actually included in the docs.