Format Specification

This document describes the TACO (Trajectory and Compressed Object) file format specification.

File Structure

A TACO file consists of:

  1. Magic bytes: Identify the file as a TACO file

  2. Header: Contains metadata about the trajectory

  3. Frame index: A list of offsets to each frame

  4. Frames: The actual trajectory data

Magic Bytes

TACO files begin with the magic bytes “TACO” followed by a version byte.

Frame Index

The frame index is a list of 8-byte offsets pointing to the start of each frame in the file. This allows random access to any frame without reading the entire file.

Frames

Each frame contains:

  • Frame number: Monotonically increasing frame number

  • Time: Simulation time in picoseconds

  • Box dimensions: Optional periodic box dimensions

  • Positions: Atom positions (potentially delta-encoded)

  • Velocities: Optional atom velocities (potentially delta-encoded)

  • Forces: Optional atom forces (potentially delta-encoded)

  • Energy components: Optional potential and kinetic energies

  • Thermostat data: Optional temperature and pressure

Delta Compression

TACO uses delta compression to reduce file size:

  1. Full frames (“keyframes”) are stored at regular intervals

  2. Intermediate frames store only the differences from the previous frame

  3. The full_frame_interval setting controls how often full frames are stored

This approach is particularly effective for molecular dynamics trajectories where consecutive frames are similar.

Precision Control

TACO allows control over numerical precision:

  • position_precision: Controls position quantization (default: 0.001 Å)

  • velocity_precision: Controls velocity quantization (default: 0.001 Å/ps)

  • force_precision: Controls force quantization (default: 0.01 eV/Å)

Setting these values to 0 enables lossless storage (no quantization).

Binary Layout

The binary layout of a TACO file is as follows:

+----------------+
| Magic "TACO"   | 4 bytes
+----------------+
| Version        | 1 byte
+----------------+
| Header size    | 4 bytes
+----------------+
| Header data    | Variable size
+----------------+
| Frame count    | 8 bytes
+----------------+
| Frame offsets  | 8 bytes × frame count
+----------------+
| Frame 0 data   | Variable size
+----------------+
| Frame 1 data   | Variable size
+----------------+
| ...            | ...
+----------------+
| Frame N data   | Variable size
+----------------+

Each frame has its own compressed blob that contains all frame data.