Format Specification
This document describes the TACO (Trajectory and Compressed Object) file format specification.
File Structure
A TACO file consists of:
Magic bytes: Identify the file as a TACO file
Header: Contains metadata about the trajectory
Frame index: A list of offsets to each frame
Frames: The actual trajectory data
Magic Bytes
TACO files begin with the magic bytes “TACO” followed by a version byte.
Header
The header contains:
Format version: Version of the TACO format
Number of atoms: Number of atoms in each frame
Time step: Time step between frames in picoseconds
Periodic boundary conditions: Boolean flags for PBC in each dimension
Full frame interval: How often full frames are stored (for delta compression)
Compression settings: Zstd compression level and precision settings
Simulation metadata: Optional information about the simulation
Atom metadata: Optional information about atoms (elements, masses, etc.)
Frame Index
The frame index is a list of 8-byte offsets pointing to the start of each frame in the file. This allows random access to any frame without reading the entire file.
Frames
Each frame contains:
Frame number: Monotonically increasing frame number
Time: Simulation time in picoseconds
Box dimensions: Optional periodic box dimensions
Positions: Atom positions (potentially delta-encoded)
Velocities: Optional atom velocities (potentially delta-encoded)
Forces: Optional atom forces (potentially delta-encoded)
Energy components: Optional potential and kinetic energies
Thermostat data: Optional temperature and pressure
Delta Compression
TACO uses delta compression to reduce file size:
Full frames (“keyframes”) are stored at regular intervals
Intermediate frames store only the differences from the previous frame
The full_frame_interval setting controls how often full frames are stored
This approach is particularly effective for molecular dynamics trajectories where consecutive frames are similar.
Precision Control
TACO allows control over numerical precision:
position_precision: Controls position quantization (default: 0.001 Å)
velocity_precision: Controls velocity quantization (default: 0.001 Å/ps)
force_precision: Controls force quantization (default: 0.01 eV/Å)
Setting these values to 0 enables lossless storage (no quantization).
Binary Layout
The binary layout of a TACO file is as follows:
+----------------+
| Magic "TACO" | 4 bytes
+----------------+
| Version | 1 byte
+----------------+
| Header size | 4 bytes
+----------------+
| Header data | Variable size
+----------------+
| Frame count | 8 bytes
+----------------+
| Frame offsets | 8 bytes × frame count
+----------------+
| Frame 0 data | Variable size
+----------------+
| Frame 1 data | Variable size
+----------------+
| ... | ...
+----------------+
| Frame N data | Variable size
+----------------+
Each frame has its own compressed blob that contains all frame data.