Format Specification ==================== This document describes the TACO (Trajectory and Compressed Object) file format specification. File Structure -------------- A TACO file consists of: 1. **Magic bytes**: Identify the file as a TACO file 2. **Header**: Contains metadata about the trajectory 3. **Frame index**: A list of offsets to each frame 4. **Frames**: The actual trajectory data Magic Bytes ----------- TACO files begin with the magic bytes "TACO" followed by a version byte. Header ------ The header contains: - **Format version**: Version of the TACO format - **Number of atoms**: Number of atoms in each frame - **Time step**: Time step between frames in picoseconds - **Periodic boundary conditions**: Boolean flags for PBC in each dimension - **Full frame interval**: How often full frames are stored (for delta compression) - **Compression settings**: Zstd compression level and precision settings - **Simulation metadata**: Optional information about the simulation - **Atom metadata**: Optional information about atoms (elements, masses, etc.) Frame Index ----------- The frame index is a list of 8-byte offsets pointing to the start of each frame in the file. This allows random access to any frame without reading the entire file. Frames ------ Each frame contains: - **Frame number**: Monotonically increasing frame number - **Time**: Simulation time in picoseconds - **Box dimensions**: Optional periodic box dimensions - **Positions**: Atom positions (potentially delta-encoded) - **Velocities**: Optional atom velocities (potentially delta-encoded) - **Forces**: Optional atom forces (potentially delta-encoded) - **Energy components**: Optional potential and kinetic energies - **Thermostat data**: Optional temperature and pressure Delta Compression ----------------- TACO uses delta compression to reduce file size: 1. Full frames ("keyframes") are stored at regular intervals 2. Intermediate frames store only the differences from the previous frame 3. The full_frame_interval setting controls how often full frames are stored This approach is particularly effective for molecular dynamics trajectories where consecutive frames are similar. Precision Control ----------------- TACO allows control over numerical precision: - **position_precision**: Controls position quantization (default: 0.001 Å) - **velocity_precision**: Controls velocity quantization (default: 0.001 Å/ps) - **force_precision**: Controls force quantization (default: 0.01 eV/Å) Setting these values to 0 enables lossless storage (no quantization). Binary Layout ------------- The binary layout of a TACO file is as follows: .. code-block:: +----------------+ | Magic "TACO" | 4 bytes +----------------+ | Version | 1 byte +----------------+ | Header size | 4 bytes +----------------+ | Header data | Variable size +----------------+ | Frame count | 8 bytes +----------------+ | Frame offsets | 8 bytes × frame count +----------------+ | Frame 0 data | Variable size +----------------+ | Frame 1 data | Variable size +----------------+ | ... | ... +----------------+ | Frame N data | Variable size +----------------+ Each frame has its own compressed blob that contains all frame data.