How To Install OpenZL on Ubuntu 24.04 LTS
OpenZL represents a significant advancement in data compression technology. Meta released this open-source format-aware compression framework in October 2025, offering developers and system administrators a powerful tool for optimizing storage and data transfer. Unlike traditional compression algorithms that treat all data uniformly, OpenZL generates specialized compressors tailored to specific data formats while maintaining compatibility with a universal decompressor. This tutorial provides comprehensive instructions for installing OpenZL on Ubuntu 24.04 LTS, covering multiple installation methods, configuration options, and practical usage scenarios.
The framework demonstrates superior compression ratios compared to industry-standard tools like gzip, zstd, and xz. Linux administrators working with large datasets, machine learning models, or homogeneous data structures will find OpenZL particularly valuable. Ubuntu 24.04 LTS provides an ideal platform for deploying OpenZL due to its stability, long-term support commitment, and compatibility with modern development tools.
What is OpenZL?
OpenZL functions as both a core library and a comprehensive toolset for generating specialized data compressors. The architecture centers on format-aware compression, which analyzes data structure patterns to create optimized compression algorithms rather than applying generic compression techniques. This approach yields substantially better compression ratios for structured data types.
The framework excels in several specific use cases. Numeric array compression benefits dramatically from OpenZL’s format-aware approach, as the system recognizes patterns in numerical sequences. PyTorch model compression represents another key application, where OpenZL optimizes the storage of machine learning model parameters. Homogeneous datasets—collections of similarly structured data—see the most significant compression improvements.
OpenZL introduces the concept of profiles and training for specialized compression. Profiles define data format expectations, such as 64-bit little-endian unsigned integers. Training involves analyzing sample data to generate a specialized compressor configuration that captures format-specific patterns. Meta developed OpenZL internally before releasing it as open-source software with comprehensive documentation hosted on GitHub. The project includes example datasets, benchmarks, and detailed API references for both C++ and Python implementations.
Prerequisites and System Requirements
Ubuntu 24.04 LTS serves as the foundation for this installation guide. Verify the Ubuntu version by executing lsb_release -a
in the terminal. The system requires an amd64 architecture, which can be confirmed with uname -m
. Allocate at least 2GB of free disk space for OpenZL, its dependencies, and compilation artifacts.
Update the package index and upgrade existing packages before proceeding:
sudo apt update
sudo apt upgrade -y
Several build dependencies must be present on the system. The build-essential package provides fundamental compilation tools including gcc, g++, and make. Git enables repository cloning from GitHub. CMake facilitates advanced build configuration and dependency management. Python development headers support the optional Python extension.
Check for existing installations of critical tools:
gcc --version
g++ --version
make --version
cmake --version
git --version
If any command returns “command not found,” proceed with dependency installation. The compilation process requires sufficient RAM—at least 4GB recommended for parallel builds. Verify available memory with free -h
.
Installing Build Dependencies on Ubuntu 24.04
Install all required dependencies with a single command:
sudo apt install build-essential gcc g++ make cmake git -y
Each component serves a specific purpose in the build process. Build-essential provides a meta-package containing essential compilation utilities. GCC and G++ compile C and C++ source code respectively. Make automates the build process by executing instructions from Makefiles. CMake generates build configurations and manages complex dependency relationships. Git clones the OpenZL repository from GitHub.
Verify successful installation by checking version numbers:
gcc --version
g++ --version
make --version
cmake --version
git --version
Each command should display version information. GCC and G++ version 11 or higher work optimally with OpenZL. CMake version 3.16 or newer ensures compatibility with OpenZL’s build configuration.
Optional dependencies include zlib1g-dev for additional compression format support. Install if needed:
sudo apt install zlib1g-dev -y
Dependency conflicts occasionally arise on systems with mixed package sources. Resolve conflicts by updating apt cache: sudo apt update --fix-missing
. Remove problematic packages and reinstall if necessary.
Method 1: Installing OpenZL from Source Using Make
The Make-based installation provides the quickest path to a working OpenZL CLI tool. Begin by cloning the OpenZL repository from GitHub:
git clone --depth 1 -b release https://github.com/facebook/openzl.git
The --depth 1
flag performs a shallow clone, downloading only the latest commit rather than the entire repository history. This significantly reduces download time and disk usage. The -b release
flag specifies the release branch, which contains stable, tested code suitable for production use.
Navigate to the cloned repository:
cd openzl
Compile the OpenZL CLI tool:
make zli
The make process downloads dependencies, compiles source code, and links the final executable. Compilation typically takes 3-10 minutes depending on CPU performance and core count. The system automatically downloads and builds googletest and zstd as part of the compilation process.
Upon successful completion, the zli executable appears in the current directory. Verify installation:
./zli --help
This command displays available subcommands and usage information. The zli tool serves as the main command-line interface, bundling multiple OpenZL utilities including compress, decompress, train, and list-profiles.
List available compression profiles:
./zli list-profiles
This command enumerates predefined profiles such as le-u64 (little-endian 64-bit unsigned integers) and other format-specific configurations. Each profile targets specific data types for optimal compression.
Method 2: Installing OpenZL Using CMake Build System
CMake offers superior flexibility for advanced configurations and system-wide installations. The CMake approach supports customization of installation paths, build types, and optional features.
Create a dedicated build directory:
mkdir -p cmakebuild
Configure the build with CMake:
cmake -S . -B cmakebuild
The -S .
flag specifies the source directory (current location), while -B cmakebuild
designates the build directory. CMake analyzes dependencies, checks compiler capabilities, and generates appropriate build files.
Compile the zli target:
cmake --build cmakebuild --target zli
This command builds only the CLI tool without compiling unnecessary components. Multi-core systems benefit from parallel compilation:
cmake --build cmakebuild --target zli -j $(nproc)
The -j $(nproc)
flag utilizes all available CPU cores for faster compilation.
Create a symbolic link to the compiled binary:
ln -sf cmakebuild/cli/zli .
This link provides convenient access to zli from the repository root directory.
For system-wide installation with custom prefix:
mkdir build-cmake
cd build-cmake
cmake .. -DCMAKE_BUILD_TYPE=Release -DOPENZL_BUILD_TESTS=ON -DCMAKE_INSTALL_PREFIX=/usr/local
make -j$(nproc)
make install
The CMAKE_BUILD_TYPE=Release flag enables optimization for maximum performance. DOPENZL_BUILD_TESTS=ON compiles the test suite for verification purposes. CMAKE_INSTALL_PREFIX specifies the installation destination—/usr/local places binaries in standard system paths.
Downloading and Compiling OpenZL Core Library
Complete library compilation includes the core library, CLI tools, and comprehensive test suite. Start from the repository root:
mkdir build-cmake
cd build-cmake
Configure with full options:
cmake .. -DCMAKE_BUILD_TYPE=Release -DOPENZL_BUILD_TESTS=ON -DCMAKE_INSTALL_PREFIX=install
Each CMake flag controls specific build aspects. CMAKE_BUILD_TYPE=Release applies optimization flags for production performance. DOPENZL_BUILD_TESTS=ON includes unit tests and integration tests in the build. CMAKE_INSTALL_PREFIX=install creates a local installation directory rather than modifying system paths.
Compile with parallel execution:
make -j
Without specifying a job count, make uses all available cores automatically. The process downloads and compiles googletest for unit testing and zstd for compression algorithm implementation. Compilation typically requires 5-15 minutes on modern hardware.
Run the test suite:
ctest . -j 10
The test command executes 10 parallel test processes. Tests verify compression algorithms, decompression accuracy, training functionality, and edge case handling. All tests should pass for a successful build.
Install to the specified prefix:
make install
Installation copies binaries, libraries, and header files to the prefix directory. The zstd library installs alongside OpenZL as it provides foundational compression primitives.
Installing the Python Extension (Optional)
Python developers benefit from native OpenZL bindings for seamless integration with data science workflows. The Python extension enables compression and decompression directly from Python code without shell commands.
Create a virtual environment:
python3 -m venv openzl-virtualenv
Activate the environment:
source ./openzl-virtualenv/bin/activate
Virtual environments isolate Python packages, preventing conflicts with system packages.
Navigate to the Python extension directory:
cd py
Install the extension with pip:
pip install .
Pip compiles the C++ extension and installs Python bindings. The process requires Python development headers—install with sudo apt install python3-dev
if errors occur.
Verify the installation:
python3 -c "import openzl; print(openzl.__version__)"
Successful execution prints the OpenZL version number. The Python API documentation provides usage examples and function references. Python integration supports NumPy arrays, PyTorch tensors, and raw byte arrays.
Verifying OpenZL Installation
Comprehensive verification ensures the installation functions correctly. Start with basic command availability:
./zli --help
The output displays available commands including compress, decompress, train, and list-profiles.
List available compression profiles:
./zli list-profiles
Predefined profiles appear in the output, confirming proper installation of compression configurations.
Perform a practical compression test using sample data. Download the sra0 test file:
wget https://github.com/facebook/openzl/releases/download/openzl-sample-artifacts/sra0.zip
Alternative download with curl:
curl -L -O https://github.com/facebook/openzl/releases/download/openzl-sample-artifacts/sra0.zip
Extract the archive:
unzip sra0.zip
Remove the zip file:
rm sra0.zip
Compress the test file using the le-u64 profile:
./zli compress --profile le-u64 sra0 --output sra0.zl
The command displays compression statistics including original size, compressed size, compression ratio, and processing speed. Expected output shows significant size reduction—typically 40-60% compression for numeric data.
Decompress to verify integrity:
./zli decompress sra0.zl --output sra0.decompressed
Compare the original and decompressed files:
cmp sra0 sra0.decompressed
No output indicates identical files. Alternative verification with checksums:
md5sum sra0 sra0.decompressed
Matching checksums confirm successful round-trip compression and decompression.
Basic OpenZL Usage and Commands
OpenZL operates through profile-based compression workflows. Profiles define expected data formats, enabling format-aware optimization. The le-u64 profile targets little-endian 64-bit unsigned integer arrays, common in scientific computing and numerical analysis.
Compress with a predefined profile:
./zli compress --profile le-u64 input_file --output compressed.zl
Decompression requires no profile specification:
./zli decompress compressed.zl --output output_file
The universal decompressor automatically detects the compression algorithm from file metadata.
Training generates specialized compressors optimized for specific datasets:
./zli train --profile le-u64 sample_data --output custom_compressor.zlc
Training analyzes sample data patterns and constructs a compressor configuration file (.zlc). The process utilizes multiple threads automatically—training time scales inversely with CPU core count.
Compress using a trained compressor:
./zli compress input_file --compressor custom_compressor.zlc --output compressed.zl
Trained compressors often achieve 130% or greater improvement in compression ratio compared to generic profiles.
Generate multiple optimal compressors with pareto-frontier training:
./zli train --profile le-u64 sample_data --output compressors/ --pareto-frontier
This command creates a directory containing multiple .zlc files representing different compression/speed tradeoffs. The included benchmark.csv file tabulates compression ratios and speeds for each configuration.
Benchmark comparisons demonstrate OpenZL’s advantages. Generic profiles typically achieve 1.50x compression on numeric data, outperforming gzip -9 (1.19x), zstd -19 (1.33x), and xz -9 (1.39x). Trained compressors push ratios beyond 3.47x on homogeneous datasets.
Training Custom Compression Profiles
Custom training unlocks maximum compression performance for specialized datasets. The training process analyzes sample data structure, identifies recurring patterns, and constructs optimized compression algorithms.
Basic training command:
./zli train --profile le-u64 sample_file --output trained_compressor.zlc
Select representative sample data reflecting production datasets. Training quality depends directly on sample representativeness—diverse samples capture broader patterns.
Control training duration with time limits:
./zli train --profile le-u64 sample_file --output trained_compressor.zlc --max-time-secs 300
This flag limits training to 300 seconds (5 minutes). Longer training generally produces better compressors, though improvements diminish beyond certain durations.
Training exhibits non-deterministic behavior—multiple runs produce different compressors with varying performance characteristics. Run several training sessions and benchmark results to identify optimal configurations.
Pareto-frontier training generates multiple compressors spanning the performance spectrum:
./zli train --profile le-u64 sample_file --output compressor_directory/ --pareto-frontier
The output directory contains multiple .zlc files plus benchmark.csv documenting performance metrics. Review benchmark.csv to select appropriate compressors for different use cases—maximum compression for archival storage, balanced compression for general use, or fast compression for real-time applications.
Compression ratio improvements vary by data type. Highly structured, homogeneous data yields the most dramatic gains. Scientific datasets, sensor readings, genomic sequences, and machine learning model parameters represent ideal candidates. Mixed-format or random data shows minimal improvement over generic compression.
Troubleshooting Common Installation Issues
Missing dependencies generate compilation errors. Install missing packages:
sudo apt install build-essential gcc g++ make cmake git -y
Verify dependency availability before attempting compilation.
Compiler version incompatibility manifests as cryptic build errors. Ubuntu 24.04 ships with gcc 13, which works seamlessly with OpenZL. Older Ubuntu versions may require compiler updates:
sudo apt install gcc-11 g++-11
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-11 100
Git clone failures occur due to network issues or rate limiting. Retry with verbose output:
git clone --depth 1 -b release https://github.com/facebook/openzl.git --verbose
Check network connectivity and GitHub accessibility. Use HTTP URLs instead of git:// if firewall restrictions exist.
CMake configuration errors indicate missing libraries or incorrect paths. Review CMake output for specific missing dependencies. Clean build directories before retrying:
rm -rf build-cmake
mkdir build-cmake
cd build-cmake
cmake .. -DCMAKE_BUILD_TYPE=Release
Build failures mid-compilation suggest insufficient memory or disk space. Reduce parallel jobs:
make -j2
This limits compilation to 2 parallel processes, reducing memory pressure.
Symbolic link errors occur when linking from incorrect locations. Remove broken links:
rm zli
Recreate with correct path:
ln -sf cmakebuild/cli/zli .
Verify the symbolic link points to the actual binary location.
Runtime errors when executing zli often stem from missing shared libraries. Check library dependencies:
ldd ./zli
Any libraries marked “not found” require installation. Common missing libraries include libstdc++.so.6 and libgcc_s.so.1.
Consult GitHub Issues for community support on persistent problems. Search existing issues before creating new reports. Include system information, error messages, and compilation output when requesting assistance.
Performance Benchmarking and Testing
Benchmark OpenZL against standard compression tools to quantify improvements. Create a test dataset representative of production data. Compress with multiple tools:
# OpenZL with generic profile
./zli compress --profile le-u64 testdata --output testdata.zl
# gzip maximum compression
gzip -9 -k testdata
# zstd maximum compression
zstd -19 testdata -o testdata.zst
# xz maximum compression
xz -9 -k testdata
Compare compressed file sizes:
ls -lh testdata*
Calculate compression ratios by dividing original size by compressed size. OpenZL typically achieves 1.50x compression with generic profiles versus 1.19-1.39x for traditional tools.
Train a custom compressor and retest:
./zli train --profile le-u64 testdata --output trained.zlc
./zli compress testdata --compressor trained.zlc --output testdata_trained.zl
Trained compressors frequently achieve 3.47x or better compression on homogeneous datasets.
Measure compression and decompression speeds using time:
time ./zli compress --profile le-u64 testdata --output testdata.zl
time ./zli decompress testdata.zl --output testdata.decompressed
OpenZL displays throughput in MB/s during operation. Record these metrics for comparison.
Pareto-frontier training produces benchmark.csv containing comprehensive performance data. The CSV includes columns for compressor ID, compression ratio, compression speed, and decompression speed. Plot this data to visualize tradeoffs between compression efficiency and processing speed.
Factors affecting performance include CPU core count, memory bandwidth, data characteristics, and dataset size. Multi-core systems benefit from automatic parallelization. SSDs provide faster data access during compression. Highly structured data compresses better than random data.
Best Practices and Recommendations
Deploy OpenZL from the release branch for production systems. The release branch receives stability fixes while avoiding experimental features. Development branches may contain untested code unsuitable for critical applications.
Train custom profiles for homogeneous datasets to maximize compression benefits. Single-file compression with generic profiles shows modest improvements. Directory-level or archive-level compression of similar files demonstrates dramatic gains. Machine learning model repositories, scientific datasets, and log archives represent ideal use cases.
Choose between predefined profiles and custom training based on dataset characteristics. Predefined profiles work adequately for quick compression without sample data availability. Custom training requires representative samples but delivers superior results.
Keep training samples aligned with production data. Dataset drift—gradual changes in data characteristics—degrades compressor effectiveness. Periodically retrain compressors using recent data samples to maintain optimal performance.
Use pareto-frontier training for versatile compressor collections. Different scenarios require different tradeoffs. Archive storage prioritizes maximum compression regardless of speed. Real-time processing demands fast compression even at the cost of ratio. Pareto-frontier training generates options covering this spectrum.
Back up critical data before implementing compression workflows. Verify decompression integrity thoroughly in test environments before production deployment. Maintain uncompressed copies during initial rollout phases.
System-wide installations benefit multi-user environments. Use CMAKE_INSTALL_PREFIX=/usr/local for standard system locations. Update PATH variables if installing to custom prefixes.
Monitor decompression compatibility across OpenZL versions. The framework maintains backward compatibility—newer versions decompress files from older versions. Forward compatibility requires matching decompressor versions with compressor versions used for file creation.
Document compression configurations for reproducibility. Record profile names, training parameters, and sample data sources. This documentation enables consistent recompression if data migration or format changes occur.
Congratulations! You have successfully installed OpenZL. Thanks for using this tutorial for installing the OpenZL open-source compression framework on your Ubuntu 24.04 LTS system. For additional help or useful information, we recommend you check the official OpenZL website.