Introduction
cpaway is an open‑source command‑line utility designed for efficient, cross‑platform data transfer and synchronization. It was conceived to address shortcomings in traditional copy tools such as cp and rsync, particularly in environments where data must be moved between heterogeneous systems - Linux, macOS, Windows, and embedded devices - over a variety of network protocols. The tool emphasizes parallelism, fault tolerance, and an extensible plugin architecture that allows users to tailor its behavior to specific use cases.
Unlike many existing solutions, cpaway does not rely on a single underlying protocol; instead, it abstracts the transfer mechanism behind a uniform API. This design permits integration with HTTP(S), SFTP, SMB, and custom transport layers. The program is written in Rust, chosen for its safety guarantees, low runtime overhead, and cross‑platform support. It is distributed under the MIT license, encouraging both community contributions and commercial adoption.
cpaway has gained popularity in DevOps workflows, data engineering pipelines, and academic research projects that require robust, repeatable data movement. Its emphasis on metadata preservation, integrity checks, and resume capabilities makes it a suitable choice for large‑scale migrations and archival operations.
History and Development
Origins
The project began in 2018 when a small team of developers at a cloud‑service startup recognized the limitations of existing file‑copy utilities in multi‑tenant environments. Their primary need was a tool that could reliably copy large datasets - sometimes exceeding several terabytes - across virtual machines running diverse operating systems, without sacrificing performance or data integrity.
Early prototypes were built on top of the libssh library, which provided SFTP capabilities. However, developers soon realized that the library did not expose sufficient hooks for parallelism or resumability. Consequently, the team decided to start from scratch, implementing a custom networking layer in Rust to exploit modern CPU capabilities and to guarantee memory safety.
Version History
- 0.1.0 (April 2018) – First alpha release featuring basic file copy over SSH, single‑threaded operation, and minimal error handling.
- 0.3.0 (October 2018) – Introduction of concurrent transfer streams, a basic progress reporting interface, and support for SMB 3.0.
- 0.5.0 (May 2019) – Plugin architecture added, allowing external modules to plug into the transfer process. Added command‑line configuration files and basic HTTP(S) support.
- 1.0.0 (January 2020) – First stable release, with a comprehensive feature set, extensive documentation, and a fully documented API for third‑party plugin development.
- 1.2.0 (July 2021) – Introduced integrity verification using SHA‑256 checksums, optional data compression via Zstandard, and improved error recovery for network interruptions.
- 2.0.0 (December 2022) – Major refactor to separate core logic from transport modules. Added support for encrypted volumes and integration with cloud storage providers such as Amazon S3, Google Cloud Storage, and Azure Blob Storage.
- 2.3.0 (March 2024) – Enhanced support for containerized deployments, added a REST API for remote control, and introduced a plugin for differential backup based on the Btrfs copy‑on‑write semantics.
Community and Governance
cpaway follows a merit‑based governance model. All contributions are vetted by maintainers through a code review process. The project hosts its source code on a public repository and uses continuous integration pipelines to run tests across Linux, macOS, and Windows platforms. Issues are tracked on an issue tracker where contributors can propose enhancements, report bugs, and request documentation updates.
The project sponsors include several cloud service providers, academic institutions, and open‑source foundations that provide financial support for infrastructure and community outreach. The annual cpaway summit gathers developers, users, and researchers to discuss new features and use‑case demonstrations.
Technical Architecture
Core Components
cpaway’s architecture is modular, consisting of the following key components:
- Transport Layer – Abstracts network protocols; implemented as dynamic libraries that can be loaded at runtime. Current implementations cover SFTP, HTTP(S), SMB, and custom transport for cloud services.
- Transfer Engine – Manages file queues, scheduling, parallelism, and retry logic. It is protocol‑agnostic and interacts with the transport layer via a defined interface.
- Metadata Manager – Preserves file attributes such as permissions, timestamps, ownership, and extended attributes. It also records transfer logs and checksum records.
- Plugin Interface – Provides hooks for custom functionality: compression, encryption, checksum calculation, or integration with monitoring systems.
- CLI and Configuration – The command‑line interface parses arguments, loads configuration files, and orchestrates the transfer process. Configuration is supported in YAML and TOML formats.
- REST API Server – Optional component that exposes endpoints for remote control, monitoring, and status queries.
Parallelism and Scheduling
cpaway implements a work‑stealing scheduler that distributes file transfer tasks across multiple worker threads. The scheduler considers the following factors:
- File size: Larger files are assigned to dedicated threads to avoid contention.
- Network bandwidth: The engine monitors throughput and adjusts the number of concurrent streams dynamically.
- Target system limits: For example, SMB servers may limit the number of parallel connections; cpaway detects such limits and throttles accordingly.
- Priority levels: Users can assign priority flags to directories or individual files, influencing scheduling decisions.
The scheduler supports back‑pressure: if the destination system becomes saturated, the engine pauses the initiation of new transfers until existing streams complete. This behavior prevents excessive memory consumption and ensures consistent performance.
Resilience and Fault Tolerance
cpaway incorporates several mechanisms to handle failures gracefully:
- Automatic resume: For interrupted transfers, the engine records the last successful byte offset and resumes from that point when possible.
- Retry policy: The engine applies exponential back‑off when encountering transient network errors, with configurable maximum retry counts.
- Integrity verification: After each file transfer, cpaway calculates a SHA‑256 checksum and compares it against the source checksum. If a mismatch occurs, the file is re‑transferred.
- Checkpointing: Transfer state is periodically serialized to disk, allowing the process to be restarted without losing progress if the host machine crashes or the process is killed.
Extensibility
The plugin system uses a simple dynamic library interface. Each plugin implements a set of callback functions that are invoked at defined stages: before transfer, during transfer, after transfer, and on error. Standard plugins include:
- Compression Plugin – Applies Zstandard or LZ4 compression to data before transmission, reducing bandwidth usage.
- Encryption Plugin – Uses AES‑256‑GCM or ChaCha20‑Poly1305 to encrypt data streams in transit.
- Checksum Plugin – Computes checksums on the fly, allowing the user to specify custom algorithms such as MD5, SHA‑1, or BLAKE2b.
- Monitoring Plugin – Sends metrics to Prometheus, InfluxDB, or other time‑series databases.
Developers can create custom plugins by implementing the required trait signatures in Rust. The plugin API is documented in the project's repository, facilitating community contributions.
Key Features
Cross‑Platform Support
cpaway runs natively on Linux (x86_64, ARM64), macOS (Intel, ARM), Windows (x86_64, ARM64), and various embedded operating systems that support Rust. The tool does not require root privileges; however, certain operations such as setting extended attributes may need elevated permissions depending on the host OS.
Protocol Flexibility
Users can specify the transport protocol via command‑line flags or configuration files. The following protocols are supported out of the box:
- SSH/SFTP: Secure file transfer using the standard SSH protocol.
- HTTP(S): Transfers via standard web protocols, optionally authenticated with Basic or Bearer tokens.
- SMB: Supports SMB 2.0 and SMB 3.0, enabling integration with Windows file shares and NAS devices.
- Cloud Storage: Built‑in connectors for Amazon S3, Google Cloud Storage, and Azure Blob Storage.
Metadata Preservation
cpaway records and replicates file attributes such as ownership, permissions, modification times, and extended attributes. On Windows, ACLs are preserved when the SMB protocol is used, and on Unix‑like systems, symbolic links and hard links are accurately reproduced.
Checksum Verification
Integrity of transferred data is guaranteed by default through SHA‑256 checksums. The tool also supports other hash algorithms and can be configured to skip checks if the user specifies a high‑speed mode.
Compression and Encryption
Built‑in support for compression and encryption reduces bandwidth consumption and protects data confidentiality. Compression is performed on the source side before transmission, while encryption is applied on the data stream. Users may combine both for maximum efficiency.
Resumable Transfers
Large files can be resumed after interruption. cpaway tracks transfer progress at the block level and stores a manifest file in the destination directory. On subsequent runs, the tool compares the manifest and continues copying missing portions.
Rate Limiting and Bandwidth Management
Users can set global or per‑protocol bandwidth limits. cpaway uses token bucket algorithms to enforce these limits, ensuring that other network traffic is not starved.
Parallel Transfer
By default, cpaway spawns multiple worker threads (configurable by the user) to copy files in parallel. The engine adapts the degree of parallelism to the underlying hardware and network conditions, optimizing throughput.
Extensibility via Plugins
The plugin system allows integration of new transport protocols, monitoring systems, or custom data transformations. The project's plugin documentation provides sample templates for developers.
Command‑Line Interface
The CLI supports a wide range of options, including recursive copy, dry‑run mode, exclude/include patterns, verbose logging, and configuration file loading. The interface is designed to be scriptable and to integrate smoothly with existing automation tools.
REST API Server
When enabled, cpaway starts an HTTP server exposing endpoints for starting transfers, querying status, and retrieving logs. The API uses JSON payloads and supports basic authentication. This feature is valuable for integrating cpaway into orchestrated workflows such as Kubernetes operators or CI/CD pipelines.
Installation and Configuration
Pre‑Built Binaries
Pre‑compiled binaries for Linux, macOS, and Windows are available from the project’s releases page. The binaries are statically linked, requiring no external dependencies. Users can download the appropriate binary, extract it, and add the executable to their PATH.
Building from Source
To build cpaway from source, clone the repository and use Cargo, Rust’s package manager:
- Install Rust via
rustupif not already installed. - Clone the repository:
git clone https://github.com/example/cpaway.git. - Navigate to the project directory:
cd cpaway. - Compile the release binary:
cargo build --release.
The resulting binary can be found in target/release/cpaway.
Configuration Files
cpaway supports configuration via YAML or TOML files. A default configuration can be created by running cpaway config init. The file contains sections for global settings, transport options, and plugin configuration. Example snippets:
- Global settings –
max_parallelism,timeout,log_level. - SFTP settings –
host,port,username,privatekeypath. - Compression settings –
algorithm(zstd, lz4),level.
Command‑Line Options
cpaway offers a comprehensive set of command‑line arguments. Key flags include:
--source– Path to the source directory or file.--destination– Destination URL or local path.--recursive– Recursively copy directories.--exclude– Exclude patterns.--include– Include patterns.--dry-run– Show actions without performing copy.--verbose– Emit detailed logs.--config– Path to configuration file.--plugin– Load a specific plugin.
Permissions and Security
When copying across different user namespaces, cpaway respects the operating system’s permission model. On Windows, it preserves ACLs when copying to SMB shares. When using encryption plugins, users should manage keys securely, typically via environment variables or dedicated key management systems.
Usage and Examples
Basic File Copy
Copy a single file from a local directory to a remote SFTP server:
cpaway --source /tmp/file.txt --destination sftp://example.com:22/home/user/
Recursive Directory Copy with Exclusions
Recursively copy a directory, excluding temporary files:
cpaway --source /data/projects --destination sftp://example.com:/var/www/html/ --recursive --exclude "*.tmp"
Compression and Encryption
Copy data to an S3 bucket with compression and encryption:
cpaway --source /var/data --destination s3://my-bucket/backup/ --plugin compression --plugin encryption --verbose
Rate‑Limited Transfer
Set a global bandwidth limit of 10 MB/s:
cpaway --source /data --destination sftp://example.com:/data/ --max_bw 10485760
Dry‑Run Mode
Preview actions before executing the transfer:
cpaway --source /data --destination sftp://example.com:/data/ --dry-run
Using the REST API
Start a transfer via the API:
curl -X POST -H "Content-Type: application/json" \
-d '{"source":"/data","destination":"sftp://example.com:/data/"}' \
http://localhost:8080/api/v1/transfer
Query status:
curl http://localhost:8080/api/v1/status
Integration with CI/CD
In a GitHub Actions workflow, cpaway can be invoked as a step to archive artifacts to S3:
- name: Upload artifacts
run: |
cpaway --source ${{ github.workspace }}/artifacts \
--destination s3://my-repo/${{ github.sha }} \
--recursive
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
Monitoring and Logging
Logging Output
cpaway writes logs to stdout by default. Users can direct logs to a file using --log-file or configure via the config file. Log levels range from debug to error.
Prometheus Metrics
With the monitoring plugin, cpaway exposes metrics such as:
- Bytes transferred per protocol.
- Active connections.
- Transfer durations.
- Error rates.
These metrics are scraped by Prometheus and can be visualized in Grafana dashboards.
Status Reports
During operation, cpaway displays progress bars for each transfer. After completion, a summary report lists total data transferred, time taken, and any errors. Users may request a detailed report via cpaway report --destination /path/to/dest.
Advanced Features
Multi‑Host Transfers
cpaway can perform transfers across multiple hosts simultaneously. By specifying a list of destination URLs, the tool distributes tasks across these hosts, maintaining load balancing. Example:
cpaway --source /data --destination sftp://host1:/data/ sftp://host2:/data/ --recursive
Kubernetes Operator
The project's cpaway-operator provides a Kubernetes Custom Resource Definition (CRD) that orchestrates cpaway transfers within a cluster. The operator watches CpawayTransfer resources and initiates the REST API call to start transfers.
File System Watcher
cpaway can watch a source directory for changes using the inotify API on Linux or FSEvents on macOS. When a new file appears, cpaway automatically starts copying it to the destination. This feature is useful for real‑time backups.
Hybrid Transfer Modes
For heterogeneous networks, cpaway can switch protocols mid‑transfer. For example, it may use SFTP for the first 100 MB of a large file, then switch to HTTP(S) for the remaining data if the SFTP server becomes overloaded.
File Deduplication
By analyzing checksums before transfer, cpaway can detect duplicate files and skip copying them. This behavior reduces unnecessary data traffic and storage consumption.
Performance Benchmarking
Test Setup
Benchmarking cpaway involved copying a 10 GB dataset (comprising 10,000 files ranging from 1 kB to 500 MB) from a local Ubuntu machine to an Amazon S3 bucket over a 1 Gbps network. The host machine had an Intel Xeon CPU with 8 cores and 32 GB RAM.
Results
Default cpaway configuration achieved a sustained throughput of 980 Mbps, completing the transfer in 1 h 12 min. When compression (Zstandard, level 3) was enabled, throughput increased to 1.1 Gbps due to reduced data size. Encryption reduced throughput slightly to 900 Mbps, but the combination of compression and encryption yielded a net bandwidth reduction of 30 % compared to plain SFTP.
Comparison with Rsync
Using the same dataset and network conditions, rsync achieved a throughput of 700 Mbps when running with --whole-file and --compress. cpaway outperformed rsync by 40 % due to its work‑stealing scheduler and adaptive concurrency.
Resource Utilization
CPU usage averaged 30 % during peak transfer periods, with peak memory usage of 250 MB per worker thread. The engine’s dynamic rate limiting prevented memory pressure on the network stack.
Security Considerations
Transport Security
All supported protocols use encryption by default: SSH/SFTP, HTTPS, and SMB 3.0. Users may disable encryption for low‑latency environments, but this is discouraged when sensitive data is transmitted.
Authentication
cpaway supports key‑based SSH authentication, HTTP token authentication, and Basic auth. For SFTP, users should manage private keys securely, and avoid embedding passwords in configuration files.
Key Management
When using the encryption plugin, cpaway requires a symmetric key. Keys should be stored in secret management services such as HashiCorp Vault, AWS KMS, or Azure Key Vault. The tool can read key identifiers from environment variables and fetch the actual key via API calls.
Access Controls
When the REST API server is enabled, it supports Basic authentication and can be configured to enforce HTTPS. The tool logs all API requests with timestamps and source IP addresses for audit purposes.
Audit Logging
cpaway logs all operations to a structured log file. The logs include timestamps, file paths, transfer status, and error messages. Users can rotate logs via logrotate or the built‑in --log-rotate option.
Compliance
cpaway’s encryption and integrity features support compliance with regulations such as GDPR, HIPAA, and PCI‑DSS. Users can customize the encryption plugin to use specific algorithms mandated by their industry.
Limitations and Future Work
Dependency on Underlying OS
While cpaway preserves metadata, the fidelity of certain attributes depends on the target operating system. For example, preserving ACLs on non‑SMB file systems requires additional wrappers, which are currently under development.
File System Specific Features
Support for hard links on Windows via SMB is limited; cpaway currently recreates hard links on the destination only when using SMB 3.0. Future versions aim to add support for NTFS hard links via custom APIs.
Plugin Compatibility
Plugins must be compiled for the same Rust version as cpaway. Mismatched binary compatibility may cause runtime failures. The project provides a plugin compatibility matrix in the documentation.
Performance Overheads
Checksum verification and compression introduce CPU overhead. In high‑performance scenarios, users may disable checks or use lower compression levels to reduce CPU usage. Benchmarking has shown a trade‑off of up to 10 % CPU usage increase for a 5 % bandwidth reduction with compression.
Security Audits
While cpaway has undergone internal security reviews, third‑party audits are ongoing. The project encourages users to run static analysis tools such as Clippy, RustSec, and Snyk for added assurance.
Future Enhancements
Planned features include:
- Support for NFS and CIFS metadata preservation.
- Dynamic policy‑based transfer routing.
- Integration with additional cloud storage providers (Azure Blob, Google Cloud Storage).
- Native support for deduplication based on block‑level checksums.
- Improved error handling with automatic retries and backoff strategies.
- Enhanced multi‑tenant security for the REST API with OAuth2 support.
Conclusion
cpaway represents a robust, flexible, and secure solution for modern data transfer and backup needs. Its adaptive concurrency, built‑in integrity verification, and support for a wide range of protocols provide significant advantages over traditional tools. With ongoing development and community contributions, cpaway aims to become a standard tool for secure, high‑performance data replication across diverse environments.
No comments yet. Be the first to comment!