12 KiB
registry-backup
Command line utilities for backup, export, and migration of a Rust private crate registry.
Use cases:
- Backup: retrieve a registry server's files for backup storage
- Export: pull the files so you can host them at another registry server
- Migration: publish downloaded .crate files to a new private registry, including modifying the
Cargo.toml
manifests of each published crate version to make it compatible with the destination registry
Tools
There are two binaries in the repo:
registry-backup
: for downloading all .crate files hosted by a Cargo registry serverpublish
: for publishing the .crate files downloaded byregistry-backup
to a different registry
registry-backup
registry-backup
is a tool to download all of the .crate files hosted by a Cargo registry server.
Example Usage
Specify the registry index either as a local path (--index-path
)...
$ git clone https://github.com/rust-lang/crates.io-index.git
$ RUST_LOG=info registry-backup \
--index-path crates.io-index \
--output-path crates.io-crate-files \
--requests-per-second 10
...or as an --index-url
instead:
$ RUST_LOG=info registry-backup \
--index-url ssh://git@ssh.shipyard.rs/shipyard-rs/crate-index.git \
--output-path shipyard-rs-crate-files \
--auth-token ${AUTH_TOKEN} # for private registry, need auth
Install
$ cargo install registry-backup --git https://git.shipyard.rs/jstrong/registry-backup.git
Runtime Options
$ ./target/release/registry-backup --help
{{ cli_menu }}
Configuration File
A toml configuration file may be used instead of command line flags. A sample file (config.toml.sample
) is included. From the example file:
{{ config_sample }}
Build From Source
$ git clone https://git.shipyard.rs/jstrong/registry-backup.git
$ cd registry-backup
$ just release-build # alternatively, cargo build --bin registry-backup --release
# ./target/release/registry-backup --help
# cp target/release/registry-backup ~/.cargo/bin/
publish
publish
is a tool to publish all of the crate versions from a source registry to second destination registry.
Usage Overview
publish
is different from registry-backup
in that in requires several steps, including the use of a Python script.
In general, migrating all of the crate versions to another registry is relatively complex, compared to just downloading the .crate files. Migrating to a new registry involves the following (big picture) steps:
- extracting the order that crate versions were published to the source registry from the git history of the crate index repository
- extracting the source files, including
Cargo.toml
manifests, from the downloaded.crate
files - modifying the
Cargo.toml
manifests for each crate version so the crate will be compatible with the destination registry - publishing the crate versions, in the right order and using the modified
Cargo.toml
manifests, to the destination registry
Background Context: cargo publish
, .crate
Files, and Cargo.toml.orig
When you run the cargo publish
command to publish a crate version to a registry server, it generates an alternate Cargo.toml
manifest based on the contents of the original Cargo.toml
in combination with the configured settings with which the command was invoked.
For example, if you had configured a private registry in ~/.cargo/config.toml
:
# ~/.cargo/config.toml
[registries.my-private-registry]
index = "ssh://git@ssh.shipyard.rs/my-private-registry/crate-index.git"
And then added a dependency from that registry in a Cargo.toml
for a crate:
# Cargo.toml
[package]
name = "foo"
publish = ["my-private-registry"]
[dependencies]
bar = { version = "1.0", registry = "my-private-registry" }
...cargo publish
would convert the dependency into one with a hard-coded registry-index
field that points to the specific index URL that was configured at the time it was invoked:
# cargo publish-generated Cargo.toml
[package]
name = "foo"
publish = ["my-private-registry"]
[dependencies]
bar = { version = "1.0", registry-index = "ssh://git@ssh.shipyard.rs/my-private-registry/crate-index.git" }
cargo publish
includes the original Cargo.toml
file at the path Cargo.toml.orig
in the .crate
file (actually a .tar.gz
archive).
Since the registry-index
entries generated by cargo publish
point to the specific URL of the source registry, just publishing the .crate
file as is to the destination registry will not suffice. To resolve this problem, publish
uses the Cargo.toml.orig
file contained in the .crate
file, modifies the dependency entries according to the settings of the destination registry, and publishes them to the destination registry using cargo publish
(i.e. discard the cargo publish
-generated Cargo.toml
, relying instead on the modified Cargo.toml.orig
in combination with runtime settings provided as env vars to cargo
).
The Global Dependency Graph of a Registry and publish-log.csv
Once we have solved how to take a .crate
file from the source registry and publish it to the destination registry, there is still the issue of which order the crate versions should be published. If crate a
version 1.2.3 depends on crate b
version 2.3.4, then crate b
version 2.3.4 needs to have already been published to the registry at the time crate a
version 1.2.3 is published, otherwise it will depend on a crate that does not (yet) exist (in the destination registry, at least). If you try to publish crates without respecting this global dependency graph using cargo publish
, it will exit with an error, and it's not a good idea otherwise, either.
Building a dependency graph for the entire registry is certainly possible, theoretically. However, in practice it is tedious to do, mainly because it requires mirroring cargo
's dependency resolution process, just to be able to identify the full set of dependencies that would end up in the Cargo.lock
file. That, in turn, requires using cargo
(i.e. via the cargo metadata
command), which is slow for large registries (only a single cargo metadata
command can run at a time due to the use of lock files), and quite involved in terms of parsing the programmatically-generated outputs (wow it is amazing how many different forms crate metadata is represented in various cargo
/registry contexts!).
To shortcut these complexities, publish
relies on the use of a Python script to extract the order in which crate versions were published to a registry using the git history of the crate index repository.
The tool (script/get-publish-history.py
) was based on an open source script that utilizes the GitPython
library to traverse the commit history of a repo. In a few minutes work, we were able to modify the script to extract the publish order of all the crate versions appearing in the crate index repository. And, as much as we love Rust (and do not share the same passion for Python), porting the code to Rust using the git2
crate appeared like quite a tedious project itself.
To generate a .csv
file with the order in which crates were published, first clone the crate index repository, e.g.:
$ git clone ssh://git@ssh.shipyard.rs/my-private-registry/crate-index.git
Then run the script (it has two dependencies GitPython
and pandas
, both of which can be pip install
ed or otherwise acquired using whatever terrible Python package manager you want):
$ python script/get-publish-history.py path/to/crate-index > publish-log.csv
You will need a publish-log.csv
generated from the source registry to use publish
.
(You might be wondering why we are relying on git history to reconstruct the publishing order. The primary reason is the crate index metadata (or any other metadata universally available from a crate registry) does not include any information about when each crate version was published.)
Detailed Usage Example
1) Clone the source registry crate index repository:
$ mkdir source-registry
$ git clone <source registry crate index repo url> source-registry/crate-index
2) Use registry-backup
to download all the .crate
files from the source registry:
$ cargo install registry-backup --git https://git.shipyard.rs/jstrong/registry-backup.git # or build from source
$ RUST_LOG=info registry-backup \
--index-path source-registry/crate-index \
--output-path source-registry/crate-files
3) Use the get-publish-history.py
script to extract the crate version publish history:
$ . ../virtualenvs/my-env/activate # or whatever you use
$ pip install GitPython
$ pip install pandas
$ python3 script/get-publish-history.py source-registry/crate-index > source-registry/publish-log.csv
4) Create a configuration file:
# publish-config.toml
# source registry config
[src]
index-dir = "source-registry/crate-index" # <- see step 1
crate-files-dir = "source-registry/crate-files" # <- see step 2
publish-history-csv = "source-registry/publish-log.csv" # <- see step 3
registry-name = "my-old-registry" # <- whatever label the source registry was given in Cargo.toml files
index-url = "https://github.com/my-org/crate-index.git" # <- index url, i.e. same as one provided in ~/.cargo/config.toml
# destination registry config
[dst]
index-url = "ssh://git@ssh.shipyard.rs/my-new-registry/crate-index.git"
registry-name = "my-new-registry" # can be same as old name or a different name
auth-token = "xxx" # auth token for publishing to the destination registry
5) Build publish
:
$ cargo bulid --bin publish --features publish --release
6) Validate your config file (optional):
$ ./target/release/publish --config publish-config.toml --validate
7) Publish to the destination registry using publish
:
$ RUST_LOG=info ./target/release/publish --config publish-config.toml
Expected Runtime
As an example, using publish
, it took us about 50 minutes to migrate a registry with 77 crates and 937 versions. Results may vary based on the machine used to run publish
as well as the performance of the destination registry server.
Building publish
(Full Example)
$ git clone https://git.shipyard.rs/jstrong/registry-backup.git
$ cd registry-backup
$ just release-build-publish # alternately, cargo build --bin publish --features publish --release
Note: --release
really is quite a bit faster, at least for larger registries.
Configuration File
Annotated example configuration file:
{{ publish_config_sample }}
Runtime Options
$ ./target/release/publish --help
{{ publish_cli_menu }}
Configuration File
A toml configuration file may be used instead of command line flags. A sample file (config.toml.sample
) is included. From the example file:
{{ config_sample }}
Running Tests
$ just test # alternatively, cargo test
Justfile
The repository includes a justfile
with functionality for building, testing, etc.
Included commands:
$ just --list
{{ just_commands }}
The commands that mirror cargo commands (e.g. just test
) are included for the purpose of convenience, so that various options (e.g. RUSTFLAGS='-C target-cpu=native
) can be included without typing them out each time.
Generating README.md
This file is generated using a template (doc/README.tera.md
) rendered using updated outputs of the CLI menu, config sample, and other values.
This version of README.md
was generated at {{ generation_time }}
based on git commit {{ git_commit }}
.
To (re-)generate the README.md
file, use the justfile command:
$ just generate-readme