Command line utilities for backup, export, and migration of a Rust private crate registry.
A command line utility for downloading all .crate files hosted by a Cargo registry server.
Use cases:
- **Backup:** retrieve a registry server's files for backup storage
- **Export:** pull the files so you can host them at another registry server
- **Migration:** publish downloaded .crate files to a new private registry, including modifying the `Cargo.toml` manifests of each published crate version to make it compatible with the destination registry
## Tools
There are two binaries in the repo:
- `registry-backup`: for downloading all .crate files hosted by a Cargo registry server
- `publish`: for publishing the .crate files downloaded by `registry-backup` to a different registry
## `registry-backup`
`registry-backup` is a tool to download all of the .crate files hosted by a Cargo registry server.
### Example Usage
## Example Usage:
Specify the registry index either as a local path (`--index-path`)...
`publish` is a tool to publish all of the crate versions from a *source registry* to second *destination registry*.
### Usage Overview
`publish` is different from `registry-backup` in that in requires several steps, including the use of a Python script.
In general, migrating all of the crate versions to another registry is relatively complex, compared to just downloading the .crate files. Migrating to a new registry involves the following (big picture) steps:
1) extracting the order that crate versions were published to the source registry from the git history of the crate index repository
2) extracting the source files, including `Cargo.toml` manifests, from the downloaded `.crate` files
3) modifying the `Cargo.toml` manifests for each crate version so the crate will be compatible with the destination registry
4) publishing the crate versions, in the right order and using the modified `Cargo.toml` manifests, to the destination registry
### Background Context: `cargo publish`, `.crate` Files, and `Cargo.toml.orig`
When you run the `cargo publish` command to publish a crate version to a registry server, it generates an alternate `Cargo.toml` manifest based on the contents of the original `Cargo.toml` in combination with the configured settings with which the command was invoked.
For example, if you had configured a private registry in `~/.cargo/config.toml`:
```toml
# ~/.cargo/config.toml
[registries.my-private-registry]
index = "ssh://git@ssh.shipyard.rs/my-private-registry/crate-index.git"
```
And then added a dependency from that registry in a `Cargo.toml` for a crate:
```toml
# Cargo.toml
[package]
name = "foo"
publish = ["my-private-registry"]
[dependencies]
bar = { version = "1.0", registry = "my-private-registry" }
```
...`cargo publish` would convert the dependency into one with a hard-coded `registry-index` field that points to the specific index URL that was configured at the time it was invoked:
```
# cargo publish-generated Cargo.toml
[package]
name = "foo"
publish = ["my-private-registry"]
[dependencies]
bar = { version = "1.0", registry-index = "ssh://git@ssh.shipyard.rs/my-private-registry/crate-index.git" }
```
`cargo publish` includes the original `Cargo.toml` file at the path `Cargo.toml.orig` in the `.crate` file (actually a `.tar.gz` archive).
Since the `registry-index` entries generated by `cargo publish` point to the specific URL of the source registry, just publishing the `.crate` file as is to the destination registry will not suffice. To resolve this problem, `publish` uses the `Cargo.toml.orig` file contained in the `.crate` file, modifies the dependency entries according to the settings of the destination registry, and publishes them to the destination registry using `cargo publish` (i.e. discard the `cargo publish`-generated `Cargo.toml`, relying instead on the modified `Cargo.toml.orig` in combination with runtime settings provided as env vars to `cargo`).
### The Global Dependency Graph of a Registry and `publish-log.csv`
Once we have solved how to take a `.crate` file from the source registry and publish it to the destination registry, there is still the issue of which order the crate versions should be published. If crate `a` version 1.2.3 depends on crate `b` version 2.3.4, then crate `b` version 2.3.4 needs to have already been published to the registry at the time crate `a` version 1.2.3 is published, otherwise it will depend on a crate that does not (yet) exist (in the destination registry, at least). If you try to publish crates without respecting this global dependency graph using `cargo publish`, it will exit with an error, and it's not a good idea otherwise, either.
Building a dependency graph for the entire registry is certainly possible, theoretically. However, in practice it is tedious to do, mainly because it requires mirroring `cargo`'s dependency resolution process, just to be able to identify the full set of dependencies that would end up in the `Cargo.lock` file. That, in turn, requires using `cargo` (i.e. via the `cargo metadata` command), which is slow for large registries (only a single `cargo metadata` command can run at a time due to the use of lock files), and quite involved in terms of parsing the programmatically-generated outputs (wow it is amazing how many different forms crate metadata is represented in various `cargo`/registry contexts!).
To shortcut these complexities, `publish` relies on the use of a Python script to extract the order in which crate versions were published to a registry using the git history of the crate index repository.
The tool (`script/get-publish-history.py`) was based on an open source script that utilizes the `GitPython` library to traverse the commit history of a repo. In a few minutes work, we were able to modify the script to extract the publish order of all the crate versions appearing in the crate index repository. And, as much as we love Rust (and do not share the same passion for Python), porting the code to Rust using the `git2` crate appeared like quite a tedious project itself.
To generate a `.csv` file with the order in which crates were published, first clone the crate index repository, e.g.:
Then run the script (it has two dependencies `GitPython` and `pandas`, both of which can be `pip install`ed or otherwise acquired using whatever terrible Python package manager you want):
You will need a `publish-log.csv` generated from the source registry to use `publish`.
(You might be wondering why we are relying on git history to reconstruct the publishing order. The primary reason is the crate index metadata (or any other metadata universally available from a crate registry) does not include any information about when each crate version was published.)
### Detailed Usage Example
##### 1) Clone the source registry crate index repository:
As an example, using `publish`, it took us about 50 minutes to migrate a registry with 77 crates and 937 versions. Results may vary based on the machine used to run `publish` as well as the performance of the destination registry server.
Command line utilities for backup, export, and migration of a Rust private crate registry.
A command line utility for downloading all .crate files hosted by a Cargo registry server.
Use cases:
- **Backup:** retrieve a registry server's files for backup storage
- **Export:** pull the files so you can host them at another registry server
- **Migration:** publish downloaded .crate files to a new private registry, including modifying the `Cargo.toml` manifests of each published crate version to make it compatible with the destination registry
## Tools
There are two binaries in the repo:
- `registry-backup`: for downloading all .crate files hosted by a Cargo registry server
- `publish`: for publishing the .crate files downloaded by `registry-backup` to a different registry
## `registry-backup`
`registry-backup` is a tool to download all of the .crate files hosted by a Cargo registry server.
### Example Usage
## Example Usage:
Specify the registry index either as a local path (`--index-path`)...
`publish` is a tool to publish all of the crate versions from a *source registry* to second *destination registry*.
### Usage Overview
`publish` is different from `registry-backup` in that in requires several steps, including the use of a Python script.
In general, migrating all of the crate versions to another registry is relatively complex, compared to just downloading the .crate files. Migrating to a new registry involves the following (big picture) steps:
1) extracting the order that crate versions were published to the source registry from the git history of the crate index repository
2) extracting the source files, including `Cargo.toml` manifests, from the downloaded `.crate` files
3) modifying the `Cargo.toml` manifests for each crate version so the crate will be compatible with the destination registry
4) publishing the crate versions, in the right order and using the modified `Cargo.toml` manifests, to the destination registry
### Background Context: `cargo publish`, `.crate` Files, and `Cargo.toml.orig`
When you run the `cargo publish` command to publish a crate version to a registry server, it generates an alternate `Cargo.toml` manifest based on the contents of the original `Cargo.toml` in combination with the configured settings with which the command was invoked.
For example, if you had configured a private registry in `~/.cargo/config.toml`:
```toml
# ~/.cargo/config.toml
[registries.my-private-registry]
index = "ssh://git@ssh.shipyard.rs/my-private-registry/crate-index.git"
```
And then added a dependency from that registry in a `Cargo.toml` for a crate:
```toml
# Cargo.toml
[package]
name = "foo"
publish = ["my-private-registry"]
[dependencies]
bar = { version = "1.0", registry = "my-private-registry" }
```
...`cargo publish` would convert the dependency into one with a hard-coded `registry-index` field that points to the specific index URL that was configured at the time it was invoked:
```
# cargo publish-generated Cargo.toml
[package]
name = "foo"
publish = ["my-private-registry"]
[dependencies]
bar = { version = "1.0", registry-index = "ssh://git@ssh.shipyard.rs/my-private-registry/crate-index.git" }
```
`cargo publish` includes the original `Cargo.toml` file at the path `Cargo.toml.orig` in the `.crate` file (actually a `.tar.gz` archive).
Since the `registry-index` entries generated by `cargo publish` point to the specific URL of the source registry, just publishing the `.crate` file as is to the destination registry will not suffice. To resolve this problem, `publish` uses the `Cargo.toml.orig` file contained in the `.crate` file, modifies the dependency entries according to the settings of the destination registry, and publishes them to the destination registry using `cargo publish` (i.e. discard the `cargo publish`-generated `Cargo.toml`, relying instead on the modified `Cargo.toml.orig` in combination with runtime settings provided as env vars to `cargo`).
### The Global Dependency Graph of a Registry and `publish-log.csv`
Once we have solved how to take a `.crate` file from the source registry and publish it to the destination registry, there is still the issue of which order the crate versions should be published. If crate `a` version 1.2.3 depends on crate `b` version 2.3.4, then crate `b` version 2.3.4 needs to have already been published to the registry at the time crate `a` version 1.2.3 is published, otherwise it will depend on a crate that does not (yet) exist (in the destination registry, at least). If you try to publish crates without respecting this global dependency graph using `cargo publish`, it will exit with an error, and it's not a good idea otherwise, either.
Building a dependency graph for the entire registry is certainly possible, theoretically. However, in practice it is tedious to do, mainly because it requires mirroring `cargo`'s dependency resolution process, just to be able to identify the full set of dependencies that would end up in the `Cargo.lock` file. That, in turn, requires using `cargo` (i.e. via the `cargo metadata` command), which is slow for large registries (only a single `cargo metadata` command can run at a time due to the use of lock files), and quite involved in terms of parsing the programmatically-generated outputs (wow it is amazing how many different forms crate metadata is represented in various `cargo`/registry contexts!).
To shortcut these complexities, `publish` relies on the use of a Python script to extract the order in which crate versions were published to a registry using the git history of the crate index repository.
The tool (`script/get-publish-history.py`) was based on an open source script that utilizes the `GitPython` library to traverse the commit history of a repo. In a few minutes work, we were able to modify the script to extract the publish order of all the crate versions appearing in the crate index repository. And, as much as we love Rust (and do not share the same passion for Python), porting the code to Rust using the `git2` crate appeared like quite a tedious project itself.
To generate a `.csv` file with the order in which crates were published, first clone the crate index repository, e.g.:
Then run the script (it has two dependencies `GitPython` and `pandas`, both of which can be `pip install`ed or otherwise acquired using whatever terrible Python package manager you want):
You will need a `publish-log.csv` generated from the source registry to use `publish`.
(You might be wondering why we are relying on git history to reconstruct the publishing order. The primary reason is the crate index metadata (or any other metadata universally available from a crate registry) does not include any information about when each crate version was published.)
### Detailed Usage Example
##### 1) Clone the source registry crate index repository:
As an example, using `publish`, it took us about 50 minutes to migrate a registry with 77 crates and 937 versions. Results may vary based on the machine used to run `publish` as well as the performance of the destination registry server.