Skip to content

[WIP] feat: unstable SHA256 support#1206

Draft
weihanglo wants to merge 10 commits intorust-lang:masterfrom
weihanglo:sha256-git2
Draft

[WIP] feat: unstable SHA256 support#1206
weihanglo wants to merge 10 commits intorust-lang:masterfrom
weihanglo:sha256-git2

Conversation

@weihanglo
Copy link
Copy Markdown
Member

@weihanglo weihanglo commented Jan 12, 2026

This adds an unstable-sha256 Cargo feature, as a follow-up of #1201.
Also adds some smoke tests for affected operations/types.

Part of #1090

Insta-stable

  • NEW Index::with_object_format to create with different format
  • NEW Oid::object_format to access underlying oid type

Behind unstable-sha256

  • NEW ObjectFormat::Sha256 enum variant
  • NEW RepositoryInitOptions::object_format() method to set hash algo
  • NEW Remote::object_format method to get hash algo on a remote
  • NEW Oid::raw_bytes to return the full underlying bytes
  • CHANGED Diff::from_buffer to accept an extra object format argument
  • CHANGED Index::open to accept an extra object format argument
  • CHANGED Indexer::new to accept an extra object format argument
  • CHANGED Oid::from_str to accept an extra object format argument
  • CHANGED Oid::hash_{object,file} to accept an extra object format argument
  • CHANGED Oid::as_bytes to return the logic bytes (SHA1 -> 20 bytes; SHA256 -> 32 bytes)
  • CHANGED impl Hash for Oid to hash git_oid->kind
  • REMOVED Index::new to avoid misuse.
  • REMOVED impl std::FromStr for Oid to avoid misuse
  • REMOVED impl std::FromStr for Oid to avoid misuse

libgit2 1.9 compatibility changes

This PR also includes fixes for libgit2 1.9 breaking changes:

  • Renamed build macros (GIT_SHA1_COLLISIONDETECT -> GIT_SHA1_BUILTIN, HTTPS backend macros)
  • Added refdb_type field to git_repository_init_options
  • Updating sha256 bindings accordingly

See libgit2/libgit2#6994 and libgit2/libgit2#7102 for upstream changes.

These changes can be split out if needed.

Comment thread src/oid.rs Outdated
Comment on lines 183 to 189
/// View this OID as a byte-slice.
///
/// * 20 bytes in length if the feature `unstable-sha256` is not enabled.
/// * 32 bytes in length if the feature `unstable-sha256` is enabled.
pub fn as_bytes(&self) -> &[u8] {
&self.raw.id
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should return 20 bytes if unstable-sha256 is enabled but the object format is SHA-1.

One reason for this is that crate features should be additive; turning on the feature shouldn't unnecessarily affect how the crate behaves with already-supported repositories.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for calling this out. That's a fair point, but I'm not sure "feature additivity" is the core issue here.

I think the real question is what Oid::as_bytes() is supposed to mean in this crate: the logical object ID bytes, or the raw backing storage from libgit2.

I'm leaning toward the logical interpretation. That seems like the less surprising API for callers, as you mentioned.

The reason I wasn’t fully convinced yet is that this type has historically been a fairly thin wrapper over libgit2, so I wanted to check whether anyone might rely on the raw representation or the backing buffer. Or maybe we should just mark this with a big ‼️ in the changelog, so we can transit to your proposal. Do you know any use of this functions in the wild?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the option to have Oid::as_bytes() give you the logical representation and introduce a new Oid::raw_bytes() for the (probably few) users that actually want the entire buffer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the as_bytes part and also added the raw_bytes method. Thanks for the feedback, folks!

* GIT_SHA1_COLLISIONDETECT -> GIT_SHA1_BUILTIN
* GIT_WINHTTP -> GIT_HTTPS_WINHTTP
* GIT_SECURE_TRANSPORT -> GIT_HTTPS_SECURETRANSPORT
* GIT_OPENSSL -> GIT_HTTPS_OPENSSL

See <libgit2/libgit2#6994>
* Add `GIT_REPOSITORY_INIT_RELATIVE_GITLINK`
  at libgit2/libgit2@bc737620d
* Remove `GIT_REPOSITORY_INIT_NO_DOTGIT_DIR`
  at libgit2/libgit2@ca2a241e4
* Remove `GIT_OBJECT_OFS_DELTA`
* Remove `GIT_OBJECT_REF_DELTA`

See libgit2/libgit2@23da3a8f3
libgit2 simplified SHA256 API to use `_ext` suffixes instead of
changing base function signatures. This avoids breaking changes
for users who don't need SHA256 support.

Updated bindings:

- git_oid_from_raw
- git_oid_from_prefix
- git_oid_from_string
- git_diff_from_buffer_ext
- git_index_new_ext
- git_index_open_ext
- git_odb_new_ext
- git_odb_open_ext
- git_repository_new_ext

See libgit2/libgit2@56e2a85643f (libgit2/libgit2#6975)
This adds an `unstable-sha256` Cargo feature,
as a follow-up of rust-lang#1201

Also adds some smoke tests for affected operations/types.

## Insta-stable

- **NEW** `Index::with_object_format` to create with different format
- **NEW** `Oid::object_format` to access underlying oid type

## Behind `unstable-sha256`

- **NEW** `ObjectFormat::Sha256` enum variant
- **NEW** `RepositoryInitOptions::object_format()` method to set hash algo
- **NEW** `Remote::object_format` method to get hash algo on a remote
- **NEW** `Oid::raw_bytes` to return the full underlying bytes
- **CHANGED** `Diff::from_buffer` to accept an extra object format argument
- **CHANGED** `Index::open` to accept an extra object format argument
- **CHANGED** `Indexer::new` to accept an extra object format argument
- **CHANGED** `Oid::from_str` to accept an extra object format argument
- **CHANGED** `Oid::hash_{object,file}` to accept an extra object format argument
- **CHANGED** `Oid::as_bytes` to return the logic bytes (SHA1 -> 20 bytes; SHA256 -> 32 bytes)
- **CHANGED** `impl Hash for Oid` to hash `git_oid->kind`
- **REMOVED** `Index::new` to avoid misuse.
- **REMOVED** `impl std::FromStr for Oid` to avoid misuse
This also run systest first,
so we can dicover bindding mismatch before
@lorenzleutgeb
Copy link
Copy Markdown

lorenzleutgeb commented Apr 25, 2026

I believe to have found something that you missed in this PR. Note that git2::util::zeroed_raw_oid does not take any arguments:

git2-rs/src/util.rs

Lines 271 to 275 in e6919b7

/// Creates a zeroed git_oid structure.
#[inline]
pub(crate) fn zeroed_raw_oid() -> raw::git_oid {
unsafe { std::mem::zeroed() }
}

I believe that this means that the kind: c_uchar (where c_uchar is defined as u8) member of the returned libgit2_sys::git_oid will have the value 0u8.

The function git2::util::zeroed_raw_oid is called by git2::Oid::zero, which also takes no arguments. If the function git2::Oid::object_format is called on the result of git2::Oid::zero, the call

unsafe { Binding::from_raw(self.raw.kind as raw::git_oid_t) }

will hit the panic! at

_ => panic!("Unknown git oid type"),

This is because 0u8 as u32 (which we got from std::mem::zeroed) is neither GIT_OID_SHA1 = 1, nor GIT_OID_SHA256 = 2.

I discovered this experimenting in code that is downstream of git2, building against this PR. In one of my tests I did essentially git2::Oid::zero().object_format().

Then I started to think about how much unsafe really is necessary here. Maybe zeroed_raw_oid could look like

#[inline]
pub(crate) fn zeroed_raw_oid(format: ObjectFormat) -> raw::git_oid {
  raw::git_oid {
    kind: format.raw(),
    id: [0; raw::GIT_OID_MAX_SIZE],
  }
}

In fact one might even ask why not to have

impl Oid {
  pub const ZERO_SHA1: Self = Self {
    raw: raw::git_oid {
      kind: raw::GIT_OID_SHA1,
      id: [0; raw::GIT_OID_MAX_SIZE],
    }
  };
  
  pub const ZERO_SHA256: Self = Self {
    raw: raw::git_oid {
      kind: raw::GIT_OID_SHA256,
      id: [0; raw::GIT_OID_MAX_SIZE],
    }
  };
  
  // NOTE: Maybe call this `zero_with_object_format` to not break `zero`.
  pub fn zero(format: ObjectFormat) -> Self {
    // TODO: Match on `format` and return `Oid::ZERO_SHA1` or `Oid::ZERO_SHA256`.
  }
}

This way, users that know at compile time which kind of zero they want can just use the constant, while those that decide at run time would call zero. And there is no unsafe involved.

If you do not want git2::Oid::zero to take an argument for backwards compatibility, then you could have it return a SHA-1 object identifier always, and mark it deprecated in favor of these constants and a new zero_with_object_format (if I have not missed a reason why the constants cannot be used).

For oid.rs I also had the gut feeling that some unsafe could be removed, but I do not have any constructive comments here, sorry.

@lorenzleutgeb
Copy link
Copy Markdown

lorenzleutgeb commented Apr 25, 2026

Another comment regarding FromStr: I understand you opted to remove it to "avoid misuse", but I would argue that having FromStr actually is quite convenient. Instead of removing it, you could consider implementing FromStr in a way that it checks the length of the argument and then calls into the more specialized function that takes a string and an object format. The caller can then still inspect the object format of the result. You leave all options open for your users, and keep backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants