Skip to content

deps(deps): bump datasets from 4.4.2 to 4.7.0#47

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/datasets-4.7.0
Closed

deps(deps): bump datasets from 4.4.2 to 4.7.0#47
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/datasets-4.7.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot bot commented on behalf of github Mar 16, 2026

Bumps datasets from 4.4.2 to 4.7.0.

Release notes

Sourced from datasets's releases.

4.7.0

Datasets Features

  • Add Json() type by @​lhoestq in huggingface/datasets#8027
    • JSON Lines files that contain arbitrary JSON objects like tool calling datasets are now supported. When there is a field or subfield containing mixed types (e.g. mix of str/int/float/dict/list or dictionaries with arbitrary keys), the Json()type is used to store such data that would normally not be supported in Arrow/Parquet
    • Use the Json() type in Features() for any dataset, it is supported in any functions that accepts features=like load_dataset(), .map(), .cast(), .from_dict(), .from_list()
    • Use on_mixed_types="use_json" to automatically set the Json() type on mixed types in .from_dict(), .from_list() and .map()

Examples:

You can use on_mixed_types="use_json" or specify features= with a [Json] type:

>>> ds = Dataset.from_dict({"a": [0, "foo", {"subfield": "bar"}]})
Traceback (most recent call last):
  ...
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert 'foo' with type str: tried to convert to int64
>>> features = Features({"a": Json()})
>>> ds = Dataset.from_dict({"a": [0, "foo", {"subfield": "bar"}]}, features=features)
>>> ds.features
{'a': Json()}
>>> list(ds["a"])
[0, "foo", {"subfield": "bar"}]

This is also useful for lists of dictionaries with arbitrary keys and values, to avoid filling missing fields with None:

>>> ds = Dataset.from_dict({"a": [[{"b": 0}, {"c": 0}]]})
>>> ds.features
{'a': List({'b': Value('int64'), 'c': Value('int64')})}
>>> list(ds["a"])
[[{'b': 0, 'c': None}, {'b': None, 'c': 0}]]  # missing fields are filled with None
>>> features = Features({"a": List(Json())})
>>> ds = Dataset.from_dict({"a": [[{"b": 0}, {"c": 0}]]}, features=features)
>>> ds.features
{'a': List(Json())}
>>> list(ds["a"])
[[{'b': 0}, {'c': 0}]]  # OK

Another example with tool calling data and the on_mixed_types="use_json" argument (useful to not have to specify features= manually):

>>> messages = [
...     {"role": "user", "content": "Turn on the living room lights and play my electronic music playlist."},
...     {"role": "assistant", "tool_calls": [
...         {"type": "function", "function": {
</tr></table> 

... (truncated)

Commits
  • ac9c452 release: 4.7.0 (#8058)
  • bd4fb05 Limit dataset listing to first 20 entries in readme (#8057)
  • 4de29bf Fix unstable tokenizer fingerprinting (enables map cache reuse) (#7982)
  • fdd8a65 fix: handle nested null types in feature alignment for multi-proc map (#8047)
  • 0751557 fix(iterable_dataset): preserve features when chaining filter() on typed Iter...
  • 1bd0a5c Don't extract bad files (#8056)
  • 6ef54e7 Fix silent data loss in push_to_hub when num_proc > num_shards (#8044)
  • 38511fc Use num_examples instead of len(self) for iterable_dataset's SplitInfo (#8041)
  • c410be5 Fix non-deterministic by sorting metadata extensions (#8034) (#8039)
  • 70f7474 Fix typos in iterable_dataset.py (#8049)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [datasets](https://github.com/huggingface/datasets) from 4.4.2 to 4.7.0.
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@4.4.2...4.7.0)

---
updated-dependencies:
- dependency-name: datasets
  dependency-version: 4.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Mar 16, 2026
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot bot commented on behalf of github Mar 23, 2026

Superseded by #49.

@dependabot dependabot bot closed this Mar 23, 2026
@dependabot dependabot bot deleted the dependabot/pip/datasets-4.7.0 branch March 23, 2026 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant