-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add format_as
, push_to_huggingface
, & from_huggingface
in FeedbackDataset
#2949
Add format_as
, push_to_huggingface
, & from_huggingface
in FeedbackDataset
#2949
Conversation
e.g. `hf_hub_download` cannot be mocked if imported as a top-level import
It seems that the mock for `load_dataset` is not working fine, since it's being ignored for some reason
Here's an example on how to use the implemented features: import argilla as rg
fds = rg.FeedbackDataset(fields=[...], questions=[...])
fds.add_records([...])
fds.push_to_huggingface(repo_id="argilla/feedback-dataset", private=True, ...) # args and **kwargs correspond to the original `push_to_hub` method from 🤗HuggingFace import argilla as rg
fds = rg.FeedbackDataset.from_huggingface(repo_id="argilla/feedback-dataset", ...) # args and **kwargs correspond to the original `load_dataset` function from 🤗HuggingFace |
I wanted to clarify what do we do when the dataset to be pushed to the HuggingFace Hub has no records, should we still upload the configuration and the dataset card? Or should we show a warning that the dataset cannot be created since there are no records? cc @frascuchon @dvsrepo |
Thats
That's a very good question. On the one hand, I think is great to be able to push and read just the config. On the other hand, what happens when people use the load_dataset (instead of our own from_huggingface)? I wonder if they'll get a weird error. If the error is not weird or there's no error, then I'd say we are fine using warnings on our side as discussed below. From our side we can always warn users both when pushing and doing from_huggingface. We could even raise an exception first time and recommend the user to use a param (e.g., force_whatever) to push without records (maybe a dirty option) All in all, the thing I find interesting is to be able to push and read configs so I let you think of we can cover this with these methods or there's a complete different workflow we can provide in the future. |
Thanks for the quick response! Then I guess for the moment we're safer with an exception being raised in case there are no records, and we can look into it in more detail when we tackle the |
Sounds good! |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## feat/2615-instructions-task #2949 +/- ##
===============================================================
- Coverage 92.86% 92.83% -0.04%
===============================================================
Files 204 204
Lines 10571 10657 +86
===============================================================
+ Hits 9817 9893 +76
- Misses 754 764 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice!
Just some code refactor suggestions. Feel free to apply them or not.
Co-authored-by: Francisco Aranda <[email protected]>
Description
This PR adds some methods to the
rg.FeedbackDataset
as follows:format_as
to format the records of the existing dataset as 🤗datasets
format (i.e.datasets.Dataset
) to be extended to support other formats likepandas
ornumpy
push_to_huggingface
to push the existing records and dataset configuration to the HuggingFace Hub (including an auto-generated dataset card! 🎉)from_huggingface
is a classmethod to load arg.FeedbackDataset
from the HuggingFace Hub, pulling the dataset itself (records and annotations) and the configuration (guidelines, fields, and questions)Closes #2765
Type of change
How Has This Been Tested
push_to_huggingface
andfrom_huggingface
format_as("datasets")
Checklist