Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: add HuggingFaceDatasetMixIn under integrations #3326

Merged
merged 23 commits into from
Jul 4, 2023

Conversation

alvarobartt
Copy link
Member

@alvarobartt alvarobartt commented Jul 4, 2023

Description

To avoid the increasing size of argilla/client/feedback/dataset.py, I've decided to detach the integrations from the FeedbackDataset class and create a MixIn class to contain all those methods specific to the integrations within the FeedbackDataset, in this case for 🤗 Datasets.

Besides that, I've also renamed the FeedbackDatasetConfig to DatasetConfig, and included some methods to dump a YAML file from now on, instead of a JSON file, since the YAML file is more readable. So now we upload argilla.yaml when pushing a FeedbackDataset to the HuggingFace Hub via push_to_huggingface.

Type of change

  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

  • Re-run unit tests
  • Catch DeprecationWarnings

Checklist

  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@alvarobartt alvarobartt requested a review from gabrielmbmb July 4, 2023 09:02
@codecov
Copy link

codecov bot commented Jul 4, 2023

Codecov Report

Patch coverage: 88.43% and project coverage change: +0.07 🎉

Comparison is base (384a69e) 90.27% compared to head (d9dfd20) 90.35%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3326      +/-   ##
===========================================
+ Coverage    90.27%   90.35%   +0.07%     
===========================================
  Files          234      236       +2     
  Lines        12642    12695      +53     
===========================================
+ Hits         11413    11471      +58     
+ Misses        1229     1224       -5     
Impacted Files Coverage Δ
...ack/integrations/huggingface/card/_dataset_card.py 100.00% <ø> (ø)
.../feedback/integrations/huggingface/card/_parser.py 100.00% <ø> (ø)
...lient/feedback/integrations/huggingface/dataset.py 85.36% <85.36%> (ø)
src/argilla/client/feedback/config.py 92.59% <91.66%> (-7.41%) ⬇️
src/argilla/client/feedback/dataset.py 80.32% <100.00%> (-1.90%) ⬇️
...ient/feedback/integrations/huggingface/__init__.py 100.00% <100.00%> (ø)
...feedback/integrations/huggingface/card/__init__.py 100.00% <100.00%> (ø)
src/argilla/client/feedback/schemas.py 97.29% <100.00%> (+0.19%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@alvarobartt alvarobartt added this to the v1.13 milestone Jul 4, 2023
@alvarobartt alvarobartt merged commit 851c14f into develop Jul 4, 2023
@alvarobartt alvarobartt deleted the refactor/huggingface-mixin branch July 4, 2023 12:10
leiyre pushed a commit that referenced this pull request Jul 4, 2023
* develop:
  refactor: add `HuggingFaceDatasetMixIn` under `integrations` (#3326)
  feat: add list user workspaces endpoint (#3308)
  ci: Stop linking issues to team work project
  chore: add missing `greenlet` dependency in `server` extra (#3330)
  fix: unit test failing if not local db (#3307)
  ci: Optimize build + test pipeline (#3300)
  refactor: simplify old bulk endpoints to avoid create datasets if does not exists (#3306)
  📝 Update doc site link (#3299)
  🚑 Fix dependencies (#3302)
  feat: migrate to async SQLAlchemy engine (#3162)
leiyre pushed a commit that referenced this pull request Jul 5, 2023
* develop:
  fix: return all workspaces in system for owner users (#3343)
  fix: `rg.init` with argilla user using quickstart images raise an unexpected error (#3341)
  feat: add `Suggestion` endpoints (#3304)
  feat: add `list_user_workspaces` and `User.workspaces` property (#3334)
  refactor: add `HuggingFaceDatasetMixIn` under `integrations` (#3326)
  feat: add list user workspaces endpoint (#3308)
  ci: Stop linking issues to team work project
  chore: add missing `greenlet` dependency in `server` extra (#3330)
  docs: update developer docs (#3314)
  Docs/3312 docs 112 is not building correctly (#3313)
  fix: unit test failing if not local db (#3307)
  ci: Optimize build + test pipeline (#3300)
  refactor: simplify old bulk endpoints to avoid create datasets if does not exists (#3306)
gabrielmbmb added a commit that referenced this pull request Jul 24, 2023
…k.schemas` (#3427)

# Description

This PR starts off with the refactoring effort to make sure that
everything's more maintainable and scalable.

So on, this PR refactors the `argilla/feedback/schemas.py` to be split
in different files in a more organised way as
`argilla/feedback/schemas/*.py` so that we have `fields.py`,
`questions.py` and `records.py` to contain the main
`pydantic.BaseModel`s for those. Also all the docstrings have been
rewritten from scratch to be clearer and provide more information.

Additionally, this PR also adds the `ArgillaDatasetMixin` to detach the
Argilla-related functionality from the `FeedbackDataset` itself, as we
recently did for the HuggingFace Hub integration (i.e. #3326)

**Type of change**

- [X] Refactor (change restructuring the codebase without changing
functionality)
- [X] Improvement (change adding some improvement to an existing
functionality)

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes. And
ideally, reference `tests`)

- [X] Add unit tests for every `pydantic.BaseModel` under
`argilla/feedback/schemas`

**Checklist**

- [ ] I added relevant documentation
- [X] follows the style guidelines of this project
- [X] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [X] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: Gabriel Martin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants