Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add RemoteFeedbackDataset.filter_by and FilteredRemoteFeedbackDataset #3610

Merged
merged 38 commits into from
Aug 29, 2023

Conversation

alvarobartt
Copy link
Member

@alvarobartt alvarobartt commented Aug 22, 2023

Description

This PR adds a method named filter_by to RemoteFeedbackDataset so as to be able to filter based on the response_status of any given record in the dataset; as well as refactoring how the RemoteFeedbackDataset is structured to detach the base functionality. Additionally, FilteredRemoteFeedbackDataset has been created with certain constraints on what the users can do with it, while still being linked to Argilla, but using the specified filter.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Refactor (change restructuring the codebase without changing functionality)

How Has This Been Tested

  • Add unit and integration tests for the newly included functionality, as well as refactoring the existing ones due to the module refactor applied to remote.py (now living under remote/dataset.py)

Checklist

  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@alvarobartt alvarobartt added type: enhancement Indicates new feature requests client labels Aug 22, 2023
@alvarobartt alvarobartt added this to the v1.15.0 milestone Aug 22, 2023
@alvarobartt alvarobartt self-assigned this Aug 22, 2023
@codecov
Copy link

codecov bot commented Aug 22, 2023

Codecov Report

Patch coverage: 75.30% and project coverage change: +0.22% 🎉

Comparison is base (fec57d7) 90.56% compared to head (b437061) 90.79%.
Report is 3 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3610      +/-   ##
===========================================
+ Coverage    90.56%   90.79%   +0.22%     
===========================================
  Files          264      267       +3     
  Lines        14188    14260      +72     
===========================================
+ Hits         12850    12948      +98     
+ Misses        1338     1312      -26     
Files Changed Coverage Δ
...c/argilla/client/feedback/dataset/remote/mixins.py 60.00% <60.00%> (ø)
.../argilla/client/feedback/dataset/remote/dataset.py 74.13% <74.13%> (ø)
src/argilla/client/feedback/dataset/remote/base.py 88.88% <85.18%> (ø)
src/argilla/client/feedback/dataset/mixins.py 79.85% <100.00%> (ø)
...argilla/client/feedback/dataset/remote/filtered.py 100.00% <100.00%> (ø)
src/argilla/client/sdk/v1/datasets/api.py 90.47% <100.00%> (+0.13%) ⬆️
src/argilla/client/sdk/v1/datasets/models.py 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alvarobartt
Copy link
Member Author

Pending #3613 merge, so that we can directly include the response_status arg for the filtering over RemoteFeedbackDatasets to be included in the HTTP requests to the Argilla Server when fetching records

@alvarobartt alvarobartt marked this pull request as ready for review August 23, 2023 14:09
@github-actions
Copy link

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-3610-ki24f765kq-no.a.run.app

@alvarobartt alvarobartt requested a review from frascuchon August 28, 2023 15:41
Copy link
Member

@frascuchon frascuchon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay for me. Just one thing. Should we add some sections in docs apart from the Python client refs? Maybe we can refactor an existing tutorial applying filters and start using the filter_by method.

cc  @davidberenstein1957

@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Aug 29, 2023

@frascuchon I agree. I think every added functionality to the Python API should be represented in the docs and that this needs to be done before wrapping up the PR.

I see there also does not seem any reference to the inner workings of the RemoteFeedbackDatasetbesides declaring variables as remote_dataset.

@alvarobartt Could you add:

  • A reference to the existence/usage of the RemoteFeedbackDataset everywhere in the docs /guides/llms/practical_guides/*.md where push/pull(and later clone) is used? They can also have a reference to the URL mentioned in the point below. Similarly, we could add this to the lower/higher tabs in the docs.
    image
  • A more concrete definition of each of the pull/push/clone methods to /guides/llms/practical_guides/export_dataset.md. For example, the pull in this page still references a normal FeedbackDataset. Is that correct?
  • A more explicit definition of the filter_by and a lack of a sort method here /guides/llms/practical_guides/export_dataset.md.
  • filter_by can also be referenced in /guides/llms/practical_guides/collect_responses.md and /guides/llms/practical_guides/create_dataset.html
  • If possible, update the tutorials https://docs.argilla.io/en/latest/guides/llms/examples/*.ipynb

I will add a work item about representing the change above and adding it to the ArgillaTrainer

@davidberenstein1957
Copy link
Member

Thanks! LFTM. Not sure if we want to change Feedback Dataset => FeedbackDataset throughout all places in the docs?

@davidberenstein1957 davidberenstein1957 self-requested a review August 29, 2023 09:30
@alvarobartt alvarobartt merged commit 347dd9c into develop Aug 29, 2023
@alvarobartt alvarobartt deleted the feat/filter-by-status branch August 29, 2023 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants