Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add validation layer for metadata_properties and metadata_filters #3860

Conversation

alvarobartt
Copy link
Member

@alvarobartt alvarobartt commented Sep 29, 2023

Description

This PR adds a validation layer for both the metadata_properties when metadata is provided via FeedbackRecord when adding that record to either a local or existing FeedbackDataset, as well as validating the metadata_filters provided to the filter_by method for RemoteFeedbackDataset.

These validation layers would help reduce the API workload as the validation will be done client-side, which implies that the client will handle the validation first.

In this case, as we have the following properties: TermsMetadataProperty, IntegerMetadataProperty and FloatMetadataProperty; we'll need to ensure that the values provided for those are correct, both on add_records and also on filter_by.

The approach is similar to the one already implemented for the fields of a FeedbackRecord, but extending it to the metadata_properties to also include a validation layer that ensures not just the type of the value provided, but whether it's compliant with the pre-defined values for TermMetadataProperty or the pre-defined boundaries for both IntegerMetadataProperty and FloatMetadataProperty. The core difference is that besides the pydantic validation, there's also a method in each property that validates the values provided in the filters related to those metadata properties.

Closes #3859

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested

  • Add unit tests for the generate_pydantic_schema_for_metadata method
  • Add unit tests for both _validate_filter and _pydantic_field_with_validator in each property i.e. TermsMetadataProperty, IntegerMetadataProperty, and FloatMetadataProperty
  • Add unit tests for add_records with both correct and wrong metadata
  • Add integration tests for add_records and filter_by to ensure that the validation is correct both before and after applying those ops

Checklist

  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@alvarobartt alvarobartt added type: enhancement Indicates new feature requests client labels Sep 29, 2023
@alvarobartt alvarobartt added this to the v1.17.0 milestone Sep 29, 2023
@alvarobartt alvarobartt self-assigned this Sep 29, 2023
@alvarobartt alvarobartt marked this pull request as ready for review September 30, 2023 14:37
@alvarobartt
Copy link
Member Author

FYI @frascuchon I had to revert the return 0 in FilteredRemoteRecords.__len__ magic method as returning 0 is misleading IMO, and we cannot return e.g. -1, anyway I assume this is temporarily as before the release we can just add an API call to get the total number of records for filtered datasets.

@codecov
Copy link

codecov bot commented Sep 30, 2023

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Files Coverage Δ
src/argilla/client/feedback/constants.py 100.00% <100.00%> (ø)
...argilla/client/feedback/dataset/remote/filtered.py 100.00% <100.00%> (+8.33%) ⬆️
src/argilla/client/feedback/dataset/base.py 91.57% <93.75%> (+2.54%) ⬆️
src/argilla/client/feedback/utils.py 82.00% <92.85%> (+2.51%) ⬆️
.../argilla/client/feedback/dataset/remote/dataset.py 84.05% <76.47%> (+3.70%) ⬆️
src/argilla/client/feedback/schemas/metadata.py 94.26% <83.33%> (-3.64%) ⬇️

... and 4 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

@github-actions
Copy link

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-3860-ki24f765kq-no.a.run.app

@alvarobartt alvarobartt force-pushed the feat/add-metadata-validation branch from b3e0015 to a0e1c0a Compare October 6, 2023 17:22
@alvarobartt alvarobartt merged commit 37eb163 into feature/support-for-metadata-filtering-and-sorting Oct 9, 2023
@alvarobartt alvarobartt deleted the feat/add-metadata-validation branch October 9, 2023 10:56
@alvarobartt alvarobartt linked an issue Oct 11, 2023 that may be closed by this pull request
@frascuchon frascuchon modified the milestones: v1.17.0, v1.18.0 Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add metadata validation in add_records and filter_by
3 participants