-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add PATCH /api/v1/records/{record_id}
endpoint
#3920
Merged
gabrielmbmb
merged 10 commits into
feature/support-for-metadata-filtering-and-sorting
from
feature/add-patch-record-endpoint
Oct 11, 2023
Merged
feat: add PATCH /api/v1/records/{record_id}
endpoint
#3920
gabrielmbmb
merged 10 commits into
feature/support-for-metadata-filtering-and-sorting
from
feature/add-patch-record-endpoint
Oct 11, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PATCH /api/v1/records/{record_id}
endpoint
jfcalvo
reviewed
Oct 11, 2023
jfcalvo
reviewed
Oct 11, 2023
…o feature/add-patch-record-endpoint
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-3920-ki24f765kq-no.a.run.app |
bde500e
to
d77b7de
Compare
…o feature/add-patch-record-endpoint
…o feature/add-patch-record-endpoint
Codecov ReportAttention:
📢 Thoughts on this report? Let us know!. |
04893b5
into
feature/support-for-metadata-filtering-and-sorting
9 tasks
gabrielmbmb
added a commit
that referenced
this pull request
Oct 17, 2023
# Description This PR adds the following: - Updates `PATCH /api/v1/records/{record_id}` added in #3920 allowing also to update the suggestions of a record. The suggestions in the input payload will **replace** the old suggestions. - Add new `PATCH /api/v1/datasets/{dataset_id}/records` endpoint allowing to batch/bulk update the records of a dataset. The endpoint allow to update the same attributes from the record as in the `PATCH /api/v1/records/{record_id}` endpoint. - Slightly modify the `SearchDocument` getter dict to do not try to populate the `SearchDocument.responses` attribute if the relationship has not been loaded (this allows us to not to have to load the `Record.responses` when updating the record document in the `SearchEngine` using `add_records`) - Removes `SearchEngine.update_record_metadata` method as the same logic is covered by `SearchEngine.add_records` method, which can be also used to update the fields of an existing document. - Rename `SearchEngine.add_records` method to `SearchEngine.index_records` as it can be used to both add and update records. **Type of change** - [x] New feature (non-breaking change which adds functionality) **How Has This Been Tested** I made an small benchmark to test the latency of the new endpoint. I've created a dataset with 100000 records and all the possible questions and metadata properties. Then I built batches of 1000 records, updating all the responses and metadata fields, and sent them to the API. The average response time of the bulk `PATCH` endpoint was ~= 0.8 seconds. <details> <summary>Code used for benchmark</summary> ```python import uuid import random import argilla as rg LABELS = ["a", "b", "c"] RANKS = ["top-1", "top-2", "top-3"] dataset = rg.FeedbackDataset( fields=[rg.TextField(name="text")], questions=[ rg.TextQuestion(name="text"), rg.RatingQuestion(name="rating", values=[1, 2, 3, 4, 5]), rg.LabelQuestion(name="label", labels=LABELS), rg.MultiLabelQuestion(name="multi-label", labels=LABELS), rg.RankingQuestion(name="ranking", values=RANKS), ], metadata_properties=[ rg.TermsMetadataProperty(name="label", values=LABELS), rg.IntegerMetadataProperty(name="integer", min=0, max=10), rg.FloatMetadataProperty(name="float", min=0, max=10), ], ) dataset.add_records([rg.FeedbackRecord(fields={"text": "Hello"}, metadata={"extra": "yes"}) for _ in range(100000)]) remote = dataset.push_to_argilla(name=f"benchmark-{uuid.uuid4()}", workspace="gabriel") def random_rank_order(): ranks = RANKS.copy() ranks.sort(key=lambda x: random.random()) return [{"value": rank, "rank": i + 1} for i, rank in enumerate(ranks)] def build_update_payload(record): return { "id": str(record.id), "external_id": str(uuid.uuid4()), "metadata": { "label": random.choice(["a", "b", "c"]), "integer": random.randint(0, 10), "float": random.uniform(0, 10), }, "suggestions": [ {"question_id": str(remote.questions[0].id), "value": "hello world" * random.randint(1, 15)}, {"question_id": str(remote.questions[1].id), "value": random.randint(1, 5)}, {"question_id": str(remote.questions[2].id), "value": random.choice(["a", "b", "c"])}, {"question_id": str(remote.questions[3].id), "value": [random.choice(["a", "b", "c"])]}, {"question_id": str(remote.questions[4].id), "value": random_rank_order()}, ], } http_client = rg.active_client().http_client elapseds = [] batch = [] for record in remote.records: batch.append(build_update_payload(record)) if len(batch) == 1000: response = http_client.httpx.patch(f"/api/v1/datasets/{remote.id}/records", json={"items": batch}) elapseds.append(response.elapsed.total_seconds()) batch = [] average_elapsed_time = sum(elapseds) / len(elapseds) print("Average elapsed time", average_elapsed_time) ``` </details> **Checklist** - [ ] I added relevant documentation - [x] I followed the style guidelines of this project - [x] I did a self-review of my code - [x] I made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [x] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/) --------- Co-authored-by: frascuchon <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area: api
Indicates that an issue or pull request is related to the Fast API server or REST endpoints
type: enhancement
Indicates new feature requests
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a new endpoint
PATCH /api/v1/records/{record_id}
that allows to partially update a Feedback Dataset record. A new method calledSearchEngine.update_record_metadata
has been added so the record metadata can also be updated on theSearchEngine
.Type of change
How Has This Been Tested
Unit tests covering the additions have been added.
Checklist
CHANGELOG.md
file (See https://keepachangelog.com/)