Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add PATCH /api/v1/records/{record_id} endpoint #3920

Conversation

gabrielmbmb
Copy link
Member

@gabrielmbmb gabrielmbmb commented Oct 10, 2023

Description

This PR adds a new endpoint PATCH /api/v1/records/{record_id} that allows to partially update a Feedback Dataset record. A new method called SearchEngine.update_record_metadata has been added so the record metadata can also be updated on the SearchEngine.

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested

Unit tests covering the additions have been added.

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@gabrielmbmb gabrielmbmb added type: enhancement Indicates new feature requests area: api Indicates that an issue or pull request is related to the Fast API server or REST endpoints labels Oct 10, 2023
@gabrielmbmb gabrielmbmb added this to the v1.17.0 milestone Oct 10, 2023
@gabrielmbmb gabrielmbmb self-assigned this Oct 10, 2023
@gabrielmbmb gabrielmbmb changed the title Feature/add patch record endpoint feat: add PATCH /api/v1/records/{record_id} endpoint Oct 10, 2023
@gabrielmbmb gabrielmbmb marked this pull request as ready for review October 11, 2023 10:05
@github-actions
Copy link

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-3920-ki24f765kq-no.a.run.app

@gabrielmbmb gabrielmbmb force-pushed the feature/add-patch-record-endpoint branch from bde500e to d77b7de Compare October 11, 2023 12:41
@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Files Coverage Δ
src/argilla/server/apis/v1/handlers/records.py 98.64% <100.00%> (+0.16%) ⬆️
src/argilla/server/contexts/datasets.py 98.71% <100.00%> (+0.37%) ⬆️
src/argilla/server/models/database.py 98.63% <100.00%> (ø)
src/argilla/server/policies.py 97.56% <100.00%> (+0.04%) ⬆️
src/argilla/server/schemas/v1/records.py 100.00% <100.00%> (ø)
src/argilla/server/search_engine/commons.py 91.15% <100.00%> (+0.31%) ⬆️
src/argilla/server/schemas/v1/datasets.py 99.03% <75.00%> (-0.32%) ⬇️
src/argilla/server/search_engine/base.py 81.56% <66.66%> (-0.33%) ⬇️

📢 Thoughts on this report? Let us know!.

@gabrielmbmb gabrielmbmb merged commit 04893b5 into feature/support-for-metadata-filtering-and-sorting Oct 11, 2023
@gabrielmbmb gabrielmbmb deleted the feature/add-patch-record-endpoint branch October 11, 2023 14:58
gabrielmbmb added a commit that referenced this pull request Oct 17, 2023
# Description

This PR adds the following:

- Updates `PATCH /api/v1/records/{record_id}` added in #3920 allowing
also to update the suggestions of a record. The suggestions in the input
payload will **replace** the old suggestions.
- Add new `PATCH /api/v1/datasets/{dataset_id}/records` endpoint
allowing to batch/bulk update the records of a dataset. The endpoint
allow to update the same attributes from the record as in the `PATCH
/api/v1/records/{record_id}` endpoint.
- Slightly modify the `SearchDocument` getter dict to do not try to
populate the `SearchDocument.responses` attribute if the relationship
has not been loaded (this allows us to not to have to load the
`Record.responses` when updating the record document in the
`SearchEngine` using `add_records`)
- Removes `SearchEngine.update_record_metadata` method as the same logic
is covered by `SearchEngine.add_records` method, which can be also used
to update the fields of an existing document.
- Rename `SearchEngine.add_records` method to
`SearchEngine.index_records` as it can be used to both add and update
records.

**Type of change**

- [x] New feature (non-breaking change which adds functionality)

**How Has This Been Tested**

I made an small benchmark to test the latency of the new endpoint. I've
created a dataset with 100000 records and all the possible questions and
metadata properties. Then I built batches of 1000 records, updating all
the responses and metadata fields, and sent them to the API. The average
response time of the bulk `PATCH` endpoint was ~= 0.8 seconds.

<details>
  <summary>Code used for benchmark</summary>

  ```python
import uuid
import random
import argilla as rg

LABELS = ["a", "b", "c"]
RANKS = ["top-1", "top-2", "top-3"]

dataset = rg.FeedbackDataset(
    fields=[rg.TextField(name="text")],
    questions=[
        rg.TextQuestion(name="text"),
        rg.RatingQuestion(name="rating", values=[1, 2, 3, 4, 5]),
        rg.LabelQuestion(name="label", labels=LABELS),
        rg.MultiLabelQuestion(name="multi-label", labels=LABELS),
        rg.RankingQuestion(name="ranking", values=RANKS),
    ],
    metadata_properties=[
        rg.TermsMetadataProperty(name="label", values=LABELS),
        rg.IntegerMetadataProperty(name="integer", min=0, max=10),
        rg.FloatMetadataProperty(name="float", min=0, max=10),
    ],
)

dataset.add_records([rg.FeedbackRecord(fields={"text": "Hello"},
metadata={"extra": "yes"}) for _ in range(100000)])

remote = dataset.push_to_argilla(name=f"benchmark-{uuid.uuid4()}",
workspace="gabriel")


def random_rank_order():
    ranks = RANKS.copy()
    ranks.sort(key=lambda x: random.random())
return [{"value": rank, "rank": i + 1} for i, rank in enumerate(ranks)]


def build_update_payload(record):
    return {
        "id": str(record.id),
        "external_id": str(uuid.uuid4()),
        "metadata": {
            "label": random.choice(["a", "b", "c"]),
            "integer": random.randint(0, 10),
            "float": random.uniform(0, 10),
        },
        "suggestions": [
{"question_id": str(remote.questions[0].id), "value": "hello world" *
random.randint(1, 15)},
{"question_id": str(remote.questions[1].id), "value": random.randint(1,
5)},
{"question_id": str(remote.questions[2].id), "value":
random.choice(["a", "b", "c"])},
{"question_id": str(remote.questions[3].id), "value":
[random.choice(["a", "b", "c"])]},
{"question_id": str(remote.questions[4].id), "value":
random_rank_order()},
        ],
    }

http_client = rg.active_client().http_client

elapseds = []

batch = []
for record in remote.records:
    batch.append(build_update_payload(record))

    if len(batch) == 1000:
response =
http_client.httpx.patch(f"/api/v1/datasets/{remote.id}/records",
json={"items": batch})
        elapseds.append(response.elapsed.total_seconds())
        batch = []

average_elapsed_time = sum(elapseds) / len(elapseds)

print("Average elapsed time", average_elapsed_time)


  ```
</details>

**Checklist**

- [ ] I added relevant documentation
- [x] I followed the style guidelines of this project
- [x] I did a self-review of my code
- [x] I made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [x] I have added relevant notes to the `CHANGELOG.md` file (See
https://keepachangelog.com/)

---------

Co-authored-by: frascuchon <[email protected]>
@frascuchon frascuchon modified the milestones: v1.17.0, v1.18.0 Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api Indicates that an issue or pull request is related to the Fast API server or REST endpoints type: enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants