Releases: argilla-io/argilla
v1.13.0
🔆 Highlights
✨ Suggestions
data:image/s3,"s3://crabby-images/d43be/d43be42df8c67fc4c14ccac47eeabe1208d96c39" alt="Dataset with suggestions"
All question types in the Feedback task support suggestions, but you can only add one suggestion per question.
Learn more about this feature in our docs.
🗄️ List workspaces
We've added functionalities to list all the workspaces that a user has access to. From the Python client you will be able to list all workspaces of the current user using rg.Workspace.list()
and in the UI you will be able to see the list of workspaces in the user settings page.
data:image/s3,"s3://crabby-images/860eb/860ebf834f8345aa9290c83620b581b00d76c206" alt="User settings page with workspace list"
Read more in the docs.
🏋️♂️ Extended training support
We are extending the support we give to help preparing data from Feedback datasets to use during training. As part of this release we include strategies to unify responses to RankingQuestion
s and also provide a task mapping for text classification TrainingTaskMapping.for_text_classification
.
Read more about how to use these methods to train models with Feedback collected in Argilla here.
Changelog 1.13.0
Added
- Added
GET /api/v1/users/{user_id}/workspaces
endpoint to list the workspaces to which a user belongs (#3308 and #3343). - Added
HuggingFaceDatasetMixin
for internal usage, to detach theFeedbackDataset
integrations from the class itself, and use Mixins instead (#3326). - Added
GET /api/v1/records/{record_id}/suggestions
API endpoint to get the list of suggestions for the responses associated to a record (#3304). - Added
POST /api/v1/records/{record_id}/suggestions
API endpoint to create a suggestion for a response associated to a record (#3304). - Added support for
RankingQuestionStrategy
,RankingQuestionUnification
and the.for_text_classification
method for theTrainingTaskMapping
(#3364) - Added
PUT /api/v1/records/{record_id}/suggestions
API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391). - Added
suggestions
attribute toFeedbackRecord
, and allow adding and retrieving suggestions from the Python client (#3370) - Added
allowed_for_roles
Python decorator to check whether the current user has the required role to access the decorated function/method forUser
andWorkspace
(#3383) - Added API and Python Client support for workspace deletion (Closes #3260)
- Added
GET /api/v1/me/workspaces
endpoint to list the workspaces of the current active user (#3390)
Changed
- Updated output payload for
GET /api/v1/datasets/{dataset_id}/records
,GET /api/v1/me/datasets/{dataset_id}/records
,POST /api/v1/me/datasets/{dataset_id}/records/search
endpoints to include the suggestions of the records based on the value of theinclude
query parameter (#3304). - Updated
POST /api/v1/datasets/{dataset_id}/records
input payload to add suggestions (#3304). - The
POST /api/datasets/:dataset-id/:task/bulk
endpoints don't create the dataset if does not exists (Closes #3244) - Added Telemetry support for
ArgillaTrainer
(closes #3325) User.workspaces
is no longer an attribute but a property, and is callinglist_user_workspaces
to list all the workspace names for a given user ID (#3334)- Renamed
FeedbackDatasetConfig
toDatasetConfig
and export/import from YAML as default instead of JSON (just used internally onpush_to_huggingface
andfrom_huggingface
methods ofFeedbackDataset
) (#3326). - The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
- Updated
Dockerfile
parent image frompython:3.9.16-slim
topython:3.10.12-slim
(#3425). - Updated
quickstart.Dockerfile
parent image fromelasticsearch:8.5.3
toargilla/argilla-server:${ARGILLA_VERSION}
(#3425).
Removed
- Removed support to non-prefixed environment variables. All valid env vars start with
ARGILLA_
(See #3392).
Fixed
- Fixed
GET /api/v1/me/datasets/{dataset_id}/records
endpoint returning always the responses for the records even ifresponses
was not provided via theinclude
query parameter (#3304). - Values for protected metadata fields are not truncated (Closes #3331).
- Big number ids are properly rendered in UI (Closes #3265)
- Fixed
ArgillaDatasetCard
to include the values/labels for all the existing questions (#3366)
Deprecated
- Integer support for record id in text classification, token classification and text2text datasets.
As always, thanks to our amazing contributors
- @manijhariya made their first contribution in #3295
Full Changelog: v1.12.1...1.13.0
v1.12.1
v1.12.0
🔆 Highlights
New RankingQuestion
in Feedback Task datasets
Now you will be able to include RankingQuestion
s in your Feedback datasets. These are specially designed to gather feedback on labeler's preferences, by providing a set of options that labelers can order.
Here's how you can add a RankingQuestion
to a FeedbackDataset
:
dataset = FeedbackDataset(
fields=[
rg.TextField(name="prompt"),
rg.TextField(name="reply-1", title="Reply 1"),
rg.TextField(name="reply-2", title="Reply 2"),
rg.TextField(name="reply-3", title="Reply 3"),
],
questions=[
rg.RankingQuestion(
name="ranking",
title="Order replies based on your preference",
description="1 = best, 3 = worst. Ties are allowed.",
required=True,
values={"reply-1": "Reply 1", "reply-2": "Reply 2", "reply-3": "Reply 3"} # or ["reply-1", "reply-2", "reply-3"]
]
)
More info in our docs.
Extended training support
You can now format responses from RatingQuestion
, LabelQuestion
and MultiLabelQuestion
for your preferred training framework using the prepare_for_training
method.
Also, we've added support for spacy-transformers
in our Argilla Trainer.
Here's an example code snippet:
import argilla.feedback as rg
dataset = rg.FeedbackDataset.from_huggingface(
repo_id="argilla/stackoverflow_feedback_demo"
)
task_mapping = rg.TrainingTaskMapping.for_text_classification(
text=dataset.field_by_name("question"),
label=dataset.question_by_name("tags")
)
trainer = rg.ArgillaTrainer(
dataset=dataset,
task_mapping=task_mapping,
framework="spacy-transformers",
fetch_records=False
)
trainer.update_config(num_train_epochs=2)
trainer.train(output_dir="my_awesone_model")
To learn more about how to use Argilla Trainer check our docs.
Changelog 1.12.0
Added
- Added
RankingQuestionSettings
class allowing to create ranking questions in the API usingPOST /api/v1/datasets/{dataset_id}/questions
endpoint (#3232) - Added
RankingQuestion
in the Python client to create ranking questions (#3275). - Added
Ranking
component in feedback task question form (#3177 & #3246). - Added
FeedbackDataset.prepare_for_training
method for generaring a framework-specific dataset with the responses provided forRatingQuestion
,LabelQuestion
andMultiLabelQuestion
(#3151). - Added
ArgillaSpaCyTransformersTrainer
class for supporting the training withspacy-transformers
(#3256).
Changed
- All docker related files have been moved into the
docker
folder (#3053). release.Dockerfile
have been renamed toDockerfile
(#3133).- Updated
rg.load
function to raise aValueError
with a explanatory message for the cases in which the user tries to use the function to load aFeedbackDataset
(#3289). - Updated
ArgillaSpaCyTrainer
to allow re-usingtok2vec
(#3256).
Fixed
- Check available workspaces on Argilla on
rg.set_workspace
(Closes #3262)
New Contributors
- @garimau made their first contribution in #3255
- @adurante92 made their first contribution in #3242
Full Changelog: v1.11.0...v1.12.0
v1.11.0
🔆 Highlights
New owner
role and user update command
We've added a new user role, owner
, that has permissions over all users, workspaces and datasets in Argilla (like the admin
role in earlier versions). From this version, the admin
role will only have permissions over datasets and users in workspaces assigned to them.
You can change a user from admin to owner using a simple CLI command: python -m argilla users update argilla --role owner
.
Improved user and workspace management
You can now get lists of users and workspaces, create new ones and give users access to workspaces directly from the Python SDK. Note that only owners will have permissions for all these actions. Admins will be able to give users access to workspaces where they have access.
Metadata fields for Feedback records
You can now add metadata information to your records. This is useful to store information that's not needed for the labeling UI but important for downstream usage (e.g., prompt id, model IDs, etc.)
Changelog 1.11.0
Fixed
- Replaced
np.float
alias byfloat
to avoidAttributeError
when usingfind_label_errors
function withnumpy>=1.24.0
(#3214). - Fixed
format_as("datasets")
when no responses or optional respones inFeedbackRecord
, to set their value to what 🤗 Datasets expects instead of justNone
(#3224). - Fixed
push_to_huggingface()
whengenerate_card=True
(default behaviour), as we were passing a sample record to theArgillaDatasetCard
class, andUUID
s introduced in 1.10.0 (#3192), are not JSON-serializable (#3231). - Fixed
from_argilla
andpush_to_argilla
to ensure consistency on both field and question re-construction, and to ensureUUID
s are properly serialized asstr
, respectively (#3234).
Added
- Added
metadata
attribute to theRecord
of theFeedbackDataset
(#3194) - New
users update
command to update the role for an existing user (#3188) - New
Workspace
class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180) - Added
User
class to let users manage their Argilla users via the Python client (#3169). - Added an option to display
tqdm
progress bar toFeedbackDataset.push_to_argilla
when looping over the records to upload (#3233).
Changed
- The role system now support three different roles
owner
,admin
andannotator
(#3104) admin
role is scoped to workspace-level operations (#3115)- The
owner
user is created among the default pool of users in the quickstart, and the default user in the server has nowowner
role (#3248), reverting (#3188).
Deprecated
- As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/
As always, thanks to our amazing contributors!
- @damianpumar made their first contribution in #2950
- @MedAmine-SUDO made their first contribution in #3204
- @manulpatel made their first contribution in #3233
v1.10.0
🔆 Highlights
Search records in Feedback Task
We've added a search bar in the Feedback Task UI so you can filter records based on specific words or phrases.
Extended markdown support
Annotation guidelines are now rendered as markdown text to make them easier to read and have a more flexible format.
Train button in Feedback Task
Admin users have access to a Train </>
button in the Feedback Task UI with quick links to all the information needed to train a model with the feedback gathered in Argilla.
Changelog 1.10.0
Added
- Added search component for feedback datasets (#3138)
- Added markdown support for feedback dataset guidelines (#3153)
- Added Train button for feedback datasets (#3170)
Changed
- Updated
SearchEngine
andPOST /api/v1/me/datasets/{dataset_id}/records/search
to return thetotal
number of records matching the search query (#3166)
Fixed
- Replaced Enum for string value in URLs for client API calls (Closes #3149)
- Resolve breaking issue with
ArgillaSpanMarkerTrainer
for Named Entity Recognition withspan_marker
v1.1.x onwards. - Move
ArgillaDatasetCard
import under@requires_version
decorator, so that theImportError
onhuggingface_hub
is handled properly (#3174) - Allow flow
FeedbackDataset.from_argilla
->FeedbackDataset.push_to_argilla
under different dataset names and/or workspaces (#3192)
As always, thanks to our amazing contributors!
- @hjain5164 made their first contribution in #3146
- @Fancman made their first contribution in #3150
- @preetgami made their first contribution in #3196
Full Changelog: v1.9.0...v1.10.0
v1.9.0
🔆 Highlights
New question types in Feedback Datasets
data:image/s3,"s3://crabby-images/4accb/4accb071cfcd780571dca4b3677c23b8766317c9" alt="Screenshot of a Feedback Dataset with the new Label and MultiLabel questions and markdown support"
We've included two new question types in Feedback Datasets: LabelQuestion
and MultiLabelQuestion
. These are specially useful for applying one or multiple labels to a record, for example, for text classification tasks. In this new view, you can add multiple classification questions and even combine them with the other question types available in Feedback Datasets: RatingQuestion
and TextQuestion
.
Markdown support in Feedback Fields and Text Questions
You can now add the use_markdown=True
tag to a TextField
or a TextQuestion
to have the UI render the text as markdown. You can use this to read and write code, tables or even add images.
data:image/s3,"s3://crabby-images/71983/71983f8525c78ffe1989cf92be552626fa9683cb" alt="Screenshot of a Feedback Dataset with rendered markdown in a record field and a text question"
Further improvements in Feedback Datasets
We continue to add improvements to our new Feedback Datasets:
- We've added checks to avoid having fields and questions with repeated names.
- Dataset cards generated using
FeedbackDataset.push_to_huggingface(generate_card=True)
now follow the official Hugging Face template.
Changelog 1.9.0
Added
- Added boolean
use_markdown
property toTextFieldSettings
model (#3000) - Added boolean
use_markdown
property toTextQuestionSettings
model (#3000). - Added new status
draft
for theResponse
model (#3033) - Added
LabelSelectionQuestionSettings
class allowing to create label selection (single-choice) questions in the API (#3005) - Added
MultiLabelSelectionQuestionSettings
class allowing to create multi-label selection (multi-choice) questions in the API (#3010). - Added
POST /api/v1/me/datasets/{dataset_id}/records/search
endpoint (#3068). - Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
- Added docstrings to the
pydantic.BaseModel
s defined atargilla/client/feedback/schemas.py
(#3137)
Changed
- Updated
GET /api/v1/me/datasets/:dataset_id/metrics
output payload to include the count of responses withdraft
status (#3033) - Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
- Updated
alembic
setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044) - Improved
DatasetCard
generation onFeedbackDataset.push_to_huggingface
whengenerate_card=True
, following the official HuggingFace Hub template, but suited toFeedbackDataset
s from Argilla (#3110)
Fixed
- Disallow
fields
andquestions
inFeedbackDataset
with the same name (#3126).
As always, thanks to our amazing contributors!
- @gitrock made their first contribution in #3091
- @ChadDa3mon made their first contribution in #3092
v1.8.0
🔆 Highlights
New Feedback Task 🎉
data:image/s3,"s3://crabby-images/f6e42/f6e42d8f0125c8422045ce21bbb31a31d948a9cc" alt="snapshot-feedback-demo"
In addition, these datasets support multiple annotations: all users with access to the dataset can give their responses.
The FeedbackDataset
has an enhanced integration with the Hugging Face Hub, so that saving a dataset to the Hub or pushing a FeedbackDataset
from the Hub directly to Argilla is seamless.
Check all the things you can do with Feedback Tasks in our docs
New LLM section in our docs
We've added a new section in our docs that covers:
- Useful concepts around work with LLMs
- How-to guides that cover all the functionalities of the new Feedback Task
- End-to-end examples
More training integrations
We've added new frameworks for the ArgillaTrainer
: ArgillaPeftTrainer
for Text and Token Classification and ArgillaAutoTrainTrainer
for Text Classification.
Changelog 1.8.0
Added
/api/v1/datasets
new endpoint to list and create datasets ([#2615])./api/v1/datasets/{dataset_id}
new endpoint to get and delete datasets ([#2615])./api/v1/datasets/{dataset_id}/publish
new endpoint to publish a dataset ([#2615])./api/v1/datasets/{dataset_id}/questions
new endpoint to list and create dataset questions ([#2615])/api/v1/datasets/{dataset_id}/fields
new endpoint to list and create dataset fields ([#2615])/api/v1/datasets/{dataset_id}/questions/{question_id}
new endpoint to delete a dataset questions ([#2615])/api/v1/datasets/{dataset_id}/fields/{field_id}
new endpoint to delete a dataset field ([#2615])/api/v1/workspaces/{workspace_id}
new endpoint to get workspaces by id ([#2615])/api/v1/responses/{response_id}
new endpoint to update and delete a response ([#2615])/api/v1/datasets/{dataset_id}/records
new endpoint to create and list dataset records ([#2615])/api/v1/me/datasets
new endpoint to list user visible datasets ([#2615])/api/v1/me/dataset/{dataset_id}/records
new endpoint to list dataset records with user responses ([#2615])/api/v1/me/datasets/{dataset_id}/metrics
new endpoint to get the dataset user metrics ([#2615])/api/v1/me/records/{record_id}/responses
new endpoint to create record user responses ([#2615])- showing new feedback task datasets in datasets list ([#2719])
- new page for feedback task ([#2680])
- show feedback task metrics ([#2822])
- user can delete dataset in dataset settings page ([#2792])
- Support for
FeedbackDataset
in Python client (parent PR [#2615], and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003]) - Integration with the HuggingFace Hub ([#2949])
- Added
ArgillaPeftTrainer
for text and token classification #2854 - Added
predict_proba()
method toArgillaSetFitTrainer
- Added
ArgillaAutoTrainTrainer
for Text Classification #2664 - New
database revisions
command showing database revisions info [#2615]: #2615
Fixes
Changed
- The
database migrate
command accepts a--revision
param to provide specific revision id tokens_length
metrics function returns empty data (#3045)token_length
metrics function returns empty data (#3045)mention_length
metrics function returns empty data (#3045)entity_density
metrics function returns empty data (#3045)
Deprecated
- Using argilla with python 3.7 runtime is deprecated and support will be removed from version 1.9.0 (#2902)
tokens_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)token_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)mention_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)entity_density
metrics function has been deprecated and will be removed in 1.10.0 (#3045)
Removed
- Removed mention
density
,tokens_length
andchars_length
metrics from token classification metrics storage (#3045) - Removed token
char_start
,char_end
,tag
, andscore
metrics from token classification metrics storage (#3045) - Removed tags-related metrics from token classification metrics storage (#3045)
As always, thanks to our amazing contributors!
v1.7.0
🔆 Highlights
OpenAI fine-tuning support
Use your data in Argilla to fine-tune OpenAI models. You can do this by getting your data in the specific format through the prepare_for_training
method or train directly using ArgillaTrainer
.
Argilla Trainer improvements
We’ve added CLI support for Argilla Trainer and two new frameworks for training: OpenAI
& SpanMarker
.
Logging and loading enhancements
We’ve improved the speed and robustness of rg.log
and rg.load
methods.
typer
CLI
A more user-friendly command line interface with typer
that includes argument suggestions and colorful messages.
Changelog 1.7.0
Added
- add
max_retries
andnum_threads
parameters torg.log
to run data logging request concurrently with backoff retry policy. See #2458 and #2533 rg.load
acceptsinclude_vectors
andinclude_metrics
when loading data. Closes #2398- Added
settings
param toprepare_for_training
(#2689) - Added
prepare_for_training
foropenai
(#2658) - Added
ArgillaOpenAITrainer
(#2659) - Added
ArgillaSpanMarkerTrainer
for Named Entity Recognition (#2693) - Added
ArgillaTrainer
CLI support. Closes (#2809)
Changed
- Argilla quickstart image dependencies are externalized into
quickstart.requirements.txt
. See #2666 - bulk endpoints will upsert data when record
id
is present. Closes #2535 - moved from
click
totyper
CLI support. Closes (#2815) - Argilla server docker image is built with PostgreSQL support. Closes #2686
- The
rg.log
computes all batches and raise an error for all failed batches. - The default batch size for
rg.log
is now 100.
Fixed
argilla.training
bugfixes and unification (#2665)- Resolved several small bugs in the
ArgillaTrainer
.
Deprecated
- The
rg.log_async
function is deprecated and will be removed in next minor release.
As always, thanks to out amazing contributors!
- docs: Fix broken links in README.md (#2759) by @stephantul
- Update how_to.ipynb by @chainyo
- Update log_load_and_prepare_data.ipynb by @ignacioct
v1.6.0
🔆 Highlights
User roles & settings page
We've introduced two user roles to help you manage your annotation team: admin
and annotator
. admin
users can create, list and delete other users, workspaces and datasets. The annotator
role is specifically designed for users who focus solely on annotating datasets.
We've also added a page to see your user's settings in the Argilla UI. To access it click on your user avatar at the top right corner and then select My settings
.
Argilla Trainer
The new Argilla.training
module deals with all data transformations and basic default configurations to train a model with annotations from Argilla using popular NLP frameworks. It currently supports spacy
, setfit
and transformers
.
Additionally, admin
users can access ready-made code snippets to copy-paste directly from the Argilla UI. Just go to the dataset you want to use, click the </> Train
button in the top banner and select your preferred framework.
Learn more about Argilla.training
in our docs.
Database support
Argilla will now create a default SQLite database to store users and workspaces. PostgreSQL is also officially supported. Simply set a custom value for the ARGILLA_DATABASE_URL
environment variable pointing to your PostgreSQL instance.
Changelog 1.6.0
Added
ARGILLA_HOME_PATH
new environment variable (#2564).ARGILLA_DATABASE_URL
new environment variable (#2564).- Basic support for user roles with
admin
andannotator
(#2564). id
,first_name
,last_name
,role
,inserted_at
andupdated_at
new user fields (#2564)./api/users
new endpoint to list and create users (#2564)./api/users/{user_id}
new endpoint to delete users (#2564)./api/workspaces
new endpoint to list and create workspaces (#2564)./api/workspaces/{workspace_id}/users
new endpoint to list workspace users (#2564)./api/workspaces/{workspace_id}/users/{user_id}
new endpoint to create and delete workspace users (#2564).argilla.tasks.users.migrate
new task to migrate users from old YAML file to database (#2564).argilla.tasks.users.create
new task to create a user (#2564).argilla.tasks.users.create_default
new task to create a user with default credentials (#2564).argilla.tasks.database.migrate
new task to execute database migrations (#2564).release.Dockerfile
andquickstart.Dockerfile
now creates a defaultargilladata
volume to persist data (#2564).- Add user settings page. Closes #2496
- Added
Argilla.training
module with support forspacy
,setfit
, andtransformers
. Closes #2504
Fixes
- Now the
prepare_for_training
method is working whenmulti_label=True
. Closes #2606
Changed
ARGILLA_USERS_DB_FILE
environment variable now it's only used to migrate users from YAML file to database (#2564).full_name
user field is now deprecated andfirst_name
andlast_name
should be used instead (#2564).password
user field now requires a minimum of8
and a maximum of100
characters in size (#2564).quickstart.Dockerfile
image default users fromteam
andargilla
toadmin
andannotator
including new passwords and API keys (#2564).- Datasets to be managed only by users with
admin
role (#2564). - The list of rules is now accessible while metrics are computed. Closes#2117
- Style updates for weak labelling and adding feedback toast when delete rules. See #2626 and #2648
Removed
email
user field (#2564).disabled
user field (#2564).- Support for private workspaces (#2564).
ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY
andARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD
environment variables. Usepython -m argilla.tasks.users.create_default
instead (#2564).- The old headers for
API Key
andworkspace
from python client - The default value for old
API Key
constant. Closes #2251
As always, thanks to our amazing contributors!
- feat: add ArgillaSpaCyTrainer for both TokenClassification and TextClassification (#2604) by @alvarobartt
- Move dataset dump to train, ignored unnecessary imports, & remove _required_fields attribute (#2642) by @alvarobartt
- fix: update field name in metadata for image url (#2609) by @burtenshaw
- fix Install doc spell error by @PhilipMay
- fix: broken README.md link (#2616) by @alvarobartt
v1.5.1
1.5.1
Fixes
- Copying datasets between workspaces with proper owner/workspace info. Closes #2562
- Copy dataset with empty workspace to the default user workspace. See #2618
- Using elasticsearch config to request backend version. Closes #2311
- Remove sorting by score in labels. Closes #2622
Changed
- Update field name in metadata for image url. See #2609