Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IX] Add support for "empty" being the labeled data #6979

Open
Tracked by #6791
aphilop opened this issue Jul 5, 2024 · 6 comments
Open
Tracked by #6791

[IX] Add support for "empty" being the labeled data #6979

aphilop opened this issue Jul 5, 2024 · 6 comments
Assignees
Labels

Comments

@aphilop
Copy link

aphilop commented Jul 5, 2024

This should be displayed in the Suggestions column and also in the Stats & Filter Panel.

SCR-20240705-htci SCR-20240705-hteh

FIX BUG:
When accepting a SELECT type suggestion, if the suggestion is empty the server returns an error Id is invalid: name_of_the_tesauri because it tries to find the suggested value, which is nothing in the thesauri. This bug can now only happen when bulk accepting.

The UI should allow to individually accept SELECT, TEXT and NUMERIC suggestions with empty values.

@txau
Copy link
Collaborator

txau commented Jul 5, 2024

In the stats and filters this becomes a labeled data. So:

  • When a row in the table is labeled as "empty" and the model is returning "empty" it will be a "Match".
  • When a row in the table is labeled as "empty" and the model is returning suggestions it will be a "Mismatch"

So I guess we don't need to modify the stats UI, but just make sure that the backend is placing the right data in the right place.

In the table, maybe since the option to accept a suggestion exists in the table, we should also have the option to accept an "empty" value as the correct value and labeled data. For the sake of simplicity we could skip this change for now and allow to set the "empty" value only from the side panel.

Where we definitely need to be able to set the "empty" value is in the side panel.

@juanmnl
Copy link

juanmnl commented Jul 12, 2024

@txau given that both can have an empty state, i guess the only way to filter through them is by adding a nested filter to Match and Mismatch (opening the possibility for us adding more options of granular filtering) but will also mean adding a "partial" state for the checkboxes and maybe a "collapse/expand" action to the main one at some point.

Screenshot 2024-07-12 at 11 41 57 AM Screenshot 2024-07-12 at 11 42 24 AM

@txau
Copy link
Collaborator

txau commented Jul 12, 2024

I think we don't need to modify the filters. They are ok as they are, if the property is LABELED as empty, it may match or mismatch, we don't need to differentiate the empty status in filters.

What we need is a way to set and check this value in the form.

It was suggested to have an extra option in the multiselect that is a fixed option saying "empty". But this is obfuscated. I think we need a more obvious and quick way for handling the empty value.

@juanmnl
Copy link

juanmnl commented Jul 24, 2024

We are adding an action to the bottom bar of the hub when viewing a PDF, that will allow the users to quickly label the whole document as [empty].

If any values are selected (checkbox) this action should deselect all.

When the action has been triggered, the button will turn inactive and a message appears explaining that the document has been [labeled as empty], until a new value is selected.

suggestions - label as empty suggestions - labeled as empty

Designs

@aphilop aphilop added Frontend 😎 Priority: High Feature Brand new functionality to be added to UWAZI labels Jul 24, 2024
@aphilop aphilop added this to the Information Extraction milestone Aug 18, 2024
@gabriel-piles
Copy link
Member

This change also affects the contract with the service. We want to be explicit about whether a value is empty and whether the suggested prediction is empty. My suggestion for passing this information back and forth to the service is as follows:

LabeledData:

tenant: str = ""
id: str = ""
xml_file_name: str = ""
entity_name: str = ""
language_iso: str = ""

--> empty_value: bool = False

label_text: str = ""
values: list[Option] = list()
source_text: str = ""
page_width: float = 0
page_height: float = 0
xml_segments_boxes: list[SegmentBox] = list()
label_segments_boxes: list[SegmentBox] = list()

Suggestion:

tenant: str
id: str
xml_file_name: str = ""
entity_name: str = ""

--> empty_suggestion: bool = False

text: str = ""
values: list[Option] = list()
segment_text: str = ""
page_number: int = 1
segments_boxes: list[SegmentBox] = list()

@txau
Copy link
Collaborator

txau commented Sep 9, 2024

@gabriel-piles I think this contract is accepted by the backenders

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants