Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems that the annotated original text are not provided? #11

Open
waylonli opened this issue Oct 3, 2024 · 5 comments
Open

It seems that the annotated original text are not provided? #11

waylonli opened this issue Oct 3, 2024 · 5 comments

Comments

@waylonli
Copy link

waylonli commented Oct 3, 2024

For long dependency QA, I suppose the 'S' key under 'qa_pairs' should be the annotated original text? But I found that most of the time the context in 'S' cannot be found in 'input'. Am I misunderstanding what 'S' is?

@waylonli
Copy link
Author

waylonli commented Oct 3, 2024

For example:
loogle_data[1]['qa_pairs'][0]['S'][0]
is

'Ciutat Vella:\n[Population (2015): 100,685;\nArea: 4.49 km2;\nDensity = 100,685 / 4.49 = 22,431 people/km2.]\nEixample:\n[Population: 263,565;\nArea: 7.46 km2;\nDensity = 263,565 / 7.46 = 35,338 people/km2.]\nSants-Montju?c:\n[Population: 180,824;\nArea: 21.35 km2;\nDensity = 180,824 / 21.35 = 8,473 people/km2.]\nLes Corts:\n[Population: 81,200;\nArea: 6.08 km2;\nDensity = 81,200 / 6.08 = 13,355 people/km2.]\nSarrià-Sant Gervasi:\n[Population: 145,761;\nArea: 20.09 km2;\nDensity = 145,761 / 20.09 = 7,254 people/km2.]\nGràcia:\n[Population: 120,273;\nArea: 4.19 km2;\nDensity = 120,273 / 4.19 = 28,710 people/km2.]\nHorta-Guinardó:\n[Population: 166,950;\nArea: 11.96 km2;\nDensity = 166,950 / 11.96 = 13,959 people/km2.]\nNou Barris:\n[Population: 164,516;\nArea: 8.04 km2;\nDensity = 164,516 / 8.04 = 20,466 people/km2.]\nSant Andreu:\n[Population: 145,983;\nArea: 6.56 km2;\nDensity = 145,983 / 6.56 = 22,246 people/km2.]\nSant Martí:\n[Population: 232,629;\nArea: 10.80 km2;\nDensity = 232,629 / 10.80 = 21,540 people/km2.]'

I completely cannot find such paragraph or sentences in the original input.

@lijiaqijane
Copy link
Collaborator

For example: loogle_data[1]['qa_pairs'][0]['S'][0] is

'Ciutat Vella:\n[Population (2015): 100,685;\nArea: 4.49 km2;\nDensity = 100,685 / 4.49 = 22,431 people/km2.]\nEixample:\n[Population: 263,565;\nArea: 7.46 km2;\nDensity = 263,565 / 7.46 = 35,338 people/km2.]\nSants-Montju?c:\n[Population: 180,824;\nArea: 21.35 km2;\nDensity = 180,824 / 21.35 = 8,473 people/km2.]\nLes Corts:\n[Population: 81,200;\nArea: 6.08 km2;\nDensity = 81,200 / 6.08 = 13,355 people/km2.]\nSarrià-Sant Gervasi:\n[Population: 145,761;\nArea: 20.09 km2;\nDensity = 145,761 / 20.09 = 7,254 people/km2.]\nGràcia:\n[Population: 120,273;\nArea: 4.19 km2;\nDensity = 120,273 / 4.19 = 28,710 people/km2.]\nHorta-Guinardó:\n[Population: 166,950;\nArea: 11.96 km2;\nDensity = 166,950 / 11.96 = 13,959 people/km2.]\nNou Barris:\n[Population: 164,516;\nArea: 8.04 km2;\nDensity = 164,516 / 8.04 = 20,466 people/km2.]\nSant Andreu:\n[Population: 145,983;\nArea: 6.56 km2;\nDensity = 145,983 / 6.56 = 22,246 people/km2.]\nSant Martí:\n[Population: 232,629;\nArea: 10.80 km2;\nDensity = 232,629 / 10.80 = 21,540 people/km2.]'

I completely cannot find such paragraph or sentences in the original input.

Which task(subset of data) do you use in this example loogle_data[1]['qa_pairs'][0]['S'][0]? Could you please provide more detail info? thanks

@waylonli
Copy link
Author

waylonli commented Oct 24, 2024

For example: loogle_data[1]['qa_pairs'][0]['S'][0] is

'Ciutat Vella:\n[Population (2015): 100,685;\nArea: 4.49 km2;\nDensity = 100,685 / 4.49 = 22,431 people/km2.]\nEixample:\n[Population: 263,565;\nArea: 7.46 km2;\nDensity = 263,565 / 7.46 = 35,338 people/km2.]\nSants-Montju?c:\n[Population: 180,824;\nArea: 21.35 km2;\nDensity = 180,824 / 21.35 = 8,473 people/km2.]\nLes Corts:\n[Population: 81,200;\nArea: 6.08 km2;\nDensity = 81,200 / 6.08 = 13,355 people/km2.]\nSarrià-Sant Gervasi:\n[Population: 145,761;\nArea: 20.09 km2;\nDensity = 145,761 / 20.09 = 7,254 people/km2.]\nGràcia:\n[Population: 120,273;\nArea: 4.19 km2;\nDensity = 120,273 / 4.19 = 28,710 people/km2.]\nHorta-Guinardó:\n[Population: 166,950;\nArea: 11.96 km2;\nDensity = 166,950 / 11.96 = 13,959 people/km2.]\nNou Barris:\n[Population: 164,516;\nArea: 8.04 km2;\nDensity = 164,516 / 8.04 = 20,466 people/km2.]\nSant Andreu:\n[Population: 145,983;\nArea: 6.56 km2;\nDensity = 145,983 / 6.56 = 22,246 people/km2.]\nSant Martí:\n[Population: 232,629;\nArea: 10.80 km2;\nDensity = 232,629 / 10.80 = 21,540 people/km2.]'

I completely cannot find such paragraph or sentences in the original input.

Which task(subset of data) do you use in this example loogle_data[1]['qa_pairs'][0]['S'][0]? Could you please provide more detail info? thanks

Hi, I was using the longdep_qa subset.

@waylonli waylonli reopened this Oct 24, 2024
@lijiaqijane
Copy link
Collaborator

hi, in the multiple information retrieval, 'S' are evidences that can be extracted directly from the original text . But other long dep tasks are more challenging and need further calculation or reasoning on the evidences to obtain the final answer. In this way, the annotators are asked to provided more details explanations and reasoning process on how to use the evidences, as the case you provided here for instance.

@waylonli
Copy link
Author

waylonli commented Oct 25, 2024

hi, in the multiple information retrieval, 'S' are evidences that can be extracted directly from the original text . But other long dep tasks are more challenging and need further calculation or reasoning on the evidences to obtain the final answer. In this way, the annotators are asked to provided more details explanations and reasoning process on how to use the evidences, as the case you provided here for instance.

Thanks for your reply. But I think it's actually a format issue in the dataset where the sentence in 'S' does not strictly match the original context in the 'input'. Please see the following example I extracted from the dataset:

From 'S':

"His father was Alvaro Picardo de\nCelis and his mother's family name was Castellón. He had four brothers,one of brothers died in infancy."

From 'input':

His father was Alvaro Picardo de Celis and his mother's family name was Castellón. He had four brothers, one of whom died in infancy.

Where 'S' contains some unexpected \n and "one of brothers" should be "one of whom" according to the original context. This kind of typos make it hard for us to process this dataset and cite your work. I did a quick test in longdep_qa, only 921 out of 3429 sentences in 'S' can be found by exact matching in the original context, after I replace \n with whitespace. It means 2508 cannot be found in the original context by exact matching and it's time-consuming for us to manually check these 2508 samples one by one and figure out what's the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants