It seems that the annotated original text are not provided? #11

waylonli · 2024-10-03T23:23:30Z

For long dependency QA, I suppose the 'S' key under 'qa_pairs' should be the annotated original text? But I found that most of the time the context in 'S' cannot be found in 'input'. Am I misunderstanding what 'S' is?

waylonli · 2024-10-03T23:30:02Z

For example:
loogle_data[1]['qa_pairs'][0]['S'][0]
is

'Ciutat Vella:\n[Population (2015): 100,685;\nArea: 4.49 km2;\nDensity = 100,685 / 4.49 = 22,431 people/km2.]\nEixample:\n[Population: 263,565;\nArea: 7.46 km2;\nDensity = 263,565 / 7.46 = 35,338 people/km2.]\nSants-Montju?c:\n[Population: 180,824;\nArea: 21.35 km2;\nDensity = 180,824 / 21.35 = 8,473 people/km2.]\nLes Corts:\n[Population: 81,200;\nArea: 6.08 km2;\nDensity = 81,200 / 6.08 = 13,355 people/km2.]\nSarrià-Sant Gervasi:\n[Population: 145,761;\nArea: 20.09 km2;\nDensity = 145,761 / 20.09 = 7,254 people/km2.]\nGràcia:\n[Population: 120,273;\nArea: 4.19 km2;\nDensity = 120,273 / 4.19 = 28,710 people/km2.]\nHorta-Guinardó:\n[Population: 166,950;\nArea: 11.96 km2;\nDensity = 166,950 / 11.96 = 13,959 people/km2.]\nNou Barris:\n[Population: 164,516;\nArea: 8.04 km2;\nDensity = 164,516 / 8.04 = 20,466 people/km2.]\nSant Andreu:\n[Population: 145,983;\nArea: 6.56 km2;\nDensity = 145,983 / 6.56 = 22,246 people/km2.]\nSant Martí:\n[Population: 232,629;\nArea: 10.80 km2;\nDensity = 232,629 / 10.80 = 21,540 people/km2.]'

I completely cannot find such paragraph or sentences in the original input.

lijiaqijane · 2024-10-07T07:19:32Z

For example: loogle_data[1]['qa_pairs'][0]['S'][0] is

'Ciutat Vella:\n[Population (2015): 100,685;\nArea: 4.49 km2;\nDensity = 100,685 / 4.49 = 22,431 people/km2.]\nEixample:\n[Population: 263,565;\nArea: 7.46 km2;\nDensity = 263,565 / 7.46 = 35,338 people/km2.]\nSants-Montju?c:\n[Population: 180,824;\nArea: 21.35 km2;\nDensity = 180,824 / 21.35 = 8,473 people/km2.]\nLes Corts:\n[Population: 81,200;\nArea: 6.08 km2;\nDensity = 81,200 / 6.08 = 13,355 people/km2.]\nSarrià-Sant Gervasi:\n[Population: 145,761;\nArea: 20.09 km2;\nDensity = 145,761 / 20.09 = 7,254 people/km2.]\nGràcia:\n[Population: 120,273;\nArea: 4.19 km2;\nDensity = 120,273 / 4.19 = 28,710 people/km2.]\nHorta-Guinardó:\n[Population: 166,950;\nArea: 11.96 km2;\nDensity = 166,950 / 11.96 = 13,959 people/km2.]\nNou Barris:\n[Population: 164,516;\nArea: 8.04 km2;\nDensity = 164,516 / 8.04 = 20,466 people/km2.]\nSant Andreu:\n[Population: 145,983;\nArea: 6.56 km2;\nDensity = 145,983 / 6.56 = 22,246 people/km2.]\nSant Martí:\n[Population: 232,629;\nArea: 10.80 km2;\nDensity = 232,629 / 10.80 = 21,540 people/km2.]'

I completely cannot find such paragraph or sentences in the original input.

Which task(subset of data) do you use in this example loogle_data[1]['qa_pairs'][0]['S'][0]? Could you please provide more detail info? thanks

waylonli · 2024-10-24T15:53:26Z

For example: loogle_data[1]['qa_pairs'][0]['S'][0] is

'Ciutat Vella:\n[Population (2015): 100,685;\nArea: 4.49 km2;\nDensity = 100,685 / 4.49 = 22,431 people/km2.]\nEixample:\n[Population: 263,565;\nArea: 7.46 km2;\nDensity = 263,565 / 7.46 = 35,338 people/km2.]\nSants-Montju?c:\n[Population: 180,824;\nArea: 21.35 km2;\nDensity = 180,824 / 21.35 = 8,473 people/km2.]\nLes Corts:\n[Population: 81,200;\nArea: 6.08 km2;\nDensity = 81,200 / 6.08 = 13,355 people/km2.]\nSarrià-Sant Gervasi:\n[Population: 145,761;\nArea: 20.09 km2;\nDensity = 145,761 / 20.09 = 7,254 people/km2.]\nGràcia:\n[Population: 120,273;\nArea: 4.19 km2;\nDensity = 120,273 / 4.19 = 28,710 people/km2.]\nHorta-Guinardó:\n[Population: 166,950;\nArea: 11.96 km2;\nDensity = 166,950 / 11.96 = 13,959 people/km2.]\nNou Barris:\n[Population: 164,516;\nArea: 8.04 km2;\nDensity = 164,516 / 8.04 = 20,466 people/km2.]\nSant Andreu:\n[Population: 145,983;\nArea: 6.56 km2;\nDensity = 145,983 / 6.56 = 22,246 people/km2.]\nSant Martí:\n[Population: 232,629;\nArea: 10.80 km2;\nDensity = 232,629 / 10.80 = 21,540 people/km2.]'

I completely cannot find such paragraph or sentences in the original input.

Which task(subset of data) do you use in this example loogle_data[1]['qa_pairs'][0]['S'][0]? Could you please provide more detail info? thanks

Hi, I was using the longdep_qa subset.

lijiaqijane · 2024-10-25T01:45:23Z

hi， in the multiple information retrieval, 'S' are evidences that can be extracted directly from the original text . But other long dep tasks are more challenging and need further calculation or reasoning on the evidences to obtain the final answer. In this way, the annotators are asked to provided more details explanations and reasoning process on how to use the evidences, as the case you provided here for instance.

waylonli · 2024-10-25T11:45:47Z

hi， in the multiple information retrieval, 'S' are evidences that can be extracted directly from the original text . But other long dep tasks are more challenging and need further calculation or reasoning on the evidences to obtain the final answer. In this way, the annotators are asked to provided more details explanations and reasoning process on how to use the evidences, as the case you provided here for instance.

Thanks for your reply. But I think it's actually a format issue in the dataset where the sentence in 'S' does not strictly match the original context in the 'input'. Please see the following example I extracted from the dataset:

From 'S':

"His father was Alvaro Picardo de\nCelis and his mother's family name was Castellón. He had four brothers,one of brothers died in infancy."

From 'input':

His father was Alvaro Picardo de Celis and his mother's family name was Castellón. He had four brothers, one of whom died in infancy.

Where 'S' contains some unexpected \n and "one of brothers" should be "one of whom" according to the original context. This kind of typos make it hard for us to process this dataset and cite your work. I did a quick test in longdep_qa, only 921 out of 3429 sentences in 'S' can be found by exact matching in the original context, after I replace \n with whitespace. It means 2508 cannot be found in the original context by exact matching and it's time-consuming for us to manually check these 2508 samples one by one and figure out what's the issue.

waylonli closed this as completed Oct 24, 2024

waylonli reopened this Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It seems that the annotated original text are not provided? #11

It seems that the annotated original text are not provided? #11

waylonli commented Oct 3, 2024

waylonli commented Oct 3, 2024

lijiaqijane commented Oct 7, 2024

waylonli commented Oct 24, 2024 •

edited

Loading

lijiaqijane commented Oct 25, 2024

waylonli commented Oct 25, 2024 •

edited

Loading

It seems that the annotated original text are not provided? #11

It seems that the annotated original text are not provided? #11

Comments

waylonli commented Oct 3, 2024

waylonli commented Oct 3, 2024

lijiaqijane commented Oct 7, 2024

waylonli commented Oct 24, 2024 • edited Loading

lijiaqijane commented Oct 25, 2024

waylonli commented Oct 25, 2024 • edited Loading

waylonli commented Oct 24, 2024 •

edited

Loading

waylonli commented Oct 25, 2024 •

edited

Loading