Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[book] indicate l. if only lines differ #284

Open
tuurma opened this issue Mar 9, 2020 · 19 comments
Open

[book] indicate l. if only lines differ #284

tuurma opened this issue Mar 9, 2020 · 19 comments

Comments

@tuurma
Copy link
Member

tuurma commented Mar 9, 2020

Same book, same text, line numbers indicated

IGLS XXI.5 29, 1 -> IGLS XXI.5 29, 1
IGLS XXI.5 29, 2 -> ib. l. 2

image

MT: ... detecting line numbers in the details field can be problematic. I would suggest creating separate line field to hold it explicitly. Then I could try to automate the line number extraction on a per abbreviation basis, similar to what we did when systematizing volume numbers. Would you have any suggestions for common patterns? Final number after the comma seems to often be the line number but I've seen also entries like p. 39 no. 3, so I wonder if the number after no. is also the line? Then there are entries like A Pers. 29, 302, 972 where I don't suppose 972 is a line number?

RC: In the past, there were very strict rules governing the use of commas, essentially for distinguishing line numbers. That is no longer the case, so I am rather at a loss to suggest how to resolve this problem. I think it is probably true that the vast majority of commas relate to line numbers, but there are also strings of numbers referring to chapters separated by commas. It would have been better in retrospect if semi-colons had been used instead. Is there any way of generating a list which would not totally overwhelm us with irrelevant entries? In the two examples you cite, what follows the comma in each case is in fact a line number. But you are likely to find others which are not (e.g. J., +BJ or J., +AJ).

@tuurma
Copy link
Member Author

tuurma commented Mar 9, 2020

Ordering the abbreviations by number of references there are:

  • 9 abbreviation with > 1000 references (IGLS, SEG, PDura, CIIP, ChLA, RE, IG, Meimaris_Chronological_Systems)
  • 84 > 100
  • 128 > 50
  • 235 > 20

I'd suggest to concentrate on the most common abbreviations to figure out what the predominant patterns are.

Initial results for IGLS show that majority of entries matching , (\d)+$) pattern (ending with , number) (bit below 3k cases out of ~9k total IGLS references could be automatically converted)

@tuurma
Copy link
Member Author

tuurma commented Apr 21, 2020

As a preparatory step I extended our xml template to store the line number explicitly

declare namespace tei="http://www.tei-c.org/ns/1.0";

for $bibl in collection('/db/apps/lgpn-data/data/persons')//tei:bibl[not(@type='volume')][not(tei:note[@type='line'])]
let $add := <note xmlns="http://www.tei-c.org/ns/1.0" type="line"/>
return 
    
    update insert $add following $bibl/tei:ref

and adjusted the input form accordingly; please note that the Linking field has been moved up and now is placed in the same row with Line

image

@tuurma
Copy link
Member Author

tuurma commented Apr 22, 2020

@michaelzellmann I have prepared a conversion list, in the first instance tackling just most popular entries with simple cases that just ends with , number pattern. If you could have a glance at the conversion suggestions below if they look reasonable and let me know

IGLS

SEG

IG

@michaelzellmann
Copy link
Collaborator

michaelzellmann commented Apr 22, 2020 via email

@tuurma
Copy link
Member Author

tuurma commented Apr 22, 2020

Thanks, I've fixed the link so it leads to the person input form.

I will run the conversion now for IGLS, SEG and IG and attach the logs here.

singlecomma-log.zip

@tuurma
Copy link
Member Author

tuurma commented Apr 22, 2020

After running the conversion other cases containing comma but not matching the pattern of final comma and number

SEG

  1. SEG XLVIII 1868, [1] Μαρώνις (comma and [number])
  2. SEG XLI 1530, 8, 75 Ζώη (multiple commas)
  3. SEG LV 1053 A, 9; B, 15 Οὐεττινιανός
  4. SEG XLIII 1026B, D Μαρῖνος

Could you please confirm if following handling is appropriate

  1. treat number in [] as a line number -> l. [1]
  2. treat final comma-separated numbers as line number -> l. 8, 75
  3. split into two bibl. entries? LV 1053 A l. 9 and LV 1053 B l. 15
  4. leave as is, I suspect B and D are not line numbers?

@tuurma
Copy link
Member Author

tuurma commented Apr 22, 2020

IGLS

  1. IGLS II 466, [2] -> same as SEG case 1
  2. IGLS XVI (1) 289, 1, 3 -> same as SEG case 2
  3. IGLS XVII (1) 477 a, 1; b, 2 -> same as SEG case 3
  4. IGLS III (2) 1183, 3, 21, 31 -> multiple line numbers, variant of case 2
  5. IGLS XVII (1) 536 a, 1; b, 1; c, 2 -> multiple entries, variant of case 3

@tuurma
Copy link
Member Author

tuurma commented Apr 22, 2020

IG very few remaining cases like IG XI (4) 772, 3, 15 (same as SEG case 2) and the rest could be handled manually

@michaelzellmann
Copy link
Collaborator

michaelzellmann commented Apr 22, 2020 via email

@tuurma
Copy link
Member Author

tuurma commented Apr 24, 2020

As we're slowly converting database entries, I'm now working on the LaTeX generating scripts

Here's a test case for Γέμελλα, in Heliopolis we should have

(2) IGLS vi 2751, 3
(3) ib. l.4

Original bibl. entry for (3) is IGLS vi 2751, 4

image

@michaelzellmann
Copy link
Collaborator

michaelzellmann commented Apr 24, 2020 via email

@tuurma
Copy link
Member Author

tuurma commented Apr 24, 2020

Yes, I saw you were working in the Google doc, many thanks!

Meanwhile I have some progress with presenting ib with lines but need to test if there are no regressions in other cases

image

@michaelzellmann
Copy link
Collaborator

michaelzellmann commented Apr 24, 2020 via email

@tuurma
Copy link
Member Author

tuurma commented Apr 24, 2020

Thanks, fixed

image

@tuurma
Copy link
Member Author

tuurma commented Apr 28, 2020

Thanks to Michael's list I could convert further entries matching the final comma-number pattern
for following abbreviations (log file attached)

 "IPalTertia", "ISyrie", "AAES", "ITyr", "IGerasa", "MUSJ", "ZDPV", "IWadi_Haggag", "YCS", "Nessana", "IJO", "Hajjar", "IPalTertia_west", "Dussaud_Macler_Mission", "IMSoueida", "SEMA", "INegev", "Lörincz", "PEQ", "DainIGLouvre", "MFO",  "Mouterde_Limes", "BCH", "ILS", "IIasos", "CIJ", "IDR", "Ovadiah_MPI", "Resafa", "FroehnerInscrLouvre", "SBF", "PMasada", "Topoi", "PferdehirtMilitärdiplome", "IGR", "KayserRecueil", "Mittmann_Beiträge", "ISmyrna", "RMD", "Clermont_Ganneau_RAO", "DOP", "IAntMaroc", "BAAL", "IAquil", "RA", "JIWE", "Pall", "Brünnow_Domaszewski_PA", "IEJ", "MendelCat", "CrowfootObjectsfromSamaria", "Old_Syriac_Inscriptions"

Here are counts of entries for each abbreviations that have line filled currently:
singlecomma-Michaelslist-log.html.zip

  • IGLS 3599
  • SEG 2952
  • IG 661
  • CIIP 265
  • IGerasa 260
  • PDura 232
  • IWadi_Haggag 188
  • ITyr 170
  • Nessana 149
  • AAES 141
  • IMSoueida 106
  • ISyrie 104
  • IPalTertia_west 103
  • SEMA 61
  • PEQ 49
  • IIasos 46
  • DainIGLouvre 44
  • IDR 38
  • INegev 37
  • Dussaud_Macler_Mission 34
  • MUSJ 34
  • YCS 32
  • Mouterde_Limes 31
  • MFO 30
  • BCH 27
  • KayserRecueil 24
  • PferdehirtMilitärdiplome 23
  • ISmyrna 22
  • RMD 21
  • FroehnerInscrLouvre 21
  • CIJ 18
  • IAquil 15
  • IPalTertia 14
  • MendelCat 13
  • IAntMaroc 13
  • Mittmann_Beiträge 12
  • JIWE 12
  • PMasada 11
  • Clermont_Ganneau_RAO 11
  • ZDPV 10
  • SBF 10
  • CrowfootObjectsfromSamaria 9
  • Brünnow_Domaszewski_PA 8
  • DOP 8
  • IGR 7
  • RA 7
  • Ovadiah_MPI 6
  • Resafa 5
  • IEJ 5
  • IJO 5
  • ILS 4
  • ChLA 4
  • BAAL 2
  • Lörincz 1
  • Topoi 1
  • Old_Syriac_Inscriptions 1
  • Hajjar 1
  • Pall 1
  • Meimaris_Chronological_Systems 1

@tuurma
Copy link
Member Author

tuurma commented Apr 29, 2020

After converting the single comma-number pattern matches for selected abbreviations yesterday, today I've prepared the conversion for patterns where there are multiple comma-separated numbers at the end and/or some numbers are in brackets (cases 1 and 2 as discussed here)

I've run the would-be conversion (generating new values but without applying) for a handful of most common abbreviations
biblLines.pdf

Looking at these results, I'd suggest to

  1. go ahead applying this pattern for "IGLS", "SEG", "CIIP", "IG", "TEAD", "ISyrie", "IMnBeyrouth", "AAES"
  2. but refrain doing so on "PDura", "PNess", "J"

There are no matches for other most common abbreviations: "ChLA", "RE", "Meimaris_Chronological_Systems", "FRA", "SchiefferACOIndexProsopogr", "DCB", "IPalTertia", "PLRE", "Justi", "IMoab", "PIR2"

@michaelzellmann
Copy link
Collaborator

michaelzellmann commented Apr 29, 2020 via email

@tuurma
Copy link
Member Author

tuurma commented Apr 29, 2020

Thanks for super-fast response, I will run it in the evening then (after 6pm in Oxford and after triggering backup, as usual)

@tuurma
Copy link
Member Author

tuurma commented Apr 29, 2020

I've just ran the conversion for "IGLS", "SEG", "CIIP", "IG", "TEAD", "ISyrie", "IMnBeyrouth", "AAES", logs are attached.

Current numbers for entries with line field filled

  • IGLS 3660
  • SEG 3011
  • IG 696
  • TEAD 573
  • IMnBeyrouth 339
  • CIIP 266
  • IGerasa 260
  • IWadi_Haggag 188
  • ITyr 171
  • Nessana 149
  • AAES 142
  • ISyrie 106
  • IMSoueida 106
  • IPalTertia_west 103
  • SEMA 61
  • PEQ 49
  • IIasos 46
  • DainIGLouvre 44
  • IDR 38
  • INegev 37
  • Dussaud_Macler_Mission 34
  • MUSJ 34
  • YCS 32
  • Mouterde_Limes 31
  • MFO 31
  • BCH 27
  • KayserRecueil 24
  • PferdehirtMilitärdiplome 23
  • ISmyrna 22
  • RMD 21
  • FroehnerInscrLouvre 21
  • CIJ 18
  • IAquil 15
  • IPalTertia 14
  • MendelCat 13
  • IAntMaroc 13
  • Mittmann_Beiträge 12
  • JIWE 12
  • PMasada 11
  • Clermont_Ganneau_RAO 11
  • ZDPV 10
  • SBF 10
  • CrowfootObjectsfromSamaria 9
  • Brünnow_Domaszewski_PA 8
  • DOP 8
  • IGR 7
  • RA 7
  • Ovadiah_MPI 6
  • Resafa 5
  • IEJ 5
  • IJO 5
  • ILS 4
  • ChLA 4
  • BAAL 2
  • Lörincz 1
  • Topoi 1
  • Old_Syriac_Inscriptions 1
  • Hajjar 1
  • Pall 1
  • Meimaris_Chronological_Systems 1

finalcommaseries.pdf

finalcommaseries-log.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants