[book] indicate l. if only lines differ #284

tuurma · 2020-03-09T09:36:55Z

Same book, same text, line numbers indicated

IGLS XXI.5 29, 1 -> IGLS XXI.5 29, 1
IGLS XXI.5 29, 2 -> ib. l. 2

MT: ... detecting line numbers in the details field can be problematic. I would suggest creating separate line field to hold it explicitly. Then I could try to automate the line number extraction on a per abbreviation basis, similar to what we did when systematizing volume numbers. Would you have any suggestions for common patterns? Final number after the comma seems to often be the line number but I've seen also entries like p. 39 no. 3, so I wonder if the number after no. is also the line? Then there are entries like A Pers. 29, 302, 972 where I don't suppose 972 is a line number?

RC: In the past, there were very strict rules governing the use of commas, essentially for distinguishing line numbers. That is no longer the case, so I am rather at a loss to suggest how to resolve this problem. I think it is probably true that the vast majority of commas relate to line numbers, but there are also strings of numbers referring to chapters separated by commas. It would have been better in retrospect if semi-colons had been used instead. Is there any way of generating a list which would not totally overwhelm us with irrelevant entries? In the two examples you cite, what follows the comma in each case is in fact a line number. But you are likely to find others which are not (e.g. J., +BJ or J., +AJ).

The text was updated successfully, but these errors were encountered:

tuurma · 2020-03-09T12:50:49Z

Ordering the abbreviations by number of references there are:

9 abbreviation with > 1000 references (IGLS, SEG, PDura, CIIP, ChLA, RE, IG, Meimaris_Chronological_Systems)
84 > 100
128 > 50
235 > 20

I'd suggest to concentrate on the most common abbreviations to figure out what the predominant patterns are.

Initial results for IGLS show that majority of entries matching , (\d)+$) pattern (ending with , number) (bit below 3k cases out of ~9k total IGLS references could be automatically converted)

all IGLS
ending with , number ~4k total, single comma ~3k
all SEG
ending with , number 6.3 total, single comma ~2.9k
all PDura
ending with , number ~2k total, with comma only about 300 but much more variation, may require some manual checks first
all CIIP
ending with , number ~2k total, not many with comma; check the dot in entries like CIIP I (2) 842.15 Αβιδελλα
ChLA very few with commas
IG with commas majority simple to convert (650 with single 1 comma pattern); some with dots, some with no.
all Meimaris, majority has no., e.g. Meimaris, +Chronological +Systems p. 189 no. 103Ιδδος

tuurma · 2020-04-21T15:04:11Z

As a preparatory step I extended our xml template to store the line number explicitly

declare namespace tei="http://www.tei-c.org/ns/1.0";

for $bibl in collection('/db/apps/lgpn-data/data/persons')//tei:bibl[not(@type='volume')][not(tei:note[@type='line'])]
let $add := <note xmlns="http://www.tei-c.org/ns/1.0" type="line"/>
return 
    
    update insert $add following $bibl/tei:ref

and adjusted the input form accordingly; please note that the Linking field has been moved up and now is placed in the same row with Line

tuurma · 2020-04-22T11:13:19Z

@michaelzellmann I have prepared a conversion list, in the first instance tackling just most popular entries with simple cases that just ends with , number pattern. If you could have a glance at the conversion suggestions below if they look reasonable and let me know

IGLS

SEG

IG

michaelzellmann · 2020-04-22T11:22:28Z

Many thanks, Magdalena, the three lists look ok to me. Should I be able to see anything by clicking on the links at right? Right now I see only this error: [cid:A84A17A2-1A1C-4F76-A010-797F0D60670C] On Apr 22, 2020, at 12:13 PM, Magdalena Turska <[email protected]<mailto:[email protected]>> wrote: @michaelzellmann<https://github.com/michaelzellmann> I have prepared a conversion list, in the first instance tackling just most popular entries with simple cases that just ends with , number pattern. If you could have a glance at the conversion suggestions below if they look reasonable and let me know IGLS<http://clas-lgpn4.classics.ox.ac.uk:8080/exist/apps/lgpn-editor/modules/tools/biblLines.xq?bibl=IGLS> SEG<http://clas-lgpn4.classics.ox.ac.uk:8080/exist/apps/lgpn-editor/modules/tools/biblLines.xq?bibl=SEG> IG<http://clas-lgpn4.classics.ox.ac.uk:8080/exist/apps/lgpn-editor/modules/tools/biblLines.xq?bibl=IG> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#284 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE55QHEFFZLNXIPUSXF6A7LRN3GN5ANCNFSM4LEEUEGA>.

tuurma · 2020-04-22T12:29:02Z

Thanks, I've fixed the link so it leads to the person input form.

I will run the conversion now for IGLS, SEG and IG and attach the logs here.

singlecomma-log.zip

tuurma · 2020-04-22T12:58:19Z

After running the conversion other cases containing comma but not matching the pattern of final comma and number

SEG

SEG XLVIII 1868, [1] Μαρώνις (comma and [number])
SEG XLI 1530, 8, 75 Ζώη (multiple commas)
SEG LV 1053 A, 9; B, 15 Οὐεττινιανός
SEG XLIII 1026B, D Μαρῖνος

Could you please confirm if following handling is appropriate

treat number in [] as a line number -> l. [1]
treat final comma-separated numbers as line number -> l. 8, 75
split into two bibl. entries? LV 1053 A l. 9 and LV 1053 B l. 15
leave as is, I suspect B and D are not line numbers?

tuurma · 2020-04-22T13:03:59Z

IGLS

IGLS II 466, [2] -> same as SEG case 1
IGLS XVI (1) 289, 1, 3 -> same as SEG case 2
IGLS XVII (1) 477 a, 1; b, 2 -> same as SEG case 3
IGLS III (2) 1183, 3, 21, 31 -> multiple line numbers, variant of case 2
IGLS XVII (1) 536 a, 1; b, 1; c, 2 -> multiple entries, variant of case 3

tuurma · 2020-04-22T13:06:03Z

IG very few remaining cases like IG XI (4) 772, 3, 15 (same as SEG case 2) and the rest could be handled manually

michaelzellmann · 2020-04-22T13:08:33Z

Please see below for answers between lines On Apr 22, 2020, at 1:58 PM, Magdalena Turska <[email protected]<mailto:[email protected]>> wrote: After running the conversion other cases containing comma but not matching the pattern of final comma and number SEG<http://clas-lgpn4.classics.ox.ac.uk:8080/exist/apps/lgpn-editor/modules/bibl-lines.xq?bibl=SEG> 1. SEG XLVIII 1868, [1] Μαρώνις (comma and [number]) 2. SEG XLI 1530, 8, 75 Ζώη (multiple commas) 3. SEG LV 1053 A, 9; B, 15 Οὐεττινιανός 4. SEG XLIII 1026B, D Μαρῖνος Could you please confirm if following handling is appropriate 1. treat number in [] as a line number -> l. [1] Correct 1. treat final comma-separated numbers as line number -> l. 8, 75 Correct 1. split into two bibl. entries? LV 1053 A l. 9 and LV 1053 B l. 15 Correct 1. leave as is, I suspect B and D are not line numbers? Correct, B and D are part of the “details” and not the line number — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#284 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE55QHAUGYNFRA2JMIQK66TRN3SXVANCNFSM4LEEUEGA>.

tuurma · 2020-04-24T10:56:22Z

As we're slowly converting database entries, I'm now working on the LaTeX generating scripts

Here's a test case for Γέμελλα, in Heliopolis we should have

(2) IGLS vi 2751, 3
(3) ib. l.4

Original bibl. entry for (3) is IGLS vi 2751, 4

michaelzellmann · 2020-04-24T10:59:01Z

Correct, thanks. I am still working through your list of the Yes / Maybe / No entries. On Apr 24, 2020, at 11:56 AM, Magdalena Turska <[email protected]<mailto:[email protected]>> wrote: As we're slowly converting database entries, I'm now working on the LaTeX generating scripts Here's a test case for Γέμελλα, in Heliopolis we should have (2) IGLS vi 2751, 3 (3) ib. l.4 Original bibl. entry for (3) is IGLS vi 2751, 4 [image]<https://user-images.githubusercontent.com/449468/80205340-bb755a00-862a-11ea-80e9-205333040d47.png> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#284 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE55QHH772JVE33VSX4Y4BTROFV6JANCNFSM4LEEUEGA>.

tuurma · 2020-04-24T11:42:08Z

Yes, I saw you were working in the Google doc, many thanks!

Meanwhile I have some progress with presenting ib with lines but need to test if there are no regressions in other cases

michaelzellmann · 2020-04-24T11:53:13Z

Might be worth checking with Richard but I believe there should be a space after l., i.e. here “ib. l. 4"

tuurma · 2020-04-24T15:09:23Z

Thanks, fixed

tuurma · 2020-04-28T17:25:24Z

Thanks to Michael's list I could convert further entries matching the final comma-number pattern
for following abbreviations (log file attached)

 "IPalTertia", "ISyrie", "AAES", "ITyr", "IGerasa", "MUSJ", "ZDPV", "IWadi_Haggag", "YCS", "Nessana", "IJO", "Hajjar", "IPalTertia_west", "Dussaud_Macler_Mission", "IMSoueida", "SEMA", "INegev", "Lörincz", "PEQ", "DainIGLouvre", "MFO",  "Mouterde_Limes", "BCH", "ILS", "IIasos", "CIJ", "IDR", "Ovadiah_MPI", "Resafa", "FroehnerInscrLouvre", "SBF", "PMasada", "Topoi", "PferdehirtMilitärdiplome", "IGR", "KayserRecueil", "Mittmann_Beiträge", "ISmyrna", "RMD", "Clermont_Ganneau_RAO", "DOP", "IAntMaroc", "BAAL", "IAquil", "RA", "JIWE", "Pall", "Brünnow_Domaszewski_PA", "IEJ", "MendelCat", "CrowfootObjectsfromSamaria", "Old_Syriac_Inscriptions"

Here are counts of entries for each abbreviations that have line filled currently:
singlecomma-Michaelslist-log.html.zip

IGLS 3599
SEG 2952
IG 661
CIIP 265
IGerasa 260
PDura 232
IWadi_Haggag 188
ITyr 170
Nessana 149
AAES 141
IMSoueida 106
ISyrie 104
IPalTertia_west 103
SEMA 61
PEQ 49
IIasos 46
DainIGLouvre 44
IDR 38
INegev 37
Dussaud_Macler_Mission 34
MUSJ 34
YCS 32
Mouterde_Limes 31
MFO 30
BCH 27
KayserRecueil 24
PferdehirtMilitärdiplome 23
ISmyrna 22
RMD 21
FroehnerInscrLouvre 21
CIJ 18
IAquil 15
IPalTertia 14
MendelCat 13
IAntMaroc 13
Mittmann_Beiträge 12
JIWE 12
PMasada 11
Clermont_Ganneau_RAO 11
ZDPV 10
SBF 10
CrowfootObjectsfromSamaria 9
Brünnow_Domaszewski_PA 8
DOP 8
IGR 7
RA 7
Ovadiah_MPI 6
Resafa 5
IEJ 5
IJO 5
ILS 4
ChLA 4
BAAL 2
Lörincz 1
Topoi 1
Old_Syriac_Inscriptions 1
Hajjar 1
Pall 1
Meimaris_Chronological_Systems 1

tuurma · 2020-04-29T11:19:20Z

After converting the single comma-number pattern matches for selected abbreviations yesterday, today I've prepared the conversion for patterns where there are multiple comma-separated numbers at the end and/or some numbers are in brackets (cases 1 and 2 as discussed here)

I've run the would-be conversion (generating new values but without applying) for a handful of most common abbreviations
biblLines.pdf

Looking at these results, I'd suggest to

go ahead applying this pattern for "IGLS", "SEG", "CIIP", "IG", "TEAD", "ISyrie", "IMnBeyrouth", "AAES"
but refrain doing so on "PDura", "PNess", "J"

There are no matches for other most common abbreviations: "ChLA", "RE", "Meimaris_Chronological_Systems", "FRA", "SchiefferACOIndexProsopogr", "DCB", "IPalTertia", "PLRE", "Justi", "IMoab", "PIR2"

michaelzellmann · 2020-04-29T11:24:08Z

Thanks, this looks ok for 1. Definitely not “J” in 2. as that is a literary text, it has no line numbers. PDura and PNess will be mostly long strings with many line numbers separated by commas, which can be done manually if not automated. On Apr 29, 2020, at 12:19 PM, Magdalena Turska <[email protected]<mailto:[email protected]>> wrote: After converting the single comma-number pattern matches for selected abbreviations yesterday, today I've prepared the conversion for patterns where there are multiple comma-separated numbers at the end and/or some numbers are in brackets (cases 1 and 2 as discussed here<#284 (comment)>) I've run the would-be conversion (generating new values but without applying) for a handful of most common abbreviations biblLines.pdf<https://github.com/eXistSolutions/LGPN/files/4551310/biblLines.pdf> Looking at these results, I'd suggest to 1. go ahead applying this pattern for "IGLS", "SEG", "CIIP", "IG", "TEAD", "ISyrie", "IMnBeyrouth", "AAES" 2. but refrain doing so on "PDura", "PNess", "J" There are no matches for other most common abbreviations: "ChLA", "RE", "Meimaris_Chronological_Systems", "FRA", "SchiefferACOIndexProsopogr", "DCB", "IPalTertia", "PLRE", "Justi", "IMoab", "PIR2" — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#284 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE55QHCUPNS7TCMTPTYBPELRPAEMNANCNFSM4LEEUEGA>.

tuurma · 2020-04-29T11:26:30Z

Thanks for super-fast response, I will run it in the evening then (after 6pm in Oxford and after triggering backup, as usual)

tuurma · 2020-04-29T17:18:05Z

I've just ran the conversion for "IGLS", "SEG", "CIIP", "IG", "TEAD", "ISyrie", "IMnBeyrouth", "AAES", logs are attached.

Current numbers for entries with line field filled

IGLS 3660
SEG 3011
IG 696
TEAD 573
IMnBeyrouth 339
CIIP 266
IGerasa 260
IWadi_Haggag 188
ITyr 171
Nessana 149
AAES 142
ISyrie 106
IMSoueida 106
IPalTertia_west 103
SEMA 61
PEQ 49
IIasos 46
DainIGLouvre 44
IDR 38
INegev 37
Dussaud_Macler_Mission 34
MUSJ 34
YCS 32
Mouterde_Limes 31
MFO 31
BCH 27
KayserRecueil 24
PferdehirtMilitärdiplome 23
ISmyrna 22
RMD 21
FroehnerInscrLouvre 21
CIJ 18
IAquil 15
IPalTertia 14
MendelCat 13
IAntMaroc 13
Mittmann_Beiträge 12
JIWE 12
PMasada 11
Clermont_Ganneau_RAO 11
ZDPV 10
SBF 10
CrowfootObjectsfromSamaria 9
Brünnow_Domaszewski_PA 8
DOP 8
IGR 7
RA 7
Ovadiah_MPI 6
Resafa 5
IEJ 5
IJO 5
ILS 4
ChLA 4
BAAL 2
Lörincz 1
Topoi 1
Old_Syriac_Inscriptions 1
Hajjar 1
Pall 1
Meimaris_Chronological_Systems 1

finalcommaseries.pdf

finalcommaseries-log.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[book] indicate l. if only lines differ #284

[book] indicate l. if only lines differ #284

tuurma commented Mar 9, 2020 •

edited

Loading

tuurma commented Mar 9, 2020

tuurma commented Apr 21, 2020 •

edited

Loading

tuurma commented Apr 22, 2020

michaelzellmann commented Apr 22, 2020 via email

tuurma commented Apr 22, 2020 •

edited

Loading

tuurma commented Apr 22, 2020

tuurma commented Apr 22, 2020 •

edited

Loading

tuurma commented Apr 22, 2020

michaelzellmann commented Apr 22, 2020 via email

tuurma commented Apr 24, 2020

michaelzellmann commented Apr 24, 2020 via email

tuurma commented Apr 24, 2020

michaelzellmann commented Apr 24, 2020 via email •

edited by tuurma

Loading

tuurma commented Apr 24, 2020

tuurma commented Apr 28, 2020

tuurma commented Apr 29, 2020

michaelzellmann commented Apr 29, 2020 via email

tuurma commented Apr 29, 2020

tuurma commented Apr 29, 2020 •

edited

Loading

[book] indicate l. if only lines differ #284

[book] indicate l. if only lines differ #284

Comments

tuurma commented Mar 9, 2020 • edited Loading

tuurma commented Mar 9, 2020

tuurma commented Apr 21, 2020 • edited Loading

tuurma commented Apr 22, 2020

michaelzellmann commented Apr 22, 2020 via email

tuurma commented Apr 22, 2020 • edited Loading

tuurma commented Apr 22, 2020

tuurma commented Apr 22, 2020 • edited Loading

tuurma commented Apr 22, 2020

michaelzellmann commented Apr 22, 2020 via email

tuurma commented Apr 24, 2020

michaelzellmann commented Apr 24, 2020 via email

tuurma commented Apr 24, 2020

michaelzellmann commented Apr 24, 2020 via email • edited by tuurma Loading

tuurma commented Apr 24, 2020

tuurma commented Apr 28, 2020

tuurma commented Apr 29, 2020

michaelzellmann commented Apr 29, 2020 via email

tuurma commented Apr 29, 2020

tuurma commented Apr 29, 2020 • edited Loading

tuurma commented Mar 9, 2020 •

edited

Loading

tuurma commented Apr 21, 2020 •

edited

Loading

tuurma commented Apr 22, 2020 •

edited

Loading

tuurma commented Apr 22, 2020 •

edited

Loading

michaelzellmann commented Apr 24, 2020 via email •

edited by tuurma

Loading

tuurma commented Apr 29, 2020 •

edited

Loading