-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SF-3184 Add USJ support to the draft API #2988
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2988 +/- ##
==========================================
+ Coverage 82.04% 82.23% +0.18%
==========================================
Files 545 553 +8
Lines 31705 32061 +356
Branches 5149 5193 +44
==========================================
+ Hits 26013 26365 +352
- Misses 4925 4926 +1
- Partials 767 770 +3 ☔ View full report in Codecov by Sentry. |
69da47c
to
b31f804
Compare
I've processed about half the files so far. |
b31f804
to
b2a5f25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all of your work on this! I'm sorry that I have so many comments.
I have reviewed the Usj-Usx conversion code, but not yet the code that teaches SF to use it. But I am posting my comments so far.
I'm trying some icons in this PR, with these meanings:
✏️ I would like a change or am concerned that something needs to change, before I'm ready to approve.
❓ I want to hear more about or discuss this before I'm ready to approve.
🎨 I prefer a change, but it is not holding up my approval.
💬 I have something to say, something to ask, am thinking out loud, or noticed something that I think should be changed that is not part of your PR's scope. But it is not holding up my approval.
Reviewed 16 of 23 files at r1, 1 of 1 files at r2.
Reviewable status: 16 of 23 files reviewed, 28 unresolved discussions (waiting on @pmachapman)
src/SIL.Converters.Usj/UsjMarker.cs
line 21 at r2 (raw file):
✏️ I had at first thought you meant that "start ID" was not included in the USJ spec. I think it would be helpful to to re-write this as something like
Milestone end ID, which matches a start ID.
Note that end ID is not currently included in the USJ spec.
src/SIL.Converters.Usj/UsjToUsx.cs
line 107 at r3 (raw file):
❓
|| (parentElement.Name == "para" && isLastItem))
Now, is this so that we can handle USX that doesn't use verse EIDs? If we are in a paragraph, a verse had begun, and we are finishing the paragraph and no verse EID was found, then we assume we are at the end of the verse and create our own verse EID. Is that right?
src/SIL.Converters.Usj/IUsj.cs
line 3 at r1 (raw file):
using System.Collections; namespace SIL.Converters.Usj
✏️ One of the things that could have been improved in the earlier stages of Scripture Forge was that we didn't seem to build in very comprehensive tests of roundtripping Scripture data. Later on, DeltaUsxMapperTests.cs was modified to do a number of roundtrip tests on small pieces of data, as well as successful demonstrations of roundtripping whole books of the Bible.
I'm glad that the Usj-Usx converter that is proposed here has some tests to show that it converts between USX and Usj for a small example of GEN 1. But I would feel more comfortable seeing this library (1) correctly handle a variety of permutations of data that it will encounter, and (2) correctly process and roundtrip real and larger data.
Some examples I'll point to in DeltaUsxMapperTests.cs are
ToDelta_TableInMiddle()
This method does its Usx to Delta test, and then converts the result back to USX and checks that the USX matches what it started with.Roundtrip_NestedCharsInTable()
This method specifies some USFM, which a helper method then converts to a USX XDocument, converts that to ChapterDeltas, converts that back to a USX XDocument, and checks that the roundtripped USX matches the earlier USX.RoundTrip_Asv()
This method checks roundtripping on the two USFM files in a zip file, thus showing successful roundtripping of two books of the Bible.
What do you think about including tests like RoundTrip_Asv()
and RoundTrip_Hebrew()
, and showing successful roundtripping of USX to USJ to USX of several books using real data?
Further, I think it would help to have some tests like Roundtrip_NestedCharsInTable()
that specify various permutations of input that the library may encounter, and show correct (or at least expected) handling. Some of my ideas of those permutations are in other comments in this review.
I suspect some of the USX inputs we might want to be prepared for are not ones that we can get from ParatextData by feeding it USFM. And so some (or all) such tests may benefit from specifying USX to start with rather than USFM, as was often done in DeltaUsxMapperTests.cs.
src/SIL.Converters.Usj/Usj.cs
line 12 at r2 (raw file):
/// The supported USJ spec type. /// </summary> public const string UsjType = "USJ";
✏️ : (Regarding the 2 const
keywords in this file.) My understanding has been that in C#, when you declare something as const
, the constant value sort of seeps into your program and is hard to change, and so it should only be used for things that are truly constant. For example, a mathematical constant. I learned that readonly
is preferable for specifying a value that will not be changed in the running program, because you can change the value in the source code, recompile, and expect the program and its various liked DLLs to pick up on the change. Whereas for const
it seemed like this wouldn't be so easy.
I may be wrong in this understanding. And I could do some googling to try to support or disprove this.
Are you aware of a potential pitfall in using const
for values that we still want to be able to change each time we compile?
src/SIL.Converters.Usj/UsjBase.cs
line 16 at r2 (raw file):
/// For <see cref="Usj"/>, this is the USJ spec type. /// For <see cref="UsjMarker"/>, this is the kind/category of node or element this is, /// corresponding the USFM marker and USX node.
💬 (Argh, using the same field for two different meanings! :)
src/SIL.Converters.Usj/UsjBase.cs
line 30 at r2 (raw file):
/// <summary> /// Additional attributes that are not a part of the USJ specification.
💬 Hmm. Do you think the additional attributes should be part of the USJ specification? What kinds of things might we be expecting to bump into that fall into this category?
I think that our teams are at the beginning of making and using a new data specification that will be used for many years, and it will be great for it to begin with as many good design decisions as possible at the beginning :)
src/SIL.Converters.Usj/UsjMarker.cs
line 18 at r2 (raw file):
/// </summary> /// <remarks>Nullable.</remarks> public string Sid { get; set; }
✏️ After spending a lot of time reviewing this PR, I think of this as Start ID. But when just looking at this model file I wasn't sure if this was "Scripture ID" or "Milestone Start ID". It might help the next guy to stick in "start ID" somewhere in the comment. Maybe like
/// Start ID indicates the Book-chapter-verse value in the paragraph based structure.
src/SIL.Converters.Usj/UsjMarker.cs
line 45 at r2 (raw file):
/// <summary> /// Published character of chapter or verse.
✏️ By "character" do you sort of mean "status"? And yet it can be represented by a character, such as "I". I guess I don't have to entirely understand what this means. But tossing in an example or two will probably help. Would it be accurate to describe the field with a comment like the following?
/// The publishing status of the chapter or verse. For example, "I" or "II".
src/SIL.Converters.Usj/UsjToUsx.cs
line 105 at r1 (raw file):
throw new ArgumentOutOfRangeException(nameof(markerContent)); }
💬 Here begins a sprinkling of comments regarding SID and EID. You might be able to clear me up just by responding to some of these questions. But I suspect it will help to talk on the phone.
test/SIL.Converters.Usj.Tests/TestData.cs
line 52 at r2 (raw file):
{ type: "verse", marker: "v", number: "15", altnumber: "3", sid: "GEN 1:15" }, "Tell the Israelites that I, the ", { type: "char", marker: "nd", content: ["Lord"] },
💬 I'd like to see some sample data that also has the equivalent of \nd Lord\nd* \nd God\nd*
. We had a bug in how we processed that before.
test/SIL.Converters.Usj.Tests/TestData.cs
line 110 at r2 (raw file):
/// - Reorder attributes to match UsxUsfmParserSink output. /// </remarks> public const string UsxGen1V1 = $"""
✏️ (I don't think this, or other properties in this class, should use const
, but pending the outcome of a discussion in another comment.)
test/SIL.Converters.Usj.Tests/TestData.cs
line 114 at r2 (raw file):
<book code="GEN" style="id">Some Scripture Version</book> <chapter number="1" style="c" pubnumber="I" sid="GEN 1" /> <para style="p">
✏️ It looks like this line, through the lines before <chapter eid="GEN 1" />
were mistakenly indented by 2 spaces?
test/SIL.Converters.Usj.Tests/TestData.cs
line 189 at r2 (raw file):
<chapter number="1" style="c" sid="GEN 1" /> <para style="p"> <verse number="1" style="v" sid="GEN 1:1" />the <char style="nd" closed="false">first verse <verse eid="GEN 1:1" />
❓ Can you explain closed="false"
? I found some examples but they had </char>
tags after the inner text, going against my assumption of what closed="false"
meant.
test/SIL.Converters.Usj.Tests/TestData.cs
line 289 at r2 (raw file):
<book code="GEN" style="id">Some Scripture Version</book> <chapter number="1" style="c" pubnumber="I" sid="GEN 1" /> <para style="p" vid="GEN 1:1-3" status="not standard">
✏️ (Indented too much?)
test/SIL.Converters.Usj.Tests/TestData.cs
line 307 at r2 (raw file):
<book code="GEN" style="id">Some Scripture Version</book> <chapter number="1" style="c" sid="GEN 1" /> <table>
✏️ (Indented too much?)
src/SIL.Converters.Usj/UsjToUsx.cs
line 61 at r3 (raw file):
// Add the non-standard attributes foreach (KeyValuePair<string, object> nonStandardAttribute in markerContent.AdditionalData)
❓ I'm interested to see that the type of Value is object
, and we only allow in (in a few lines below) Values that are string
. Is it the case that we expect values to always be strings, but we allow them to be object
just so we can process any incoming USJ and omit the parts that we have no idea what to do with, without choking, and at least do something sensible with converting the data?
src/SIL.Converters.Usj/UsjToUsx.cs
line 64 at r3 (raw file):
{ string key = nonStandardAttribute.Key; if (nonStandardAttribute.Value is string value && key != "type" && key != "marker" && key != "content")
✏️ When reading this, I think it would benefit from a comment explaining why we aren't adding the key and value if it's specifically one of these three things. Are they already added or accounted for somewhere else? Are they not useful information and so we discard them? Are they redundant?
src/SIL.Converters.Usj/UsjToUsx.cs
line 106 at r3 (raw file):
} // Create chapter and verse end elements from SID attributes.
❓ Actually, I'm confused about why we are doing this. When type is "verse", a dozen lines above we XmlElement element = usxDoc.CreateElement(type);
Why do we also specifically here create a verse element and get it added into the document?
src/SIL.Converters.Usj/UsjToUsx.cs
line 107 at r3 (raw file):
// Create chapter and verse end elements from SID attributes. if (_verseEid != null && (type == "verse" || (parentElement.Name == "para" && isLastItem)))
💬 One thing that occurs to me here, is that if we have a document like
<paragraph>
<verse 1 />In the beginning.
<verse 2 /></paragraph>
, then when we process <verse 1 />
, we will be setting _verseEid
. Then when we do this if
line, we will be making a verse end element for verse 1. Down below, it looks like we may AppendChild
the start verse 2 element before AppendChild
the end verse 1 element. The verse id processing is a bit complex here. A bunch of permutations of test data will prove its correctness.
src/SIL.Converters.Usj/UsjToUsx.cs
line 110 at r3 (raw file):
{ eidElement = CreateVerseEndElement(usxDoc); _verseEid = null;
✏️ Oh, this should probably be called _pendingVerseEid
, shouldn't it. Or some name that communicates that we are in and processing a particular verse. Okay, probably better would be _currentVerseId
, then.
src/SIL.Converters.Usj/UsjToUsx.cs
line 113 at r3 (raw file):
} if (type == "verse" && usjMarker?.Sid != null)
❓ I understand that not all USX files have sid
attributes on verse markers. For example, <verse number="1" style="v" />
. If this sort of USX is processed, won't we here fail to pick up on the verse start, at least for the sid and eid processing code here? If so, is that a problem? We will benefit from more permutations of test data.
src/SIL.Converters.Usj/UsjToUsx.cs
line 160 at r3 (raw file):
} private void UsjToUsxDom(IUsj usj, XmlDocument usxDoc)
💬 Hmm. Because of the whitespaces nuances of getting USX into an XDocument object correctly, I might have thought the Usj-Usx conversion library would allow a client program to feed it Usj, and that the conversion library would then create and provide to the client program, an XDocument (and possibly also alternatively an XmlDocument) that corresponds to the inputted Usj.
Returning an XML string
of the USX content to a client requires each client to correctly implement (and perhaps test the round-tripping of) processing the XML into a C# XML object.
Hmm, unless we return it not pretty-printed. Then the consumer could rely on the all the content of the returned XML being part of the document's actual content.
I've noticed with interest that your code hasn't seemed to wrestle as much with whitespaces and XML as I would have expected. And so I wonder if the code here just hasn't done the adequate wrestling yet, or if XmlDocument requires less wrangling than XDocument, or if I made oversights in parsing USX XML into an XDocument.
Ahh, ya know, there are a couple of places in UsxToUsj.cs that look for and respond to whitespace. I wonder if that is the secret sauce.
src/SIL.Converters.Usj/Usx.cs
line 12 at r2 (raw file):
/// The USX spec type. /// </summary> public const string UsxType = "usx";
✏️ (The two const
keywords here are relevant to a discussion in another comment.)
src/SIL.Converters.Usj/UsxToUsj.cs
line 115 at r1 (raw file):
outObj.Content.Add(child.NextSibling.Value); } }
❓ Hey, if right here we omit processing and adding of certain whitespace nodes, that might be good to capture in a comment. Like
// Skip any whitespace nodes with multiple spaces, newlines, or tabs.
(If that is happening here?)
src/SIL.Converters.Usj/UsxToUsj.cs
line 22 at r2 (raw file):
{ string type = usxElement.Name; string text = null;
✏️ Oh, this is a good time to specify that this project is nullable-able. Is there a reason you didn't go in that direction?
src/SIL.Converters.Usj/UsxToUsj.cs
line 36 at r2 (raw file):
.ToDictionary(attrib => attrib.Name, attrib => attrib.Value); // If style is present, make that the market
✏️ Looks like "market" may be a typo for "marker"?
src/SIL.Converters.Usj/UsxToUsj.cs
line 81 at r2 (raw file):
if ( usxElement.FirstChild?.NodeType == XmlNodeType.Text && !string.IsNullOrWhiteSpace(usxElement.FirstChild.Value)
💬 Thinking about XML and whitespace makes me glad to use USJ :).
src/SIL.Converters.Usj/UsxToUsj.cs
line 110 at r2 (raw file):
if ( child.NextSibling?.NodeType == XmlNodeType.Text || (child.NextSibling?.NodeType == XmlNodeType.Whitespace && child.NextSibling.Value == " ")
❓ What is the significance of finding a node with a single space? Is it a particular kind of thing that we will routinely expect? Is it an edge case? Would we just as likely expect to find multiple spaces (but without a newline character)?
src/SIL.Converters.Usj/UsxToUsj.cs
line 125 at r2 (raw file):
{ // Ignore append = false;
✏️ Now that I've read more of the PR, I think we are setting append = false
here because we don't write the verse EID element to USJ. I had been confused about what was being "// Ignore"d before. It might be helpful to be a bit more explanatory here with something like
// Omit verse EID marker from output
(If that is why we are setting append = false
?)
src/SIL.Converters.Usj/UsxToUsj.cs
line 136 at r2 (raw file):
if (usxDom == null) { outputJson = new Usj { Content = new ArrayList() };
💬 I was surprised to see this set to an empty and non-null ArrayList, given the documentation comments about Content being nullable, and sometimes setting Content = null
. Perhaps the behaviour is that when we convert USX to USJ, our root object will always have a non-null Content field, but the child objects under the root might have a null Content field.
src/SIL.Converters.Usj/UsxToUsj.cs
line 164 at r2 (raw file):
PreserveWhitespace = true, // Whitespace inside nodes is important }; xmlDocument.LoadXml(usx);
💬 We'll see if there are further nuances that need to be accounted for. For example, various pre-processing seemed to be needed when creating an XDocument from USX in ParatextSyncRunner.UsxToXDocument()
.
src/SIL.Converters.Usj/UsxToUsj.cs
line 174 at r2 (raw file):
/// <returns>The USJ.</returns> /// <remarks> /// The <see cref="XmlDocument"/> should have <see cref="XmlDocument.PreserveWhitespace"/> set to <c>true</c>.
💬 We could throw here if xmlDocument.PreserveWhitespace==false
. But I'm thinking it makes sense to let the client code format their data as they want and have the library here return json based on what the library was given.
src/SIL.Converters.Usj/UsxToUsj.cs
line 176 at r2 (raw file):
/// The <see cref="XmlDocument"/> should have <see cref="XmlDocument.PreserveWhitespace"/> set to <c>true</c>. /// </remarks> public static Usj UsxXmlDocumentToUsj(XmlDocument xmlDocument) => UsxDomToUsj(xmlDocument?.DocumentElement);
❓ I have been hearing that XDocument is recommended over XmlDocument. I failed to quickly find a statement on MSDN about that, so I may be mistaken about what is recommended. Is there a reason you chose to use XmlDocument over XDocument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Okay, I have looked over all the files now.)
Reviewed 6 of 23 files at r1, 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 28 unresolved discussions (waiting on @pmachapman)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @Github-advanced-security[bot] from 2 discussions.
Reviewable status: all files reviewed, 26 unresolved discussions (waiting on @marksvc)
src/SIL.Converters.Usj/IUsj.cs
line 3 at r1 (raw file):
Previously, marksvc wrote…
✏️ One of the things that could have been improved in the earlier stages of Scripture Forge was that we didn't seem to build in very comprehensive tests of roundtripping Scripture data. Later on, DeltaUsxMapperTests.cs was modified to do a number of roundtrip tests on small pieces of data, as well as successful demonstrations of roundtripping whole books of the Bible.
I'm glad that the Usj-Usx converter that is proposed here has some tests to show that it converts between USX and Usj for a small example of GEN 1. But I would feel more comfortable seeing this library (1) correctly handle a variety of permutations of data that it will encounter, and (2) correctly process and roundtrip real and larger data.
Some examples I'll point to in DeltaUsxMapperTests.cs are
ToDelta_TableInMiddle()
This method does its Usx to Delta test, and then converts the result back to USX and checks that the USX matches what it started with.Roundtrip_NestedCharsInTable()
This method specifies some USFM, which a helper method then converts to a USX XDocument, converts that to ChapterDeltas, converts that back to a USX XDocument, and checks that the roundtripped USX matches the earlier USX.RoundTrip_Asv()
This method checks roundtripping on the two USFM files in a zip file, thus showing successful roundtripping of two books of the Bible.What do you think about including tests like
RoundTrip_Asv()
andRoundTrip_Hebrew()
, and showing successful roundtripping of USX to USJ to USX of several books using real data?Further, I think it would help to have some tests like
Roundtrip_NestedCharsInTable()
that specify various permutations of input that the library may encounter, and show correct (or at least expected) handling. Some of my ideas of those permutations are in other comments in this review.
I suspect some of the USX inputs we might want to be prepared for are not ones that we can get from ParatextData by feeding it USFM. And so some (or all) such tests may benefit from specifying USX to start with rather than USFM, as was often done in DeltaUsxMapperTests.cs.
I have added roundtrip tests for all of the test data From USJ -> USX -> USJ and from USX -> USJ -> USX.
Doing a full round trip is a good idea, but can we make that a separate PR?
I did a quick test with some USX from the DBL, but I found that end verse markers were sometimes missing. USX 3.1 requires these, but USX 2.0-3.0 is loose about it. I would need to modify a test dataset to be USX 3.1 "complete" (i.e. restore the missing verse markers).
src/SIL.Converters.Usj/Usj.cs
line 12 at r2 (raw file):
Previously, marksvc wrote…
✏️ : (Regarding the 2
const
keywords in this file.) My understanding has been that in C#, when you declare something asconst
, the constant value sort of seeps into your program and is hard to change, and so it should only be used for things that are truly constant. For example, a mathematical constant. I learned thatreadonly
is preferable for specifying a value that will not be changed in the running program, because you can change the value in the source code, recompile, and expect the program and its various liked DLLs to pick up on the change. Whereas forconst
it seemed like this wouldn't be so easy.I may be wrong in this understanding. And I could do some googling to try to support or disprove this.
Are you aware of a potential pitfall in using
const
for values that we still want to be able to change each time we compile?
Done. You are right - these should be static readonly
as if the underlying logic in the library changes, these values will reflect that.
A const
is set in the client libraries at compile time - we don't want libraries to recompile if the USX or USJ version changes.
src/SIL.Converters.Usj/UsjBase.cs
line 16 at r2 (raw file):
Previously, marksvc wrote…
💬 (Argh, using the same field for two different meanings! :)
I know!
src/SIL.Converters.Usj/UsjBase.cs
line 30 at r2 (raw file):
Previously, marksvc wrote…
💬 Hmm. Do you think the additional attributes should be part of the USJ specification? What kinds of things might we be expecting to bump into that fall into this category?
I think that our teams are at the beginning of making and using a new data specification that will be used for many years, and it will be great for it to begin with as many good design decisions as possible at the beginning :)
I talked with Ira about this. The original USJ spec was a "pure" spec - choosing purity over compatibility. His TypeScript implementation and my C# implementation favor compatibility with USX over purity - hence this grab-bag of any other attributes, and our codes attempt to preserve them when the original Python implementation stripped them.
A much more complete USJ spec is being created in a PR at https://github.com/BiblioNexus-Foundation/scripture-editors/pull/221/files#diff-c9217f9d310b415bbecbbe826bd6c725bd8f9885bb9ad6e95e7afe8fedd67a41
I think we should keep an eye on this updated spec, and adjust out model to match this spec when it is published.
src/SIL.Converters.Usj/UsjMarker.cs
line 18 at r2 (raw file):
Previously, marksvc wrote…
✏️ After spending a lot of time reviewing this PR, I think of this as Start ID. But when just looking at this model file I wasn't sure if this was "Scripture ID" or "Milestone Start ID". It might help the next guy to stick in "start ID" somewhere in the comment. Maybe like
/// Start ID indicates the Book-chapter-verse value in the paragraph based structure.
Done. Good idea.
src/SIL.Converters.Usj/UsjMarker.cs
line 21 at r2 (raw file):
Previously, marksvc wrote…
✏️ I had at first thought you meant that "start ID" was not included in the USJ spec. I think it would be helpful to to re-write this as something like
Milestone end ID, which matches a start ID.
Note that end ID is not currently included in the USJ spec.
Done. Let me know if it is clearer now.
src/SIL.Converters.Usj/UsjMarker.cs
line 45 at r2 (raw file):
Previously, marksvc wrote…
✏️ By "character" do you sort of mean "status"? And yet it can be represented by a character, such as "I". I guess I don't have to entirely understand what this means. But tossing in an example or two will probably help. Would it be accurate to describe the field with a comment like the following?
/// The publishing status of the chapter or verse. For example, "I" or "II".
Done. I have updated to comment to be a bit clearer on the value and use.
src/SIL.Converters.Usj/UsjToUsx.cs
line 61 at r3 (raw file):
Previously, marksvc wrote…
❓ I'm interested to see that the type of Value is
object
, and we only allow in (in a few lines below) Values that arestring
. Is it the case that we expect values to always be strings, but we allow them to beobject
just so we can process any incoming USJ and omit the parts that we have no idea what to do with, without choking, and at least do something sensible with converting the data?
In Json.Net, a property decorated with the [JsonExtensionData] attribute should technically be a Dictionary<string, JToken>
type. I made this a Dictionary<string, object>
to allow compatibility with System.Text.Json.Serialization, in case that is needed in future, and because I don't like embedding third party library specific types in classes that will form a public API.
If I change it to Dictionary<string, string>
, all tests fail because of Json.NET's restrictions on this attribute.
I test to see if it is a string because it could theoretically be another type of JSON object if the user specifies invalid USJ, which is totally not supported, and these will be discarded in this case.
src/SIL.Converters.Usj/UsjToUsx.cs
line 64 at r3 (raw file):
Previously, marksvc wrote…
✏️ When reading this, I think it would benefit from a comment explaining why we aren't adding the key and value if it's specifically one of these three things. Are they already added or accounted for somewhere else? Are they not useful information and so we discard them? Are they redundant?
Done. FYI, this logic was copied from https://github.com/BiblioNexus-Foundation/scripture-editors/blob/main/packages/utilities/src/converters/usj/usj-to-usx.ts#L30
src/SIL.Converters.Usj/UsjToUsx.cs
line 106 at r3 (raw file):
Previously, marksvc wrote…
❓ Actually, I'm confused about why we are doing this. When type is "verse", a dozen lines above we
XmlElement element = usxDoc.CreateElement(type);
Why do we also specifically here create a verse element and get it added into the document?
USX doesn't nest its XML as neatly. For example, one would think it would be:
<verse number="1" style="v" sid="GEN 1:1">
...contents here...
</verse>
Instead it is:
<verse number="1" style="v" sid="GEN 1:1" />
...contents here...
<verse eid="GEN 1:1" />
In the dozen lines above, you will notice after the verse is created, the contents of it are recursively added via ConvertUsjRecurse
. This logic here closes off the verse, so that either the next verse can be added, or the chapter closed off if we are at the end of that.
src/SIL.Converters.Usj/UsjToUsx.cs
line 107 at r3 (raw file):
Previously, marksvc wrote…
❓
|| (parentElement.Name == "para" && isLastItem))
Now, is this so that we can handle USX that doesn't use verse EIDs? If we are in a paragraph, a verse had begun, and we are finishing the paragraph and no verse EID was found, then we assume we are at the end of the verse and create our own verse EID. Is that right?
USJ does not usually include the eid, so yes, if we are at the last "para" element in a verse, we can safely say the verse is closed.
src/SIL.Converters.Usj/UsjToUsx.cs
line 107 at r3 (raw file):
Previously, marksvc wrote…
💬 One thing that occurs to me here, is that if we have a document like
<paragraph> <verse 1 />In the beginning. <verse 2 /></paragraph>, then when we process
<verse 1 />
, we will be setting_verseEid
. Then when we do thisif
line, we will be making a verse end element for verse 1. Down below, it looks like we mayAppendChild
the start verse 2 element beforeAppendChild
the end verse 1 element. The verse id processing is a bit complex here. A bunch of permutations of test data will prove its correctness.
You are right - this is a bug. I have added a tests for it (ShouldConvertFromUsxToUsjBlankVerse
and ShouldConvertFromUsxToUsjBlankChapter
), and have fixed the bug.
src/SIL.Converters.Usj/UsjToUsx.cs
line 110 at r3 (raw file):
Previously, marksvc wrote…
✏️ Oh, this should probably be called
_pendingVerseEid
, shouldn't it. Or some name that communicates that we are in and processing a particular verse. Okay, probably better would be_currentVerseId
, then.
Done. Excellent idea.
src/SIL.Converters.Usj/UsjToUsx.cs
line 113 at r3 (raw file):
Previously, marksvc wrote…
❓ I understand that not all USX files have
sid
attributes on verse markers. For example,<verse number="1" style="v" />
. If this sort of USX is processed, won't we here fail to pick up on the verse start, at least for the sid and eid processing code here? If so, is that a problem? We will benefit from more permutations of test data.
If the sid
's are missing, then the only real result is that no eid
's will be generated. This will keep a degree of compatibility between the USX and USJ. I've added a test to demonstrate: ShouldConvertFromUsjToUsxWithNoSids
.
Also, according to the spec they are required: https://ubsicap.github.io/usx/elements.html
src/SIL.Converters.Usj/UsjToUsx.cs
line 160 at r3 (raw file):
Previously, marksvc wrote…
💬 Hmm. Because of the whitespaces nuances of getting USX into an XDocument object correctly, I might have thought the Usj-Usx conversion library would allow a client program to feed it Usj, and that the conversion library would then create and provide to the client program, an XDocument (and possibly also alternatively an XmlDocument) that corresponds to the inputted Usj.
Returning an XML
string
of the USX content to a client requires each client to correctly implement (and perhaps test the round-tripping of) processing the XML into a C# XML object.Hmm, unless we return it not pretty-printed. Then the consumer could rely on the all the content of the returned XML being part of the document's actual content.
I've noticed with interest that your code hasn't seemed to wrestle as much with whitespaces and XML as I would have expected. And so I wonder if the code here just hasn't done the adequate wrestling yet, or if XmlDocument requires less wrangling than XDocument, or if I made oversights in parsing USX XML into an XDocument.
Ahh, ya know, there are a couple of places in UsxToUsj.cs that look for and respond to whitespace. I wonder if that is the secret sauce.
PreserveWhitespace = true
does most of the heavy lifting. It is only whitespace inside element contents that we care about, as USJ does not store the inter-element whitespace, but does store in-element whitespace. If you know other details of whitespace oddities in USX, please let me know.
Also, I've created a method to output an XmlDocument.
Yes, there are bugs in the TypeScript implementation that the C# implementation does not suffer from (see the xit
'ed tests cases in the TypeScript project. In particular, this piece of code in UsxToUsj.cs fixes the whitespace bugs I encountered:
if (
child.NextSibling?.NodeType == XmlNodeType.Text
|| (child.NextSibling?.NodeType == XmlNodeType.Whitespace && child.NextSibling.Value == " ")
)
src/SIL.Converters.Usj/Usx.cs
line 12 at r2 (raw file):
Previously, marksvc wrote…
✏️ (The two
const
keywords here are relevant to a discussion in another comment.)
Done. Changed to static readonly
.
src/SIL.Converters.Usj/UsxToUsj.cs
line 115 at r1 (raw file):
Previously, marksvc wrote…
❓ Hey, if right here we omit processing and adding of certain whitespace nodes, that might be good to capture in a comment. Like
// Skip any whitespace nodes with multiple spaces, newlines, or tabs.
(If that is happening here?)
Is the comment I added above helpful to your question?
src/SIL.Converters.Usj/UsxToUsj.cs
line 22 at r2 (raw file):
Previously, marksvc wrote…
✏️ Oh, this is a good time to specify that this project is nullable-able. Is there a reason you didn't go in that direction?
As this is a .NET Standard 2.0 project, nullable properties (i.e. string?
) are not supported. One of the downsides of my writing with a view to future publishing in libpalaso or Paratext.
src/SIL.Converters.Usj/UsxToUsj.cs
line 36 at r2 (raw file):
Previously, marksvc wrote…
✏️ Looks like "market" may be a typo for "marker"?
Done. Thanks!
src/SIL.Converters.Usj/UsxToUsj.cs
line 110 at r2 (raw file):
Previously, marksvc wrote…
❓ What is the significance of finding a node with a single space? Is it a particular kind of thing that we will routinely expect? Is it an edge case? Would we just as likely expect to find multiple spaces (but without a newline character)?
A single space would be a space between two elements, i.e. a user inputted space. Paratext automatically converts multiple spaces to just one, so we should only check for one. Any more and those spaces will be padding between elements, and should be ignored.
src/SIL.Converters.Usj/UsxToUsj.cs
line 125 at r2 (raw file):
Previously, marksvc wrote…
✏️ Now that I've read more of the PR, I think we are setting
append = false
here because we don't write the verse EID element to USJ. I had been confused about what was being "// Ignore"d before. It might be helpful to be a bit more explanatory here with something like// Omit verse EID marker from output
(If that is why we are setting
append = false
?)
Done.
src/SIL.Converters.Usj/UsxToUsj.cs
line 136 at r2 (raw file):
Perhaps the behaviour is that when we convert USX to USJ, our root object will always have a non-null Content field, but the child objects under the root might have a null Content field.
Yes, that's it. I've updated the property comment.
src/SIL.Converters.Usj/UsxToUsj.cs
line 174 at r2 (raw file):
Previously, marksvc wrote…
💬 We could throw here if
xmlDocument.PreserveWhitespace==false
. But I'm thinking it makes sense to let the client code format their data as they want and have the library here return json based on what the library was given.
I've added a throw. If PreserveWhitespace
is false, data loss (the single spaces between some elements) will be lost.
src/SIL.Converters.Usj/UsxToUsj.cs
line 176 at r2 (raw file):
Previously, marksvc wrote…
❓ I have been hearing that XDocument is recommended over XmlDocument. I failed to quickly find a statement on MSDN about that, so I may be mistaken about what is recommended. Is there a reason you chose to use XmlDocument over XDocument?
I chose XmlDocument as that what ParatextData supports. XDocument is better - way better - but my goal was compatibility with ParatextData and the USX it outputs.
test/SIL.Converters.Usj.Tests/TestData.cs
line 52 at r2 (raw file):
Previously, marksvc wrote…
💬 I'd like to see some sample data that also has the equivalent of
\nd Lord\nd* \nd God\nd*
. We had a bug in how we processed that before.
Done. Oh yeah, great idea!
test/SIL.Converters.Usj.Tests/TestData.cs
line 110 at r2 (raw file):
Previously, marksvc wrote…
✏️ (I don't think this, or other properties in this class, should use
const
, but pending the outcome of a discussion in another comment.)
I've updated these to static readonly
.
test/SIL.Converters.Usj.Tests/TestData.cs
line 114 at r2 (raw file):
Previously, marksvc wrote…
✏️ It looks like this line, through the lines before
<chapter eid="GEN 1" />
were mistakenly indented by 2 spaces?
Done.
test/SIL.Converters.Usj.Tests/TestData.cs
line 189 at r2 (raw file):
Previously, marksvc wrote…
❓ Can you explain
closed="false"
? I found some examples but they had</char>
tags after the inner text, going against my assumption of whatclosed="false"
meant.
``closed="false"` was added to be a non-standard, to ensure it was kept as-is. I've updated the comment to reflect this.
test/SIL.Converters.Usj.Tests/TestData.cs
line 289 at r2 (raw file):
Previously, marksvc wrote…
✏️ (Indented too much?)
Is it better now? The indentation is a pain because Visual Studio keeps trying to make it ultra-indented.
test/SIL.Converters.Usj.Tests/TestData.cs
line 307 at r2 (raw file):
Previously, marksvc wrote…
✏️ (Indented too much?)
Is it better now?
b2a5f25
to
7342edb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience :)
Reviewed 9 of 9 files at r4, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @pmachapman)
src/SIL.Converters.Usj/IUsj.cs
line 3 at r1 (raw file):
Thank you for adding in roundtripping tests.
Doing a full round trip is a good idea, but can we make that a separate PR?
Sounds good.
I did a quick test with some USX from the DBL, but I found that end verse markers were sometimes missing. USX 3.1 requires these, but USX 2.0-3.0 is loose about it. I would need to modify a test dataset to be USX 3.1 "complete" (i.e. restore the missing verse markers).
Hmm. Would we say then that this library is only going to support a certain version of the USX spec? There are a lot of USFM files online. I don't know how many old USX files may be online, or whether to expect that people have old USX laying around that they would want to convert to USJ. And perhaps that idea is too broad a scope if this tool is meant to be connected up to ParatextData and read and interpret the USX it gets from ParatextData.
src/SIL.Converters.Usj/UsjMarker.cs
line 21 at r2 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Done. Let me know if it is clearer now.
Yes, that is clearer.
src/SIL.Converters.Usj/UsjToUsx.cs
line 64 at r3 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Done. FYI, this logic was copied from https://github.com/BiblioNexus-Foundation/scripture-editors/blob/main/packages/utilities/src/converters/usj/usj-to-usx.ts#L30
Thank you. Hmm, I would think it odd to find "type" or "content" in the AdditionalData
field.
src/SIL.Converters.Usj/UsjToUsx.cs
line 106 at r3 (raw file):
after the verse is created, the contents of it are recursively added via
ConvertUsjRecurse
Thank you. In USJ, it appears that verses work in a similar way as USX, where they function as markers rather than as containers. For example, in TestData.cs I see
{ type: "verse", marker: "v", number: "1", sid: "GEN 1:1" },
"In the beginning ",
{ type: "verse", marker: "v", number: "2", sid: "GEN 1:2" },
where there is a verse 1 demarcation, followed by some text, followed by a verse 2 demarcation. But the "In the beginning " text does not appear to be in or the contents of the verse 1 object.
So the verse won't have contents, to be added recursively via ConvertUsjRecurse
will it?
I stepped through in the debugger and I the answer to what I was confused about is that USJ has verseSid items but not verseEid items. And so when we encounter a verseSid item, we need to do special handling to also make a verseEid item, for output as USX.
src/SIL.Converters.Usj/UsjToUsx.cs
line 110 at r3 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Done. Excellent idea.
I think maybe this was lost?
_verseEid
might be improved as _currentVerseId
and
lastVerseEid
might be improved as lastVerseId
. I see that "Eid" is used on some of these variable names, but I don't think it's significant that the id is for the eid, right. But the id is for the verse that we are in the midst of, for which an eid would be later created using that id. If that's the right way to think about it.
src/SIL.Converters.Usj/UsxToUsj.cs
line 115 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Is the comment I added above helpful to your question?
👍
src/SIL.Converters.Usj/UsxToUsj.cs
line 22 at r2 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
As this is a .NET Standard 2.0 project, nullable properties (i.e.
string?
) are not supported. One of the downsides of my writing with a view to future publishing in libpalaso or Paratext.
Ahh, okay.
src/SIL.Converters.Usj/UsxToUsj.cs
line 136 at r2 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Perhaps the behaviour is that when we convert USX to USJ, our root object will always have a non-null Content field, but the child objects under the root might have a null Content field.
Yes, that's it. I've updated the property comment.
Thank you!
src/SIL.Converters.Usj/UsxToUsj.cs
line 176 at r2 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
I chose XmlDocument as that what ParatextData supports. XDocument is better - way better - but my goal was compatibility with ParatextData and the USX it outputs.
That makes sense.
test/SIL.Converters.Usj.Tests/TestData.cs
line 289 at r2 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Is it better now? The indentation is a pain because Visual Studio keeps trying to make it ultra-indented.
Yes. It probably doesn't make a difference in how the current processing is happening. But bringing in the indenting as we've done here I think makes it closer to actual world data.
If VS keeps flipping it back to how it was before I suppose that's okay.
test/SIL.Converters.Usj.Tests/TestData.cs
line 307 at r2 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Is it better now?
👍
src/SIL.Converters.Usj/UsjToUsx.cs
line 198 at r4 (raw file):
/// <param name="usj">The USJ object.</param> /// <returns>The XML Document.</returns> public XmlDocument UsjToUsxXmlDocument(IUsj usj)
👍
test/SIL.Converters.Usj.Tests/TestData.cs
line 456 at r4 (raw file):
/// <summary> /// Genesis 1:1 in USJ (with no sids).
✏️ Oops, USX not USJ
src/SIL.Converters.Usj/UsjBase.cs
line 27 at r4 (raw file):
/// <remarks> /// Nullable. /// If there are no contents, this will be null for <see cref="UsjMarker"/>, or empty for <see cref="Usj"/>.
❓ I notice that all three of UsjBase's fields are handled differently by its two subclasses.
Another way we could design this is to have Usj and UsjMarker be unrelated classes. For example, something like
public class Usj : IUsj
{
/// <summary>
/// This is the USJ spec type.
/// </summary>
public string Type { get; set; }
/// <summary>
/// The JSON representation of scripture contents from USFM/USX.
/// </summary>
/// <value>This will be a <see cref="string"/></value>
/// <remarks>
/// NOT Nullable.
/// If there are no contents, this will be empty.
/// The contents will be laid out in order.
/// </remarks>
[JsonConverter(typeof(UsjContentConverter))]
public ArrayList Content { get; set; }
public static readonly string UsjType = "USJ";
public static readonly string UsjVersion = "3.1";
public string Version { get; set; }
}
public class UsjMarker
{
/// <summary>
/// This is the kind/category of node or element this is,
/// corresponding the USFM marker and USX node.
/// </summary>
/// <example><c>para</c>, <c>verse</c>, <c>char</c>.</example>
public string Type { get; set; }
/// <summary>
/// The JSON representation of scripture contents from USFM/USX.
/// </summary>
/// <value>This will be a <see cref="UsjMarker"/>.</value>
/// <remarks>
/// Nullable.
/// If there are no contents, this will be null.
/// The contents will be laid out in order.
/// </remarks>
[JsonConverter(typeof(UsjContentConverter))]
public ArrayList Content { get; set; }
/// <summary>
/// Additional attributes that are not a part of the USJ specification.
/// </summary>
/// <remarks>
/// These are typically <c>closed</c>, <c>colspan</c>, etc.
/// </remarks>
[JsonExtensionData]
public Dictionary<string, object> AdditionalData { get; } = new Dictionary<string, object>();
public string Marker { get; set; }
public string Sid { get; set; }
...
That would require some adjustment in how the conversion classes work.
Does this seem preferable? Or maybe keeping UsjBase around is giving us a bunch of help?
It is nice that in the current design, methods like UsxDomToUsjRecurse
can handle both Usj and UsjMarker, though it sort of needs to cook in some of the differences between Usj and UsjMarker that could have been enforced by type if Usj and UsjMarker were unrelated classes.
test/SIL.Converters.Usj.Tests/UsxToUsjTests.cs
line 3 at r4 (raw file):
using System; using System.Xml; using NUnit.Framework;
💬 Somewhere in my memory I have the idea that USFM empty verses can get processed such that they have a space after them. But I'm not finding something to show that right now, except that when processing empty USX verses we make ChapterDeltas with a verse insert followed by a blank insert (DeltaUsxMapperTests.cs ToDelta_BlankLine()). If there's still anything that needs done about this sort of thing I expect we will quickly find it.
test/SIL.Converters.Usj.Tests/UsxToUsjTests.cs
line 8 at r4 (raw file):
[TestFixture] public class UsxToUsjTests
❓ I was playing with the conversion and whitespace. Check out these three tests. Note also how the roundtripping is not catching the problem.
I may also be misinterpreting the documentation for what should be expected.
I'm thinking this may mean we need to adjust the wihtespace handling. What do you think?
[Test]
public void A()
{
// Suppose we have two char elements separated by a single space.
string inputUsx = $"""
<usx version="{Usx.UsxVersion}">
<book code="GEN" style="id">Some Scripture Version</book>
<chapter number="1" style="c" sid="GEN 1" />
<para style="p">
<verse number="1" style="v" sid="GEN 1:1" />the first verse <char style="nd">Lord</char> <char style="nd">God</char>.<verse eid="GEN 1:1" />
</para>
<chapter eid="GEN 1" />
</usx>
""";
Usj intermediateUsj = UsxToUsj.UsxStringToUsj(inputUsx);
var usjToUsx = new UsjToUsx();
string outputUsx = usjToUsx.UsjToUsxString(intermediateUsj);
outputUsx = TestData.RemoveXmlWhiteSpace(outputUsx);
Assert.That(outputUsx, Is.EqualTo(TestData.RemoveXmlWhiteSpace(inputUsx)));
// It roundtrips. Does the Usj look like we expect, with a single space between?
// <char style="nd">Lord</char>
Assert.That(((intermediateUsj.Content[2] as UsjMarker).Content[2] as UsjMarker).Type, Is.EqualTo("char"));
// " "
Assert.That((intermediateUsj.Content[2] as UsjMarker).Content[3], Is.EqualTo(" "));
// <char style="nd">God</char>
Assert.That(((intermediateUsj.Content[2] as UsjMarker).Content[4] as UsjMarker).Type, Is.EqualTo("char"));
// Yes.
}
[Test]
public void B()
{
// Suppose we have two char elements separated by 2 spaces.
string inputUsx = $"""
<usx version="{Usx.UsxVersion}">
<book code="GEN" style="id">Some Scripture Version</book>
<chapter number="1" style="c" sid="GEN 1" />
<para style="p">
<verse number="1" style="v" sid="GEN 1:1" />the first verse <char style="nd">Lord</char> <char style="nd">God</char>.<verse eid="GEN 1:1" />
</para>
<chapter eid="GEN 1" />
</usx>
""";
Usj intermediateUsj = UsxToUsj.UsxStringToUsj(inputUsx);
var usjToUsx = new UsjToUsx();
string outputUsx = usjToUsx.UsjToUsxString(intermediateUsj);
outputUsx = TestData.RemoveXmlWhiteSpace(outputUsx);
Assert.That(outputUsx, Is.EqualTo(TestData.RemoveXmlWhiteSpace(inputUsx)));
// It roundtrips. Does the Usj look like we expect? What _do_ we expect? Perhaps from https://docs.usfm.bible/usfm/latest/whitespace.html we might expect it to be reduced to a single space.
// <char style="nd">Lord</char>
Assert.That(((intermediateUsj.Content[2] as UsjMarker).Content[2] as UsjMarker).Type, Is.EqualTo("char"));
// " "
Assert.That((intermediateUsj.Content[2] as UsjMarker).Content[3], Is.EqualTo(" "));
// <char style="nd">God</char>
Assert.That(((intermediateUsj.Content[2] as UsjMarker).Content[4] as UsjMarker).Type, Is.EqualTo("char"));
// No. The Usj does not have any space between char items.
}
[Test]
public void C()
{
// Suppose we have two char elements separated by a newline and other whitespace.
string inputUsx = $"""
<usx version="{Usx.UsxVersion}">
<book code="GEN" style="id">Some Scripture Version</book>
<chapter number="1" style="c" sid="GEN 1" />
<para style="p">
<verse number="1" style="v" sid="GEN 1:1" />the first verse <char style="nd">Lord</char>
<char style="nd">God</char>.<verse eid="GEN 1:1" />
</para>
<chapter eid="GEN 1" />
</usx>
""";
Usj intermediateUsj = UsxToUsj.UsxStringToUsj(inputUsx);
var usjToUsx = new UsjToUsx();
string outputUsx = usjToUsx.UsjToUsxString(intermediateUsj);
outputUsx = TestData.RemoveXmlWhiteSpace(outputUsx);
Assert.That(outputUsx, Is.EqualTo(TestData.RemoveXmlWhiteSpace(inputUsx)));
// It roundtrips. Does the Usj look like we expect? What _do_ we expect? Perhaps from https://docs.usfm.bible/usfm/latest/whitespace.html "A newline before a character marker is converted to a space. That space is content.", we might expect it to be a single space.
// <char style="nd">Lord</char>
Assert.That(((intermediateUsj.Content[2] as UsjMarker).Content[2] as UsjMarker).Type, Is.EqualTo("char"));
// "\n "
Assert.That((intermediateUsj.Content[2] as UsjMarker).Content[3], Is.EqualTo(" "));
// <char style="nd">God</char>
Assert.That(((intermediateUsj.Content[2] as UsjMarker).Content[4] as UsjMarker).Type, Is.EqualTo("char"));
// No. The Usj does not have any space between char items.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @pmachapman)
test/SIL.Converters.Usj.Tests/TestData.cs
line 385 at r4 (raw file):
/// Genesis 1:1 in USX (with additional blank chapters). /// </summary> public static readonly string UsxGen1V1WithBlankChapters = $"""
❓ This is interesting. USJ-to-USX doesn't produce a verseEid for GEN 1:2. Should it?
test/SIL.Converters.Usj.Tests/TestData.cs
line 429 at r4 (raw file):
<verse number="2" style="v" sid="GEN 1:2" /><verse eid="GEN 1:2" /> <verse number="3" style="v" sid="GEN 1:3" /><verse eid="GEN 1:3" /> <verse number="4" style="v" sid="GEN 1:4" />
❓ And here, USX to USX doesn't produce a verse 4 EID. Should it?
test/SIL.Converters.Usj.Tests/TestData.cs
line 444 at r4 (raw file):
content: [ { type: "book", marker: "id", code: "GEN" }, { type: "chapter", marker: "c", number: "1"},
💬
I find that the EID handling seems to dominate and complicate the ConvertUsjRecurse
method. I experimented with simplifying it (in the linked commit, below). By the time I handled the various cases, it was also a lot of lines :).
But, FWIW, (1) this code produces the missing verse EID elements that I ask about in a couple comments above. And
(2) This implementation is a different, tho not less verbose, implementation; you can use some, all, or none as you like.
This PR adds support for conversion between USJ and USX, and an initial implementation for the download draft API.
This PR is to form the basis of future support for storing USJ as JSON0 documents, and to provide initial working support for USJ in C#.
The branch has been split off feature/SF-3133 as it is a discrete piece of work, in the hope it will make code review easier, instead of one very large PR for SF-3133.
I have separated the USJ converter into its own NET Standard 2.0 project to enable this code at sometime in the future to potentially be integrated with libpalaso or Paratext. The currently plan is to have this code run in Scripture Forge during the development of SF-3133, so that any bugs can be ironed out quickly.
As this adds an API endpoint that is not yet being utilized by Paratext 10 or Scripture Forge, this PR only requires developer testing.
This change is