Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use commonmark-java instead of markdown-it via GraalJS #28

Merged
merged 84 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
591f4ce
Try out java commonmark parser
zampino Apr 3, 2024
f42219a
Todo lists
zampino Apr 3, 2024
e1cadf8
Custom InlineFormula type
zampino Apr 3, 2024
3aee4b7
Tight vs Loose lists
zampino Apr 4, 2024
0357fe4
Safer inline formula
zampino Apr 4, 2024
6953c34
ftnotes wip
zampino Apr 4, 2024
bf518d9
rm unused bb/fs
zampino Apr 4, 2024
a0b6495
First take at footnotes
zampino Apr 4, 2024
4ab0440
Code blocks and monospace
zampino Apr 4, 2024
60f5236
Blockquote
zampino Apr 4, 2024
567ca17
Prepare for footnote refs
zampino Apr 5, 2024
adee38d
Use open inline parsing
zampino Apr 10, 2024
4a3fdef
Source nj commonmark-java fork from jitpack
zampino Apr 10, 2024
135ae6f
Run clj tests against both parsers
zampino Apr 10, 2024
f79ebac
First step toward block formulas
zampino Apr 10, 2024
99ae048
Cleanup
zampino Apr 10, 2024
fe4ba1c
Use new commonmark custom inline parsing
zampino May 2, 2024
7da275c
Parse Block formulas
zampino May 2, 2024
1a9c40e
First take at footnotes
zampino Jul 8, 2024
f577be1
Validate footnote definitions against existing references
zampino Jul 8, 2024
50514a6
heading level
zampino Jul 8, 2024
53df7e2
More edge cases for duplicate refs
zampino Jul 8, 2024
9940a66
Adjust processing of footnotes into sidenotes
zampino Jul 8, 2024
8daff68
Remove redundant ref
zampino Jul 8, 2024
e06b55a
Parse inline footnotes
zampino Jul 22, 2024
5290d90
Allow to add parsed text to same context
zampino Aug 1, 2024
10baed7
First step toward code reorganization
zampino Aug 13, 2024
020a93e
Rename parser2 ns
zampino Aug 13, 2024
b73dd6f
Cleanup classes
zampino Aug 13, 2024
204c9eb
Rebuild customizable leaf text parsing
zampino Aug 13, 2024
cdec3d4
Use latest commit available on jitpack (temporarily disable footnotes)
zampino Aug 13, 2024
30e5095
Fix parsing extensibility notebook
zampino Aug 14, 2024
6097e09
Restore GraalJS based implementation to new ns
zampino Aug 15, 2024
c663b70
Introduce parse* to keep parsing ctx open
zampino Aug 15, 2024
f336a18
Fix building notebooks
zampino Aug 15, 2024
9d9cb4c
Remove last occurrence of n.m.parser in new impl
zampino Aug 15, 2024
792d0ab
Parse emoji in JVM impl
zampino Aug 15, 2024
1c39257
Implement ToC handling JVM side
zampino Aug 15, 2024
6d113cc
Equalize code and links + add tests
zampino Aug 15, 2024
d86d8fd
Equalize ToC
zampino Aug 15, 2024
59788e3
First take at tables
zampino Aug 15, 2024
8eb07b9
Emit table header nodes
zampino Aug 16, 2024
f6f16b8
Support strikethrough syntax
zampino Aug 16, 2024
e5758a5
Flatter project layout
zampino Aug 16, 2024
7412fd0
Fix heading id disambiguation
zampino Aug 16, 2024
be6e58d
Less zipping
zampino Aug 16, 2024
960b9b3
Parse [[TOC]] placeholder
zampino Aug 16, 2024
c464430
Benchmark warmup
zampino Aug 16, 2024
55c5005
Move graalJS impl to dev, compare benchmarks
zampino Aug 16, 2024
887b9cf
Saner public API
zampino Aug 16, 2024
a67078f
Use Clerk using latest markdown API
zampino Aug 16, 2024
c4a53cb
Fix Clerk render deps
zampino Aug 16, 2024
4605558
Fix build notebooks
zampino Aug 16, 2024
dd904b2
Use util for opening zipper node
zampino Aug 16, 2024
3556ac3
Use Clerk with fixed render deps
zampino Aug 19, 2024
435389f
Move more stuff to shared utils
zampino Aug 19, 2024
a74eb7d
Implement footnotes via zippers in cljs
zampino Aug 19, 2024
3bd3c17
Unify more
zampino Aug 19, 2024
16d84b0
Add old vs new implementation comparison
zampino Sep 3, 2024
d3b1fc2
Use new commonmark release with footnotes
zampino Sep 17, 2024
ed5b27e
CI: Update actions/upload-artifact to v4
mk Sep 17, 2024
4cfd835
Bump Clerk
mk Sep 17, 2024
fd2b093
Bump clerk.render as well
mk Sep 17, 2024
83548ca
Fix inline note container
zampino Sep 17, 2024
f8c6848
Fix notebooks
zampino Sep 17, 2024
37aa2ee
Add auto-link extension
zampino Sep 17, 2024
09216e4
Remove precompiled classes
mk Sep 23, 2024
d7cec58
Try adding require
mk Sep 23, 2024
3ce92ed
Don't use AOT-ed classes
borkdude Sep 23, 2024
697b0df
Merge branch 'main' into commonmark-java
borkdude Sep 23, 2024
3a5686d
fix cljs test?
borkdude Sep 23, 2024
3d02be6
Bump clerk, which probably doesn't help
borkdude Sep 23, 2024
44f50b1
Bump clerk.render as well
mk Sep 24, 2024
f514378
Go back to correct clerk version from commonmark-java branch
mk Sep 24, 2024
00f48c2
definterface workaround
borkdude Sep 24, 2024
c4785a2
middle ground
borkdude Sep 24, 2024
6d03031
add compiled class
borkdude Sep 24, 2024
3481c7a
compile using 8
borkdude Sep 24, 2024
474e8d2
improve instruction
borkdude Sep 24, 2024
459a73b
First pass at Readme and Changelog
zampino Sep 24, 2024
16b97a8
Reimplement table cell alignment
zampino Sep 24, 2024
6a94cbd
Avoid runtime reflection
borkdude Sep 24, 2024
706f23b
cleanup
borkdude Sep 24, 2024
5ae6323
revert types stuff
borkdude Sep 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .dir-locals.el
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
((clojure-mode
(cider-clojure-cli-aliases . ":test:repl")))
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Changelog

## Unreleased
## 0.6 Unreleased

* We're swapping out GraalJS in favour of [commonmark-java](https://github.com/markdown-it/markdown-it) on the JVM side. The cljs implementation stays the same.
* Comply with commonmark's suggested rendering of images by default ([#18](https://github.com/nextjournal/markdown/issues/18)). This is a breaking change.

## 0.5.148
Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@

A cross-platform clojure library for [Markdown](https://en.wikipedia.org/wiki/Markdown) parsing and transformation.


🚧 _ALPHA_ status, subject to frequent change. For a richer reading experience [read this readme as a clerk notebook](https://nextjournal.github.io/markdown/README).

## Features

* _Focus on data_: parsing yields an AST ([à la Pandoc](https://nextjournal.github.io/markdown/notebooks/pandoc)) of nested data representing a structured document.
* _Cross Platform_: clojurescript native, we target the JVM using [Graal's Polyglot Library](https://www.graalvm.org/22.1/reference-manual/js/JavaInteroperability/#polyglot-context).
* _Cross Platform_: using [commonmark-java](https://github.com/commonmark/commonmark-java) on the JVM and [markdown-it](https://github.com/markdown-it/markdown-it) for clojurescript
* _Configurable [Hiccup](https://github.com/weavejester/hiccup) conversion_.

## Try
Expand Down
4 changes: 2 additions & 2 deletions bb.edn
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,12 @@
build:notebooks
{:doc "builds a Clerk static with notebooks specified in deps.edn given a specified git SHA"
:depends [cljs:sci:release]
:task (clojure (str "-X:nextjournal/clerk :git/sha '\"" (first *command-line-args*) "\"' :browse? false"))}
:task (clojure (str "-X:dev:nextjournal/clerk :git/sha '\"" (or (first *command-line-args*) "SHASHASHA") "\"' :browse? false"))}

dev
{:doc "Boots and watches both shadow browser test and sci builds"
:depends [yarn-install]
:task (clojure "-M:test:nextjournal/clerk:sci watch sci browser-test")}
:task (clojure "-M:dev:test:nextjournal/clerk:sci watch sci browser-test")}

cljs:sci
{:doc "watches cljs build"
Expand Down
23 changes: 18 additions & 5 deletions deps.edn
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
{:paths ["src" "resources"]
{:paths ["src" "resources" "classes"]
:mvn/repos {"jitpack.io" {:url "https://jitpack.io"}}
:deps {applied-science/js-interop {:mvn/version "0.3.3"}
org.clojure/data.json {:mvn/version "2.4.0"}
org.graalvm.js/js {:mvn/version "21.3.2.1"}}
org.commonmark/commonmark {:mvn/version "0.23.0"}
org.commonmark/commonmark-ext-autolink {:mvn/version "0.23.0"}
org.commonmark/commonmark-ext-footnotes {:mvn/version "0.23.0"}
org.commonmark/commonmark-ext-task-list-items {:mvn/version "0.23.0"}
org.commonmark/commonmark-ext-gfm-tables {:mvn/version "0.23.0"}
org.commonmark/commonmark-ext-gfm-strikethrough {:mvn/version "0.23.0"}}

:aliases
{:nextjournal/clerk
{:extra-paths ["notebooks"]
:extra-deps {io.github.nextjournal/clerk {:git/sha "e8f275b5cf077ec9441e404c1885ff0b6ee0aef9"
{:extra-paths ["notebooks" "dev"]
:extra-deps {io.github.nextjournal/clerk {:git/sha "f7b47ebb639ea157a0f72d6a63e7f263179c72a9"
:exclusions [io.github.nextjournal/markdown]}}
:jvm-opts ["-Dclojure.main.report=stderr"
"-Dclerk.resource_manifest={\"/js/viewer.js\" \"js/viewer.js\"}"] ;;
Expand All @@ -24,8 +30,15 @@
:quiet
{:jvm-opts ["-Dpolyglot.engine.WarnInterpreterOnly=false"]}

:dev
{:extra-paths ["dev"]
:extra-deps {org.babashka/http-client {:mvn/version "0.3.11"}
org.clojure/test.check {:mvn/version "1.1.1"}
org.graalvm.js/js {:mvn/version "21.3.2.1"}}}

:test
{:extra-paths ["test"]
:jvm-opts ["-Dclojure.main.report=stderr"]
:extra-deps {nubank/matcher-combinators {:mvn/version "3.8.3"}}
:exec-fn test-runner/run}

Expand All @@ -40,7 +53,7 @@
:main-opts ["-m" "shadow.cljs.devtools.cli"]
:jvm-opts ["-Dclerk.resource_manifest={\"/js/viewer.js\" \"http://localhost:8021/viewer.js\"}"]
:extra-deps {io.github.nextjournal/clerk.render {:git/url "https://github.com/nextjournal/clerk"
:git/sha "e8f275b5cf077ec9441e404c1885ff0b6ee0aef9"
:git/sha "f7b47ebb639ea157a0f72d6a63e7f263179c72a9"
:deps/root "render"}}}

:build
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(ns nextjournal.markdown
(ns nextjournal.markdown.graaljs
"Facility functions for handling markdown conversions"
(:require [clojure.java.io :as io]
[clojure.data.json :as json]
Expand Down Expand Up @@ -67,6 +67,13 @@
- [x] ~~thing~~
")

(parse "some text[^note] and some other[^other] but again[^note]

[^other]: some other
[^note]: some story
")


(->hiccup "# Hello Markdown

* What's _going_ on?
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
(:require [clojure.string :as str]
[clojure.zip :as z]
[nextjournal.markdown.transform :as md.transform]
[nextjournal.markdown.parser.emoji :as emoji]
[nextjournal.markdown.utils.emoji :as emoji]
#?@(:cljs [[applied-science.js-interop :as j]
[cljs.reader :as reader]])))

Expand Down Expand Up @@ -101,7 +101,9 @@

(defn parse-fence-info [info-str]
(try
(when (string? info-str)
;; NOTE: this fix is backported
;; from the new implementation 👇
(when (and (string? info-str) (seq info-str))
(let [tokens (-> info-str
str/trim
(str/replace #"[\{\}\,]" "") ;; remove Pandoc/Rmarkdown brackets and commas
Expand Down Expand Up @@ -417,6 +419,8 @@ end"
(let [new-loc (-> loc (z/replace {:type :sidenote-container :content []})
(z/append-child node)
(z/append-child {:type :sidenote-column
;; TODO: broken in the old implementation
;; should be :content (mapv #(footnote->sidenote (get footnotes %)) (distinct refs))}))]
:content (mapv #(footnote->sidenote (get footnotes %)) refs)}))]
(recur (z/right new-loc) (z/up new-loc)))
(recur (z/right loc) parent))
Expand Down
59 changes: 59 additions & 0 deletions dev/old_vs_new.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
(ns old-vs-new
(:require [clojure.string :as str]
[nextjournal.markdown.graaljs :as md-old]
[nextjournal.markdown :as md]
[babashka.http-client :as http]
[clojure.test.check :as tc]
[clojure.test.check.generators :as gen]
[clojure.test.check.properties :as prop]
[clojure.test.check.clojure-test :refer [defspec]]))

(comment
(= (md/parse "https://github.com")
(md-old/parse "https://github.com"))

(:body (http/get "https://jaspervdj.be/lorem-markdownum/markdown.txt?fenced-code-blocks=on"))

(let [sample (:body (http/get "https://jaspervdj.be/lorem-markdownum/markdown.txt?num-blocks=1000&fenced-code-blocks=on"))]
[(with-out-str (time (md-old/parse sample)))
(with-out-str (time (md/parse sample)))]))

(def feature-keys
(list "no-headers"
"no-code"
"no-quotes"
"no-lists"
"no-inline-markup"
"no-external-links"
"underline-headers"
"underscore-em"
"underscore-strong"
"no-wrapping"
"fenced-code-blocks"
"reference-links"))

(defn opts->query [opts]
(str/join "&" (keep (fn [[k v]] (when v (str k "=on"))) opts)))

(defn size+opts->random-md-str [size opts]
(:body (http/get (format "https://jaspervdj.be/lorem-markdownum/markdown.txt?%s&num-blocks=%s" (opts->query opts) size))))

(def md-generator
(gen/sized (fn [size]
#_(prn :size size)
(gen/fmap #(size+opts->random-md-str size %)
(gen/fmap (partial zipmap feature-keys) (gen/vector gen/boolean 12))))))

#_(gen/sample md-generator)

(def compare-old-vs-new-parse-implementations
(prop/for-all [s md-generator]
(= (md/parse s)
(md-old/parse s))))

#_(tc/quick-check 100 compare-old-vs-new-parse-implementations)

(defspec test-old-vs-new-implem 100
compare-old-vs-new-parse-implementations)

#_(test-old-vs-new-implem)
36 changes: 20 additions & 16 deletions notebooks/benchmarks.clj
Original file line number Diff line number Diff line change
@@ -1,29 +1,37 @@
;; # ⏱ Some Naïve Benchmarks
(ns ^:nextjournal.clerk/no-cache benchmarks
(ns benchmarks
{:nextjournal.clerk/no-cache true}
(:require [clojure.test :refer :all]
[nextjournal.clerk :as clerk]
[nextjournal.clerk.eval :as clerk.eval]
[nextjournal.markdown :as md]
parsing-extensibility
[nextjournal.markdown.parser :as md.parser]))
[nextjournal.markdown.graaljs :as old-md]
[nextjournal.markdown.utils :as u]
[parsing-extensibility]))

(def reference-text (slurp "notebooks/reference.md"))

(defmacro time-ms [& expr]
`(-> (clerk.eval/time-ms (dotimes [_# 100] ~@expr)) :time-ms (/ 100)))

(comment
(macroexpand '(time-ms do-this)))

;; Compare with different set of tokenizers
(defn parse
([text] (parse [] text))
([extra-tokenizers text]
(md.parser/parse (update md.parser/empty-doc :text-tokenizers concat extra-tokenizers)
(md/tokenize text))))
(md/parse* (assoc u/empty-doc :text-tokenizers extra-tokenizers)
text)))

;; Default set of tokenizers
(time-ms (parse reference-text))
(-> (parse reference-text) :content count)

;; Default set of tokenizers, warmup
[(time-ms (parse reference-text))
(time-ms (parse reference-text))
(time-ms (parse reference-text))]

;; GraalJS based implementation
[(time-ms (old-md/parse reference-text))
(time-ms (old-md/parse reference-text))
(time-ms (old-md/parse reference-text))]

;; With an extra brace-brace parser
(time-ms (parse [{:regex #"\{\{([^\{]+)\}\}"
Expand All @@ -39,14 +47,10 @@

;; With hashtags and internal links
(time-ms
(parse [md.parser/hashtag-tokenizer
md.parser/internal-link-tokenizer
(parse [u/hashtag-tokenizer
u/internal-link-tokenizer
{:regex #"\{\{([^\{]+)\}\}"
:handler (fn [m] {:type :var :text (m 1)})}
{:tokenizer-fn parsing-extensibility/losange-tokenizer-fn
:handler (fn [data] {:type :losange :data data})}]
reference-text))

^{::clerk/visibility {:code :hide :result :hide}}
(comment
(clerk/serve! {:port 8888}))
13 changes: 5 additions & 8 deletions notebooks/pandoc.clj
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
[nextjournal.clerk :as clerk]
[nextjournal.clerk.viewer :as v]
[nextjournal.markdown :as md]
[nextjournal.markdown.parser :as md.parser]
[nextjournal.markdown.utils :as u]
[nextjournal.markdown.transform :as md.transform]))

;; From the [docs](https://pandoc.org/MANUAL.html#description):
Expand Down Expand Up @@ -161,20 +161,20 @@ this _is_ a
{:type :list-item
:content (keep pandoc->md li)}) (second (:c node)))})

:Math (fn [node] (let [[_meta latex] (:c node)] (md.parser/block-formula latex)))
:Math (fn [node] (let [[_meta latex] (:c node)] (u/block-formula latex)))
:Code (fn [node]
(let [[_meta code] (:c node)]
{:type :monospace :content [(md.parser/text-node code)]}))
{:type :monospace :content [(u/text-node code)]}))
:CodeBlock (fn [node]
(let [[[_id classes _meta] code] (:c node)]
{:type :code
:content [(md.parser/text-node code)]}))
:content [(u/text-node code)]}))
:SoftBreak (constantly {:type :softbreak})
:RawBlock (constantly nil)
:RawInline (fn [{:keys [c]}]
(cond
(and (vector? c) (= "latex" (first c)))
(md.parser/formula (second c))))})
(u/formula (second c))))})

^{::clerk/visibility {:result :hide}}
(defn pandoc->md [{:as node :keys [t pandoc-api-version blocks]}]
Expand Down Expand Up @@ -226,9 +226,6 @@ this _is_ a

^{::clerk/visibility {:result :hide :code :hide}}
(comment
(clerk/serve! {:port 9999})
(clerk/clear-cache!)
(-> *e ex-cause ex-data)
(json/read-str
(:out
(shell/sh "pandoc" "-f" "markdown" "-t" "json" :in markdown-text))
Expand Down
Loading
Loading