Skip to content

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Aug 26, 2025

Bumps org.jsoup:jsoup from 1.16.1 to 1.21.2.

Release notes

Sourced from org.jsoup:jsoup's releases.

jsoup 1.21.2

jsoup 1.21.2 is out now, adding support for custom SSLContext in HTTP/2 connections, and improving consistency in how user data is handled in attributes. It also brings performance gains in DOM manipulation and fragment parsing, and fixes several edge cases in stream parsing, traversal, cloning, and concurrent reads.

jsoup is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

Changes

  • Deprecated internal (yet visible) methods Normalizer#normalize(String, bool) and Attribute#shouldCollapseAttribute(Document.OutputSettings). These will be removed in a future version.
  • Deprecated Connection#sslSocketFactory(SSLSocketFactory) in favor of the new Connection#sslContext(SSLContext). Using sslSocketFactory will force the use of the legacy HttpUrlConnection implementation, which does not support HTTP/2. #2370

Improvements

  • When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. #2349.
  • Updated Connection.Response#statusMessage() to return a simple loggable string message (e.g. "OK") when using the HttpClient implementation, which doesn't otherwise return any server-set status message. #2356
  • Attributes#size() and Attributes#isEmpty() now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. #2369
  • Added Connection#sslContext(SSLContext) to provide a custom SSL (TLS) context to requests, supporting both the HttpClient and the legacy HttUrlConnection implementations. #2370
  • Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (element.child(0).remove(), and when using Parser#parseBodyFragement() to parse a large number of direct children. #2373.

Bug Fixes

  • When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. #2353.
  • In NodeTraversor, if a last child element was removed during the head() call, the parent would be visited twice. #2355.
  • Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for Attributes#size() and Attributes#isEmpty(). #2356
  • In a multithreaded application where multiple threads are calling Element#children() on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. #2366
  • Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. #2377.

jsoup 1.21.1

jsoup 1.21.1 is out now, featuring powerful new node selection capabilities that let you target specific DOM nodes like comments and text nodes using CSS selectors, dynamic tag customization through the new TagSet callback system, and improved defense against mutation XSS attacks with simplified attribute escaping. This release also brings HTTP/2 support by default, numerous API improvements for better developer experience, and fixes for several edge-case parsing issues.

jsoup is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

Changes

  • Removed previously deprecated methods. #2317
  • Deprecated the :matchText pseduo-selector due to its side effects on the DOM; use the new ::textnode selector and the Element#selectNodes(String css, Class<T> type) method instead. #2343
  • Deprecated Connection.Response#bufferUp() in lieu of Connection.Response#readFully() which can throw a checked IOException.
  • Deprecated internal methods Validate#ensureNotNull(Object) (replaced by typed Validate#expectNotNull(T)); protected HTML appenders from Attribute and Node.
  • If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release.

Improvements

  • Enhanced the Selector to support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment: ::comment:contains(prices) + p will select p elements immediately after a <!-- prices: --> comment. Supported types include ::node, ::leafnode, ::comment, ::text, ::data, and ::cdata. Node contextual selectors like ::node:contains(text), :matches(regex), and :blank are also supported. Introduced Element#selectNodes(String css) and Element#selectNodes(String css, Class<T> nodeType) for direct node selection. #2324
  • Added TagSet#onNewTag(Consumer<Tag> customizer): register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace). #2330
  • Made TokenQueue and CharacterReader autocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.
  • Added Selector#evaluatorOf(String css), as a clearer way to obtain an Evaluator from a CSS query. An alias of QueryParser.parse(String css).
  • Custom tags (defined via the TagSet) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.
  • Added NodeVisitor#traverse(Node) to simplify node traversal calls (vs. importing NodeTraversor).
  • Updated the default user-agent string to improve compatibility. #2341
  • The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) #2326
  • Added Connection.Response#readFully() as a replacement for Connection.Response#bufferUp() with an explicit IOException. Similarly, added Connection.Response#readBody() over Connection.Response#body(). Deprecated Connection.Response#bufferUp(). #2327
  • When serializing HTML, the < and > characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. #2337
  • Changed Connection to prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via -Djsoup.useHttpClient=false. #2340

Bug Fixes

  • The contents of a script in a svg foreign context should be parsed as script data, not text. #2320

... (truncated)

Changelog

Sourced from org.jsoup:jsoup's changelog.

1.21.2 (2025-Aug-25)

Changes

  • Deprecated internal (yet visible) methods Normalizer#normalize(String, bool) and Attribute#shouldCollapseAttribute(Document.OutputSettings). These will be removed in a future version.
  • Deprecated Connection#sslSocketFactory(SSLSocketFactory) in favor of the new Connection#sslContext(SSLContext). Using sslSocketFactory will force the use of the legacy HttpUrlConnection implementation, which does not support HTTP/2. #2370

Improvements

  • When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. #2349.
  • Updated Connection.Response#statusMessage() to return a simple loggable string message (e.g. "OK") when using the HttpClient implementation, which doesn't otherwise return any server-set status message. #2356
  • Attributes#size() and Attributes#isEmpty() now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. #2369
  • Added Connection#sslContext(SSLContext) to provide a custom SSL (TLS) context to requests, supporting both the HttpClient and the legacy HttUrlConnection implementations. #2370
  • Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (element.child(0).remove(), and when using Parser#parseBodyFragement() to parse a large number of direct children. #2373.

Bug Fixes

  • When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. #2353.
  • In NodeTraversor, if a last child element was removed during the head() call, the parent would be visited twice. #2355.
  • Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for Attributes#size() and Attributes#isEmpty(). #2356
  • In a multithreaded application where multiple threads are calling Element#children() on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. #2366
  • Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. #2377.

1.21.1 (2025-Jun-23)

Changes

  • Removed previously deprecated methods. #2317
  • Deprecated the :matchText pseduo-selector due to its side effects on the DOM; use the new ::textnode selector and the Element#selectNodes(String css, Class type) method instead. #2343
  • Deprecated Connection.Response#bufferUp() in lieu of Connection.Response#readFully() which can throw a checked IOException.
  • Deprecated internal methods Validate#ensureNotNull (replaced by typed Validate#expectNotNull); protected HTML appenders from Attribute and Node.
  • If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release.

Improvements

  • Enhanced the Selector to support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment: ::comment:contains(prices) + p will select p elements immediately after a <!-- prices: --> comment. Supported types include ::node, ::leafnode, ::comment, ::text, ::data, and ::cdata. Node contextual selectors like ::node:contains(text), :matches(regex), and :blank are also supported. Introduced Element#selectNodes(String css) and Element#selectNodes(String css, Class nodeType) for direct node selection. #2324
  • Added TagSet#onNewTag(Consumer<Tag> customizer): register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace).
  • Made TokenQueue and CharacterReader autocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.
  • Added Selector#evaluatorOf(String css), as a clearer way to obtain an Evaluator from a CSS query. An alias of QueryParser.parse(String css).
  • Custom tags (defined via the TagSet) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.
  • Added NodeVisitor#traverse(Node) to simplify node traversal calls (vs. importing NodeTraversor).
  • Updated the default user-agent string to improve compatibility. #2341
  • The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) #2326.
  • Added Connection#readFully() as a replacement for Connection#bufferUp() with an explicit IOException. Similarly, added Connection#readBody() over Connection#body(). Deprecated Connection#bufferUp(). #2327
  • When serializing HTML, the < and > characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. #2337
  • Changed Connection to prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via -Djsoup.useHttpClient=false. #2340

Bug Fixes

  • The contents of a script in a svg foreign context should be parsed as script data, not text. #2320
  • Tag#isFormSubmittable() was updating the Tag's options. #2323
  • The HTML pretty-printer would incorrectly trim whitespace when text followed an inline element in a block element. #2325
  • Custom tags with hyphens or other non-letter characters in their names now work correctly as Data or RcData tags. Their closing tags are now tokenized properly. #2332
  • When cloning an Element, the clone would retain the source's cached child Element list (if any), which could lead to incorrect results when modifying the clone's child elements. #2334
  • When parsing HTML with svg:script elements in SVG elements, don't enter the Text insertion mode, but continue to parse as foreign content. Otherwise, misnested HTML could then cause an IndexOutOfBoundsException. #2374

... (truncated)

Commits
  • b02837b [maven-release-plugin] prepare release jsoup-1.21.2
  • 1f0c207 v1.21.2 release date
  • b093463 Use central-publishing-maven-plugin
  • 615b959 Updating sonatype deploy URLs
  • 6961720 Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.11.2 to 3.11.3 (#2386)
  • 82864b2 Bump jetty.version from 9.4.57.v20241219 to 9.4.58.v20250814 (#2385)
  • 71f963e Fix for HTML that breaks the select scope
  • 6b20f6e Removed effective recursion closing \</select>
  • eb2957a Bump actions/checkout from 4 to 5 (#2382)
  • 3a9a6c7 Fix ProxyTest in CI
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [org.jsoup:jsoup](https://github.com/jhy/jsoup) from 1.16.1 to 1.21.2.
- [Release notes](https://github.com/jhy/jsoup/releases)
- [Changelog](https://github.com/jhy/jsoup/blob/master/CHANGES.md)
- [Commits](jhy/jsoup@jsoup-1.16.1...jsoup-1.21.2)

---
updated-dependencies:
- dependency-name: org.jsoup:jsoup
  dependency-version: 1.21.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file java Pull requests that update java code labels Aug 26, 2025
@dependabot dependabot bot requested a review from bmc08gt August 26, 2025 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file java Pull requests that update java code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant