Skip to content

Commit

Permalink
Snippets: don't parse the entire article: 1000+maxSymbols is enough
Browse files Browse the repository at this point in the history
  • Loading branch information
edwardspec committed Dec 30, 2024
1 parent 1828250 commit 4a55c2a
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 0 deletions.
1 change: 1 addition & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ List of changes between releases of Extension:JsCalendar.

Features:
* Support PostgreSQL.
* Removal of images from HTML snippets is now done with a proper HTML parser instead of regexes.

== JsCalendar 0.4.0 ==

Expand Down
6 changes: 6 additions & 0 deletions includes/EventCalendar.php
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,12 @@ public function findEvents( array $opt, Parser $recursiveParser ) {
// NOTE: we can't use getParserOutput() here, because we are already inside Parser::parse().
$snippet = $recursiveParser->recursiveTagParseFully( $row->text );

// Truncate to maximum allowed length PLUS some extra (in case some HTML tags get removed),
// so that we wouldn't have to sanitize the whole article (potentially 10-80 kb.),
// but also without snippet becoming shorter than $maxSymbols after the tags are removed.
$extraSymbols = 1000;
$snippet = mb_substr( $snippet, 0, $maxSymbols + $extraSymbols );

// Remove the image tags: in 99,9% of cases they are too wide to be included into the calendar.
$snippet = HtmlSanitizer::sanitizeSnippet( $snippet );

Expand Down

0 comments on commit 4a55c2a

Please sign in to comment.