Commit Graph

211 Commits

Author SHA1 Message Date
Alex Vandiver
6a40c17ccf markdown: CSS-escape preview links.
This adds `soupsieve` as an explicit dependency, but intentionally
does not adjust the provision version, as it was already an indirect
dependency.
2021-10-26 18:17:23 -07:00
Alex Vandiver
52f74bbd9b markdown: Run URL preview links through camo.
Not proxying these requests through camo is a security concern.
Furthermore, on the desktop client, any embed image which is hosted on
a server with an expired or otherwise invalid certificate will trigger
a blocking modal window with no clear source and a confusing error
message; see zulip/zulip-desktop#1119.

Rewrite all `message_embed_image` URLs through camo, if it is enabled.
2021-10-26 18:17:23 -07:00
Anders Kaseorg
58920affd4 python: Remove re.UNICODE flag (redundant in Python 3).
https://docs.python.org/3/library/re.html#re.A

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-10-22 13:42:29 -07:00
Alex Vandiver
9381a3bd45 linkifiers: Support URL percent-encoded bytes.
Supporting URL percent-encoded bytes is possible using `%%20`, but this
is not necessarily very understandable to end-users, even those that
understand percent encoding.

Allow `%20` in linkifier URL format strings, and transform them into
`%%20` in the pattern just before they are applied in markdown
translation.  Care must be taken here, such that already-escaped `%`s
are not escaped an extra time.

We do this before rendering, and not before storage, as
a simplification; the JS-side linkifier at present only understands
`%(foo)s` and thus needs no changes, and to avoid an un-escaping pass
before showing in the admin UI.
2021-10-22 13:00:20 -07:00
Anders Kaseorg
4839b7ed27 url_preview: Interpret og:image relative to full page URL.
og:image is supposed to be an absolute URL, but some sites incorrectly
provide a relative URL.  In this case, it makes more sense to
interpret it relative to the full page URL after redirects, rather
than relative to just the domain part of the page URL before
redirects.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-10-21 12:20:37 -07:00
Alex Vandiver
db934be064 CVE-2021-41115: Use re2 for user-supplied linkifier patterns.
Zulip attempts to validate that the regular expressions that admins
enter for linkifiers are well-formatted, and only contain a specific
subset of regex grammar.  The process of checking these
properties (via a regex!) can cause denial-of-service via
backtracking.

Furthermore, this validation itself does not prevent the creation of
linkifiers which themselves cause denial-of-service when they are
executed.  As the validator accepts literally anything inside of a
`(?P<word>...)` block, any quadratic backtracking expression can be
hidden therein.

Switch user-provided linkifier patterns to be matched in the Markdown
processor by the `re2` library, which is guaranteed constant-time.
This somewhat limits the possible features of the regular
expression (notably, look-head and -behind, and back-references);
however, these features had never been advertised as working in the
context of linkifiers.

A migration removes any existing linkifiers which would not function
under re2, after printing them for posterity during the upgrade; they
are unlikely to be common, and are impossible to fix automatically.

The denial-of-service in the linkifier validator was discovered by
@erik-krogh and @yoff, as GHSL-2021-118.
2021-10-04 21:26:24 +00:00
Tim Abbott
545911b051 markdown: Remove useless locless_schemes check.
This check was copied from upstream python-markdown's "safe mode"
before they removed that feature.  The upstream history is that they
introduced this check in
2db5d1c8e4,
which was not a complete security check, and then added the
immediately following check (with an allowlist of schemes) in
0b4ffbb60e.

Their first, incomplete check provides no security benefit and makes
the code hard to reason about, so we remove it.
2021-09-09 09:03:40 -07:00
rht
c24ab8c4d3 markdown: Expand list of safelisted URL schemes to match HTML spec. 2021-09-09 09:03:40 -07:00
Anders Kaseorg
66ad6a4583 docs: Inline code spans are not blocks.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-09-07 16:12:39 -07:00
Anders Kaseorg
646c04eff2 Rename default branch to ‘main’.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-09-06 12:56:35 -07:00
Alex Vandiver
4d428490fd outgoing_http: Use OutgoingSession subclasses in more places.
This adds the X-Smokescreen-Role header to proxy connections, to track
usage from various codepaths, and enforces a timeout.  Timeouts were
kept consistent with their previous values, or set to 5s if they had
none previously.
2021-09-01 05:34:13 -07:00
Priyansh Garg
1e51c23494 markdown: Remove unnecessary checks for zulip_message.
This commits removes some unnecessary checks for `self.md.zulip_message`,
which were put there historically, as earlier we used to add the additional
properties like mentions_user_ids, alert_words, etc. to Message dict
only. These were later moved to MessageRenderingResult class in commit
75cea329b but the checks weren't removed.

This is important because while rendering the messages imported from
other chat tools (like Rocket.Chat), the Message dict is not passed to
the markdown, due to which the checks for `self.md.zerver_message` fails
and hence, things like user mentions, stream/topic mentions are not
rendered in the imported messages properly.
2021-08-31 16:53:42 -07:00
Anders Kaseorg
4206e5f00b python: Remove locally dead code.
These changes are all independent of each other; I just didn’t feel
like making dozens of commits for them.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-19 01:51:37 -07:00
Anders Kaseorg
806494da06 markdown: Stream and parse incrementally in fetch_open_graph_image.
This way we can stop reading as soon as we get to the body.  Also,
send an Accept header, check that the request was actually successful,
use lxml.etree.iterparse instead of a broken hand-rolled state
machine, and support XHTML, all for negative 28 lines of code.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-05 09:17:32 -07:00
Priyansh Garg
0a875c1c4c markdown: Fix jpeg extension in IMAGE_EXTENSIONS. 2021-08-05 08:54:02 -07:00
Anders Kaseorg
42fa62e563 Revert "time_widget: Make the generated time string more readable."
This reverts commit 1965584eec.

This syntax has a bad interaction with table syntax and needs to be
rethought.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-03 16:45:31 -07:00
Ganesh Pawar
1965584eec time_widget: Make the generated time string more readable.
Before: <time:2021-07-14T00:14:00-07:00>
After: <time:2021-07-14|00:14:00|UTC-07:00>

Fixes #19205
2021-08-02 23:17:01 -07:00
Anders Kaseorg
3665deb93a python: Remove unnecessary intermediate lists.
Generated automatically by pyupgrade.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-02 15:53:52 -07:00
Anders Kaseorg
162e9d6c0b fenced_code: Optimize FENCE_RE to fix cubic worst-case complexity.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-07-22 16:40:44 -07:00
Anders Kaseorg
c56440ded0 requirements: Upgrade Python requirements.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-07-05 12:23:06 -07:00
Priyansh Garg
94a2be06f3 markdown: Use a shared variable for IMAGE_EXTENSION. 2021-07-02 11:22:55 -07:00
akshatdalton
44a298b671 minor: Use OUTER_CAPTURE_GROUP variable instead of string value. 2021-06-25 17:43:27 -07:00
akshatdalton
490f6b6880 markdown: Extract regex in local variables. 2021-06-25 17:43:01 -07:00
PIG208
75cea329b4 markdown: Refactor out additional properties added to Message.
This adds a new class called MessageRenderingResult to contain the
additional properties we added to the Message object (like alert_words)
as well as the rendered content to ensure typesafe reference. No
behavioral change is made except changes in typing.

This is a preparatory change for adding django-stubs to the backend.

Related: #18777
2021-06-24 18:14:53 -07:00
akshatdalton
c507931ac8 refactor: Export non-markdown logic in mention.py. 2021-06-14 13:26:30 -07:00
Wesley Aptekar-Cassels
d5ba94082a markdown: Increase max rendered message length to 1MB.
This should help with #17425, where messages with lots of LaTeX are
lost, due to the large expansion factor.

This isn't a total fix for this - large messages with lots of LaTeX
can still end up larger than 1MB, and rendering could timeout, but
this fix should help significantly.

1MB is still small enough that I don't expect we'll run into any DOS
problems - my testing didn't show any problems rendering messages that
contain ~1MB of LaTeX.
2021-06-03 10:10:35 -07:00
akshatdalton
7df62ebbaf settings: Make MAX_MESSAGE_LENGTH a server-level setting.
This will offer users who are self-hosting to adjust
this value. Moreover, this will help to reduce the
overall time taken to test `test_markdown.py` (since
this can be now overridden with `override_settings`
Django decorator).

This is done as a prep commit for #18641.
2021-06-03 09:26:28 -07:00
akshatdalton
832c763c38 minor: Remove unnecessary __init__ method in InlineInterestingLinkProcessor.
Subclass `Treeprocessor` takes care of the `__init__` method.
2021-05-26 17:13:03 -07:00
Anders Kaseorg
bac96cae80 markdown: Fix Dropbox image previews.
?dl=1 causes Dropbox to send Content-Type: application/binary, which
can’t be interpreted by Camo.  Use ?raw=1 instead.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-25 13:42:29 -07:00
akshatdalton
503247ebfa refactor: Add class CompiledInlineProcessor to de-duplicate code. 2021-05-23 14:30:22 -07:00
akshatdalton
78f26b6031 minor: Use super to initialize subclass. 2021-05-23 14:30:22 -07:00
akshatdalton
18203d8af3 markdown: Silence user group mention inside blockquotes. 2021-05-18 17:31:25 -07:00
akshatdalton
0245b590e9 markdown: Add support for user group silent mention.
Prior to this, we only supported direct mention to
the user groups. This commit extends that support
to silent mention for the user groups.
A related test case is also added.

Fixes: #11711.
2021-05-18 17:31:25 -07:00
akshatdalton
f56fca308a mention: Refactor USER_GROUP_MENTIONS_RE and simplify its related code path.
Earlier, USER_GROUP_MENTIONS_RE was:
r"(?<![^\s\'\"\(,:<])@(\*[^\*]+\*)"

For the syntax: *foo*, this was unnecessarily capturing it as
*foo* and the extraction of `foo` was done using another helper
function: `extract_user_group`.

This is now changed as:
r"(?<![^\s\'\"\(,:<])@(\*(?P<match>[^\*]+)\*)"

and extraction of `foo` can be done just by using the named capture
group `match`.

This change also helps to simplify its related code path.
2021-05-18 17:31:25 -07:00
akshatdalton
d5a36ac5e2 mention: Refactor MENTIONS_RE and simplify its related code path.
Earlier, MENTIONS_RE was:
r"(?<![^\s\'\"\(,:<])@(?P<silent>_?)(?P<match>\*\*[^\*]+\*\*)"

For the syntax: **foo**, this was unnecessarily capturing it as
**foo** and adding extra operation for the extraction of `foo`.

This is now changed as:
r"(?<![^\s\'\"\(,:<])@(?P<silent>_?)(\*\*(?P<match>[^\*]+)\*\*)"

and extraction of `foo` can be done just by using the named capture
group `match`.

This change also helps to simplify its related code path.
2021-05-18 17:31:25 -07:00
akshatdalton
a9d89b3c56 minor: Convert unicode_emoji_regex to uppercase.
Following the convention, we use uppercase for
regex. Also, `unicode_emoji_regex` is given a
conventional name ending with `*_RE`: `UNICODE_EMOJI_RE`.
2021-05-18 17:31:25 -07:00
akshatdalton
ffc4724287 minor: Convert emoticon_regex to uppercase.
Following the convention, we use uppercase for
regex. Also, `emoticon_regex` is given a
conventional name ending with `*_RE`: `EMOTICON_RE`.
2021-05-18 17:31:25 -07:00
akshatdalton
9f6e6709d3 minor: Convert user_group_mentions to uppercase.
Following the convention, we use uppercase for
regex. Also, `user_group_mentions` is given a
conventional name ending with `*_RE`: `USER_GROUP_MENTIONS_RE`.
2021-05-18 17:31:25 -07:00
akshatdalton
0a01b1b28e minor: Convert find_mentions to uppercase.
Following the convention, we use uppercase for
regex. Also, `find_mentions` is given a
conventional name ending with `*_RE`: `MENTIONS_RE`.
2021-05-18 17:31:25 -07:00
Ganesh Pawar
529f72fa3f markdown: Add support for sms and tel links.
Fixes #18390
2021-05-10 15:15:34 -07:00
akshatdalton
55f4996f16 markdown: Fix silent wildcard mentions bug.
A message containing wildcard mention when quoted (which
is turned into a silent mention) or message with silent
wildcard mention notifies the users by sending desktop,
sound, and missed message email notifications. This
is clearly a bug which is fixed by this commit.

Fixes: #18354.
2021-05-10 12:19:40 -07:00
Anders Kaseorg
d0c6f4f400 python: Strip leading and trailing spaces from docstrings.
This is enforced by Black ≥ 21.4b0.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-07 22:42:39 -07:00
Anders Kaseorg
995389b4c1 markdown: Don’t apply further Markdown processing to KaTeX output.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-01 15:43:35 -07:00
Wesley Aptekar-Cassels
6b7a3fb74a markdown: Rewrite all external images to use Camo.
Requesting external images is a privacy risk, so route all external
images through Camo.

Tweaked by tabbott for better test coverage, more comments, and to fix
bugs.
2021-04-30 10:36:16 -07:00
Anders Kaseorg
e3f2ffa681 docs: Capitalize “Markdown” consistently.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-04-26 09:31:08 -07:00
Anders Kaseorg
178736c8eb docs: Fix spelling errors caught by codespell.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-04-26 09:31:08 -07:00
Tim Abbott
2e928a0853 markdown: Remove logic for creating markdown engines for all realms.
This logic likely never ran due to a combination of bugs.

* Running `maybe_update_markdown_engines` unconditionally meant that
  `if md_engine_key in md_engines` was likely always true.
* Introduced in 65838bb: DEFAULT_MARKDOWN_KEY could never be in
  md_engines, so should we have ever reached that code path, we'd have
  tried to rebuild all markdown engines every time.

And it also wasn't clearly helpful -- because we fetch all linkifiers
for a realm on every request anyway, we don't really save database
queries by doing a bulk fetch on startup, and doing so would likely
result in a material regression to Zulip's overall startup time that
we were creating markdown engines for large numbers of realms in bulk
during process startup.
2021-04-13 09:18:18 -07:00
Abhijeet Prasad Bodas
52a86d9604 linkifiers: Use dictionaries for internal structures.
This change does not affect the API in anyway.
All internal code now uses dictionaries to
denote a linkifier, instead of tuples.
2021-04-05 18:16:08 -07:00
Abhijeet Prasad Bodas
68fe912c63 refactor: Rename most of "filter" to "linkifier".
After this only the database table, events,
and API endpoints remain.
2021-04-05 18:14:07 -07:00
Abhijeet Prasad Bodas
f896a7667f refactor: Update some uses of "filter" to "linkifier".
This updates some comments and local variables
which could be changed without breaking other
stuff.
2021-04-05 18:14:07 -07:00