mirror of
https://github.com/zulip/zulip.git
synced 2025-11-13 10:26:28 +00:00
The transforms called from `build_message_payload` use `lxml.html.fromstring` to parse (and stringify, and re-parse) the HTML generated by Markdown. However, this function fails if it is passed an empty document. "empty" is broader than just the empty string; it also includes any document made entirely out of control characters, spaces, unpaired surrogates, U+FFFE, or U+FFFF, and so forth. These documents would fail to parse, and raise a ParserError. Using `lxml.html.fragment_fromstring` handles these cases, but does by wrapping the contents in a <div> every time it is called. As such, replacing each `fromstring` with `fragment_fromstring` would nest another layer of `<div>`. Instead of each of the helper functions re-parsing, modifying, and stringifying the HTML, parse it once with `fragment_fromstring` and pass around the parsed document to each helper, which modifies it in-place. This adds one outer `<div>`, requiring minor changes to tests and the prepend-sender functions. The modification to add the sender is left using BeautifulSoup, as that sort of transform is much less readable, and more fiddly, in raw lxml. Partial fix for #19559.
29 KiB
29 KiB