Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ Handle remaining unimplemented content parsing features #921

Open
13 tasks
PIG208 opened this issue Aug 30, 2024 · 2 comments
Open
13 tasks

☂️ Handle remaining unimplemented content parsing features #921

PIG208 opened this issue Aug 30, 2024 · 2 comments
Labels
a-content Parsing and rendering Zulip HTML content, notably message contents
Milestone

Comments

@PIG208
Copy link
Member

PIG208 commented Aug 30, 2024

This is a follow-up to #190 and #917, to serve as an umbrella issue for unimplemented features to be handled.

Current features:

Legacy features/behaviors that have been removed:

  • <div class="inline-preview-twitter">: No longer useful since the Twitter API change, but there are some existing messages (example).
  • <span class="timestamp">: Added -> Removed (example)
  • <img class="message_body_gravatar">: Added -> Removed (example)
  • <span class="inline-subscribe">. Added -> Removed (example)
  • <div class="message_inline_image"> message_inline_image is still in use, but potentially something specific about how some older messages was structured causes the parser to fail (example)
Old/New
  <p>
-   <a href="https://www.dropbox.com/s/7vb5y14vr73lt3r/2018-07-27%2018.51.30.jpg?dl=0" target="_blank" title="https://www.dropbox.com/s/7vb5y14vr73lt3r/2018-07-27%2018.51.30.jpg?dl=0">https://www.dropbox.com/s/7vb5y14vr73lt3r/2018-07-27%2018.51.30.jpg?dl=0</a>
+   <a href="https://www.dropbox.com/s/7vb5y14vr73lt3r/2018-07-27%2018.51.30.jpg?dl=0">https://www.dropbox.com/s/7vb5y14vr73lt3r/2018-07-27%2018.51.30.jpg?dl=0</a>
  </p>
  <div class="message_inline_image">
-   <a href="https://www.dropbox.com/s/7vb5y14vr73lt3r/2018-07-27%2018.51.30.jpg?dl=0" target="_blank" title="2018-07-27 18.51.30.jpg">
+   <a href="https://www.dropbox.com/s/7vb5y14vr73lt3r/2018-07-27%2018.51.30.jpg?dl=0" title="2018-07-27 18.51.30.jpg">
-     <img src="https://www.dropbox.com/s/7vb5y14vr73lt3r/2018-07-27%2018.51.30.jpg?dl=1">
+     <img src="/external_content/82ccbf2c8cb498e0ef6d5a78d18dca92d208c0eb/68747470733a2f2f7777772e64726f70626f782e636f6d2f732f37766235793134767237336c7433722f323031382d30372d323725323031382e35312e33302e6a70673f7261773d31">
    </a>
  </div>

Unfinished features from one-off CZO experiments:


Full Output

Found unimplemented features in 33742 out of 1295577 public messages:

  • <span class="topic-mention">
    Oldest message: 1609467; newest message: 1925998 (15/33742)

  • <table>
    Oldest message: 33947; newest message: 1931762 (348/33742)

  • <div class="codehilite">
    Oldest message: 3444; newest message: 1845456 (10175/33742)

  • <img>
    Oldest message: 1792633; newest message: 1792683 (3/33742)

  • <span class="tex-error">
    Oldest message: 176408; newest message: 1768950 (25/33742)

  • <span class="topic-mention silent">
    Oldest message: 1609468; newest message: 1704065 (2/33742)

  • <div class="inline-preview-twitter">
    Oldest message: 29260; newest message: 1574645 (22176/33742)

  • <span class="katex-display">
    Oldest message: 202662; newest message: 1355972 (19/33742)

  • <span class="timestamp-error">
    Oldest message: 925908; newest message: 1267388 (10/33742)

  • <div class="message_inline_ref">
    Oldest message: 61290; newest message: 945000 (26/33742)

  • <img class="message_body_gravatar">
    Oldest message: 15312; newest message: 927237 (55/33742)

  • <span class="timestamp">
    Oldest message: 882554; newest message: 908075 (38/33742)

  • <p>
    Oldest message: 176412; newest message: 908053 (65/33742)

  • <div class="message_inline_image">
    Oldest message: 4324; newest message: 751747 (709/33742)

  • <span class="katex">
    Oldest message: 308073; newest message: 426840 (3/33742)

  • <div class="message_embed">
    Oldest message: 192764; newest message: 193181 (28/33742)

  • <span class="inline-subscribe">
    Oldest message: 4297; newest message: 97826 (45/33742)


These features are categorized to help us determine the priorities.

  • The current features should be supported before launch. There can also be potential bugs in the content parser affecting current features.
  • Most of the legacy features (such as inline-subscribe) are irrelevant enough that we probably just need to acknowledge them and render them as plain text.
  • The one-off experiments can likely be addressed by just removing the messages from CZO, without us handling them.

We might add more unimplemented features here as we find more of them later. We tested on all public messages from CZO (1295577 messages).

Also related:

@PIG208 PIG208 added the a-content Parsing and rendering Zulip HTML content, notably message contents label Aug 30, 2024
@chrisbobbe

This comment was marked as resolved.

@PIG208

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-content Parsing and rendering Zulip HTML content, notably message contents
Projects
Status: No status
Development

No branches or pull requests

2 participants