Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache #4

Open
pawamoy opened this issue May 9, 2022 · 7 comments
Open

Cache #4

pawamoy opened this issue May 9, 2022 · 7 comments
Assignees
Labels
feature New feature or request fund Issue priority can be boosted insiders Candidate for Insiders

Comments

@pawamoy
Copy link
Owner

pawamoy commented May 9, 2022

Is your feature request related to a problem? Please describe.
Generating text/images/svg can slow down rendering a lot.

Describe the solution you'd like
Allow to cache things either during serve (memory), or across multiple builds (filesystem).
A cache option could be added. If boolean value, use hash of the code block's contents as ID.
Otherwise use cache option value as ID. Only useful for cross-builds cache.
Items can then be deleted from the cache by deleting the files in /tmp with the ID as name.

Boost priority

  • Boost priority in our backlog through Polar.sh. Higher pledge, higher priority.
  • Minimum pledge by user/organization is $5, minimum amount for boost is $30.
  • View all issues with pledges.
  • We receive the funds once the issue is completed and confirmed by you.
  • Features with the insiders label are released to sponsors first, and tied to a funding goal.
Fund with Polar
@pawamoy pawamoy self-assigned this Jun 13, 2024
@pawamoy pawamoy added feature New feature or request fund Issue priority can be boosted insiders Candidate for Insiders labels Oct 25, 2024
@15r10nk
Copy link

15r10nk commented Dec 5, 2024

how about going one step further and use the cached output to test your documentation.
The cached output could be versioned in git and tested in a separate step (pytest plugin, or markdown-exec-test command).

You could maybe write two files to disk. One output.approved and a output.new.
output.new would be generated and used by mkdocs serve. mkdocs build would only read output.approved (should be useful because you want to serve a deterministic documentation).

The test could check if there are any differences between both files and provide a way to approve the changes (copy output.new -> output.approved).

It might be difficult to decide in which cases you want to regenerate the cache (*.new files`).

@pawamoy
Copy link
Owner Author

pawamoy commented Dec 5, 2024

That would be useful to prevent involuntary changes, yes. Reproducibily is an important aspect, even though I was more interested into the performance once. I suppose we could hit two (virtual) birds with one stone and provide options that would allow users to do that themselves, easily? For example by configuring the cache folder to be docs/assets or something like that. This way generated files can be tracked. If they exist, Markdown-Exec uses them. A global option or env var would tell Markdown-Exec not to use the cache, and users would be able to run git diff after a build to assert "no changes".

@15r10nk
Copy link

15r10nk commented Dec 6, 2024

But you probably don't want to regenerate your docs just to test the docs. A dedicated tool/pytest-plugin would be useful to execute just the tests and compare the output with the last output on disk.
Is it be possible to run the core part of markdown-exec which generates the code without mkdocs?

@pawamoy
Copy link
Owner Author

pawamoy commented Dec 6, 2024

Is it be possible to run the core part of markdown-exec which generates the code without mkdocs?

Not currently. Markdown-Exec is based on PyMDown-Extensions' SuperFences, which is a Python-Markdown extension, and so we need to run markdown.convert(...) on each page with these extensions enabled. Pages can be autogenerated through MkDocs plugins, etc.. Additionally, code blocks might use things that are only available when running through MkDocs, typically the MKDOCS_CONFIG_DIR that Markdown-Exec sets when enabled as a MkDocs plugins (though this env var can easily be set manually when testing too).

@pawamoy
Copy link
Owner Author

pawamoy commented Dec 6, 2024

I wonder if testing the code blocks shouldn't be done separately. If you have a lot of them, it's probably a good idea anyway to write them in their own Python/other files, and inject them with pymdownx-snippets, also allowing easier testing since the Python/other files can now be imported/used anywhere.

@15r10nk
Copy link

15r10nk commented Dec 6, 2024

Some context:
I build the tests for my documentation myself. This is how my tests usually look like:

https://github.com/15r10nk/inline-snapshot/blob/69aa4e6daff81c57cd07ccc6379649b536125693/docs/in_snapshot.md?plain=1#L8-L19

They html comments are evaluated here https://github.com/15r10nk/inline-snapshot/blob/main/tests/test_docs.py

I started to use markdown-exec in my documetation:

https://github.com/15r10nk/inline-snapshot/blob/69aa4e6daff81c57cd07ccc6379649b536125693/docs/pytest.md?plain=1#L42-L65

Which looks great 👍, but caused two issues so far:

15r10nk/inline-snapshot#98
15r10nk/inline-snapshot#122

The problem is that the code is always evaluated again and that it is difficult to check (manually) if the output is correct.

I would like to have the same safety with the markdown-exec documentation which I have with my own test.

Saving the output in git and use exactly this output for mkdocs build would already be really useful.

rerunning the tests in ci would also be nice.
maybe this could be possible if you not only write the output to disk but also the input (this could solve the issue of codeblocks which are dynamically generated)

If you have a lot of them, it's probably a good idea anyway to write them in their own Python/other files

I don't like indirection in my documentation (and in general). I want that the tests in the code blocks are part of my documentation.

Another Idea is that you not only save the output to disk but the input too.

=== input
code block content
...
=== output
generated output
...

This would make it easier to understand ... because you have less indirection 😃

@pawamoy
Copy link
Owner Author

pawamoy commented Dec 10, 2024

Thank you for the context @15r10nk, this is super helpful!

I believe the two issues you mention could be solved by making sure your code blocks fail, in which case markdown-exec logs a warning, which will make strict MkDocs builds fail too.

For example, your Bash code block (https://github.com/15r10nk/inline-snapshot/blob/69aa4e6daff81c57cd07ccc6379649b536125693/docs/pytest.md?plain=1#L42-L65) could use set -e so that the absence of a pytest command would make the script fail, and markdown-exec would report that as a warning.

Same thing for the issue where pytest is found but dirty-equals is missing: pytest would fail with code 1, and Bash's set -e would propagate that. The alternative is to store $? in your run function after $@, and return it again after the last echo. This would make the script exit with this value since the run line is last. set -e is probably easier and more robust.

In short, make sure that errors in code blocks are not silent (especially in Bash code blocks) 🙂

I don't like indirection in my documentation (and in general). I want that the tests in the code blocks are part of my documentation.

Fair concern! No IDE will understand the indirection, and won't let you ctrl-click the filepath I believe, so that makes it harder indeed. However with inlined code blocks you don't enjoy the same level of editor support, like auto-completion, linting, etc., and you need specialized tools to format code in Markdown (I think Ruff still doesn't do that yet?). So yeah, that's a tradeoff I suppose.

Another Idea is that you not only save the output to disk but the input too.

Interesting! So, IIUC:

  • upon building docs
  • (optional: with a special markdown-exec toggle)
  • markdown-exec writes both code block inputs (the code itself?) and outputs (as in captured output, which could be stdout/stderr/both/nothing) into a user-configured location
  • user can track these files in Git
  • some utility can re-execute these inputs and compare the outputs in CI
  • this utility could be provided by markdown-exec as well (since it already knows how to execute code)

I can see the value of such a feature. I'd just like to note that in most cases, you just want the code block to "succeed" without really checking the output, since this output could change (yet still be valid) for many reasons (for example: a new pytest version slighly updating output format), and validating it manually every time could be a lot of maintenance work.

I'd love to see a few examples where we would actually want to assert the exact value of the output, if you can think of some!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request fund Issue priority can be boosted insiders Candidate for Insiders
Projects
None yet
Development

No branches or pull requests

2 participants