Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use of guard pages without requiring a signal handler #2029

Merged
merged 1 commit into from
Nov 2, 2022

Conversation

shravanrn
Copy link
Collaborator

The Firefox use case of sandboxing untrusted libraries with Wasm is incompatible with use of signal handlers in wasm2c. This is because Firefox has its own signal handler, with a crash report, stack walking, telemetry handling etc. However, for speed, we still need to use the guard page (8gb model) to trap OOB linear memory accesses from sandboxed libraries (versus memory bounds/range checks). Given that Firefox also does not attempt to recover from any linear memory OOB, the easiest option is to give a macro that disables the install and use of signal handlers.

@shravanrn shravanrn changed the title Allow us of guard pages without requiring a signal handler Allow use of guard pages without requiring a signal handler Oct 27, 2022
@keithw
Copy link
Member

keithw commented Oct 28, 2022

This seems totally reasonable for Firefox to do -- I think the example runtime in the top-level wasm2c directory (especially the runtime "impl") is meant to be customized by the embedder to meet its environment and needs. It seems to me that Firefox should make this change in their runtime implementation and go for it, since they already have their own signal handler, etc. It's meant to be customized like that.

I'm more nervous about WABT taking it as an option for the example runtime implementation. That seems like a bit of a different question -- these files are demonstrating how to embed wasm2c-generated output when linked into a process on a POSIXy OS, and in that setting, turning off this code can let stack overflows or OOB turn in to a real segfault or illegal access/undefined behavior.

I know there have been strong feelings about this in the past (#1432), so it seems easier just to not pick a side and say, hey, the runtime implementation is up to the embedder; we provide an example demonstrating how to do this correctly on a POSIXy OS, but if you have some other environment, go wild and you do you... And if Firefox wants to do this in their runtime implementation, that seems great. WDYT?

@shravanrn
Copy link
Collaborator Author

shravanrn commented Oct 30, 2022

we provide an example demonstrating how to do this correctly on a POSIXy OS, but if you have some other environment, go wild and you do you...

Assuming we intend to ensure wasm2c can be considered a "production ready" implementation of Wasm (which is certainly something we want to do from our end, since it is being used in Firefox), I think there is a reasonable expectation that wasm2c provides a compiler and runtime that are more than just examples. I think its probably fair to say that the current runtime is not there yet, but this is part of the work we have scheduled on our end for the next few months (These include numerous memory and performance optimizations, a number of security bugs we have found in our audits, debugging aids etc. in addition to the expected tasks such as support for Wasm standards like SIMD etc.).

My thinking here is that any Wasm runtime is expected to provide knobs relevant to the production needs of their host applications --- this is something that many other wasm runtimes already do. The signal handler is one of several such knobs that is important in real settings. I think we are on the same page that this should not be on by default, and indeed this is hidden behind an extra macro which the embedder has to explicitly enable. I can add a second macro here called "WASM_RT_ALLOW_NON_STANDARDS_COMPLIANT" which is also disabled by default if that makes you less nervous.

It seems to me that Firefox should make this change in their runtime implementation and go for it, since they already have their own signal handler, etc.

Regarding the specific point about whether this can be maintained in a separate runtime --- I think while this is technically possible in this minor example, there are many additional changes (coming in future PRs that we can discuss specifically once those PRs are submitted) that are not maintainable in a minor runtime fork (including changes to "wasm2c.declarations.c"). Again, placing these as a non-default option in wasm2c, in my view, is the sensible trade-off.

A related but separate point here is that our motivation and effort to upstream is precisely to avoid the fragmentation when forking the runtime or compiler; our take is that maintaining a fork expends dev-cycles in a way that cannot be shared among those who have similar use cases. Our current dev resources certainly permit us to maintain both a separate wasm2c compiler and runtime fork indefinitely, but my belief here is that this should certainly not be our default option given a choice.

turning off this code can let stack overflows or OOB turn in to a real segfault or illegal access/undefined behavior.

The above would result in implementation defined behavior (it would not be undefined). The OOB memory access will hit a guard page which raises a signal to the host. This follows the path taken by all SFI tools prior to Wasm, which is that an out of bounds access results in a control transfer to the host who is free to do as they choose --- either terminate the program or choose to recover by some alternate means. The default behavior for a host without a signal handler is that the program terminates. To be clear I am not arguing that this is should be the default (this will be behind a macro flag), but I do not believe there is any security concern here (not withstanding any potential pre-existing security bugs previously mentioned --- happy to share some of these in a private thread if you are curious). If you believe there are specific security concerns, happy to discuss more.

I have some minor cleanups to do with this PR; I'll add the change to also include "WASM_RT_ALLOW_NON_STANDARDS_COMPLIANT" here and resubmit shortly.

@keithw
Copy link
Member

keithw commented Oct 31, 2022

I think there is a reasonable expectation that wasm2c provides a compiler and runtime that are more than just examples.

Compiler, strongly agreed.

Runtime, I'm less-sure. It seems to me that the embedders are so diverse that trying to accommodate everything is a fool's errand. So my tentative view would be that we'd be happier if we:

  • keep the runtime API surface small and make it easy for embedders to implement runtimes,
  • when implementing something, if we think reasonable embedders will have different preferences, then try to delegate that behavior to the runtime, or if not practical, then as a last resort implement it in the wasm2c-generated code but with a hook for the embedder to control it, and
  • keep the "distributed" runtime pretty minimalist, geared towards demonstrating correct execution of wasm2c-generated output on a bog-standard POSIXish or Win32 environment.

It sounds like you have doubts about whether this abstraction boundary is practical, e.g.:

there are many additional changes (coming in future PRs that we can discuss specifically once those PRs are submitted) that are not maintainable in a minor runtime fork (including changes to "wasm2c.declarations.c").

You may be right, but I think it will be easier to make good decisions on this when your team or Firefox wants a concrete change to the wasm2c output. Maybe we can figure out a way to delegate whatever it is to the runtime entirely, or maybe we'll need to add a hook to the wasm2c output, etc. For this particular one, it seems 100% already within the full discretion of the runtime.

The above would result in implementation defined behavior (it would not be undefined). The OOB memory access will hit a guard page which raises a signal to the host.

Well, fair enough. I'm not sure what the exact POSIX (or Linux) line is for a SEGSEGV delivered on stack overflow when the stack has grown to its limit and the process hasn't set up a sigaltstack to run the signal handler on... but I think nothing great.

I'll add the change to also include "WASM_RT_ALLOW_NON_STANDARDS_COMPLIANT" here and resubmit shortly.

:-/ Hey, I'm not the boss around here, but my sense is that WABT's future benefits from having various constituencies in the WebAssembly world all hold fond feelings for this project. Given the strong disagreement in #1432 about providing this kind of hook, I would rather not re-pick that scab if we can avoid it...

@deian
Copy link

deian commented Nov 1, 2022

Frankly this is just gatekeeping for no good technical reason and you're imposing your sense of style under the guise of WABT.

@kripken
Copy link
Member

kripken commented Nov 1, 2022

It's reasonable to disagree on stuff like this, but please let's all make sure to come to the discussion assuming everyone else also wants what is best for the project and the ecosystem.

There are definitely tradeoffs here and I agree with several of the points of both @shravanrn and @keithw . It's not obvious to me what the best path is.

But I do think it's very valuable to have a production-ready runtime in this repo, so that new users can benefit from it. And this particular flag seems like a necessary feature not just for Firefox but for any browser or other large application that wants to use wasm2c in this way. If we do not do that then there is a risk of fragmentation. I've already seen some possible indications of that (that is, more use of out-of-tree forks than is best - not just Firefox). I think the approach in this PR would reduce that risk. So overall I think it is a good direction.

Note that I think this is different enough from that old discussion that I don't think we'd end up with similar arguments here, as this lets the user customize the signal handler but not remove checks. I guess we can see if there are complaints, but I'm optimistic.

@deian
Copy link

deian commented Nov 1, 2022

+1 on all accounts, thanks for chiming in!

@sbc100
Copy link
Member

sbc100 commented Nov 1, 2022

I think this approach seems reasonable for now. And I don't think we need the extra WASM_RT_ALLOW_NON_STANDARDS_COMPLIANT flag, at least not yet.

I think the value of having Firefox changes upstream (unforking) is high enough that it can warrant some amount of this kind of added complexity/cofigurabilty. Obviously we should try to limit the number and scope of these build time options, but as long as we constantly strive to minimize them over time I think this kind of change is ok.

@sbc100
Copy link
Member

sbc100 commented Nov 1, 2022

It seems like this options will always be needed in any kind of complex embedding such as a browser where there is already global signal handler installed? i.e. if we ever get around to using wasm2c in chrome we would want this too I think.

@sbc100 sbc100 closed this Nov 1, 2022
@sbc100 sbc100 reopened this Nov 1, 2022
@keithw
Copy link
Member

keithw commented Nov 1, 2022

Yeah, I mean, we certainly also want this (we use an external handler too); I think it's a question of how much runtime complexity do we include in the "upstream" runtime we include in WABT. And how much further burden are we taking on to accommodate everybody's use case upstream, if we also care about abstracting the runtime from the generated code to enable creative runtimes.

If there's a consensus to do this, my preference would be to do it in a way that doesn't enlarge the runtime API surface (by creating more preprocessor macros in wasm-rt.h and threading them through). This can be a small addition to wasm-rt-impl.h and .c to provide an alternate init function for the embedder to call; e.g. something like https://github.com/WebAssembly/wabt/compare/wasm2c-ext-handler?expand=1

@sbc100
Copy link
Member

sbc100 commented Nov 1, 2022

Yeah, I mean, we certainly also want this (we use an external handler too); I think it's a question of how much runtime complexity do we include in the "upstream" runtime we include in WABT. And how much further burden are we taking on to accommodate everybody's use case upstream, if we also care about abstracting the runtime from the generated code to enable creative runtimes.

I understand the tension, and I agree that some things may be better left to downstream.

This seems like something that will be useful enough to a variety of downstreams, and is hard to do without forking, so I think its probably worth the upstream complexity. We can always re-asses and change our policy/posture as more changes form the firefox branch are upstreamed, and I will probably end up being the one who pushed back against some of them.

If there's a consensus to do this, my preference would be to do it in a way that doesn't enlarge the runtime API surface (by creating more preprocessor macros in wasm-rt.h and threading them through). This can be a small addition to wasm-rt-impl.h and .c to provide an alternate init function for the embedder to call; e.g. something like https://github.com/WebAssembly/wabt/compare/wasm2c-ext-handler?expand=1

I guess another approach would be to weakly define all these wasm-rt symbols such that an implementation could override them with a strong definition. The downside of that approach is that overriding wasm_rt_init would only work as long as signal handler installation was the only thing it did.

I think there may be some some value the ifdef approach though. Signal handlers are notoriously hard to get right, and if I was a security reviewer I think I would rather see that code simply not compiled into the binary rather than trying to figure out if it was ever called or not.

@keithw
Copy link
Member

keithw commented Nov 2, 2022

Fair enough -- I'm skeptical that we'll be able to accommodate all the production needs of a Firefox or Chrome in the "upstream"-distributed runtime, and so it seemed inevitable that these megaprojects will be carrying their own implementation of the runtime API. But maybe I'm wrong on that. In any event, the nice thing about a runtime-only change is that nobody has to use the upstream runtime implementation if they don't want to. I expect the harder issues will come in the future when it's about changing wasm2c itself.

@deian
Copy link

deian commented Nov 2, 2022

I'm more optimistic and would love to see us be welcoming to more projects and use cases even beyond library sandboxing.

@shravanrn
Copy link
Collaborator Author

Sounds good to me. Based on the above discussions, I will go ahead and land this. (I have some additional changes to handle the Windows case, but I will open a separate PR for that)

@shravanrn shravanrn merged commit 64941b6 into WebAssembly:main Nov 2, 2022
@sbc100
Copy link
Member

sbc100 commented Nov 2, 2022

Fair enough -- I'm skeptical that we'll be able to accommodate all the production needs of a Firefox or Chrome in the "upstream"-distributed runtime, and so it seemed inevitable that these megaprojects will be carrying their own implementation of the runtime API. But maybe I'm wrong on that. In any event, the nice thing about a runtime-only change is that nobody has to use the upstream runtime implementation if they don't want to. I expect the harder issues will come in the future when it's about changing wasm2c itself.

I'm also concerned about the increasing the testing matrix with each new build option, but I think we can monitor the situation and add more testing as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants