Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid url escaping - replaced url parsing from url.parse to new… #178

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 7 additions & 8 deletions src/rules/requests/request-handlers.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import _ = require('lodash');
import url = require('url');
import type dns = require('dns');
import net = require('net');
import tls = require('tls');
Expand Down Expand Up @@ -405,7 +404,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition {

// Capture raw request data:
let { method, url: reqUrl, rawHeaders } = clientReq as OngoingRequest;
let { protocol, hostname, port, path } = url.parse(reqUrl);
let { protocol, hostname, port, pathname, search } = new URL(reqUrl);

// Check if this request is a request loop:
if (isSocketLoop(this.outgoingSockets, (<any> clientReq).socket)) {
Expand All @@ -430,7 +429,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition {
hostname,
clientReq.remoteIpAddress,
getDnsLookupFunction(this.lookupOptions)
);
) || "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Currently it looks like we previously assumed hostname really could be null here, and accepted that (and getClientRelativeHostname preserves that). That's not great - I think in reality it is always set, since req.url is always an absolute URL in our case (we preprocess to ensure this in MockttpServer).

However, setting it to "" isn't very good either and might do all sorts of weird things.

In reality, with new URL() I think hostname is now always set, and that's recognized in the types, so really we should change getClientRelativeHostname here - either to remove the null support entirely so it's always set, or to improve the types so we know that string hostname in => string hostname out.


if (this.forwarding) {
const { targetHost, updateHostHeader } = this.forwarding;
Expand All @@ -439,7 +438,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition {
[hostname, port] = targetHost.split(':');
} else {
// We're forwarding to a fully specified URL; override the host etc, but never the path.
({ protocol, hostname, port } = url.parse(targetHost));
({ protocol, hostname, port } = new URL(targetHost));
}

const hostHeaderName = isH2Downstream ? ':authority' : 'host';
Expand All @@ -465,7 +464,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition {
hostHeader[1] = updateHostHeader;
} // Otherwise: falsey means don't touch it.

reqUrl = new URL(`${protocol}//${hostname}${(port ? `:${port}` : '')}${path}`).toString();
reqUrl = new URL(`${protocol}//${hostname}${(port ? `:${port}` : '')}${pathname}${search}`).toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there are other differences here where the url.parse output is preferable... One notable example is a simple ? at the end of the path:

console.log(url.parse('https://example.com/?').search);
// '?'

console.log(new URL('https://example.com/?').search)
// ''

These might be semantically equivalent in theory, but they're not actually the same in practice (a server will literally receive a different string, and can legitimately do different things based on that if it so chooses). With this change, when proxying a request to that URL through Node, the client will send a slightly different request to the server that it expected. For more context, there's a WHATWG discussion on this that you might find interesting at whatwg/url#779 (I've just added a comment there with my perspective).

This is not great, but of course you do make a good argument about the other differences where url.parse is wrong too! I think maybe we might want to make our own URL parser that wraps the two to use WHATWG parsing as standard but with a few extra checks for details like this that get lost otherwise. What do you think?

}

// Override the request details, if a transform or callback is specified:
Expand Down Expand Up @@ -645,7 +644,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition {
// Reparse the new URL, if necessary
if (modifiedReq?.url) {
if (!isAbsoluteUrl(modifiedReq?.url)) throw new Error("Overridden request URLs must be absolute");
({ protocol, hostname, port, path } = url.parse(reqUrl));
({ protocol, hostname, port, pathname, search } = new URL(reqUrl));
}

rawHeaders = objectHeadersToRaw(headers);
Expand Down Expand Up @@ -748,7 +747,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition {
hostname,
port,
family,
path,
path: pathname + search,
headers: shouldTryH2Upstream
? rawHeadersToObjectPreservingCase(rawHeaders)
: flattenPairedRawHeaders(rawHeaders) as any,
Expand Down Expand Up @@ -1110,7 +1109,7 @@ export class PassThroughHandler extends PassThroughHandlerDefinition {
protocol: protocol!.replace(/:$/, ''),
hostname,
port,
path,
path: pathname + search,
rawHeaders
});

Expand Down