Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement header "From" rewriting #96

Closed
wants to merge 2 commits into from

Conversation

jjakob
Copy link
Contributor

@jjakob jjakob commented Apr 12, 2021

This adds header From rewriting when enabled in dma.conf with REWRITEFROM.
The original From header is added as "X-Original-From".
The maximum length of the From header is limited to 10 lines as defined
by MSG_HDR_FROM_MAX due to static variable allocation.

The rewrite function fails if the address part of the From header is
not in a single line as it uses a simple strncmp. This could be improved
by refactoring parse_addrs() into a more generic address parsing
function that returns pointers to the start and end of a valid address
in the passed string, so the rewrite function could just replace the
string between those 2 pointers (inserting line continuations before and
after the address if the address happened to be split into multiple
lines). Nevertheless, the current simplified rewriting function will
fail gracefully in this case.

If the rewrite fails to find the From address in the header line,
the header is not rewritten, but X-Original-From is still added.
This can be changed via a later commit to on failure replace it with the
envelope-from address and discard the original From line. This will ensure
rewriting always takes place, but will not preserve the From address comment
sent by the client on failure to rewrite.

jjakob added 2 commits April 12, 2021 18:21
This adds header From rewriting when enabled in dma.conf with REWRITEFROM.
The original From header is added as "X-Original-From".
The maximum length of the From header is limited to 10 lines as defined
by MSG_HDR_FROM_MAX due to static variable allocation.

The rewrite function fails if the address part of the From header is
not in a single line as it uses a simple strncmp. This could be improved
by refactoring parse_addrs() into a more generic address parsing
function that returns pointers to the start and end of a valid address
in the passed string, so the rewrite function could just replace the
string between those 2 pointers (inserting line continuations before and
after the address if the address happened to be split into multiple
lines). Nevertheless, the current simplified rewriting function will
fail gracefully in this case.

If the rewrite fails to find the From address in the header line,
the header is not rewritten, but X-Original-From is still added.
This can be changed via a later commit to on failure replace it with the
envelope-from address and discard the original From line. This will ensure
rewriting always takes place, but will not preserve the From address comment
sent by the client on failure to rewrite.
Instead of passing the original From unchanged if rewriting fails,
replace it with envelope-from anyway, to guarantee the message will get
delivered.
The original From will be included as X-Original-From so it can be
checked why the rewrite failed later.
@jjakob
Copy link
Contributor Author

jjakob commented Apr 14, 2021

Added a commit that addresses this issue:

If the rewrite fails to find the From address in the header line,
the header is not rewritten, but X-Original-From is still added.
This can be changed via a later commit to on failure replace it with the
envelope-from address and discard the original From line. This will ensure
rewriting always takes place, but will not preserve the From address comment
sent by the client on failure to rewrite.

This commit will make it replace the From with envelope-from even when the rewrite fails.

@corecode
Copy link
Owner

Why do we have to store the headers in the queue structure?

@jjakob
Copy link
Contributor Author

jjakob commented Apr 17, 2021

You're right, it's a global variable so it can be accessed from any function.

I was looking at issues #16 and #21, it seems like what's needed is a more generalized rewrite function. rewrite_header_from could easily be rewritten to work on any header. The bigger issue I see is that by using static variable allocation like I used here, the from_lines variable needs to be initialized to the maximum possible from lines size, which uses quite some memory (10k bytes from_lines + 10k bytes rewritten_from_lines = 20k bytes). That would increase linearly as more variables are needed for To, Cc, Bcc. The better way would be to use dynamic allocation, but would require rewriting most of the rewriting code as it is in this PR.

@corecode
Copy link
Owner

corecode commented Apr 17, 2021 via email

@jjakob
Copy link
Contributor Author

jjakob commented Apr 17, 2021

It might be possible to rewrite it on the fly, I'd need to try implementing it. It needs to be stored at least to add the X-Original-From header, which is good to have if there's a bug in the rewrite code, we can see what the original header was.

@corecode
Copy link
Owner

corecode commented Apr 17, 2021 via email

@jjakob
Copy link
Contributor Author

jjakob commented Apr 17, 2021

A header can be split into multiple lines by inserting a blank character at the beginning of the next line, for example:

From: "foobar"
 <foo@bar>

The main reason being for cases where a header would exceed the maximum line length, which is 998 characters according to https://tools.ietf.org/html/rfc2822#section-2.2.3 (this is called "folding").
So we can't simply take one line in, rewrite it, and dump 2 lines out.
The current parse_addrs function correctly parses the address out of a multi-line header so I just wrote the rewriting around it, still using it to extract the address out of the header. Parsing the raw message by myself is not something I'd be comfortable doing as it needs to account for many different things that the standard allows so it can get very complex and error-prone. The parse_addrs function is an example.
So in order to possibly do a on-the-fly rewrite, one could watch parse_state to see when the header line is the one containing the address, then either pass it through (a line without an address) or rewrite it (a line where an address has been found). I'd still save the original header as-is to then later add X-Original-From (which can also be multi-line, as it can be in my implementation - up to 10k chars)
The other possibility being a header where the address itself is split into 2 lines, we need to account for that too. The current rewrite function fails in this case. A possible workaround would be, in the on-the-fly rewrite possibility, to watch parse_state, see when it detects the start of an address, strip it out from that line, insert "\n<rewritten_addr>\n", then on the next line, strip the ending part of the address out of it, and pass the rest unchanged.
Of course, a proper rewrite function would take the incoming header, strip all newlines and folding from it, rewrite it, then re-fold it before the 998-char limit. Without doing that, any rewriting has the possibility of exceeding the 998-char limit if for example, the original address was much shorter than the rewritten one. We basically rely on the incoming header being that much shorter than the maximum, or we'll produce non-compliant messages. Whether modern mail software can handle them is another question, they possibly may.

@jjakob
Copy link
Contributor Author

jjakob commented Apr 17, 2021

To clarify the last paragraph - it would be possible to not exceed the 998-char limit even with rewriting as we do it here, just by always folding before the rewritten address. For example:

From: "someone"
 <foo@bar>
X-Original-From:
 "someone" <foo@bar>

(imagine that the original header is right at 998 chars, we need to fold both the rewritten address and X-Original-From into their own lines, as "X-Original-" is 11 additional characters that would make the line exceed the limit)

@jjakob
Copy link
Contributor Author

jjakob commented Apr 17, 2021

And another example:

From: "this
 is
 a
 totally
 legit
 address" <wow
 @what
 .a
 .mess>

Could be rewritten to:

From: "this
 is
 a
 totally
 legit
 address"
 <newaddress>

by passing the non-address lines through unchanged, stripping <wow from the line with the partial-address (where the < may or may not be there), inserting our address as \n<newaddress>\n, then stripping everything the parse_addrs function still considers to be an address, that means everything else, including a possible > right after the address.

@corecode
Copy link
Owner

I still don't get the complication. When looking at the From: header, I would insert a new line, From: $envelope_from, and then insert a X-Original-From:\n\t, and then insert the original line, plus continuation lines. Why do we need to retain all kinds of stuff of the original from line if we rewrite it?

@jjakob
Copy link
Contributor Author

jjakob commented Apr 17, 2021

Because the point of rewriting is to rewrite just the address part of the headers, not the comments. The comment can be anything the sender wants, and it would be bad to strip it out, as it can have the sender's name, the application's name in it, etc.

@jjakob
Copy link
Contributor Author

jjakob commented Apr 17, 2021

If we replaced From: "My Name" <myname> with From: <server@domain> we would remove that critical piece of information, which is the sender's name. MUA's don't display anything from X-Original-From. It also wouldn't technically be considered "rewriting", but "replacing", so it could only be considered "header replacing".

@jjakob
Copy link
Contributor Author

jjakob commented Apr 17, 2021

#21 would also be impossible to implement without actually storing the whole header and rewriting just the address part. Okay, possible just by blindly discarding everything that isn't the address itself, but that would be very rude, and hardly satisfactory IMO.

@corecode
Copy link
Owner

Doing the whole rewriting is too complicated and will likely introduce bugs. I think we either replace the From: address, or not do it at all.

@jjakob
Copy link
Contributor Author

jjakob commented Apr 17, 2021

I'll come up with a working on-the-fly rewrite branch in any case when I get to it. Parsing the address from the header (with all the weirdness aroud folding etc.) is already in the code in parse_addrs, so there's not much possibility of going wrong there, and I won't be touching it. So this branch can stay as a WIP/PoC (or someone can already use it on his systems as I am).

@jjakob jjakob closed this Jun 7, 2023
@jjakob jjakob deleted the rewriteheaderfrom branch June 7, 2023 12:50
@matthijskooijman
Copy link

If you ever get around to implementing this, consider also applying the same treatment to the "To" header (I'm currently seeing rejected mails because mails are generated with "To: root" which dma passes without modification and which my exim rejects.

@jjakob
Copy link
Contributor Author

jjakob commented Mar 22, 2024

If you ever get around to implementing this, consider also applying the same treatment to the "To" header (I'm currently seeing rejected mails because mails are generated with "To: root" which dma passes without modification and which my exim rejects.

If it's for mails from cron, you can set the MAILTO variable in the crontab (man 5 crontab), maybe you need to do that in all crontabs (/etc/crontab and user crontabs). I have an alias added for root in /etc/aliases and I get all mails, even if there's a To: root in the header, the envelope To is correct. Maybe some MTA and AS/AV are more strict.
I don't think I'll be working on it any time soon, because the branch I'm using that is a fork with all the patches for rewriting From applied works for all my needs, I rebased it on latest master some time ago.

@matthijskooijman
Copy link

If it's for mails from cron,

It was for cron, sudo, netdata and maybe some others.

I have an alias added for root in /etc/aliases and I get all mails, even if there's a To: root in the header, the envelope To is correct.

Yeah, I have the same alias and I had picked DMA because it would rewrite the envelope to be correct.

Maybe some MTA and AS/AV are more strict.

I'm using Exim with verify = header_syntax, which rejects unqualified addresses in the To/From/etc headers (I tried to config an exception for this on server-side exim, but could not make that work the way I wanted to).

I don't think I'll be working on it any time soon, because the branch I'm using that is a fork with all the patches for rewriting From applied works for all my needs, I rebased it on latest master some time ago.

No worries, for now I switched to nullmailer which does support qualifying plain usernames in the headers (it only supports adding the mailname, not applying aliases on them for full rewriting, but I ended up doing the rewriting on the exim side instead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants