Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicts at the end of files that don't end in a newline are not materialized correctly #3968

Open
ilyagr opened this issue Jun 26, 2024 · 16 comments · May be fixed by #4088, #5181 or #5186
Open

Conflicts at the end of files that don't end in a newline are not materialized correctly #3968

ilyagr opened this issue Jun 26, 2024 · 16 comments · May be fixed by #4088, #5181 or #5186
Assignees
Labels
🐛bug Something isn't working

Comments

@ilyagr
Copy link
Contributor

ilyagr commented Jun 26, 2024

jj new
echo -n "aa" > qq
jj new
echo -n "bb" > qq
jj new @-
echo -n "cc" > qq
jj new all:@-+
cat qq

Actual Result

<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-aa+cc+++++++ Contents of side #2
bb>>>>>>> Conflict 1 of 1 ends

If the file is subsequently edited, jj cannot parse the conflict correctly.

Expected Behavior

I'd materialize the conflict as though the files ended in newlines, if we decide this is acceptable. We could perhaps notify the user about this in the labels:

<<<<<<< Conflict 1 of 1 (extra newlines inserted)
%%%%%%% Changes from base to side #1
-aa
+cc
+++++++ Contents of side #2
bb
>>>>>>> Conflict 1 of 1 ends

If this is not acceptable, I'm not sure what is best to do.

@ilyagr
Copy link
Contributor Author

ilyagr commented Jun 26, 2024

I think pretending that the file had newlines is my preference. This will only matter if the user edits the conflicts manually, in which case the file was probably meant to end in a newline anyway.

There is one alternative on my mind, but I don't like it:

The fish shell prints ⏎\n when a command's output does not end in a newline:

$ echo -n aa
aa⏎
$ # Next command here

We could in principle use a trick like that (either with ⏎\n or with \nJJ: NO NEWLINE\n), but I'm not sure it's worth it. Also, we then lose the ability to represent a file that ends with ⏎\n. It's also a curious question what to do if a conflict not at the end of file includes ⏎\n.

@martinvonz
Copy link
Member

Mercurial adds "\ No newline at end of file" in diffs. We could copy that. I'm fine with either that or adding just a newline as you suggested.

@emilazy
Copy link
Contributor

emilazy commented Jun 26, 2024

I like the idea of using the existing label syntax; that avoids introducing any more ambiguity with change contents that isn’t already present. The two sides could disagree on the presence of the newline, right? If putting it at the end of the label line attached to the relevant text is too subtle, maybe another label line could be added?

@arxanas
Copy link
Contributor

arxanas commented Jun 26, 2024

IIUC the main idea in the thread so far is to only render the lack of a newline, but not necessarily be able to operate on it in a principled fashion?

Shouldn't the lack of a newline be encoded in the conflict marker syntax itself? It seems syntactically important. Wouldn't you want to preserve the invariant that you can reconstruct any of the file sides directly from the conflicted contents, rather than having to consult an out-of-band store?

@PhilipMetzger PhilipMetzger added the 🐛bug Something isn't working label Jun 26, 2024
@ilyagr
Copy link
Contributor Author

ilyagr commented Jun 26, 2024

UPDATE: I decided to promote this to a separate issue, #3975.



Wouldn't you want to preserve the invariant that you can reconstruct any of the file sides directly from the conflicted contents, rather than having to consult an out-of-band store?

This issue has been nerd-sniping me for a while; in addition to trailing newlines, it's also a matter of encoding conflict markers inside files


TODO: find link. I remember discussing it with Martin recently in some PR, haven't found it yet. TLDR: jj currently can't handle conflicts in out own tutorial.md because of

https://github.com/martinvonz/jj/blob/638649b435795bbb991d840e9ad1750648d9b74a/docs/tutorial.md?plain=1#L284-L291


In the shortest term, I'll probably ignore both issues and just add newlines to the end of files.

However, I think I finally figured out a syntax that will work, will not require a complicated parser, and even be somewhat readable. Before computing conflicts (and only if a file is conflicted), we'd encode file contents as follows:

  1. A mildly pathological file
JJ Verbatim Line:<<<<<<<<<<<
blahblah
JJ EOF: No newline at the end of file
  1. A moderately pathological file
JJ Verbatim Line:<<<<<<<<<<<
JJ Verbatim Line:JJ Verbatim Line:<<<<<<<<<<<
blahblah
JJ Verbatim Line:JJ EOF: No newline at the end of file
  1. Another fun file
blah
JJ Verbatim Line:JJ EOF: No newline at the end of file
JJ EOF: No newline at the end of file

Fun, huh?


This relies on conflict markers having to start at the beginning of the line. If we ever have a notion of conflicts that happen in the middle of lines, we'd need a different syntax, of course. I think that would be nice, but also would probably need a whole different UI to be usable; I'm not sure it can be very usable in the terminal or as a text file.

@ilyagr ilyagr self-assigned this Jun 26, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jun 26, 2024
This happens for non-empty files only; the materialization code can
handle files and hunks with 0 or more lines, but each line should
terminate in a newline.

This is an imperfect fix to jj-vcs#3968, a better fix is TODO; see jj-vcs#3975.
ilyagr added a commit to ilyagr/jj that referenced this issue Jun 26, 2024
This happens for non-empty files only; the materialization code can
handle files and hunks with 0 or more lines, but each line should
terminate in a newline.

This is an imperfect fix to jj-vcs#3968, a better fix is TODO; see jj-vcs#3975.
ilyagr added a commit to ilyagr/jj that referenced this issue Jun 27, 2024
This happens for non-empty files only; the materialization code can
handle files and hunks with 0 or more lines, but each line should
terminate in a newline.

This is an imperfect fix to jj-vcs#3968, a better fix is TODO; see jj-vcs#3975.

This also implements FromIterator for Merge. I could split it into a
separate commit, but then it'd have no uses.
ilyagr added a commit to ilyagr/jj that referenced this issue Jun 27, 2024
This happens for non-empty files only; the materialization code can
handle files and hunks with 0 or more lines, but each line should
terminate in a newline.

This is an imperfect fix to jj-vcs#3968, a better fix is TODO; see jj-vcs#3975.

This also implements FromIterator for Merge. I could split it into a
separate commit, but then it'd have no uses.
ilyagr added a commit to ilyagr/jj that referenced this issue Jun 27, 2024
This happens for non-empty files only; the materialization code can
handle files and hunks with 0 or more lines, but each line should
terminate in a newline.

This is an imperfect fix to jj-vcs#3968, a better fix is TODO; see jj-vcs#3975.

This also implements FromIterator for Merge. I could split it into a
separate commit, but then it'd have no uses.
ilyagr added a commit to ilyagr/jj that referenced this issue Jun 27, 2024
This happens for non-empty files only; the materialization code can
handle files and hunks with 0 or more lines, but each line should
terminate in a newline.

This is an imperfect fix to jj-vcs#3968, a better fix is TODO; see jj-vcs#3975.

This also implements FromIterator for Merge. I could split it into a
separate commit, but then it'd have no uses.
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Jul 15, 2024
ilyagr added a commit to ilyagr/jj that referenced this issue Aug 7, 2024
@ilyagr
Copy link
Contributor Author

ilyagr commented Oct 6, 2024

Somebody asked a few days ago whether this will handle the similar problem with diffs, but then seemed to delete their comment. Shamelessly stealing their example from my email, it was (after minor edits):

$ jj diff
Modified regular file .gitignore:
   1    1: build
        2: ignoreAdded regular file .vscode/settings.json:
        1: {
        2:     "spellright.language": [
        3:         "en"
        4:     ],
        5: }

I don't actually have an answer (it may or may not be part of the same solution, and it should look similar), but we certainly shouldn't forget about that.

@bwinton
Copy link

bwinton commented Oct 7, 2024

Hi, that was me, and I deleted it because I noticed I was a version behind, so I upgraded and 0.21.0 handles it just fine, so it seems like there's no extra work to be done here!

@ilyagr
Copy link
Contributor Author

ilyagr commented Oct 8, 2024

I think @yuja may have fixed it without me noticing. Thank you, Yuya!

@ilyagr
Copy link
Contributor Author

ilyagr commented Oct 8, 2024

It seems that our colored_words diff is currently a bit lossy in this case. This wouldn't often come up, but it might be confusing occasionally.

We could add a message (e.g. "Added a newline at the end of file" in green or "Removed a newline at the end of file" in red), perhaps, to clarify when there is a diff in whether there is a newline. I don't know whether I'd get to it anytime soon, though.

I also think the jj diff --git output is not the clearest, but it's identical to git diff, so we should probably keep it.

image

@ilyagr
Copy link
Contributor Author

ilyagr commented Oct 8, 2024

Another alternative for the colored words diff would be to do what the fish shell does, and represent a lack of a newline at EOL by the symbol:

image

I'm not sure how clear this is, but at least there is precedent.

(Also, this entire problem is less important than the original problem of malformed diff)

@yuja
Copy link
Contributor

yuja commented Oct 8, 2024

It seems that our colored_words diff is currently a bit lossy in this case.

Yes, maybe we can add some marker or header/footer message. Its output doesn't have to be parsable, so we can freely choose the syntax.

ilyagr added a commit to ilyagr/jj that referenced this issue Oct 13, 2024
@scott2000
Copy link
Contributor

I have another possible solution that I just thought of, but I'm not sure if it's too complicated. Basically, we introduce a new conflict marker \\\\\\\ that indicates that the immediate previous conflict's trailing newlines should be deleted. It would always come immediately after the end of a conflict, so that it's clear to the user that it's a special marker that they will need to delete along with the conflict markers. If a non-conflicted line is missing a newline at the end of the file, we would just leave it as-is, since it's not part of a conflict.

Here's an example of what a conflict might look like, where the base and side 1 don't have a newline at the end of the file, but side 2 adds a newline:

<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-base
+left
+++++++ Contents of side #2
right

>>>>>>> Conflict 1 of 1 ends
\\\\\\\ No newline after conflict

Or if the base has a newline at the end of the file, but both sides remove it:

<<<<<<< Conflict 1 of 1
+++++++ Contents of side #1
left
%%%%%%% Changes from base to side #2
-base
-
+right
>>>>>>> Conflict 1 of 1 ends
\\\\\\\ No newline after conflict

So basically if any non-empty side/base of the conflict is missing a trailing newline, then we add a newline to every side/base in the conflict before we materialize it, and we add a \\\\\\\ marker after we materialize it to indicate that these added newlines should be ignored when parsing.

I think this has some nice benefits:

  • It's consistent with the other markers we add to the file, so the user will recognize it as something they need to remove while resolving the conflict.
  • Just like the other conflict markers, it can be expanded to have a longer length if there's already a line in the file which starts with \\\\\\\, so we don't have to worry about any special escaping logic.
  • I think it's a bit easier to read, since if I see a diff where a blank line is added/removed, it has a pretty clear meaning. Whereas if I see a diff where \JJ EOF: No newline at end of file is added/removed, it's a bit confusing since removing \JJ EOF: No newline at end of file would mean adding a newline.

@yuja
Copy link
Contributor

yuja commented Nov 25, 2024

I don't have a strong opinion, but I think abusing comment part might be also good, something like +++++++ Contents of side #1 (noeol) or %%%%%%% Changes from base to side #2 (+noeol).

Maybe we can also restore the EOL state from the original conflict contents (without encoding it in the conflict marker.) It's unlikely that user wants to modify EOL in conflicts without resolving it.

@arxanas
Copy link
Contributor

arxanas commented Nov 25, 2024

@scott2000: I think it's a bit easier to read, since if I see a diff where a blank line is added/removed, it has a pretty clear meaning. Whereas if I see a diff where \JJ EOF: No newline at end of file is added/removed, it's a bit confusing since removing \JJ EOF: No newline at end of file would mean adding a newline.

How is that different in the \\\\\\\ No newline after conflict scheme? Don't you still add/remove a line that corresponds to the inverse of the content that's present?

@yuja: I don't have a strong opinion, but I think abusing comment part might be also good, something like +++++++ Contents of side #1 (noeol) or %%%%%%% Changes from base to side #2 (+noeol).

I think it might be nice to modify the conflict marker format in the same line, because we can add any number of metadata parts without needing a corresponding number of extra metadata lines. Some other attributes we could include in this way:

  • Line ending format (LF vs CRLF)
  • Diff format (line, word, whitespace handling, amount of context, etc.)
    • Perhaps we could somehow produce/consume native Git diff formats as well, if we marked things explicitly in this way?
  • Original commit/tree ID

Of course, if we choose to store structured metadata in this way, we should establish a real specification for the format.

@scott2000
Copy link
Contributor

I think it's a bit easier to read, since if I see a diff where a blank line is added/removed, it has a pretty clear meaning. Whereas if I see a diff where \JJ EOF: No newline at end of file is added/removed, it's a bit confusing since removing \JJ EOF: No newline at end of file would mean adding a newline.

How is that different in the \\\\\\\ No newline after conflict scheme? Don't you still add/remove a line that corresponds to the inverse of the content that's present?

I meant for within the conflicts markers themselves under the %%%%%%% diff section. Like if a side added a trailing newline, I think that this:

%%%%%%% Changes from base to side #1
-base
+left
+

Is clearer than this:

%%%%%%% Changes from base to side #1
-base
-\JJ EOF: No newline at end of file
+left

I think @yuja was suggesting also it could be displayed like this:

%%%%%%% Changes from base to side #1 (-noeol)
-base
+left

Which I also think is a bit clearer, but it suffers from the same double-negative (to me, -noeol means "side 1 removed not-EOL", which actually means "side 1 added EOL"). That may be unavoidable though, since the default is to have an EOL, so it does make sense to only mark the cases where the default isn't happening. Maybe it would be better for me to think of it as "the removed term has the property 'noeol'", which could be more useful in general if we support other properties in the future.

@scott2000
Copy link
Contributor

I've been thinking about this a bit more, and now I'm thinking @yuja's idea of just restoring the "noeol" state from the existing conflict data is the best approach, rather than encoding it in the conflict markers and parsing it in some way. Here's my thoughts:

I think parsing part of the "note" section of the conflict markers could be surprising to users (since usually the "note" part of the conflict markers doesn't matter), and it would reduce compatibility with existing tools that generate Git-style conflict markers (since these tools wouldn't emit the "noeol" note in the way the we expect, so they would end up discarding the "noeol" state by accident). I don't think accidentally losing this information would be a big issue, but it definitely seems nicer if we can avoid it.

If we were to parse the "noeol" state from the conflict markers, then we would also have to decide how to handle the case where the "noeol" markers appear in the middle of a file rather than at the end, which seems more complicated than it needs to be.

Also, we already rely on the current conflict state for other things (determining the number of sides to expect and distinguishing deleted sides from empty sides), so it seems reasonable to also resolve this the same way (it seems especially similar to the deleted file case to me). If we want to come up with a way to encode all of that information in the conflict markers directly, then that seems like a separate issue to me.

If we restore the "noeol" state from the tree, we can still add a note to the conflict markers indicating that the conflict involved a "noeol" side, just to make it easier to understand for the user. I've implemented this approach in a draft PR here: #5186.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment