Re: diff files

To: debian-user@lists.debian.org
Subject: Re: diff files
From: Greg Wooledge <greg@wooledge.org>
Date: Thu, 2 Oct 2025 08:03:24 -0400
Message-id: <[🔎] 20251002120324.GJ24603@wooledge.org>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 3e494e9f97c2a26207029378c87446f6@gmail.com>
References: <3e494e9f97c2a26207029378c87446f6.ref@gmail.com> <[🔎] 3e494e9f97c2a26207029378c87446f6@gmail.com>
On Thu, Oct 02, 2025 at 12:13:11 +0100, mick.crane wrote:
> Does anybody know how the system/syntax for diff files for Bookworm can be
> explained?

If you mean how diff(1) and patch(1) are used, it's not specific to
Debian or Bookworm.

diff has three different output modes: legacy, context, and unified.
The easiest way to explain the difference is simply to show you.

Let's say we have the following file:

hobbit:~$ cat foo
#!/bin/sh
# this is a shell script
echo StuFf | tr [:upper:] [:lower] | cat
echo Did it work?
read answer
if $answer = yes
then
    echo hooray!!
fi

Next, let's say we do a code review and we find a few mistakes in this
shell script.  So we make a backup copy, and then we edit it.

hobbit:~$ cp foo foo.bak
hobbit:~$ vi foo
hobbit:~$ cat foo
#!/bin/sh
# this is a shell script
echo StuFf | tr '[:upper:]' '[:lower]'
printf 'Did it work? '
read -r answer
if [ "$answer" = yes ]
then
    echo 'hooray!!'
fi

Now, we can use diff(1) to show us what changes were made in the edit.
Here's the legacy mode output:

hobbit:~$ diff foo.bak foo
3,6c3,6
< echo StuFf | tr [:upper:] [:lower] | cat
< echo Did it work?
< read answer
< if $answer = yes
---
> echo StuFf | tr '[:upper:]' '[:lower:]'
> printf 'Did it work? '
> read -r answer
> if [ "$answer" = yes ]
8c8
<     echo hooray!!
---
>     echo 'hooray!!'

This format shows us which lines were changed, identified by line numbers
(lines 3-6, and line 8), and it shows us the original line (preceded
with "< ") and the changed line (preceded with "> ").  If lines were
added or deleted, we'd see those as well.

One problem with this format is that it doesn't show you any of the
surrounding lines.  If a line like "echo hooray!!" appears multiple times
in the original file, then we might not be sure the instance on line 8
is the one we're supposed to change.

This becomes a real issue when you're trying to apply a patch to a file
that has had *other* changes made to it.  Maybe a blank line was inserted,
so the echo command we're changing is actually on line 9.

The second diff format is "context", and it shows you a few surrounding
lines.

hobbit:~$ diff -c foo.bak foo
*** foo.bak	Thu Oct  2 07:28:11 2025
--- foo	Thu Oct  2 07:33:06 2025
***************
*** 1,9 ****
  #!/bin/sh
  # this is a shell script
! echo StuFf | tr [:upper:] [:lower] | cat
! echo Did it work?
! read answer
! if $answer = yes
  then
!     echo hooray!!
  fi
--- 1,9 ----
  #!/bin/sh
  # this is a shell script
! echo StuFf | tr '[:upper:]' '[:lower:]'
! printf 'Did it work? '
! read -r answer
! if [ "$answer" = yes ]
  then
!     echo 'hooray!!'
  fi

Here, the lines are still identified by number, but in this case, we're
looking at lines 1 through 9, because we want the surrounding context
for the changed lines.  The top part shows the original lines, with
unchanged lines preceded by "  " (two spaces) and changed lines
preceded by "! ".  The bottom part shows the new lines, with the same
prefixes.

The context format is a great improvement in many ways, but it's also
a bit lengthy.  There's a fair amount of redundancy there.  So, the
third format, "unified", was created to address that.  It's very much
like the "context" form, just condensed:

hobbit:~$ diff -u foo.bak foo
--- foo.bak	2025-10-02 07:28:11.278317645 -0400
+++ foo	2025-10-02 07:33:06.126231502 -0400
@@ -1,9 +1,9 @@
 #!/bin/sh
 # this is a shell script
-echo StuFf | tr [:upper:] [:lower] | cat
-echo Did it work?
-read answer
-if $answer = yes
+echo StuFf | tr '[:upper:]' '[:lower:]'
+printf 'Did it work? '
+read -r answer
+if [ "$answer" = yes ]
 then
-    echo hooray!!
+    echo 'hooray!!'
 fi

Here, we're still looking at lines 1-9, but instead of repeating the
block twice, it's only shown once, with unchanged lines preceded by
a space, old lines preceded by "-", and new lines preceded by "+".

So, those are the three basic diff formats.  Now, how does patch work?

Let's save the unified diff into a file, and copy the original file
and the diff to a new working directory:

hobbit:~$ diff -u foo.bak foo > foo.diff
hobbit:~$ mkdir patchdemo
hobbit:~$ cp foo.bak foo.diff patchdemo
hobbit:~$ cd patchdemo
hobbit:~/patchdemo$ 

Now, we can "apply" the diff to the original file, to generate the
modified file:

hobbit:~/patchdemo$ patch -p0 < foo.diff
patching file foo.bak
hobbit:~/patchdemo$ ls -l
total 8
-rwxr-xr-x 1 greg greg 164 Oct  2 07:40 foo.bak*
-rw-rw-r-- 1 greg greg 392 Oct  2 07:39 foo.diff
hobbit:~/patchdemo$ cat foo.bak
#!/bin/sh
# this is a shell script
echo StuFf | tr '[:upper:]' '[:lower:]'
printf 'Did it work? '
read -r answer
if [ "$answer" = yes ]
then
    echo 'hooray!!'
fi

Remember, our unified diff began with two lines that showed the original
and modified file name.  The patch command looks at that to decide which
file(s) to edit.  Since we have a file named "foo.bak" in the working
directory, and that was the original file name in the diff, that's what
was edited.

The -p0 argument tells patch to strip zero path components from the
file names in order to find the file to edit.  You may need to read
the patch and look at the pathnames and decide whether to use -p0 or -p1.
It's really, really unlikely you'll ever need a different number of
stripped path components besides 0 or 1.

After finding the file, patch simply follows all of the instructions in
the diff file, replacing the old lines with the new lines.

patch is also a little bit clever, and if there are some slight
differences in the input file, it can adapt to them.  Unfortunately,
my example doesn't really give me the chance to demonstrate that.
Let's make another example:

hobbit:~$ cat grocery
My grocery list:
(Don't forget to bring the coupons!)

apples
bananas
cherries
deodorant
eggs
Fritos
grapes
hobbit:~$ cat grocery.diff
--- grocery	2025-10-02 07:47:04.941361665 -0400
+++ grocery.new	2025-10-02 07:47:26.957653717 -0400
@@ -2,7 +2,7 @@
 
 apples
 bananas
-cherries
+cherry pies
 deodorant
 eggs
 Fritos

What I did here was create a grocery list file, then make a copy of it,
then edit the copy, then create the diff, and then edit the original
file.

So, the diff was generated against an older version of the grocery
list, and the line numbers don't exactly match up.  But because the
part we're looking at is still intact in the newer version of the
grocery list, we can still apply the diff:

hobbit:~$ patch -p0 < grocery.diff
patching file grocery
Hunk #1 succeeded at 3 (offset 1 line).

patch is able to find the line it needs to modify, even though it was
on a different line number.  The surrounding context gives us the
confidence that the correct line is being changed.

In software development, this allows a developer to distribute an
original source file, and then a set of patches which can be applied
to the source file to fix bugs or add new features.  The end user
who receives the patches may have made some of their own changes to
the source file.  But as long as the user's changes don't overlap with
the part of the file that's being patched, the patch can still be
applied successfully.

If the patch can't be applied, because the surrounding context has been
altered, or because the changed line itself has already been altered,
then patch will print an error message and will store the failed
part of the patch in a new file (usually with a ".rej" suffix).  The
end user will then have to read the rejected part of the patch and figure
out how to merge it with their own changes.
Reply to:
Follow-Ups:
- Re: diff files
  - From: rhkramer@gmail.com
References:
- diff files
  - From: "mick.crane" <mick.crane@gmail.com>
Prev by Date: Re: diff files
Next by Date: Re: diff files
Previous by thread: Re: diff files
Next by thread: Re: diff files
Index(es):
- Date
- Thread