paragraph conversion (was Re: Which Diff tool could I use for visually comparing two text files where Word Wrap is possible?)
On Wed, 5 Apr 2023 davidson wrote:
On Wed, 5 Apr 2023 Susmita/Rajib wrote:
On 04/04/2023, davidson <davidson@freevolt.org> wrote:
[trim]
Attached (unless the listserv software has nuked it) is a sed script
"flow" (with verbose comments) which might serve your needs. (Since
you have not exhibited here any of the text you are working with, I
can only play the role of speculative optimist.)
For trial purposes make a new, empty directory. Here we'll pretend
that directory is called "testing". Put "flow" in that directory. Then
do
$ cd testing # Make testing your current directory
$ chmod u+x flow # Make flow executable
$ PATH="$PATH:$PWD" # Now "flow" means something, for this session
$ icdiff-flow () { icdiff <( flow <"$1" ) <( flow <"$2" ) ; }
and then you should be able to test it out in that same shell session:
$ flow document # see if flow works as intended with a single document
$ icdiff-flow document1 document2 # see if it works well with icdiff
Attached is a more adequate version of "flow", for converting plain
text paragraphs, in flush or plain style*, to single lines. Unlike the
previous version, version 2.0 does not fumble on the last line of the
document and fail to print material before quitting.
* A "plain" paragraph begins with its first line indented, whereas a
"flush" paragraph is distinguished from its neighbors by blank
newlines.
--
Sometimes it pays to have squirrels in your head running around making
you question everything. -- Clive Robinson
#!/usr/bin/env -S sed -f
# Flow text. (Remove intra-paragraph newlines.)
# Version 2.0
# First line of document initialises storage.
1 {
h # 1. A copy goes to storage.
d # 2. The original (still on the workbench) is discarded and a new cycle begins.
}
# When a line starts with non-whitespace character,
# We assume it belongs to a paragraph accumulating in storage.
/^[^[:blank:]]/ {
H # 1. A copy goes to storage.
$ { # In case this line terminates the document...
g # ...Get everything out of storage.
s/\(.\)\n\(.\)/\1 \2/g # ...Replace every interstitial newline with a space.
q # ...Print and quit NOW.
}
d # 2. Toss out the original (the one still on the workbench) and begin new cycle
}
# When a line does not start with non-whitespace character (ie, it is empty or begins with whitespace),
# We assume it begins a new paragraph.
# We further assume that whatever is in storage we may now format (and print) as if it were a paragraph.
/^\([[:blank:]]\|$\)/ {
x # 1. Swap: A copy goes to storage, and what was in storage lands on the workbench.
s/\(.\)\n\(.\)/\1 \2/g # 2. Format: Replace every interstitial newline with a space. (Then print it.)
$ { # In case this line terminates the document...
p # ...The stuff we just formatted gets printed,
g # ...and then retrieve the line we just stored, and print it too before we quit.
}
}
Reply to: