[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Editing a text file with sed



Joe Hart wrote:
On Wednesday 29 August 2007 17:01:20 Adam W wrote:
Single quotes go around the whole sed script unless you are using a
separate sed script file.
try      sed 's/\n//' 1.txt > 2.txt

 - Adam

On 8/29/07, Florian Kulzer <florian.kulzer+debian@icfo.es> wrote:
On Wed, Aug 29, 2007 at 15:17:46 +0200, Joe Hart wrote:
I am having trouble using sed to edit text files, heres a good example
of what I am looking for:

<begin 1.txt>
This is a test
file, what I am
trying to do is get the lines to join.

It isn't a complicated thing,
but I also want to keep the paragraphs
separate.
</end 1.txt>
[...]

But ideally I'd like to just have a script to do it, but cannot figure
out how to go about it, as sed doesn't seem to be working.
Why not use Perl?

$ perl -p0e '$_=~s/(.)\n(.)/$1 $2/g' < 1.txt
This is a test file, what I am trying to do is get the lines to join.

It isn't a complicated thing, but I also want to keep the paragraphs
separate.

$ perl -p0e '$_=~s/(.)\n(.|\n)/$1 $2/g;$_=~s/ \n/\n/g' < 1.txt
This is a test file, what I am trying to do is get the lines to join.
It isn't a complicated thing, but I also want to keep the paragraphs
separate.

--
Regards,            | http://users.icfo.es/Florian.Kulzer
          Florian   |

Both very good solutions for the example I gave, although the first perl snippet seams to skip a line when I try it. However, on the real files, I am afraid it isn't working. What I am actually trying to do is reformat books that I downloaded from the gutenberg project. Many of them are coded with loads of hard returns because the OCR software was poorly written, or they were typed by people used to old fashioned typewriters that require people to hit the return/enter key at the end of every line. A practice frowned upon in the modern world.


I just reformat them:
fmt -s -w 120 The\ Day\ of\ the\ Jackal\ by\ Frederick\ Forsyth.txt > The_Day_of_the_Jackal.txt

Hugo




If I have to, I'll write a program that reads character by character and looks for the line break, but like I said before there should be tools that can already do it. It seems that a regex \n or even a $ would be enough, but alas that doesn't seem to give me decent output.

At least I have been pointed in the right direction, and I have learned some regex in the process. Always good to learn new things.

Joe





Reply to: