[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

(visual) diff for large files



At work we've been discussing (below) 'diff' running out of memory. I've
tried to see if 'rdiff' can help (but no idea how to back out the
differences from the "delta" o/p file). I've seen that freeBSD has a
diff utility (eg 2bsd-diff) that does NOT read in all of the files in
one go and thus has less problems with 'memory exhausted' but I cannot
find a Debian equivalent. Therefore,

does anybody know of a freeBSD->Debian ports site?

I did try downloading the 2.11bsd-diff source but it doesn't build on my
system (unsurprisingly!)

Thanks, Michael


-------- Forwarded Message --------
> From: Michael Bane <michael.bane@manchester.ac.uk>
> To: michael <michael.bane@manchester.ac.uk>
> Subject: Re: [MAN-UNIX-GROUP] (visual) diff for large files
> Date: Thu, 01 May 2008 12:50:22 +0100
> 
> On Wed, 2008-04-30 at 17:23 +0100, michael wrote:
> > Like many I now generate large files (for purpose of this discussion
> > even 1.5GB is large), they are ASCII and I wish to 'diff' them
> > visually.
> > 
> > For reasonably sized files (up to a few GB) 'diff' will tell me where
> > the differences are for given lines but sometimes it's a bit tricky to
> > interpret. (And does it take a while (and SO SO SO much memory!) for
> > large files!)
> > 
> > 'sdiff' seems to do the job, producing side-by-side output, againg for
> > reasonably sized files... however it seems to show only the first 50 (if
> > that) cols of output which isn't much help for me (each of my rows are
> > about 150 chars wide)
> > 
> > My favourite tool, to date, has been 'xxdiff' (as per SGI's 'xdiff') but
> > that falls over on files over about 1.5GB. It seems as thou xdiff calls
> > diff and the memory usage is somehow doubled compared to 'diff' alone...
> > 
> > I've quickly tried 'tkdiff' and that falls over even more often.
> > 
> > So, my question is, given the >>GB files that are in common use today,
> > especially on high performance computing machines (with say 16GB RAM and
> > 32GB swap), how do people compare their outputs? I'm presuming most of
> > the utilities fall over since they try and keep everything in memory and
> > once that's full they fall over (with the side effect of bringing that
> > machine painfully
> > S...L.......O...........W..............L..............Y to its knees)
> > 
> > What do people on horace use for large files? Or on your own HPC
> > clusters or even desktops? I've spend way too many hours this month
> > cutting files into segments in order to pass to xxdiff...
> > 
> 
> It appears that freeBSD has solved the problem of diff starting off by
> reading files (and thus exhausting memory)...
> http://www.freebsdsoftware.org/textproc/2bsd-diff.html
> but does anybody know of freeBSD to Debian (or Fedora) ports???
> 
> Thanks, Michael
> 
> 
> > Many thanks, Michael


Reply to: