[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ps2pdf: why differences in different instances using unchanged PS source file



On Tue 15 Dec 2020 at 11:06:33 (+1300), David Warring wrote:
> On Mon, Dec 14, 2020 at 10:14 AM Tom Browder <tom.browder@gmail.com> wrote:
> > On Sun, Dec 13, 2020 at 11:03 The Wanderer <wanderer@fastmail.fm> wrote:
> > > On 2020-12-13 at 11:40, Tom Browder wrote:
> > > > personalized calendar for my wife. This year I have been cleaning up
> > > > my old Perl generating code (in preparation for converting it to Raku
> > > > [https://raku.org]) and noticed I am getting a different pdf output
> > > > for each run, even when the PS output source file is unchanged!
> > ...
> > > What observation leads you to notice that the files are different?
> >
> > When I have both files under git revision control and committed, a
> > rerun shows no change in the PS but a change in the PDF.
> >
> > I read the rest of this post and your additional post. Thank you for
> > your forensic information. As long as I know what has changed, thanks
> > to you, I am content for the moment.
> > I have a Raku online friend (in CC above) who is an expert on PDF and
> > has built voluminous tools with his PDF modules in Raku. I think he
> > will either know about the problem or possibly know how to fix it
> > since he deals with the binary output and modifying it after the fact.
> >

> Yes quite right. As The Wanderer points out the CreationDate and ModDate
> differ, as do the uuid in the Catalog Metadatas and the trailer ID.
> 
> I've found that deleting the optional dates from the Info dictionary, the
> Metadata entry from the Catalog and resetting the ID does seem to be enough
> to make output PDFs identical. In the simple case at least.
> 
> This gist https://gist.github.com/dwarring/1e4e056d84d6fe125262bba1da1f58fb
> does
> this using the Raku PDF module. Usage is:
> 
> p2s2pdf-strip.raku in.pdf [out.pdf]     # post-process ps2pdf output
> 
> I've used the lower level PDF::Reader interface, which doesn't often get
> used directly. But in this case, we also need to bypass Raku PDF also
> attempting to update the same fields. The PDF also needs to be rewritten,
> rather than being incrementally updated.

Another way of tackling this is to use faketime when creating the PDF
file, which deals with the times and uuids. The only difference left is
the pair of ID strings (each 32 hex chars) at the end, which I assume
are random (when generated by ps2pdf).

One way of dealing with this is to run   cmp -bl a.pdf b.pdf   and
subtract the integer at the beginning of the first line of output from
that in the last line. The difference should not exceed 65 (2×32+2-1).

If writing a script to perform this check, one might parameterise this
number, as I did notice that convert (from ImageMagick) must take a
different approach to generating its IDs. The strings are 64 chars
long, but identical from run to run. So although just using faketime
can produce a perfect match in this case, it's a warning that the IDs
may have different properties when generated by other conversion tools.

Cheers,
David.


Reply to: