Re: [Alug] Acrobat/Ghostscript printing

7 Feb 2002


      On 07-Feb-02 MJ Ray wrote:
...
Ted Harding:
...
However, after 'ps2pdf' this PS file converts to a PDF
file of size 40667 bytes -- actually smaller than the
original! (of course, there is compression involved here;
if I turn off compression I end up with 207810
I notice that ps.gz is missing from your figures?
On purpose. The compression referred to is the 'Flate'
compression (implented as gzip when you use ps2pdf)
internal to the PDF file. Only chunks of the PDF file
are internally compressed in this way, not the whole file.
Since Flate compression is the norm for PDF, that's why
I made a point of mentioning the uncompressed outcome
as well.
...
Anyway, PDF is a good idea poorly executed, I think.
By introducing multiple incompatible versions of the
format and making implementers have to play "chase the
document trail" if they want to do a full version,
they've pretty much guaranteed it's always going to
cause pain.  (The document trail comment is based on
commend from the CL-PDF library authors, not personal
experience.)
Yes, I've also heard comment to this effect (and others).
But in my experience such pain is only rarely encountered.
...
Oh, remember that xpdf (or even pdf2ps and ps2ascii in
some cases) recovers you the original text for editing.
[I think you mean pdftotext here? Out of the same stable
 as xpdf of course].
You can grab small quantities of text from xpdf by
cut&paste with the mouse, quite successfully. There
seems to be no way (at least in the xpdf which I have)
of saving to text a whole PDF file opened with xpdf.
While pdftotext will (usually) save the text content,
often there is so much garbage as well (including masses
of space characters) that cleaning this up prior to
editing would be a horrible pain.
pdf2ps and pdf2ascii can do a fair job when they work --
which, on the whole, is when the files are very simple.
All sorts of things go wrong with more complicated layouts
-- chunks missing, spurious "text", etc. I have some PS
examples (mostly docs consisting mainly of tables) where,
out of say 40K of text characters (i.e. characters that
get printed and are meant to be read), maybe a few hundred
are extracted by ps2ascii.
The point to remeber about PS (and from this point of view
it applies also to PDF) is that it is primarily designed
to place marks at precise places on a page. These can be
placed in any order; in an extreme example, a PS file
could be constructed which rendered a page of print by
taking the characters in random order, each with the
coordinates of its position, and planting them as they
come. The printed result would be the same; but ps2ascii
would make nothing whatever sensible of it.
In summary: if such tools work for a particular file,
fine. But, in my direct experience, there are so many
catastrophic exceptions to this that I don't consider
them as serious options to be relied on.
...
In many cases, xhtml is a better format for document
interchange.
Horses for courses, of course; and where xhtml may be useful,
XML may be even better. If you're not fussed about how
the person at the other end will format the layout, then
it's probably OK (depending on what kind of document it
is). If what you want them to see is precisely what you see,
then it can get very hairy.
Cheers,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) Ted.Harding@nessie.mcc.ac.uk
Fax-to-email: +44 (0)870 167 1972
Date: 07-Feb-02                                       Time: 17:22:02
------------------------------ XFMail ------------------------------

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: [Alug] Acrobat/Ghostscript printing