I have a few PDFs which are scanned newsletters.
Printed, each is a sheet of A3 folded to create a 4-page A4 newsletter. Scanned, what I have is a PDF comprising two A3 pages (each therefore having two A4 pages on it).
As a result, the PDF when viewed effectively shows pages in the order 4 (back page), 1 (front page), 2, 3. (That's one A3 sheet containing pages 4&1, and another containing 2&3, as a two-page PDF.)
I want to turn this into a new PDF comprising four A4 pages in the obvious 1,2,3,4 page order.
Suggestions?
If the pages were A4 I could use pdftk: pdftk in.pdf cat 2-4 1 output out.pdf .. but pdftk can currently only "see" two A3 pages so that won't work.
Another consideration: The scan appears to have included an OCR layer which makes much of the text content searchable and I'd like to not lose that.
The original newsletters are lost to time so I can't rescan. (This exercise is for a website that's creating a historical archive of which these PDFs will form a part.)
On 18 February 2018 at 10:58, Mark Rogers mark@more-solutions.co.uk wrote:
If the pages were A4 I could use pdftk: pdftk in.pdf cat 2-4 1 output out.pdf .. but pdftk can currently only "see" two A3 pages so that won't work.
As is so often the way, formulating the question in a way suitable to post to a mailing list also prompts new phrases to Google for so I solved this myself:
sudo apt install pdfposter pdftk pdfposter in.pdf -s1 tmp.pdf pdftk tmp.pdf cat 2 4 5 1 output a4.pdf
(For some reason, pdfposter gives me a 6 page result, where 1,2,4,5 have my document on them and 3&6 are blank. I could probably solve that if it mattered but as it's just an interim step easily fixed by pdftk I didn't bother.)
The OCR layer appears to have been retained correctly by this process.