On 26/06/12 16:52, Bev Nicolson wrote:
OK. Say I have a 300+ page pdf document. Someone else I know needs to read one or two chapters of it but I would like to extract not just the text, but the layout and photos too.
OK!
Well you can do it with PDF shuffle, pdf chain or pdftk (but read on before attempting)
Using pdf-shuffler Open pdf-shuffler Use button at bottom to open a copy of the PDF document. Click on a page you want to delete. Hold the ctrl key, and click on another page you want to delete. Keep selecting pages until you have a bunch, or until you are finished. Click on the Delete Page(s) button. Repeat as necessary. Press Export pdf to save the document. Note if you are doing this for chapters, you'll have a lot to delete, it will take a long time.
Using pdf chain. open pdf chain. Put a copy the document you want to edit into a directory Go to the split tab Press the add button, and tell the program where the file is. Leave Prefix as "Sheet" and counter as Auto. Press save, enter the folder you want to save the result to. This has split the document into one pdf file for every page. Go to the Merge tab.P Press the plus button Click on the first sheet you want to add. Press and hold shift button. Click on the last sheet you want to add. Press OK. Press the Save button then enter an filename for the re-assembled file You have now merged the individual pdf files into one new document.
using pdftk at a prompt type pdftk ORIGINAL_FILE_NAME.PDF shuffle 28-47 output OUTPUT_FILE_NAME.PDF
ORIGINAL_FILE_NAME is the name of the main document OUTPUT_FILE_NAME is the name of you want to give this, e.g. Chapter5.pdf replace 28-47 with the appropriate page numbers - note that 28-47 will give you pages 28 to 46 inclusive.
I'd recommend using pdftk and creating a file for each chapter you want to extract. You can then combine these with pdf-chain, or if you're feeling adventurous, work out how to do it with pfdtk. Alternatively, just supply the separate chapter files.
Hope this helps. Steve