For years I've used "catdoc" to cat old msword docs. Is there an equivalent command for docx? (I don't want to have to open libreoffice or whatever) -- Steve Mynott <steve.mynott@gmail.com> cv25519/ECF8B611205B447E091246AF959E3D6197190DD5
Steve Mynott <steve.mynott@gmail.com> a écrit :
For years I've used "catdoc" to cat old msword docs.
Is there an equivalent command for docx?
I am not aware of anything ready made but you can handle 90% of them by using unzip to extract content.xml (I think... Or it may be document.xml) and stripping the xml tags and fmt-ing it. Hope that helps!
MJ Ray <mjr@phonecoop.coop> a écrit :
I am not aware of anything ready made but you can handle 90% of them by using unzip to extract content.xml (I think... Or it may be document.xml) and stripping the xml tags and fmt-ing it.
Or I may have been misled by mention of libreoffice and that is for ODF files! As you were.
On Fri, Jan 22, 2021 at 10:35:30AM +0000, MJ Ray wrote:
MJ Ray <mjr@phonecoop.coop> a écrit :
I am not aware of anything ready made but you can handle 90% of them by using unzip to extract content.xml (I think... Or it may be document.xml) and stripping the xml tags and fmt-ing it.
Or I may have been misled by mention of libreoffice and that is for ODF files!
Pretty sure you're right and a docx is a zip file too. I haven't tried it, but docx2txt (http://docx2txt.sourceforge.net/) looks like it might do what the original poster wants? J. -- Evil is as evil does, but evil doesn't wear shoes.
On Fri, 22 Jan 2021 at 10:43, Jonathan McDowell <noodles@earth.li> wrote:
I haven't tried it, but docx2txt (http://docx2txt.sourceforge.net/) looks like it might do what the original poster wants?
I'll try and see if works any better for formatting than catdocx () { unzip -p "$1" word/document.xml | w3m -B -T text/html | fmt } -- Steve Mynott <steve.mynott@gmail.com> cv25519/ECF8B611205B447E091246AF959E3D6197190DD5
On Fri, 22 Jan 2021 at 10:46, Steve Mynott <steve.mynott@gmail.com> wrote:
On Fri, 22 Jan 2021 at 10:43, Jonathan McDowell <noodles@earth.li> wrote:
I haven't tried it, but docx2txt (http://docx2txt.sourceforge.net/) looks like it might do what the original poster wants?
I'll try and see if works any better for formatting than
catdocx () { unzip -p "$1" word/document.xml | w3m -B -T text/html | fmt }
docx2txt works very well (better than my shell function) and is exactly what I wanted! -- Steve Mynott <steve.mynott@gmail.com> cv25519/ECF8B611205B447E091246AF959E3D6197190DD5
On Fri, Jan 22, 2021 at 10:56:53AM +0000, Steve Mynott wrote:
On Fri, 22 Jan 2021 at 10:46, Steve Mynott <steve.mynott@gmail.com> wrote:
On Fri, 22 Jan 2021 at 10:43, Jonathan McDowell <noodles@earth.li> wrote:
I haven't tried it, but docx2txt (http://docx2txt.sourceforge.net/) looks like it might do what the original poster wants?
I'll try and see if works any better for formatting than
catdocx () { unzip -p "$1" word/document.xml | w3m -B -T text/html | fmt }
docx2txt works very well (better than my shell function) and is exactly what I wanted!
There's more than one docx2txt, I've found a perl one and a python one. -- Chris Green
On Fri, 22 Jan 2021 at 12:02, Chris Green <cl@isbd.net> wrote:
There's more than one docx2txt, I've found a perl one and a python one.
I just installed the one in the Debian repos which is the perl one at http://docx2txt.sourceforge.net It mentions possible use in data recovery of corrupted files which is useful. -- Steve Mynott <steve.mynott@gmail.com> cv25519/ECF8B611205B447E091246AF959E3D6197190DD5
participants (4)
-
Chris Green -
Jonathan McDowell -
MJ Ray -
Steve Mynott