For years I've used "catdoc" to cat old msword docs.
Is there an equivalent command for docx?
(I don't want to have to open libreoffice or whatever)
Steve Mynott steve.mynott@gmail.com a écrit :
For years I've used "catdoc" to cat old msword docs.
Is there an equivalent command for docx?
I am not aware of anything ready made but you can handle 90% of them by using unzip to extract content.xml (I think... Or it may be document.xml) and stripping the xml tags and fmt-ing it.
Hope that helps!
MJ Ray mjr@phonecoop.coop a écrit :
I am not aware of anything ready made but you can handle 90% of them by using unzip to extract content.xml (I think... Or it may be document.xml) and stripping the xml tags and fmt-ing it.
Or I may have been misled by mention of libreoffice and that is for ODF files!
As you were.
On Fri, Jan 22, 2021 at 10:35:30AM +0000, MJ Ray wrote:
MJ Ray mjr@phonecoop.coop a écrit :
I am not aware of anything ready made but you can handle 90% of them by using unzip to extract content.xml (I think... Or it may be document.xml) and stripping the xml tags and fmt-ing it.
Or I may have been misled by mention of libreoffice and that is for ODF files!
Pretty sure you're right and a docx is a zip file too.
I haven't tried it, but docx2txt (http://docx2txt.sourceforge.net/) looks like it might do what the original poster wants?
J.
On Fri, 22 Jan 2021 at 10:43, Jonathan McDowell noodles@earth.li wrote:
I haven't tried it, but docx2txt (http://docx2txt.sourceforge.net/) looks like it might do what the original poster wants?
I'll try and see if works any better for formatting than
catdocx () { unzip -p "$1" word/document.xml | w3m -B -T text/html | fmt }
On Fri, 22 Jan 2021 at 10:46, Steve Mynott steve.mynott@gmail.com wrote:
On Fri, 22 Jan 2021 at 10:43, Jonathan McDowell noodles@earth.li wrote:
I haven't tried it, but docx2txt (http://docx2txt.sourceforge.net/) looks like it might do what the original poster wants?
I'll try and see if works any better for formatting than
catdocx () { unzip -p "$1" word/document.xml | w3m -B -T text/html | fmt }
docx2txt works very well (better than my shell function) and is exactly what I wanted!
On Fri, Jan 22, 2021 at 10:56:53AM +0000, Steve Mynott wrote:
On Fri, 22 Jan 2021 at 10:46, Steve Mynott steve.mynott@gmail.com wrote:
On Fri, 22 Jan 2021 at 10:43, Jonathan McDowell noodles@earth.li wrote:
I haven't tried it, but docx2txt (http://docx2txt.sourceforge.net/) looks like it might do what the original poster wants?
I'll try and see if works any better for formatting than
catdocx () { unzip -p "$1" word/document.xml | w3m -B -T text/html | fmt }
docx2txt works very well (better than my shell function) and is exactly what I wanted!
There's more than one docx2txt, I've found a perl one and a python one.
On Fri, 22 Jan 2021 at 12:02, Chris Green cl@isbd.net wrote:
There's more than one docx2txt, I've found a perl one and a python one.
I just installed the one in the Debian repos which is the perl one at http://docx2txt.sourceforge.net
It mentions possible use in data recovery of corrupted files which is useful.