I have some Excel files that contain some useful information (hard to believe I know!). I want to extract just that information.
Somewhere in the file it will say something like "Location: Blah" and I want to capture just "Blah".
I can use xls2csv to turn the XLS into text and then run that through (eg) awk. Because the XLS field in question will now be in quotes, the following gets me the whole "Location: Blah" value: xls2csv myfile.xls | grep -oE '"Location:([^"]+)"' What I want is something like this: myfile: Blah (the filename prefix is because I plan on processing about 100 files this way).
I'm sure this can all be done pretty simply within awk but the rate I'm going I'm going to end up with quite a complex script, so any guidance would be very welcome!
On 03/12, Mark Rogers wrote:
I can use xls2csv to turn the XLS into text and then run that through (eg) awk. Because the XLS field in question will now be in quotes, the following gets me the whole "Location: Blah" value: xls2csv myfile.xls | grep -oE '"Location:([^"]+)"' What I want is something like this: myfile: Blah
Would summat like this do?
echo $(basename -s .xls $myfile): $(xls2csv $myfile | grep -oE '"Location:([^"]+)"' | cut -d ':' -f 2)
Steve
On 3 December 2014 at 11:53, Steve Engledow steve@offend.me.uk wrote:
Would summat like this do?
echo $(basename -s .xls $myfile): $(xls2csv $myfile | grep -oE '"Location:([^"]+)"' | cut -d ':' -f 2)
Yes, for the example case (which is all I asked, so well done!), however I'm finding that the not all fields are formatted the same (eg ":" might be "-"), so something based on selecting a specific match from a regex would be better.
I've got some way closer using gawk (note: not awk) and "match()", however I have cme up against another problem: sometimes the fied will contain carriage returns: "foo", "bar", 123, "Location: This is a location", "etc" .. and my pattern matching in gawk is obviously line based.
Of-course maybe I'm looking at this the wrong way; I'm sure there are more flexible XLS parsers that could be told to just give me what ever is in a certain cell (that looks fairly consistent so far, but I'll no doubt find out otherwise soon), and maybe even search the XLS file for a matching cell and extract that. But I have nothing yet that will do that, Google here I come...
Mark