Given the following: <?xml version="1.0" encoding="utf-8"?> <Project> <Network> <Unit> <UnitType>Type1</UnitType> <SerialNumber>123456</SerialNumber> </Unit> <Unit> <UnitType>Type2</UnitType> <SerialNumber>abcdef</SerialNumber> </Unit> [... snip...] </Network> </Project>
.. I need to extract all the serial numbers where UnitType is "Type2".
I have never managed to get my head around XPath etc, and all the examples I find seem either too general or too complex for me to make much progress. I always end up writing some code that parses it as text which is a massive cop out.
How best can I do this correctly using XML tools, and which tools should I use?
Mark
I like XPath! The w3schools tutorial is OK.
xmllint xmltest.xml --xpath '//Unit[UnitType="Type2"]/SerialNumber/text()'
will do it. It's a bit 'lazy' as it will find all 'Unit' elements with a child of 'UnitType' 'Type2' even if they are used elsewhere in the document, but if that's not a problem keep it simple. The text() takes off the <SerialNumber> element name.
The xmllint command comes from libxml2 package.
Cheers Neil
On 07/11/2013 10:35, Mark Rogers wrote:
Given the following:
<?xml version="1.0" encoding="utf-8"?>
<Project> <Network> <Unit> <UnitType>Type1</UnitType> <SerialNumber>123456</SerialNumber> </Unit> <Unit> <UnitType>Type2</UnitType> <SerialNumber>abcdef</SerialNumber> </Unit> [... snip...] </Network> </Project>
.. I need to extract all the serial numbers where UnitType is "Type2".
I have never managed to get my head around XPath etc, and all the examples I find seem either too general or too complex for me to make much progress. I always end up writing some code that parses it as text which is a massive cop out.
How best can I do this correctly using XML tools, and which tools should I use?
Mark
On 7 November 2013 11:18, Neil Sedger alug@moley.org.uk wrote:
I like XPath! The w3schools tutorial is OK.
Thanks, I'll take a look.
xmllint xmltest.xml --xpath '//Unit[UnitType="Type2"]/SerialNumber/text()'
That's almost perfect, thanks! It certainly finds all the data, but at the moment it's stringing it all together in one long string (no whitespace of any kind between results). How do I format it?
Mark