I have some relatively large* XML files (about 3MB) which I need to work with (view certainly, edit would be nice).
They're not formatted - there are no line breaks in the files - which means most of the editors I've tried to open them with have struggled, and FireFox took about half an hour to open one of them.
What are good (preferably GUI) tools for this kind of thing?
I'm on Ubuntu some Gnome-biased is great, but to be honest anything that does the job is great enough. The application which creates them can comfortably handle a valid reformatted file, so if the lack of formatting is lost on a save that's not an issue. I don't really want to have to preprocess the files to tidy them before opening, though, as the files change frequently.
[*] Personally I find it hard to believe that 3MB is "large" in today's world, but I guess it is quite a long line if you're expecting line breaks.
On Tuesday 16 October 2007 16:33:31 Mark Rogers wrote:
They're not formatted - there are no line breaks in the files -
[...]
I don't really want to have to preprocess the files to tidy them before opening, though, as the files change frequently.
Therefore this may not be all that helpful, but I use this to make unreadable XML readable:
$ xmllint --format file.xml [1]
What are good (preferably GUI) tools for this kind of thing?
Again, it doesn't quite fit your requirements, but I use nxml-mode[2] in emacs and I haven't yet managed to make it struggle with large files.
Another thing you could consider (but, again, straying a bit far from your question) is using XPath to get the bits you want. Lots of tools do that including xmlstarlet[3], XML::XPath[4] (includes an xpath command), and (my favourite) Python lxml[5].
Cheers, Richard
[1] http://xmlsoft.org/xmllint.html [2] http://www.thaiopensource.com/nxml-mode/ [3] http://xmlstar.sourceforge.net/ [4] http://search.cpan.org/dist/XML-XPath/ [5] http://codespeak.net/lxml/
Richard Lewis wrote:
Therefore this may not be all that helpful, but I use this to make unreadable XML readable:
$ xmllint --format file.xml
xmllint is installed so I'll play with it - a useful workaround so thanks.
Another thing you could consider (but, again, straying a bit far from your question) is using XPath to get the bits you want.
Knowing what I want would be the hard part!
Mostly I just need to open the file for "browsing", maybe making occasional (and small scale, non-structural) changes. If a text editor opened it and displayed it formatted (having it all on one line isn't much good for browsing!) it would go far enough for my needs.
Thanks for the pointers though.
Mark Rogers wrote:
xmllint is installed so I'll play with it - a useful workaround so thanks.
I'm starting to think that I'm missing the point here...
I have formatted the file with xmllint, and it still takes a long time to load in any of the text editors (or Firefox) that I've tried. The formatted file is 76000 lines long, 3.7MB in size.
Obviously the line length was not the problem after all. Any suggestions for a good GUI editor that can handle large files?
I never had problems with large files in (eg) UltraEdit on Windows, but I've not succeeded in getting that running under Wine. (And I still don't think 4MB is large!)
Maybe I should give Eclipse another go; it eats memory for breakfast :-)
Richard Lewis richardlewis@fastmail.co.uk wrote:
On Tuesday 16 October 2007 16:33:31 Mark Rogers wrote:
What are good (preferably GUI) tools for this kind of thing?
Again, it doesn't quite fit your requirements, but I use nxml-mode[2] in emacs and I haven't yet managed to make it struggle with large files.
Outside Emacs, Conglomerate has recently changed maintainer http://lists.copyleft.no/pipermail/conglomerate-devel/2007-July/003749.html - its last one reportedly uses nXML-mode now http://lists.copyleft.no/pipermail/conglomerate-devel/2007-July/003744.html - so watch/pester http://www.conglomerate.org/ for a release soon.
Regards,
For XML editing, I've used <oXygen /> XML Editor in the past ... it's a Java-based commercial tool based on the Apache Xerces library. I don't know how well it copes with large documents but it was very good for the smaller uses I put it to - and I believe it's well-regarded in the area.
Perhaps try the trial version?
Peter.
samwise wrote:
For XML editing, I've used <oXygen /> XML Editor in the past ... it's a Java-based commercial tool based on the Apache Xerces library.
I just took a quick look at this, thanks for the pointer. I didn't get as far as the trial though as the price for the product is a bit steep for what I need to use it for (it's probably worthy of the price for its target audience, but at least for now I can't justify the cost purely to look at some large(-ish) XML files).
When I start doing more with XML (which I really should do) I'll look at <oXygen /> again.
I'm a bit surprised my quests for a good GUI text editor and XML viewer have struggled under Linux so far, one area I thought I'd be spoilt for choice! Maybe that's the problem, too much choice, too little focus. Or do "real programmers" still use text-mode editors?
On Thursday 18 October 2007 09:05:48 Mark Rogers wrote:
When I start doing more with XML (which I really should do)
XML has its place.
I'll look at <oXygen /> again.
I have some XML geek acquaintances (mainly the TEI[1] people) and I know they use <oXygen /> a lot.
I'm a bit surprised my quests for a good GUI text editor and XML viewer have struggled under Linux so far, one area I thought I'd be spoilt for choice! Maybe that's the problem, too much choice, too little focus. Or do "real programmers" still use text-mode editors?
Far be it from me to speak for real programmers, but I use ordinary emacs in a terminal emulator. And I know that I frequently get slated by other ALUG members for this; they argue that vi is a much better choice. A superficial glance at the Web seems to suggest that most people who know what a text editor is for prefer (or like to claim that they prefer) vi [2]. Interestingly, I've just spent a few minutes Googling for similar surveys but haven't found anything citable. Perhaps text editors are just too fundamental a tool for people either to make a big fuss about them (though emacs/vi flame wars would suggest otherwise) or to spend time developing new ones.
Cheers, Richard
[1] http://www.tei-c.org/ [2] http://linuxhelp.blogspot.com/2007/08/poll-shows-majority-favor-vi-as-their....
Richard Lewis wrote:
Perhaps text editors are just too fundamental a tool for people either to make a big fuss about them (though emacs/vi flame wars would suggest otherwise) or to spend time developing new ones.
Personally I do a lot from terminal windows (oftenvia SSH) and frequently fire up a textmode editor within the GUI.
But for a lot of things I think a good GUI editor ought to be better (anything presentational, like syntax highlighting, printing, etc works better in a GUI even where it can be done from textmode, imho). After all, lynx (and others like it) are really good browsers when you need them, but I still use Firefox for day to day browsing! I also have one eye on a desire to convince others to move away from Windows and they're not likely to be tempted by textmode.
I should probably try to get my head around emacs, having never learnt to love vi (from a terminal I tend to use joe, having had exposure to Wordstar compatible editors in the dim and distant past).
Richard Lewis wrote:
Perhaps text editors are just too fundamental a tool for people either to make a big fuss about them (though emacs/vi flame wars would suggest otherwise) or to spend time developing new ones.
Personally I do a lot from terminal windows (oftenvia SSH) and frequently
fire up a textmode editor within the GUI.
But for a lot of things I think a good GUI editor ought to be better
(anything presentational, like syntax highlighting, printing, etc works better in a GUI even where >it can be done from textmode, imho). After all, lynx (and others like it) are really good browsers when you need them, but I still use Firefox for day to day browsing! >I also have one eye on a desire to convince others to move away from Windows and they're not likely to be tempted by textmode.
I should probably try to get my head around emacs, having never learnt to
love vi (from a terminal I tend to use joe, having had exposure to Wordstar compatible editors >in the dim and distant past).
I have to admit that I'm a die-hard fan of the text-mode editor for everyday use. So as not to give the game away, I'll only mention that I tend to use one beginning with 'v' and ending in 'i'. Having said that, large and complex XML files are not fit for human consumption and might use the help of a good GUI ...
It has always amazed me that, considering the prevalence of XML, open source 'industrial quality' XML editors are scarce. I came across one recently at http://xml-copy-editor.sourceforge.net/. I've not had the opportunity to give it a good test drive, but so far it looks promising ...
Safe
On Thursday 18 October 2007 11:16:02 Safe Hammad wrote:
It has always amazed me that, considering the prevalence of XML, open source 'industrial quality' XML editors are scarce. I came across one recently at http://xml-copy-editor.sourceforge.net/. I've not had the opportunity to give it a good test drive, but so far it looks promising ...
It would be interesting to find out just how prevalent XML really is. Where is XML really being applied? I always get the impression (though I have no evidence to back this up) that XML has become an industry buzz-word and is discussed with enthusiasm by managers but with skepticism bordering on disdain[1] by IT professionals. The availability of XML tools could (given my clear bias for attempting to strengthen my argument) reflect this dichotomy: Assuming a preference amongst management for commercial (therefore accountable) tools and IT professionals (particularly the more geeky ones) for open source tools, notice how its Microsoft[2] and Oracle[3] who are pushing the most up-to-date XML technologies such as XQuery while open source tools (with the exception of Saxon[4] which, lets face it, is written in Java) and eXist[5] (again, written in Satan's language and not aimed at the business market) are currently sticking to the earlier standards (e.g. libxml, PostgreSQL[6], Xalan/Xerces). (Also note, however, Qt's recent addition[7] of XQuery to its framework. Where does this sit?) Is this because the open source community simply doesn't care about XSLT 2 and XQuery? Certainly Torvalds' famous rant[8] would seem to back this position up. And notice how AJAX is now becoming AJAJ[9].
XML has its place, but I'm not sure its in business information systems.
Cheers, Richard
[1] http://burningbird.net/writing/the-parable-of-the-languages/ [2] http://blogs.msdn.com/mrorke/ [3] http://www.oracle.com/technology/tech/xml/xquery/index.html [4] http://saxon.sourceforge.net/saxon7.9/using-xquery.html [5] http://exist.sourceforge.net/ [6] http://developer.postgresql.org/index.php/XML_Support [7] http://doc.trolltech.com/main-snapshot/qtxquery.html [8] http://mail.bitmover.com/pipermail/lmbench-users/2003-November/000076.html [9] http://www.json.org/xml.html
On Thu, Oct 18, 2007 at 12:13:35PM +0100, Richard Lewis wrote:
On Thursday 18 October 2007 11:16:02 Safe Hammad wrote:
It has always amazed me that, considering the prevalence of XML, open source 'industrial quality' XML editors are scarce. I came across one recently at http://xml-copy-editor.sourceforge.net/. I've not had the opportunity to give it a good test drive, but so far it looks promising ...
It would be interesting to find out just how prevalent XML really is. Where is XML really being applied? I always get the impression (though I have no evidence to back this up) that XML has become an industry buzz-word and is discussed with enthusiasm by managers but with skepticism bordering on disdain[1] by IT professionals. The availability of XML tools could (given my clear bias for attempting to strengthen my argument) reflect this dichotomy: Assuming a preference amongst management for commercial (therefore accountable) tools and IT professionals (particularly the more geeky ones) for open source tools, notice how its Microsoft[2] and Oracle[3] who are pushing the most up-to-date XML technologies such as XQuery while open source tools (with the exception of Saxon[4] which, lets face it, is written in Java) and eXist[5] (again, written in Satan's language and not aimed at the business market) are currently sticking to the earlier standards (e.g. libxml, PostgreSQL[6], Xalan/Xerces). (Also note, however, Qt's recent addition[7] of XQuery to its framework. Where does this sit?) Is this because the open source community simply doesn't care about XSLT 2 and XQuery? Certainly Torvalds' famous rant[8] would seem to back this position up. And notice how AJAX is now becoming AJAJ[9].
XML has its place, but I'm not sure its in business information systems.
Our systems where I work use XML for just about everything that's passed between systems. It's very heavily used here as our product runs on multiple platforms running different OS and much of its raison d'ĂȘtre is moving data about. We are Oracle based for our databases so that's also a factor I guess.
I agree somewhat with your sentiments about it though, it's very user unfriendly as a medium for outputting to logs for debugging etc.
Richard Lewis wrote:
It would be interesting to find out just how prevalent XML really is. Where is XML really being applied?
I find it frequently, although rarely where it is "necessary" - usually a sign that the person who wrote the application thinks XML is great and used it. Howver it's infinitely better than proprietary formats, and I can't help thinking that the fact that an OpenOffice .odt file is basically an XML file, and its inserted objects (images etc) packed up in a .zip file.
What frustrates me is that understanding XML is easy at the basic level, but there seems to me to be a big jump between that and doing anything useful with it. Once you "get it" the world is probably your oyster.
The XML files which lead me to start this thread are from an indistrial software package, the previous version of which used a proprietary format. We've just completed a very large high profile job without difficulty because the XML file format allowed us to auto-create a huge amount of the application that would have been hand-coded otherwise. (I did all that without really needing to get to grips with XML and not touching XSLT etc, it was sufficient to just treat the XML as a big text file).
And notice how AJAX is now becoming AJAJ[9].
Yes, I'd noticed this too.
Hi Richard
On Thursday 18 October 2007 12:13, Richard Lewis wrote:
It would be interesting to find out just how prevalent XML really is. Where is XML really being applied?
StepNC is one area where XML is being pushed as the future of CAD/CAM/CAE (a subject of interest to me).
Personally, I prefer XML style for configuration files without the overhead of DTD, XLST, and all the other bloat/buzzwords that often goes with it.
XML has its place, but I'm not sure its in business information systems.
As a way to store structured data in a human readable format, it has it's uses. As a solution to all/any problem, very doubtfull.
Regards, Paul.
Richard Lewis richardlewis@fastmail.co.uk wrote:
It would be interesting to find out just how prevalent XML really is. Where is XML really being applied? [...]
I use it extensively for inter-system communication. One problem is that some systems which claim to be providing XML don't provide XML (even Wordpress seems to get it wrong, especially if there's a plugin or two) so you do need to handle the failure case, which I think upsets some people, but it's something programmers should do anyway. You can't trust any system these days - not even yours.
It's also quite a nice way to pass S-expressions to things that don't use Lisp ;-)
Regards,
Safe Hammad wrote:
It has always amazed me that, considering the prevalence of XML, open source 'industrial quality' XML editors are scarce. I came across one recently at http://xml-copy-editor.sourceforge.net/. I've not had the opportunity to give it a good test drive, but so far it looks promising ...
Thanks for that pointer; I've just installed it to trial.
It opened the ~4MB reformatted XML file in about 4 seconds and looked really good. I then tried the unformatted file (the one without any line breaks) and it pretty much killed it :-( I'll see if I can file a bug report.
Mark Rogers wrote:
I'll see if I can file a bug report.
Well its using the standard SourceForge bug tracker so I posted a bug and XML Copy Editor's author is already looking into it. So this project gets lots of bonus points for quality of support!
If only commercial applications could support their paying customers so well.
On Thu, Oct 18, 2007 at 09:05:48AM +0100, Mark Rogers wrote:
I'm a bit surprised my quests for a good GUI text editor and XML viewer have struggled under Linux so far, one area I thought I'd be spoilt for choice! Maybe that's the problem, too much choice, too little focus. Or do "real programmers" still use text-mode editors?
To some extent - yes.
For anyone who uses a wide variety of languages and file formats it's easier to use a very familiar editor (i.e. use the same one for everything) and live with the resulting 'rough edges', than it is to use an unfamiliar editor dedicated to the one use. Well, that's what I find anyway.
I use vile/xvile which is a vi clone with Emacs aspirations. It has a very complete set of syntax colour highlighting filters which recognise just about every language and format I have thrown at it. This gives at least some help in things like XML and HTML and my fingers can work the same way that they do when writing C or Python or when composing E-Mails and writing news postings (since I use text based programs for those too, for much the same reason, a familiar editor).
A further advantage (for me) of using a vi clone is that when I connect to one of our target systems at work to look at error logs or whatever I can use 'real' vi without too much pain. They are Solaris systems so vi is all that you've got 'out of the box' at least on the command line.