I'm trying out edbrowse. I've been looking for an easy to script browser with full javascript support for a long time, and it looks like it may be the ticket.
I want to script it like this, but I don't know how to echo a new line:
echo "e http://yahoo.com%5Cn1,$n" | edbrowse
How do I properly echo a new line? I'm sure that the answer will no doubt make me slap my own head...
Thanks, Richard
On Wed, Jul 07, 2010 at 07:20:03PM +0100, Richard Parsons wrote:
I'm trying out edbrowse. I've been looking for an easy to script browser with full javascript support for a long time, and it looks like it may be the ticket.
I want to script it like this, but I don't know how to echo a new line:
echo "e http://yahoo.com%5Cn1,$n" | edbrowse
How do I properly echo a new line? I'm sure that the answer will no doubt make me slap my own head...
echo -e "e http://yahoo.com%5Cn1,$n" | edbrowse
(You might want $n instead of just $n as well - I don't know if n is actually a variable or $n a command.)
J.
On 07 Jul 19:50, Jonathan McDowell wrote:
On Wed, Jul 07, 2010 at 07:20:03PM +0100, Richard Parsons wrote:
I'm trying out edbrowse. I've been looking for an easy to script browser with full javascript support for a long time, and it looks like it may be the ticket.
I want to script it like this, but I don't know how to echo a new line:
echo "e http://yahoo.com%5Cn1,$n" | edbrowse
How do I properly echo a new line? I'm sure that the answer will no doubt make me slap my own head...
echo -e "e http://yahoo.com%5Cn1,$n" | edbrowse
(You might want $n instead of just $n as well - I don't know if n is actually a variable or $n a command.)
Which would work... but I'm now bemused as to why the echo of the first line...
echo '1,$n' | edbrowse http://yahoo.com/
Would work just as well...
Or:
edbrowse <<EOF e http://yahoo.com/ 1,n$ EOF
Which removes the echo entirely, and makes it obvious what to run...
Cheers,
On Thu, 8 Jul 2010, Brett Parker wrote:
but I'm now bemused as to why the echo of the first line...
echo '1,$n' | edbrowse http://yahoo.com/
Would work just as well...
Or:
edbrowse <<EOF e http://yahoo.com/ 1,n$ EOF
Which removes the echo entirely, and makes it obvious what to run...
Yes, the "$n" is a literal string to be passed to edbrowse. "$" means the last line of the file and so "1,$n" means "print the lines from the first line to the last line including the line numbers".
I like your final version best with the EOF. It's very clear, thanks. I'm quite new to bash scripting.
By the way, I'm very excited to have found a browser that is easily scriptable. Suddenly it seems a breeze to write a shell script that will web scrape. Has everyone always been able to do this easily with other programs and I just never noticed?
Richard
On 08 Jul 12:33, Richard Parsons wrote:
On Thu, 8 Jul 2010, Brett Parker wrote:
but I'm now bemused as to why the echo of the first line...
echo '1,$n' | edbrowse http://yahoo.com/
Would work just as well...
Or:
edbrowse <<EOF e http://yahoo.com/ 1,n$ EOF
Which removes the echo entirely, and makes it obvious what to run...
Yes, the "$n" is a literal string to be passed to edbrowse. "$" means the last line of the file and so "1,$n" means "print the lines from the first line to the last line including the line numbers".
I like your final version best with the EOF. It's very clear, thanks. I'm quite new to bash scripting.
By the way, I'm very excited to have found a browser that is easily scriptable. Suddenly it seems a breeze to write a shell script that will web scrape. Has everyone always been able to do this easily with other programs and I just never noticed?
I must admit I tend to use python's mechanize module for it instead, and not worry about javascript at all... But then, I tend to assume that people aren't building non-accessable websites, and when they are, I tend to not want to use 'em ;)
(Oh, and in that example, I should have put a \ in front of the $, but as there was nothing following the dollar, it wasn't resolved as a variable :)
I have, before, scripted a fair few things using urllib + urllib2 and HTMLParser in the python base, but that's for when you need something with very few dependencies... and the newer httplib is quite nice. By far, though, mechanize's Browser 'just works' for scripting.
The usual shell type way would be to use one of: wget -O - http://that.place/ w3m -dump_source http://that.place/ w3m -dump http://that.place/ lynx -dump http://that.place/
The first two will actually just dump the html. the last two do slightly more interesting things :)
But, it does look like edbrowse is quite sensible in its handling of javascript, so I'll make a note of it. Thanks.
Cheers,
On 08/07/10 12:33, Richard Parsons wrote:
By the way, I'm very excited to have found a browser that is easily scriptable. Suddenly it seems a breeze to write a shell script that will web scrape. Has everyone always been able to do this easily with other programs and I just never noticed?
I only have one script that reads a web page and I use w3m for that. I must look into edbrowse to see what if offers.
nev
On Thu, 8 Jul 2010, nev young wrote:
On 08/07/10 12:33, Richard Parsons wrote:
By the way, I'm very excited to have found a browser that is easily scriptable. Suddenly it seems a breeze to write a shell script that will web scrape. Has everyone always been able to do this easily with other programs and I just never noticed?
I only have one script that reads a web page and I use w3m for that. I must look into edbrowse to see what if offers.
Thanks Nev, and Brett, for your replies.
I really like the unix philosophy: "Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."
It seems such a step backwards to me to have a program that cannot easily be controlled from the commandline.
Currently I use alpine for email, but I want to start using edbrowse. It seems a real boon to me if learning to *use* a program is the same as learning to *script* it.
Programs like Firefox and Evolution are clearly marvellous achievements, but what a shame that the rise of the GUI necessitates the fall of interoperability between programs.
iDunno@sommitrealweird.co.uk wrote:
I have, before, scripted a fair few things using urllib + urllib2 and HTMLParser in the python base, but that's for when you need something with very few dependencies... and the newer httplib is quite nice. By far, though, mechanize's Browser 'just works' for scripting.
The usual shell type way would be to use one of: wget -O - http://that.place/ w3m -dump_source http://that.place/ w3m -dump http://that.place/ lynx -dump http://that.place/
The first two will actually just dump the html. the last two do slightly more interesting things :)
But, it does look like edbrowse is quite sensible in its handling of javascript, so I'll make a note of it. Thanks.
Another thing to make note of for people who need a middle ground between a whole browser and HTMLParser would be BeautifulSoup, which is wonderful for scraping web content, especially if you're used to using HTMLParser. I say middle ground because while it's not a standard library, it covers lots of bases like bad markup and so on, without the overhead of running a browser.
Programs like Firefox and Evolution are clearly marvellous achievements, but what a shame that the rise of >the GUI necessitates the fall of interoperability between programs.
Out of interest, the rise of the GUI means no such thing. Two of the large desktop suites, gnome and KDE have had scripting interfaces for a while to allow users and apps to communicate with each other.
In fact, before KDE and its apps were completely ruined, I used the superlatively useful DCOP for years to make desktop and development apps do whatever I wished, including scraping content derived using Konqueror, running an automatic spider for broken links on a website when an in-place edit over ssh had just been done with Kate, changing all sorts of things when my bluetooth-enabled handset was near the machine, managing playlists, pausing torrents and stuff if youtube was opened (on a very old machine with a bad connection), alerting me (really sending a contact via bluetooth called "alert") when something was finished, shutting down the machine when a DVD finished playing, etc. etc.. Pretty much every KDE app was scriptable (although some of that stuff will have involved me working around any limitations). In fact, it was the principal thing that made that DE so much better than anything else I have used before or since.
Of course, DCOP had its own problems, which is part of the reason for KDE moving on from it, and KDE itself is far from being recommendable again, but there is a healthy number of weirdos like myself who like to be able to do that kind of thing, so I can't see it disappearing soon.
Another thing to make note of for people who need a middle ground between a whole browser and HTMLParser would be BeautifulSoup, which is wonderful for scraping web content, especially if you're used to using HTMLParser. I say middle ground because while it's not a standard library, it covers lots of bases like bad markup and so on, without the overhead of running a browser.
I've just had a look at BeautifulSoup. Looks good. Thanks.
Programs like Firefox and Evolution are clearly marvellous achievements, but what a shame that the rise of >the GUI necessitates the fall of interoperability between programs.
Out of interest, the rise of the GUI means no such thing. Two of the large desktop suites, gnome and KDE have had scripting interfaces for a while to allow users and apps to communicate with each other.
In fact, before KDE and its apps were completely ruined, I used the superlatively useful DCOP for years to make desktop and development apps do whatever I wished, including scraping content derived using Konqueror, running an automatic spider for broken links on a website when an in-place edit over ssh had just been done with Kate, changing all sorts of things when my bluetooth-enabled handset was near the machine, managing playlists, pausing torrents and stuff if youtube was opened (on a very old machine with a bad connection), alerting me (really sending a contact via bluetooth called "alert") when something was finished, shutting down the machine when a DVD finished playing, etc. etc.. Pretty much every KDE app was scriptable (although some of that stuff will have involved me working around any limitations). In fact, it was the principal thing that made that DE so much better than anything else I have used before or since.
Of course, DCOP had its own problems, which is part of the reason for KDE moving on from it, and KDE itself is far from being recommendable again, but there is a healthy number of weirdos like myself who like to be able to do that kind of thing, so I can't see it disappearing soon.
That sounds great, and just what I'm looking for. So what's the best way of doing this these days? I'm running Ubuntu, is there a Gnome version of DCOP?
Richard
On 07-Jul-10 18:20:03, Richard Parsons wrote:
I'm trying out edbrowse. I've been looking for an easy to script browser with full javascript support for a long time, and it looks like it may be the ticket.
I want to script it like this, but I don't know how to echo a new line:
echo "e http://yahoo.com%5Cn1,$n" | edbrowse
How do I properly echo a new line? I'm sure that the answer will no doubt make me slap my own head...
Thanks, Richard
It may well ... :)
$ echo "e http://yahoo.com%5Cn1,$n" e http://yahoo.com%5Cn1,
$ echo -e "e http://yahoo.com%5Cn1,$n" e http://yahoo.com 1,
Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jul-10 Time: 19:59:09 ------------------------------ XFMail ------------------------------