Greetings! I'm looking for a way to find out what files for a particular wild-card form exist at a certain directory depth on a website.
For example:
www.some.web.page/*/*.png
for all PNG files 1 below the top level. The results to be listed (stored in a file) along the lines of what you would get from
ls */*.png
if you were at the top level on the server.
A browser won't do it (won't accept wild-cards). I've looked at wget, but this doesn't seem to have a simple listing option (except under ftp mode, which the remote site won't respond to).
Any suggestions? With thanks, Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jun-09 Time: 08:41:32 ------------------------------ XFMail ------------------------------
On Sun, 07 Jun 2009 08:41:36 +0100 (BST) (Ted Harding) Ted.Harding@manchester.ac.uk allegedly wrote:
Greetings! I'm looking for a way to find out what files for a particular wild-card form exist at a certain directory depth on a website.
For example:
www.some.web.page/*/*.png
for all PNG files 1 below the top level. The results to be listed (stored in a file) along the lines of what you would get from
ls */*.png
if you were at the top level on the server.
A browser won't do it (won't accept wild-cards). I've looked at wget, but this doesn't seem to have a simple listing option (except under ftp mode, which the remote site won't respond to).
Any suggestions?
Ted
If I've understood you correctly then find should do it. Try
find . -maxdepth 2 -name "*.png" -print
(You can redirect output to a file of course)
Mick
---------------------------------------------------------------------
The text file for RFC 854 contains exactly 854 lines. Do you think there is any cosmic significance in this?
Douglas E Comer - Internetworking with TCP/IP Volume 1
http://www.ietf.org/rfc/rfc854.txt ---------------------------------------------------------------------
On 07-Jun-09 10:36:02, mick wrote:
On Sun, 07 Jun 2009 08:41:36 +0100 (BST) (Ted Harding) Ted.Harding@manchester.ac.uk allegedly wrote:
Greetings! I'm looking for a way to find out what files for a particular wild-card form exist at a certain directory depth on a website.
For example:
www.some.web.page/*/*.png
for all PNG files 1 below the top level. The results to be listed (stored in a file) along the lines of what you would get from
ls */*.png
if you were at the top level on the server.
A browser won't do it (won't accept wild-cards). I've looked at wget, but this doesn't seem to have a simple listing option (except under ftp mode, which the remote site won't respond to).
Any suggestions?
Ted If I've understood you correctly then find should do it. Try find . -maxdepth 2 -name "*.png" -print (You can redirect output to a file of course) Mick
Thanks, Mick, but I can't execute 'find' on a remote website! This can only be accessed by http.
Specifically: Go to http://journal.sjdm.org/ and the click on, say, "1" in "Vol. 4 (2009): 1" You will see a list of articles, of whichthe first has URL: http://journal.sjdm.org/8816/jdm8816.pdf
Note the "/8816/" -- this is the number of the article. Now go to that directory: http://journal.sjdm.org/8816 and note the listing of files with extensions .html, .pdf, .tex, .gif.
Now do similarly with the second article, which has URL: http://journal.sjdm.org/81125/jdm81125.pdf and its directory is: http://journal.sjdm.org/81125 in which, as well as .html, .pdf, .tex and .gif files, there is a a file: fig1.R
It is similar throughout: the directories for the articles are
where * is a 4- or 5-digit number (of the article), and some of these directories have one or more ".R" files in them, others have none.
What I want is to list (with directory paths) all ".R" files on that web-page. In other words, in regexp language,
http://journal.sjdm.org/%5B0:9%5D+/*.R
However, it seems that the HTTP protocol won't accept "wild cards" (or so wget tells me), and the website won't accept FTP access (where I could use wild cards).
Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jun-09 Time: 12:05:24 ------------------------------ XFMail ------------------------------
At Sun, 07 Jun 2009 12:05:29 +0100 (BST), (Ted Harding) wrote:
On 07-Jun-09 10:36:02, mick wrote:
On Sun, 07 Jun 2009 08:41:36 +0100 (BST) (Ted Harding) Ted.Harding@manchester.ac.uk allegedly wrote:
Greetings! I'm looking for a way to find out what files for a particular wild-card form exist at a certain directory depth on a website.
For example:
www.some.web.page/*/*.png
for all PNG files 1 below the top level. The results to be listed (stored in a file) along the lines of what you would get from
ls */*.png
if you were at the top level on the server.
A browser won't do it (won't accept wild-cards). I've looked at wget, but this doesn't seem to have a simple listing option (except under ftp mode, which the remote site won't respond to).
Any suggestions?
Ted If I've understood you correctly then find should do it. Try find . -maxdepth 2 -name "*.png" -print (You can redirect output to a file of course) Mick
Thanks, Mick, but I can't execute 'find' on a remote website! This can only be accessed by http.
Specifically: Go to http://journal.sjdm.org/ and the click on, say, "1" in "Vol. 4 (2009): 1" You will see a list of articles, of whichthe first has URL: http://journal.sjdm.org/8816/jdm8816.pdf
Note the "/8816/" -- this is the number of the article. Now go to that directory: http://journal.sjdm.org/8816 and note the listing of files with extensions .html, .pdf, .tex, .gif.
Now do similarly with the second article, which has URL: http://journal.sjdm.org/81125/jdm81125.pdf and its directory is: http://journal.sjdm.org/81125 in which, as well as .html, .pdf, .tex and .gif files, there is a a file: fig1.R
It is similar throughout: the directories for the articles are
where * is a 4- or 5-digit number (of the article), and some of these directories have one or more ".R" files in them, others have none.
What I want is to list (with directory paths) all ".R" files on that web-page. In other words, in regexp language,
http://journal.sjdm.org/%5B0:9%5D+/*.R
However, it seems that the HTTP protocol won't accept "wild cards" (or so wget tells me), and the website won't accept FTP access (where I could use wild cards).
Er, yes. You can't do anything equivalent to a kind of 'httpsh', 'shell over http' unless someone actually implements an http-speaking server which accepts UNIX shell-like URIs and returns UNIX shell-like responses.
Please take a moment to consider the security implications of http allowing users to do things like directory listing, file renaming, indiscriminate PUTting, etc. ...
The closest you might get is to use wget in spider mode to download a complete copy of the web site in question, then execute the shell commands of interest against your local copy:
$ mkdir sjdm $ cd sjdm $ wget -r -k -np -nH http://journal.sjdm.org/
should do it. Please see your man wget for details.
Then you can use find in your new sjdm directory.
On Sun, 2009-06-07 at 12:39 +0100, Richard Lewis wrote:
Please take a moment to consider the security implications of http allowing users to do things like directory listing, file renaming, indiscriminate PUTting, etc. ...
There is a tool in BackTrack that does a sort of brute force search for files over HTTP, I forget it's name but it found a scary amount of "hidden" stuff I had stuck on my hosting at various times for specific people to download.
Although the webmaster of the site in question won't thank you for running it.
Needless to say I am more careful about what I put up now :)
On Sun, 07 Jun 2009 12:05:29 +0100 (BST) (Ted Harding) Ted.Harding@manchester.ac.uk allegedly wrote: .
What I want is to list (with directory paths) all ".R" files on that web-page. In other words, in regexp language,
http://journal.sjdm.org/%5B0:9%5D+/*.R
However, it seems that the HTTP protocol won't accept "wild cards" (or so wget tells me), and the website won't accept FTP access (where I could use wild cards).
Ted
Sorry - I hadn't spotted that you had no shell access. But maybe curl can help. You might be able to grab all the files from the directories you want and save them locally so that you can search them later.
Mick
---------------------------------------------------------------------
The text file for RFC 854 contains exactly 854 lines. Do you think there is any cosmic significance in this?
Douglas E Comer - Internetworking with TCP/IP Volume 1
http://www.ietf.org/rfc/rfc854.txt ---------------------------------------------------------------------