On 07-Jun-09 10:36:02, mick wrote:
On Sun, 07 Jun 2009 08:41:36 +0100 (BST) (Ted Harding) Ted.Harding@manchester.ac.uk allegedly wrote:
Greetings! I'm looking for a way to find out what files for a particular wild-card form exist at a certain directory depth on a website.
For example:
www.some.web.page/*/*.png
for all PNG files 1 below the top level. The results to be listed (stored in a file) along the lines of what you would get from
ls */*.png
if you were at the top level on the server.
A browser won't do it (won't accept wild-cards). I've looked at wget, but this doesn't seem to have a simple listing option (except under ftp mode, which the remote site won't respond to).
Any suggestions?
Ted If I've understood you correctly then find should do it. Try find . -maxdepth 2 -name "*.png" -print (You can redirect output to a file of course) Mick
Thanks, Mick, but I can't execute 'find' on a remote website! This can only be accessed by http.
Specifically: Go to http://journal.sjdm.org/ and the click on, say, "1" in "Vol. 4 (2009): 1" You will see a list of articles, of whichthe first has URL: http://journal.sjdm.org/8816/jdm8816.pdf
Note the "/8816/" -- this is the number of the article. Now go to that directory: http://journal.sjdm.org/8816 and note the listing of files with extensions .html, .pdf, .tex, .gif.
Now do similarly with the second article, which has URL: http://journal.sjdm.org/81125/jdm81125.pdf and its directory is: http://journal.sjdm.org/81125 in which, as well as .html, .pdf, .tex and .gif files, there is a a file: fig1.R
It is similar throughout: the directories for the articles are
where * is a 4- or 5-digit number (of the article), and some of these directories have one or more ".R" files in them, others have none.
What I want is to list (with directory paths) all ".R" files on that web-page. In other words, in regexp language,
http://journal.sjdm.org/%5B0:9%5D+/*.R
However, it seems that the HTTP protocol won't accept "wild cards" (or so wget tells me), and the website won't accept FTP access (where I could use wild cards).
Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 07-Jun-09 Time: 12:05:24 ------------------------------ XFMail ------------------------------