http://slashdot.org/article.pl?sid=03/02/13/2132221
An anonymous coward asks "Looking to serve files for downloading (typically 1MB-6MB), I'm confused about whether I should provide an FTP server instead of / as well as HTTP. According to a rapid Google search, the experts say 1) HTTP is slower and less reliable than FTP and 2) HTTP is amateur and will make you look a wimp. But a) FTP is full of security holes. and b) FTP is a crumbling legacy protocol and will make you look a dinosaur. Surely some contradiction... Should I make the effort to implement FTP or take desperate steps to avoid it?"
So what's the general opinion about FTP vs. HTTP in the GNU/Linux environment?
Keith ____________ The Mind is the slayer of the Real The Voice of the Silence.
On Fri, 14 Feb 2003, Keith Watson wrote:
So what's the general opinion about FTP vs. HTTP in the GNU/Linux environment?
I'll take it...
Being more of a client than a server, I like FTP because I can choose a front-end to suit my mood: when I'm in a geeky mood, I run the command-line ftp client and issue Unix-like commands for doing things to files, otherwise I use my web browser to pretend those directories are web pages. I suspect the former would be tricky (but not impossible?) with HTTP.
On Friday 14 Feb 2003 3:22 pm, Keith Watson wrote:
http://slashdot.org/article.pl?sid=03/02/13/2132221
An anonymous coward asks "Looking to serve files for downloading (typically 1MB-6MB), I'm confused about whether I should provide an
FTP server instead of / as well as HTTP. According to a rapid Google search, the experts say 1) HTTP is slower and less reliable than FTP and 2) HTTP is amateur and will make you look a wimp. But a) FTP is full of security holes. and b) FTP is a crumbling legacy protocol and will make you look a dinosaur. Surely some contradiction... Should I make the effort to implement FTP or take desperate steps to avoid it?"
So what's the general opinion about FTP vs. HTTP in the GNU/Linux environment?
When given the choice, personally I always go for the HTTP download for one reason - everyone else goes for FTP because its a faster protocol. In reality though, I'm one of the few on the HTTP link so I get faster download overall.
Matt
Matt Parker matt200@ntlworld.com writes:
When given the choice, personally I always go for the HTTP download for one reason - everyone else goes for FTP because its a faster protocol. In reality though, I'm one of the few on the HTTP link so I get faster download overall.
I'm curious as to why you believe FTP is a "faster protocol".
"Keith Watson" Keith.Watson@Kewill.com writes:
http://slashdot.org/article.pl?sid=03/02/13/2132221
An anonymous coward asks "Looking to serve files for downloading (typically 1MB-6MB), I'm confused about whether I should provide an FTP server instead of / as well as HTTP. According to a rapid Google search, the experts say 1) HTTP is slower and less reliable than FTP and 2) HTTP is amateur and will make you look a wimp. But a) FTP is full of security holes. and b) FTP is a crumbling legacy protocol and will make you look a dinosaur. Surely some contradiction... Should I make the effort to implement FTP or take desperate steps to avoid it?"
So what's the general opinion about FTP vs. HTTP in the GNU/Linux environment?
FTP is certainly not inherently faster or less reliable than HTTP; for large files I would expect it to turn out about the same, as most of the time both are just shifting raw data straight from a file on disk to a TCP connection.
For small files FTP has the overhead of creating a new TCP connection for every individual file transferred. HTTP used to have this too, but modern versions don't necessarily suffer this limitation.
Furthermore, FTP has the additional overhead of setting up the control connection and loggin in, which could quite plausibly double the number of round trips required to fetch a single small file.
"HTTP is amateur" doesn't seem to mean anything at all. I might as well say "FTP is evil and must die" (which is indeed my opinion but is probably not a very convincing argument for anyone who doesn't share it).
FTP is not inherently full of security holes, but many implementations have had security problems in the past. But then, exactly the same is true of HTTP. I don't think there's any good motive for choosing one over the other here.
Richard Kettlewell rjk@terraraq.org.uk wrote:
Furthermore, FTP has the additional overhead of setting up the control connection and loggin in, which could quite plausibly double the number of round trips required to fetch a single small file.
I'm not sure this part holds: nothing requires you to wait for an answer to the login before sending the commands, although it is "intended to be an alternating dialogue". If the login fails, you've just given the server some garbage, but hey, you're not a human who mistyped, you're a dumb downloader program who will now fail anyway. Therefore you can pipeline commands and compared with HTTP headers and possible download of robots.txt in each direction, it's positively minimalist.
In general, though, more people have HTTP servers running already, don't need the rich features of FTP and there's little to choose between them.
My Opinion Only.
MJR
MJ Ray markj@cloaked.freeserve.co.uk writes:
Richard Kettlewell rjk@terraraq.org.uk wrote:
Furthermore, FTP has the additional overhead of setting up the control connection and loggin in, which could quite plausibly double the number of round trips required to fetch a single small file.
I'm not sure this part holds: nothing requires you to wait for an answer to the login before sending the commands, although it is "intended to be an alternating dialogue". If the login fails, you've just given the server some garbage, but hey, you're not a human who mistyped, you're a dumb downloader program who will now fail anyway. Therefore you can pipeline commands
You can probably pipeline to an extent, yes - but see the difficulties with pipelining in SMTP and NNTP. Authentication strikes me as one of the biggest danger spots for pipelining, as sometimes authentication involves handing off the connection to another process for a bit.
In passive mode (quite popular for clients behind firewalls) you have to wait for the response to the PASV command before you can create the data connection.
and compared with HTTP headers and possible download of robots.txt in each direction, it's positively minimalist.
If you just want to download a single file I'm not sure I see why one would look at robots.txt. If you want many files then I'd expect the cost of the control connection to rapidly beat a robots.txt lookup.
The headers probably fit in a single packet and don't require any back-and-forth between the client and the server, so aren't going to contribute much to the time taken.
Perhaps we should stop speculating and start profiling l-)
Richard Kettlewell rjk@terraraq.org.uk wrote:
the biggest danger spots for pipelining, as sometimes authentication involves handing off the connection to another process for a bit.
Surely that delay must not drop any data from the socket, else the server is in error. I've done some POP3 pipelining and the only problem has been some non-compliant servers that try to rely on an alternating dialogue.
[...]
If you just want to download a single file I'm not sure I see why one would look at robots.txt.
If you're doing it automatically, you're supposed to, IIRC.
If you want many files then I'd expect the cost of the control connection to rapidly beat a robots.txt lookup.
You're quite possibly correct there, but you have to do it.
The headers probably fit in a single packet and don't require any back-and-forth between the client and the server, so aren't going to contribute much to the time taken.
I'm not so sure. Headers seem to be getting more and more verbose. Are they just adding things back in to emulate FTP's richness?
Perhaps we should stop speculating and start profiling l-)
Oh, the real world will screw up all the numbers, for sure ;-)
MJ Ray markj@cloaked.freeserve.co.uk writes:
Richard Kettlewell rjk@terraraq.org.uk wrote:
the biggest danger spots for pipelining, as sometimes authentication involves handing off the connection to another process for a bit.
Surely that delay must not drop any data from the socket, else the server is in error. I've done some POP3 pipelining and the only problem has been some non-compliant servers that try to rely on an alternating dialogue.
It's nothing to do with delays. The problem arises when something on the server side uses stdio or some other buffered I/O library. Sendmail and INN are real examples of this.
[...]
If you just want to download a single file I'm not sure I see why one would look at robots.txt.
If you're doing it automatically, you're supposed to, IIRC.
I must have missed the bit where we said we were only talking about automated requests. (I bet the majority of downloads are manual...)
The headers probably fit in a single packet and don't require any back-and-forth between the client and the server, so aren't going to contribute much to the time taken.
I'm not so sure. Headers seem to be getting more and more verbose. Are they just adding things back in to emulate FTP's richness?
The headers on my home page come to under 500 bytes (including three lines of rubbish from proxies).
Richard Kettlewell rjk@terraraq.org.uk wrote:
MJ Ray markj@cloaked.freeserve.co.uk writes:
Surely that delay must not drop any data from the socket, else the server is in error. [...]
It's nothing to do with delays. The problem arises when something on the server side uses stdio or some other buffered I/O library.
If they're connecting buffered I/O directly to the network socket and collecting more data than they need from it, then surely they are in error and need fixing.
Sendmail and INN are real examples of this.
If that's what they did, I'd probably write that last word with a different letter order.
[...]
If you just want to download a single file I'm not sure I see why one would look at robots.txt.
If you're doing it automatically, you're supposed to, IIRC.
I must have missed the bit where we said we were only talking about automated requests. (I bet the majority of downloads are manual...)
I must have missed the bit where I didn't say "possible" for the download of robots.txt. Whether it's one file or many is irrelevant. Whether it's manual or automatic is relevant. That's what I said.
[...]
The headers on my home page come to under 500 bytes (including three lines of rubbish from proxies).
Congratulations. Do you have the "light headers award" PNG?
MJ Ray markj@cloaked.freeserve.co.uk writes:
Richard Kettlewell rjk@terraraq.org.uk wrote:
MJ Ray markj@cloaked.freeserve.co.uk writes:
Surely that delay must not drop any data from the socket, else the server is in error. [...]
It's nothing to do with delays. The problem arises when something on the server side uses stdio or some other buffered I/O library.
If they're connecting buffered I/O directly to the network socket and collecting more data than they need from it, then surely they are in error and need fixing.
Why? SMTP and NNTP (for instance) have mechanisms for negotiating the use of pipelining; why should a client that doesn't use those mechanisms expect to be able to pipeline safely?
Richard Kettlewell rjk@terraraq.org.uk wrote:
MJ Ray markj@cloaked.freeserve.co.uk writes:
If they're connecting buffered I/O directly to the network socket and collecting more data than they need from it, then surely they are in error and need fixing.
Why? SMTP and NNTP (for instance) have mechanisms for negotiating the use of pipelining; why should a client that doesn't use those mechanisms expect to be able to pipeline safely?
IIRC, SMTP requires an alternating dialogue, but the NNTP spec (like FTP) does not. So, a client should be able to pipeline NNTP or FTP safely with a conformant server, as long as it is happy to handle state change failures earlier in the pipeline when they occur.
MJ Ray markj@cloaked.freeserve.co.uk writes:
Richard Kettlewell rjk@terraraq.org.uk wrote:
Why? SMTP and NNTP (for instance) have mechanisms for negotiating the use of pipelining; why should a client that doesn't use those mechanisms expect to be able to pipeline safely?
IIRC, SMTP requires an alternating dialogue, but the NNTP spec (like FTP) does not.
The best known implementation of NNTP does indeed require strictly alternating dialogue at certain points. For instance, you can't safely pipeline near MODE READER or AUTHINFO GENERIC.
RFC977 and RFC2980 are silent on the subject both as a general principle and in those two specific cases. Sometimes you have to accept that the spec is really an incomplete description, not a specification.
So, a client should be able to pipeline NNTP or FTP safely with a conformant server, as long as it is happy to handle state change failures earlier in the pipeline when they occur.
I think it's optimistic to expect to use pipelining where it's not explicitly allowed.
MJ Ray markj@cloaked.freeserve.co.uk writes:
Richard Kettlewell rjk@terraraq.org.uk wrote:
MJ Ray markj@cloaked.freeserve.co.uk writes:
If you just want to download a single file I'm not sure I see why one would look at robots.txt.
If you're doing it automatically, you're supposed to, IIRC.
I must have missed the bit where we said we were only talking about automated requests. (I bet the majority of downloads are manual...)
I must have missed the bit where I didn't say "possible" for the download of robots.txt. Whether it's one file or many is irrelevant. Whether it's manual or automatic is relevant. That's what I said.
Why so defensive? If we're discussing the relative performance of FTP and HTTP then concentrating on unusual usage cases tells you nothing of any use; rather, one must look at the mainstream.
Given that the OP didn't specify what they were trying to do in any particular detail, that's interactive, non-automatic download, largely using the "Save link as" option in a web browser, and probably interactively using some command line tool in a small set of cases.
The number of people doing automatic downloads, for instance mirroring, will be tiny in comparison.
Richard Kettlewell rjk@terraraq.org.uk wrote:
Why so defensive? If we're discussing the relative performance of FTP and HTTP then concentrating on unusual usage cases tells you nothing of any use; rather, one must look at the mainstream.
It wasn't defensive, it was merely imitating your tone in reply. Maybe it gives you the same feeling that it did me? I thought I already adequately covered the possibility that robots.txt wouldn't be downloaded in the earlier message wording.
As I think you mentioned elsewhere in other words, in the absence of numbers or OP input, appealing to "the mainstream" case is meaningless.