L10N Forum Forum Index
Author Message

<  Characters and encoding  ~  FTP transfer type encoding

Richard Major
Posted: Mon Jun 18, 2007 8:55 am Reply with quote
Joined: 07 Jun 2007 Posts: 1 Location: London, UK
Please can someone explain the difference between the FTP transfer types; Binary and ASCII?
View user's profile Send private message
Guy
Posted: Mon Jun 18, 2007 11:18 am Reply with quote
Site Admin Joined: 19 Feb 2007 Posts: 21 Location: Alameda, CA
I'm working from long term memory here, so don't expect 100% accuracy.

Back in the Bad Old Days, certain computers changed their encoding of file data for ASCII and binary data. Binary by nature was typically 8 bits, though some stubborn computer makers who had 16, 31, and 32 bit computer defined it differently.

ASCII was treated as a 7 bit data type, discarding the high-order bit as it was not used for English characters and not well defined for other uses.

To make matters worse on ASCII, different computer makers and operating systems had different defaults for using carriage returns and line feeds at the end of lines/paragraphs.

So when FTP was being designed, it had to at least agree on a bag-of-bytes that should not be altered (binary) and human readable text that needs to be semi-formatted (ASCII), and those two designations were made.

The CR/LF remains important. If you do a binary transfer of an ASCII file to/from a Windows and UNIX computer, they will not format the same on both.

Finally, hooks started to appear to help transfers translate character encoding sets from real ASCII to EBCDIC and back.

_________________
-----
Guy Smith
Silicon Strategies Marketing
630 Taylor Avenue
Alameda, CA 94501
510-521-4477
guy.smith@SiliconStrat.com
View user's profile Send private message Visit poster's website
ian.henderson
Posted: Fri Jun 29, 2007 7:49 am Reply with quote
Joined: 20 Apr 2007 Posts: 16 Location: Edinburgh
In a nutshell it is to do with how line endings are encoded in text files.

In a UNIX file an end of line is indicated by a LF (Line feed), whereas on DOS/Windows an end of line is indicated by CR LF (Carriage return and Line feed). Strictly speaking DOS/Windows is more logical, as the CR is explicit. But UNIX being UNIX, the CR was eliminated as it saved one byte for each line!

To view this effect, try opening a UNIX text file in Windows Notepad. The entire file will appear on one line.

The FTP ASCII transfer adds or subtracts CR characters when transferring a file between a UNIX system and a DOS/Windows system. In other words there is a change made to the file transferred.

The FTP binary transfer makes no change to the file transferred.

So why not use ASCII every time?
If a binary file by chance has a character sequence of CR/LF or LF these will get transformed, thereby breaking the binary file.

So why not use binary every time?
The files won't break; but they will look odd on the target machine. As mentioned above a UNIX text file in Windows Notepad will run entirely on one line, whereas a Windows text file viewed in a UNIX text editor will have a blank line inserted below each line.

There is usually an Auto option in addition to the ASCII/binary options in ftp clients. This option will try to determine if the file is a binary file or an ASCII file.


Last edited by ian.henderson on Fri Jun 29, 2007 7:52 am; edited 1 time in total
View user's profile Send private message Visit poster's website

Display posts from previous:  

All times are GMT
Page 1 of 1
Post new topic

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum