Line endings

From WhyNotWiki
Jump to: navigation, search

Applies to text / ASCII files (only)

Aliases: Line ending differences, Line ending styles, Line ending differences, Different line endings, Line terminators, File formats (line endings)

Name Abbreviation Escaped string representation ASCII code (decimal) ASCII code (hexadecimal) Used by
Line Feed LF \n 010 0A [GNU/Linux (category)]/Unix / Mac (>= X)
Carriage Return CR \r 013 0D
Carriage Return + Line Feed CRLF \r\n 013 010 0D 0A [Windows (category)]/DOS

Contents

How to determine what line ending / file format a file has

Using the file command:

man file:

In addition, file will attempt to determine other characteristics of text-type files. If the lines of a file are terminated by CR, CRLF, or NEL, instead of the Unix-standard LF, this will be reported. Files that contain embedded escape sequences or overstriking will also be identified.

Example:

> file *
init.rb:     ASCII text, with CRLF line terminators
lib:         directory
patch:       RCS/CVS diff output text
patch2:      ASCII text, with CRLF, LF line terminators
Rakefile:    ASCII text
Readme:      ASCII English text

As you can see...

  • Rakefile is a "normal" ASCII text file, with Unix-style (LF) line endings.
  • init.rb is a "Windows/DOS"-formatted ASCII text file, with CRLF line endings
  • patch2 is somewhat extraordinary, in that file actually detected two different line endings in use (see #Files with two different line ending styles)


In Vim:

[Vim (category)]

You can type :set fileformat and it will report one of the following values (probably):

fileformat=unix # LF
fileformat=dos  # CRLF

Also, if you ever see one of these strange symbols, this is what they mean:

^M    = CR = \r
??    = LF = \n

You can insert a ^M (carriage return) by pressing Ctrl-v Enter.

By inspecting the bytes

[shed (category)][hexdump (category)]

By using a hex editor, for instance.

> shed patch2
0160:  -  2D  045 055 00101101
0161:  c  63  099 143 01100011
0162:     0D  013 015 00001101
0163:     0A  010 012 00001010

Or the [hexdump (category)] command:

     -c          One-byte character display.  Display the input offset in hexadecimal, followed by sixteen space-separated, three column,
                 space-filled, characters of input data per line.
> hexdump -c patch2
00000a0   -   c  \r  \n       h  \r  \n       i  \r  \n
     -C          Canonical hex+ASCII display.  Display the input offset in hexadecimal, followed by sixteen space-separated, two column, hex-
                 adecimal bytes, followed by the same sixteen bytes in %_p format enclosed in ââ|ââ characters

.

> hexdump -C patch2
000000a0  2d 63 0d 0a 20 68 0d 0a  20 69 0d 0a              |-c.. h.. i..|

I don't recommend using the default format, because it uses "two-byte quantities of input data", which causes the byte order to be reversed for every 2-byte segment, which is very confusing to people like me who expect things to be in order.


     If no format strings are specified, the default display is equivalent to specifying the -x option.
     -x          Two-byte hexadecimal display.  Display the input offset in hexadecimal, followed by eight, space separated, four column,
                 zero-filled, two-byte quantities of input data, in hexadecimal, per line.
> hexdump patch2
00000a0 632d 0a0d 6820 0a0d 6920 0a0d

It seems to reverse the order of bytes: 63 and 2d are out of order, as are 0a and 0d.

Files with multiple different line ending styles

This rarely happens in practice, but it's possible.

To create a file with all 3 known line ending styles, just do this:

echo -ne "LF\nCRLF\r\nCR\r" > weird

And then use one or more of the ways we've just discussed to confirm the results:

> file weird
weird: ASCII text, with CRLF, CR, LF line terminators
> hexdump -c weird
0000000   L   F  \n   C   R   L   F  \r  \n   C   R  \r

> hexdump -C weird
00000000  4c 46 0a 43 52 4c 46 0d  0a 43 52 0d              |LF.CRLF..CR.|
0000000c

Vim will show:

LF
CRLF^M
CR^M
Ads
Personal tools