Scott Hanselman

Carriage Returns and Line Feeds will ultimately bite you - Some Git Tips

June 06, 2018 Comment on this post [22] Posted in Linux | Win10
Sponsored By

Typewriter by Matunos used under Creative CommonsWhat's a Carriage and why is it Returning? Carriage Return Line Feed WHAT DOES IT ALL MEAN!?!

The paper on a typewriter rides horizontally on a carriage. The Carriage Return or CR was a non-printable control character that would reset the typewriter to the beginning of the line of text.

However, a Carriage Return moves the carriage back but doesn't advance the paper by one line. The carriage moves on the X axes...

And Line Feed or LF is the non-printable control character that turns the Platen (the main rubber cylinder) by one line.

Hence, Carriage Return and Line Feed. Two actions, and for years, two control characters.

Every operating system seems to encode an EOL (end of line) differently. Operating systems in the late 70s all used CR LF together literally because they were interfacing with typewriters/printers on the daily.

Windows uses CRLF because DOS used CRLF because CP/M used CRLF because history.

Mac OS used CR for years until OS X switched to LF.

Unix used just a single LF over CRLF and has since the beginning, likely because systems like Multics started using just LF around 1965. Saving a single byte EVERY LINE was a huge deal for both storage and transmission.

Fast-forward to 2018 and it's maybe time for Windows to also switch to just using LF as the EOL character for Text Files.

Why? For starters, Microsoft finally updated Notepad to handle text files that use LF.

BUT

Would such a change be possible? Likely not, it would break the world. Here's NewLine on .NET Core.

public static String NewLine {
    get {
        Contract.Ensures(Contract.Result() != null);
#if !PLATFORM_UNIX
        return "\r\n";
#else
        return "\n";
#endif // !PLATFORM_UNIX
    }
}

Regardless, if you regularly use Windows and WSL (Linux on Windows) and Linux together, you'll want to be conscious and aware of CRLF and LF.

I ran into an interesting situation recently. First, let's review what Git does

You can configure .gitattributes to tell Git how to to treat files, either individually or by extension.

When

git config --global core.autocrlf true

is set, git will automatically convert files quietly so that they are checked out in an OS-specific way. If you're on Linux and checkout, you'll get LF, if you're on Windows you'll get CRLF.

Viola on Twitter offers an important clarification:

"gitattributes controls line ending behaviour for a repo, git config (especially with --global) is a per user setting."

99% of the time system and the options available works great.

Except when you are sharing file systems between Linux and Windows. I use Windows 10 and Ubuntu (via WSL) and keep stuff in /mnt/c/github.

However, if I pull from Windows 10 I get CRLF and if I pull from Linux I can LF so then my shell scripts MAY OR MAY NOT WORK while in Ubuntu.

I've chosen to create a .gitattributes file that set both shell scripts and PowerShell scripts to LF. This way those scripts can be used and shared and RUN between systems.

*.sh eol=lf
*.ps1 eol=lf

You've got lots of choices. Again 99% of the time autocrlf is the right thing.

From the GitHub docs:

You'll notice that files are matched--*.c, *.sln, *.png--, separated by a space, then given a setting--text, text eol=crlf, binary. We'll go over some possible settings below.

  • text=auto
    • Git will handle the files in whatever way it thinks is best. This is a good default option.
  • text eol=crlf
    • Git will always convert line endings to CRLF on checkout. You should use this for files that must keep CRLF endings, even on OSX or Linux.
  • text eol=lf
    • Git will always convert line endings to LF on checkout. You should use this for files that must keep LF endings, even on Windows.
  • binary
    • Git will understand that the files specified are not text, and it should not try to change them. The binary setting is also an alias for -text -diff.

Again, the defaults are probably correct. BUT - if you're doing weird stuff, sharing files or file systems across operating systems then you should be aware.

Edward Thomson, a co-maintainer of libgit2, has this to say and points us to his blog post on Line Endings.

I would say this more strongly. Because `core.autocrlf` is configured in a scope that's per-user, but affects the way the whole repository works, `.gitattributes` should _always_ be used.

If you're having trouble, it's probably line endings. Edward's recommendation is that ALL projects check in a .gitattributes.

The key to dealing with line endings is to make sure your configuration is committed to the repository, using .gitattributes. For most people, this is as simple as creating a file named .gitattributes at the root of your repository that contains one line:
* text=auto

Hope this helps!

* Typewriter by Matunos used under Creative Commons


Sponsor: Check out JetBrains Rider: a cross-platform .NET IDE. Edit, refactor, test and debug ASP.NET, .NET Framework, .NET Core, Xamarin or Unity applications. Learn more and download a 30-day trial!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service
June 06, 2018 4:11
I believe that source control should store my files.
And should not edit.
For editing I have good editors.
June 06, 2018 10:21
@Roman Agreed ... except that that fails spectacularly when you have multiple contributors to a codebase using different OSes and tools.
June 06, 2018 11:18
Changing that would probably break some of my code :)

I'm pretty sure i've got some replace( crlf, <br>) code floating about in legacy to get pieces of text working on both paper and web page.

Probably some other dirty tricks left and right

The next y2cr!!! :D
June 06, 2018 13:00
You can also set "files.eol" to "\n" at workspace level. There is an issue open to allow for language specific settings.
https://github.com/Microsoft/vscode/issues/39051
June 06, 2018 17:05
I can't believe we're still on this tired typewriter analogy.

Instead, just think of CR as "Cursor Return."

A line feed should take me from 1:10 to 2:10 in the document.

A CR should take me from 2:10 to 2:0 (or 2:1 if you prefer).
June 06, 2018 19:30
Would such a change be possible? Likely not, it would break the world.


Sometimes, the best way to light the path to your future is by burning the bridges of your past!
June 06, 2018 21:23
Anon- It's not an Analogy. It's literal history. It's exactly and demonstrably why it's a CR. Not sure why you're tired of it. It's reality.
June 06, 2018 22:07
Anon, I suggest finding a copy of Neal Stephenson's "In The Beginning... Was The Command Line." It's a thin little book and a quick read. It's probably fairly dated itself by now, but it's got a ton of great history, including the insight that, fundamentally, Unix *still* thinks it's talking to a teletype machine.
June 06, 2018 22:14
Very good topic Scott. I agree, it is the time to adopt the single-character line break across all platforms. It has been an area of concern for far too long.
The first step would be to ignore CR[\r] in Windows-based string parsers and change Environment.NewLine to LF[\n] in the next version of DotNet Core. Then more radical changes can follow...
June 07, 2018 1:22
Another reason why Windows/DOS/CPM etc uses CRLF -- According to ASCII/Ansi, it's the correct way of doing it.
June 07, 2018 1:26
Also, in the history department, The Radio Shack TRS-80 (which predated the Mac by 7 years, and the Apple 2 by a couple months), also used a single CR as end-of-line.
June 07, 2018 1:34
One of the things we do in our project to mitigate the line feed issue is to run a check script at the start of each build. The check script has sensible defaults for every file type, if eol at the time of commit are different then build would be rejected. In addition to this, we have a fix line endings script which dev can run before pushing the changes.

This way, we have a consistent EOL across the solution in source control and we are no longer dependent on. gitattribute
June 07, 2018 2:59
The old Typewriter and for that matter the dot matrix printer. We used to have to learn, know, all this stuff. This reminds me of when I used to have to send printer control codes to printers to get them to behave a standard way. Whenever a customer got a different printer I would have to work out what to send it to behave as the others did. Same with Modems. Everything is automatically abstracted these days.

ATH0

June 07, 2018 17:40
Regarding autocrlf, I ran into this interesting tidbit recently.

Normally, I set (global) autocrlf to true on Windows. That's normal for most folks. Recently, I had to set up a machine without autocrlf set to true.

So, the interesting behavior is that when (global) autocrlf is not set, git defaults to false (as documented), but Visual Studio defaults to true (wha...?). I had great fun fixing my branch after using both VS and git for various checkins.
June 07, 2018 19:40
I find it fascinating that you have to explain this physical paradigm, it's similar to understanding the en' space and 'em' space and how they define hyphens and various dashes. Then what descenders, ascenders, kerning and leading and gutters are and the physical paradigm that gives us that language. I am getting old...

Nice explanation, thanks!
June 08, 2018 12:43
I have never understood why we have CR and LF, but this makes sense. Could you change Windows systems to accept LF or CRLF? Therefore being backwards compatible and make .Net code more portable.
June 10, 2018 7:36
Most Internet protocols, e.g. HTTP or SMTP, use CRLF too.

EditorConfig also defines which line endings are used, but from the editor side. It also controls other whitespace like indentation, trailing spaces, and the final line ending. Using both that and .gitattributes can be helpful as long as they are in sync.
June 11, 2018 9:24
Nice article!
June 11, 2018 13:07
Great information. thanks
June 12, 2018 10:14
One fatal flaw in combining the CR and LF functions into a single operation is that it is useful to be able to perform CR and LF separately. Performing a CR without an LF used to let you overstrike an entire line (or an infinite number of lines) on the paper (and, on Tektronix CRT terminals, even on the screen). That, in turn, let you create a potentially infinite number of custom characters, on the fly, without having to fool with new glyphs, bitmaps, or fonts. For instance, you could create a custom line-item "bullet" by overstriking 'o' and '+' to make a circle-with-crosshair symbol. In the classic artform of raster-image "ASCII art," the available contrast range was greatly extended by the ability to print two, three, or four "passes" of each line, overlaying characters that placed ink in more space than a single character alone could ever achieve (I have a set of files that uses four-pass overstrike to print a wonderful classic Star Trek poster).

I'm sure there are, or were, other applications for overstriking, back "when overstriking was easy", and for 25 years now I've bemoaned the fact that the very concept seems to have gotten forgotten -- probably because GUI apps were designed to emulate the behavior of CRT terminals (such as the DEC VT series) that didn't happen to support overstrike. If only we had skipped directly from paper to GUIs, this might not have happened! Anyway, over those same 25 years I've also been wishing that MS Word's "overstrike" mode really was overstrike mode; what it really is, though -- and what it ought to be called! -- is replace mode, because in that mode, when you type where there's already text, the former text isn't overlaid by what you type, so that both are visible -- instead, the former text disappears and is replaced by what you type. It would be great for Word to have true overstrike mode so you could cheaply pull the abovementioned "custom bullets" and other tricks, easily, as used to be the case. I'm not a big fan of taking away capabilities in the course of "improving" a technology.

Thank goodness output onto paper, at least, is now performed by mechanisms that don't rely on CR and LF to control the position of the "next character to be printed." That lets us work around the problem, in software. In a pinch, one can learn application programming, and/or PostScript, and write one's own tools to do these things-that-once-were-easy. I created my own JPEG-to-raster-image-ASCII-art tool, for instance, but had to write it in Perl/Tkx because that's the only platform I know (of, or how to use) where I could easily get the desired, overstruck, printed output: I "create a Text object" in a Canvas widget for every character in up to three complete pagesful of characters overstruck as a single pageful -- and can then render the entire content of the Canvas to a PostScript file with a single function call. Of course, the resulting PostScript can then be sent to a printer, where it bypasses the entire notion of CR / LF -based character positioning and places each character exactly where my program wanted it.

If it were up for a vote, my vote would therefore be for CR and LF to remain separate, and for each to perform, or emulate, only-and-exactly the behavior it had in the original days of paper teletype. CR moves back to the left margin without advancing to the next line, and LF advances to the next line without changing horizontal position. The two can, of course, still always be combined, to retain the behavior almost everyone is used to.

Moreover, I would argue that, in this day and age, when all the resources that once had to be carefully rationed are now insanely cheap, for maximum flexibility on all platforms the "standard" line-ending should not be Unix's "plain LF," which in the new, corrected regime outlined above would simply advance the line, but instead should be Windows' "CR LF", so that advancing the line and returning-to-the-left would take place independently, and only when explicitly commanded -- as it should be. Text files could then freely mix lone CRs (so as to perform true overstriking), lone LFs (purpose unclear, but consider that at one time anyway a reverse line feed was defined in ASCII), and CR LF sequences, to perform everything that can be performed today plus resurrect the long-lost overstrike capability. I suspect this might actually simplify text-output drivers in most cases: there'd be no need to explicitly parse e.g. LF to do both functions; just "do whatever the character you just received says to do."

(Incidentally, the original-mechanical-typewriter operation we think we're emulating might not have been "CR LF" per se. On the mechanical typewriter I grew up with, the same lever performed both functions but, since more force was required to start the carriage moving than to advance the platen (roller) one line, when you pushed that lever the "LF" action happened first, and then the "CR" action. So I could argue that we've been doing it wrong all this time anyway -- terminating Windows textfile lines with "CR LF" when in at least some cases it should have been "LF CR"!)
June 14, 2018 3:59
I recently learned that gitconfig allows conditional includes. I keep all my windows projects in a single root folder /mnt/c/src, so I have added at the bottom of my ~/gitconfig

[includeIf "gitdir:/mnt/c/src/"]
path = autocrlf.inc


and autocrlf.inc contains

[core]
autocrlf = true


Allow my linuxy things like oh-my-zsh and vim plugins etc. live outside that folder. So now I have autocrlf = false everywhere except /mnt/c/src/.

This has improved my WSL experience substantially.
June 16, 2018 12:02
Like in the latest insiders of Windows 10 where you can setup filename case sensitivity there should be an option per folder to define what should be default \r\n or \n.

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.