Carriage Returns and Line Feeds will ultimately bite you - Some Git Tips
What's a Carriage and why is it Returning? Carriage Return Line Feed WHAT DOES IT ALL MEAN!?!
The paper on a typewriter rides horizontally on a carriage. The Carriage Return or CR was a non-printable control character that would reset the typewriter to the beginning of the line of text.
However, a Carriage Return moves the carriage back but doesn't advance the paper by one line. The carriage moves on the X axes...
And Line Feed or LF is the non-printable control character that turns the Platen (the main rubber cylinder) by one line.
Hence, Carriage Return and Line Feed. Two actions, and for years, two control characters.
Every operating system seems to encode an EOL (end of line) differently. Operating systems in the late 70s all used CR LF together literally because they were interfacing with typewriters/printers on the daily.
Windows uses CRLF because DOS used CRLF because CP/M used CRLF because history.
Mac OS used CR for years until OS X switched to LF.
Unix used just a single LF over CRLF and has since the beginning, likely because systems like Multics started using just LF around 1965. Saving a single byte EVERY LINE was a huge deal for both storage and transmission.
Fast-forward to 2018 and it's maybe time for Windows to also switch to just using LF as the EOL character for Text Files.
Why? For starters, Microsoft finally updated Notepad to handle text files that use LF.
BUT
Would such a change be possible? Likely not, it would break the world. Here's NewLine on .NET Core.
public static String NewLine { get { Contract.Ensures(Contract.Result() != null); #if !PLATFORM_UNIX return "\r\n"; #else return "\n"; #endif // !PLATFORM_UNIX } }
Regardless, if you regularly use Windows and WSL (Linux on Windows) and Linux together, you'll want to be conscious and aware of CRLF and LF.
I ran into an interesting situation recently. First, let's review what Git does
You can configure .gitattributes to tell Git how to to treat files, either individually or by extension.
When
git config --global core.autocrlf true
is set, git will automatically convert files quietly so that they are checked out in an OS-specific way. If you're on Linux and checkout, you'll get LF, if you're on Windows you'll get CRLF.
Viola on Twitter offers an important clarification:
"gitattributes controls line ending behaviour for a repo, git config (especially with --global) is a per user setting."
99% of the time system and the options available works great.
Except when you are sharing file systems between Linux and Windows. I use Windows 10 and Ubuntu (via WSL) and keep stuff in /mnt/c/github.
However, if I pull from Windows 10 I get CRLF and if I pull from Linux I can LF so then my shell scripts MAY OR MAY NOT WORK while in Ubuntu.
I've chosen to create a .gitattributes file that set both shell scripts and PowerShell scripts to LF. This way those scripts can be used and shared and RUN between systems.
*.sh eol=lf *.ps1 eol=lf
You've got lots of choices. Again 99% of the time autocrlf is the right thing.
From the GitHub docs:
You'll notice that files are matched--*.c
, *.sln
, *.png
--, separated by a space, then given a setting--text
, text eol=crlf
, binary
. We'll go over some possible settings below.
text=auto
- Git will handle the files in whatever way it thinks is best. This is a good default option.
text eol=crlf
- Git will always convert line endings to
CRLF
on checkout. You should use this for files that must keepCRLF
endings, even on OSX or Linux.
- Git will always convert line endings to
text eol=lf
- Git will always convert line endings to
LF
on checkout. You should use this for files that must keep LF endings, even on Windows.
- Git will always convert line endings to
binary
- Git will understand that the files specified are not text, and it should not try to change them. The
binary
setting is also an alias for-text -diff
.
- Git will understand that the files specified are not text, and it should not try to change them. The
Again, the defaults are probably correct. BUT - if you're doing weird stuff, sharing files or file systems across operating systems then you should be aware.
Edward Thomson, a co-maintainer of libgit2, has this to say and points us to his blog post on Line Endings.
I would say this more strongly. Because `core.autocrlf` is configured in a scope that's per-user, but affects the way the whole repository works, `.gitattributes` should _always_ be used.
If you're having trouble, it's probably line endings. Edward's recommendation is that ALL projects check in a .gitattributes.
The key to dealing with line endings is to make sure your configuration is committed to the repository, using
.gitattributes
. For most people, this is as simple as creating a file named.gitattributes
at the root of your repository that contains one line:* text=auto
Hope this helps!
I hope Microsoft bought Github so they can fix this CRLF vs LF issue.
— Scott Hanselman (@shanselman) June 4, 2018
* Typewriter by Matunos used under Creative Commons
Sponsor: Check out JetBrains Rider: a cross-platform .NET IDE. Edit, refactor, test and debug ASP.NET, .NET Framework, .NET Core, Xamarin or Unity applications. Learn more and download a 30-day trial!
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
I'm pretty sure i've got some replace( crlf, <br>) code floating about in legacy to get pieces of text working on both paper and web page.
Probably some other dirty tricks left and right
The next y2cr!!! :D
https://github.com/Microsoft/vscode/issues/39051
Instead, just think of CR as "Cursor Return."
A line feed should take me from 1:10 to 2:10 in the document.
A CR should take me from 2:10 to 2:0 (or 2:1 if you prefer).
Would such a change be possible? Likely not, it would break the world.
Sometimes, the best way to light the path to your future is by burning the bridges of your past!
The first step would be to ignore CR[\r] in Windows-based string parsers and change Environment.NewLine to LF[\n] in the next version of DotNet Core. Then more radical changes can follow...
This way, we have a consistent EOL across the solution in source control and we are no longer dependent on. gitattribute
ATH0
Normally, I set (global) autocrlf to true on Windows. That's normal for most folks. Recently, I had to set up a machine without autocrlf set to true.
So, the interesting behavior is that when (global) autocrlf is not set, git defaults to false (as documented), but Visual Studio defaults to true (wha...?). I had great fun fixing my branch after using both VS and git for various checkins.
Nice explanation, thanks!
EditorConfig also defines which line endings are used, but from the editor side. It also controls other whitespace like indentation, trailing spaces, and the final line ending. Using both that and .gitattributes can be helpful as long as they are in sync.
I'm sure there are, or were, other applications for overstriking, back "when overstriking was easy", and for 25 years now I've bemoaned the fact that the very concept seems to have gotten forgotten -- probably because GUI apps were designed to emulate the behavior of CRT terminals (such as the DEC VT series) that didn't happen to support overstrike. If only we had skipped directly from paper to GUIs, this might not have happened! Anyway, over those same 25 years I've also been wishing that MS Word's "overstrike" mode really was overstrike mode; what it really is, though -- and what it ought to be called! -- is replace mode, because in that mode, when you type where there's already text, the former text isn't overlaid by what you type, so that both are visible -- instead, the former text disappears and is replaced by what you type. It would be great for Word to have true overstrike mode so you could cheaply pull the abovementioned "custom bullets" and other tricks, easily, as used to be the case. I'm not a big fan of taking away capabilities in the course of "improving" a technology.
Thank goodness output onto paper, at least, is now performed by mechanisms that don't rely on CR and LF to control the position of the "next character to be printed." That lets us work around the problem, in software. In a pinch, one can learn application programming, and/or PostScript, and write one's own tools to do these things-that-once-were-easy. I created my own JPEG-to-raster-image-ASCII-art tool, for instance, but had to write it in Perl/Tkx because that's the only platform I know (of, or how to use) where I could easily get the desired, overstruck, printed output: I "create a Text object" in a Canvas widget for every character in up to three complete pagesful of characters overstruck as a single pageful -- and can then render the entire content of the Canvas to a PostScript file with a single function call. Of course, the resulting PostScript can then be sent to a printer, where it bypasses the entire notion of CR / LF -based character positioning and places each character exactly where my program wanted it.
If it were up for a vote, my vote would therefore be for CR and LF to remain separate, and for each to perform, or emulate, only-and-exactly the behavior it had in the original days of paper teletype. CR moves back to the left margin without advancing to the next line, and LF advances to the next line without changing horizontal position. The two can, of course, still always be combined, to retain the behavior almost everyone is used to.
Moreover, I would argue that, in this day and age, when all the resources that once had to be carefully rationed are now insanely cheap, for maximum flexibility on all platforms the "standard" line-ending should not be Unix's "plain LF," which in the new, corrected regime outlined above would simply advance the line, but instead should be Windows' "CR LF", so that advancing the line and returning-to-the-left would take place independently, and only when explicitly commanded -- as it should be. Text files could then freely mix lone CRs (so as to perform true overstriking), lone LFs (purpose unclear, but consider that at one time anyway a reverse line feed was defined in ASCII), and CR LF sequences, to perform everything that can be performed today plus resurrect the long-lost overstrike capability. I suspect this might actually simplify text-output drivers in most cases: there'd be no need to explicitly parse e.g. LF to do both functions; just "do whatever the character you just received says to do."
(Incidentally, the original-mechanical-typewriter operation we think we're emulating might not have been "CR LF" per se. On the mechanical typewriter I grew up with, the same lever performed both functions but, since more force was required to start the carriage moving than to advance the platen (roller) one line, when you pushed that lever the "LF" action happened first, and then the "CR" action. So I could argue that we've been doing it wrong all this time anyway -- terminating Windows textfile lines with "CR LF" when in at least some cases it should have been "LF CR"!)
[includeIf "gitdir:/mnt/c/src/"]
path = autocrlf.inc
and autocrlf.inc contains
[core]
autocrlf = true
Allow my linuxy things like oh-my-zsh and vim plugins etc. live outside that folder. So now I have autocrlf = false everywhere except /mnt/c/src/.
This has improved my WSL experience substantially.
Comments are closed.
And should not edit.
For editing I have good editors.