Computer things they didn't teach you in school #2 - Code Pages, Character Encoding, Unicode, UTF-8 and the BOM
OK, fine maybe they DID teach you this in class. But, you'd be surprised how many people think they know something but don't know the background or the etymology of a term. I find these things fascinating. In a world of bootcamp graduates, community college attendees (myself included!), and self-taught learners, I think it's fun to explore topics like the ones I plan to cover in my new YouTube Series "Computer things they didn't teach you."
BOOK RECOMMENDATION: I think of this series as being in the same vein as the wonderful "Imposter's Handbook" series from Rob Conery (I was also involved, somewhat). In Rob's excellent words: "Learn core CS concepts that are part of every CS degree by reading a book meant for humans. You already know how to code build things, but when it comes to conversations about Big-O notation, database normalization and binary tree traversal you grow silent. That used to happen to me and I decided to change it because I hated being left out. I studied for 3 years and wrote everything down and the result is this book."
In the first video I covered the concept of Carriage Returns and Line Feeds. But do you know WHY it's called a Carriage Return? What's a carriage? Where did it go? Where is it returning from? Who is feeding it lines?
In this second video I talk about Code Pages, Character Encoding, Unicode, UTF-8 and the BOM. I thought it went very well.
What would you like to hear about next?
Sponsor: Like C#? We do too! That’s why we've developed a fast, smart, cross-platform .NET IDE which gives you even more coding power. Clever code analysis, rich code completion, instant search and navigation, an advanced debugger... With JetBrains Rider, everything you need is at your fingertips. Code C# at the speed of thought on Linux, Mac, or Windows. Try JetBrains Rider today!
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
Same goes for encoding issues. Depending on how people connect to their servers (putty with wrong encoding) and how server-side editors are configured (vi with wrong encoding) you might have to handle multiple layers of bad encoding.
These are the real trechnes I like to throw j.devs in.
Another great video! Thank you!!
Is the BOM always a specific length, does it always start with a certain piece of data?
Thanks,
Matt
This reminds me of an ancient (but awsome) post from joel spolsky which explains it a bit better for me.
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
Ivy [url=https://bjdclub.ru/memberlist.php?mode=viewprofile&u=141627]birkenstock rio 2 strap white sandals[/url] ankle-strap espadrille flats - silk suede - talbots. Beautiful footwear for spring or summer! Brown birkenstock mayari sandals. Teva distinctive common leather sandal, tan american eagle outfitters. Birkenstock mayari shoe in cream. Birkenstock classic z rich - anatomic footbed - great shades new indonesia slides. Grays residences with a really feel will likely be really versatile in my own wardrobe. The increase and rise of the graceful shoemail. Birkenstock absolutely the most relaxed boot for summertime 2015. The easy, breezy design of those sandals makes them the suitable complement to any summer season outfit.
Birkenstock classics are distinctive, each inside their sort and in their function. Birkenstock they re not just sneakers; somewhat the opposite. We've a huge assortment of shut sneakers, widespread classics and contemporary some concepts from our design department. Needless to say, all of them have the birkenstock features that establish our sandals as successfully: [url=http://www.folkd.com/user/allthings550]birkenstock usa arizona[/url] the perfect supplies, the birkenstock footbed and the superior level of ease clients have come to count on, for a number of types of weather and any event. Mother and father know: excellent sneakers are vital for the expansion of healthy youngsters s feet.
This can be a model that is expanded much beyond their preliminary operate and in to a correctly elegant product line. Strappy sneakers set completely with oversize glam sunglasses. Or strive a buckled blue sandal or pair of enormous espresso boots. With these footwear as your bottom, it's easy to progress in to a rose prime or maybe a striped tee relying [url=http://kbforum.dragondoor.com/members/allthings550.html]birkenstock arizona high arch (unisex)[/url] in your mood. The most recent birkenstocks give you all of the function without limitations on style. Mix and fit to your center s materials whereas maintaining the spirit of the strong company alive.
These distribution methods could possibly be picked all through the checkout process. Observe: not absolutely all items are certified for all methods. To routine your provide session, the delivery service can contact you directly. Please make sure that the contact info is suitable at checkout. Perhaps not out there for in-floor hoops and weapon safes. The development course of is finished by our development provider in a separate appointment from the provision appointment. The development service can contact you by [url=https://bjdclub.ru/memberlist.php?mode=viewprofile&u=141627]birkenstock odessa co[/url] telephone within 24 to 48 hours following your purchase to schedule your set up appointment.
Wish they did that "complete shoe" type for adults! [url=https://www.utellstory.com/profile/stories/53777]birkenstock arizona patent[/url] good discover! heat. They're a budget £20 top dollar eva ones. Perhaps not the cork footbed ones that mould to your foot. I believe they have cheapened the mannequin by offering these but nonetheless scorching for what they are. Look excessive priced for some plastic footwear or can or not it's the make that gives it. Many thanks, simply bought a pair of leather teva footwear, £16. 50 c amp;d, with the rule extra. Good charges, should just discover me a decent sock buy, cheers :d. Sorry, ought to examine prime revenue again in there, my bad :.
Search
Sign up
Sign in
Shots
Designers
Teams
Community
Jobs
Hiring Designers?
Popular
All Animation Branding Illustration Mobile Print Product Design Typography Web Design
Filters
Seismic Magnetic Midnight Illustration & Packaging
Seismic Magnetic Midnight Illustration & Packaging
November 20, 2019
Save
Like
Hanshin Tigers
Hanshin Tigers
November 20, 2019
Save
Like
Vacation House UX/UI map
Vacation House UX/UI map
November 20, 2019
Save
Like
Packaging
Packaging
November 20, 2019
Save
Like
Fanhood Sample Typography Templates
Fanhood Sample Typography Templates
November 20, 2019
Save
Like
Method 3.0 Mobile Blocks
Method 3.0 Mobile Blocks
about 3 hours ago
Save
Like
Bright Future
Bright Future
about 2 hours ago
Save
Like
Hooray 20K!
Hooray 20K!
November 20, 2019
Save
Like
Technology Risk Management Platform
Technology Risk Management Platform
about 3 hours ago
Save
Like
ValuBet
ValuBet
November 20, 2019
Save
Like
Whale
Whale
November 20, 2019
Save
Like
Music App Concept
Music App Concept
November 20, 2019
Save
Like
goalkick logo
goalkick logo
November 20, 2019
Save
Like
Payment History - Cards - Transaction • Mobile App
Payment History - Cards - Transaction • Mobile App
November 20, 2019
Save
Like
Organizing Chaos
Organizing Chaos
November 20, 2019
Save
Like
Application UI design
Application UI design
November 20, 2019
Save
Like
Groceries Shopping Mobile App
Groceries Shopping Mobile App
November 20, 2019
Save
Like
Disco
Disco
November 20, 2019
Save
Like
"Protect our Wildlife" AfterHoursATX 2019 Poster
"Protect our Wildlife" AfterHoursATX 2019 Poster
November 20, 2019
Save
Like
Daml Homepage Motion
Daml Homepage Motion
November 20, 2019
Save
Like
Colorplan | Package Design
Colorplan | Package Design
November 20, 2019
Save
Like
Investio Web Dashboard
Investio Web Dashboard
November 20, 2019
Save
Like
Information Architecture Sketch Kit II
Information Architecture Sketch Kit II
November 20, 2019
Save
Like
Landing page - Team Building Conference
Landing page - Team Building Conference
November 20, 2019
Save
Like
Cozy Christmas Night and Little Church
Cozy Christmas Night and Little Church
about 5 hours ago
Save
Like
Fitness Companion Mobile App
Fitness Companion Mobile App
about 2 hours ago
Save
Like
Red Dragon
Red Dragon
November 20, 2019
Save
Like
Mobile app - Tiny.Kingdom
Mobile app - Tiny.Kingdom
40 minutes ago
Save
Like
Resting Girl
Resting Girl
November 20, 2019
Save
Like
Bear
Bear
November 20, 2019
Save
Like
Fashion Website Exploration
Fashion Website Exploration
about 6 hours ago
Save
Like
Lisk website – RWD preview
Lisk website – RWD preview
November 20, 2019
Save
Like
Kulture Athletics - Measure
Kulture Athletics - Measure
November 20, 2019
Save
Like
Future/Fabric®
Future/Fabric®
November 20, 2019
Save
Like
Virtual Reality e-commerce solution #DailyUI #day073
Virtual Reality e-commerce solution #DailyUI #day073
November 20, 2019
Save
Like
kpk - product listing page
kpk - product listing page
about 4 hours ago
Save
Like
Don'tchu talk to me urr ma deputy evurr again
Don'tchu talk to me urr ma deputy evurr again
November 20, 2019
Save
Like
music composer application
music composer application
November 20, 2019
Save
Like
Applied Article, Concept
Applied Article, Concept
November 20, 2019
Save
Like
Cutting vegetables on table top view
Cutting vegetables on table top view
November 20, 2019
Save
Like
Framer is coming to the web
Framer is coming to the web
November 20, 2019
Save
Like
Sorry for the Slow Reply
Sorry for the Slow Reply
November 20, 2019
Save
Like
North Landing Page
North Landing Page
about 5 hours ago
Save
Like
Application development process
Application development process
about 2 hours ago
Save
Like
🔥Custom Emoji Sliders
🔥Custom Emoji Sliders
November 20, 2019
Save
Like
T-shirt design options
T-shirt design options
November 20, 2019
Save
Like
Healthcare iOS App User Interface
Healthcare iOS App User Interface
November 20, 2019
Save
Like
DKNG at Designer Con 2019 (Booth #2219)
DKNG at Designer Con 2019 (Booth #2219)
November 20, 2019
Save
Like
Get 10 Adobe Stock standard assets with your free 30-day trial
Don't want to see ads? Go Pro!
Free Trial
Sign up to continue
or sign in
Icon backtotop
dribbble
Show and tell for designers
What are you working on? Dribbble is a community of designers sharing screenshots of their work, process, and projects.
Icon team dribbble Icon team twitter Icon team facebook Icon team instagram Icon team blog
Dribbble
About
Help
Contact
Careers
Terms
Guidelines
Privacy
Playoffs
Shop
Testimonials
Media Kit
Advertise
API
Apps
Places
Hiring at Dribbble
Post a job
Search designers
Add your team
Directories
Jobs
Tags
Jobs for Designers
37f205f6405274a6b993b719fae6b790
Justworks
Senior Product Designer, Mobile
A1bcb8c26234bbe72225162880bf64f4
MessageBird
Brand Design Lead
A5b634cc645c6fcfe9957b01748c5986
Swimlane
Senior UI/UX Designer
ads via CarbonSeason 2 of Dribbble's Overtime podcast is here! Listen now!
ADS VIA CARBON
8,426,217
shots dribbbled
© 2019 Dribbble. All rights reserved.
Made with ♥ remotely fromSalem, MAWalnut Creek, CAVictoria, BCCentennial, COBournemouth, UKVancouver, BCMontreal, QCRoseville, MNRome, GAPeterborough, NHOakland, CAAustin, TXMystic, CTSaint Charles, MODes Moines, IASalt Lake City, UTLander, WYCote St Luc, QCWomelsdorf, PAMinneapolis, MNHighlands Ranch, COSan Francisco, CASilver Spring, MDLondon, ONPottstown, PAPhoenix, AZSacramento, CAFarmers Branch, TXMarina del Rey, CAMurray, UTOrlando, FLParis, FranceBrookline, MALos Angeles, CASan Rafael, CASan Luis Obispo, CAAtlanta, GATucson, AZ
However, the idea of a BOM is new to me and although it makes sense what I'm wondering is how do you know if the first few bytes of a file are a BOM or just part of the content?
Is the BOM always a specific length, does it always start with a certain piece of data?
The BOM is part of the content, and is always at the very start of the content. You only have to read the first few bytes to know if the BOM is present, and what encoding it indicates.
Scott mentioned UTF-8 encoding, but Unicode also defines some more standard encodings. One is UTF-16, which typically uses 2 bytes per code point but will sometimes use 4 bytes. Another is UTF-32, which always uses 4 bytes per code point. The order of these bytes can change depending on the CPU, which can be big-endian (BE, rare) or little-endian (LE, common, used by x86/x64).
Note that I'm saying "code point" rather than "character". This is because each Unicode value isn't necessarily a single character. Some values are modifiers, such as the accent mark above "a" in the letter "á". Some values aren't shown at all but have a special meaning, such as the zero width space and zero width joiner. "Code point" is the correct name that covers all of these values.
Anyway, back to the BOM. The value of the BOM is always the encoded representation of the code point U+FEFF. It comes out to the following values under the various Unicode encodings:
UTF-8 = EF BB BF
UTF-16-BE = FE FF
UTF-16-LE = FF FE
UTF-32-BE = 00 00 FE FF
UTF-32-LE = FF FE 00 00
This makes it really easy to identify one of the Unicode character encodings. You only need to read the first 4 bytes of the file, and if you find one of the patterns above, you know which encoding to use.
It's a lot better than the pre-Unicode days. There was no standard way to indicate which encoding was used, so you had to guess. It was very easy to guess the wrong encoding and you'd end up with gibberish.
https://en.wikipedia.org/wiki/File_(command)
Also seems weird that when you write out binary you have to write it out backwards to get it to be read correctly forwards. Reading and writing binary might be a good topic to cover in a future video. Keep these videos coming.
generic cost where can i buy viagra name brand viagra online
Comments are closed.
I was tasked to work on a text processing library. This library had to be able to take any random text file from anywhere on the internet, detect which encoding it used, convert it to Unicode, and normalize it (i.e. combine modifier characters where possible and put the rest in a known order). After all of this, the text could then be processed.
I had some knowledge of character encodings before this, but working on this library really opened my eyes. I didn't realise quite how many encodings there were out there, and how widely varied they can be. There's single byte encodings, multi-byte encodings, variable byte length encodings, surrogate characters... Never mind the encodings that are almost identical except for a couple of values (e.g. Windows code page 1252 vs ISO-8859-1).
I'm quite happy that the world is moving towards Unicode (UTF-8 and UTF-16) and getting away from the myriad of crazy encodings out there.