JavaScriptでUnicodeを操作する方法、絵文字が何でできているか、ES6の改善、JSでUnicodeを処理する際のいくつかの落とし穴について学びます。
- ソースファイルのUnicodeエンコーディング
- JavaScriptが内部でUnicodeを使用する方法
- 文字列でのUnicodeの使用
- 正規化
- 絵文字
- 文字列の適切な長さを取得します
- ES6Unicodeコードポイントエスケープ
- ASCII文字のエンコード
ソースファイルのUnicodeエンコーディング
特に指定がない限り、ブラウザはプログラムのソースコードがローカル文字セットで記述されていると想定します。これは国によって異なり、予期しない問題が発生する可能性があります。このため、JavaScriptドキュメントの文字セットを設定することが重要です。
別のエンコーディング、特にWeb上で最も一般的なファイルエンコーディングであるUTF-8をどのように指定しますか?
ファイルにが含まれている場合BOM文字、それはエンコーディングを決定することを優先します。オンラインでさまざまな意見を読むことができます。UTF-8のBOMは推奨されないと言う人もいれば、追加しない編集者もいます。
これは何ですかUnicode標準は言う:
…UTF-8ではBOMの使用は必須でも推奨でもありませんが、UTF-8データがBOMを使用する他のエンコード形式から変換される場合、またはBOMがUTF-8署名として使用される場合に発生する可能性があります。
これはW3Cが言うことです:
HTML5では、ブラウザーはUTF-8 BOMを認識し、それを使用してページのエンコードを検出する必要があります。最近のバージョンの主要なブラウザーは、UTF-8エンコードされたページに使用すると期待どおりにBOMを処理します。 –https://www.w3.org/International/questions/qa-byte-order-mark
ファイルがHTTP(またはHTTPS)を使用してフェッチされる場合、Content-Typeヘッダーエンコーディングを指定できます:
Content-Type: application/javascript; charset=utf-8If this is not set, the fallback is to check the charset
attribute of the script
tag:
<script src="./app.js" charset="utf-8">
If this is not set, the document charset meta tag is used:
...
<head>
<meta charset="utf-8" />
</head>
...
The charset attribute in both cases is case insensitive (see the spec)
All this is defined in RFC 4329 “Scripting Media Types”.
Public libraries should generally avoid using characters outside the ASCII set in their code, to avoid it being loaded by users with an encoding that is different than their original one, and thus create issues.
How JavaScript uses Unicode internally
While a JavaScript source file can have any kind of encoding, JavaScript will then convert it internally to UTF-16 before executing it.
JavaScript strings are all UTF-16 sequences, as the ECMAScript standard says:
When a String contains actual textual data, each element is considered to be a single UTF-16 code unit.
Using Unicode in a string
A unicode sequence can be added inside any string using the format \uXXXX
:
const s1 = '\u00E9' //é
A sequence can be created by combining two unicode sequences:
const s2 = '\u0065\u0301' //é
Notice that while both generate an accented e, they are two different strings, and s2 is considered to be 2 characters long:
s1.length //1
s2.length //2
And when you try to select that character in a text editor, you need to go through it 2 times, as the first time you press the arrow key to select it, it just selects half element.
You can write a string combining a unicode character with a plain char, as internally it’s actually the same thing:
const s3 = 'e\u0301' //é
s3.length === 2 //true
s2 === s3 //true
s1 !== s3 //true
Normalization
Unicode normalization is the process of removing ambiguities in how a character can be represented, to aid in comparing strings, for example.
Like in the example above:
const s1 = '\u00E9' //é
const s3 = 'e\u0301' //é
s1 !== s3
ES6/ES2015 introduced the normalize() method on the String prototype, so we can do:
s1.normalize() === s3.normalize() //true
Emojis
Emojis are fun, and they are Unicode characters, and as such they are perfectly valid to be used in strings:
const s4 = '🐶'
Emojis are part of the astral planes, outside of the first Basic Multilingual Plane (BMP), and since those points outside BMP cannot be represented in 16 bits, JavaScript needs to use a combination of 2 characters to represent them
The 🐶 symbol, which is U+1F436
, is traditionally encoded as \uD83D\uDC36
(called surrogate pair). There is a formula to calculate this, but it’s a rather advanced topic.
Some emojis are also created by combining together other emojis. You can find those by looking at this list https://unicode.org/emoji/charts/full-emoji-list.html and notice the ones that have more than one item in the unicode symbol column.
👩❤️👩 is created combining 👩 (\uD83D\uDC69
), ❤️ (\u200D\u2764\uFE0F\u200D
) and another 👩 (\uD83D\uDC69
) in a single string: \uD83D\uDC69\u200D\u2764\uFE0F\u200D\uD83D\uDC69
There is no way to make this emoji be counted as 1 character.
Get the proper length of a string
If you try to perform
'👩❤️👩'.length
You’ll get 8 in return, as length counts the single Unicode code points.
Also, iterating over it is kind of funny:

And curiously, pasting this emoji in a password field it’s counted 8 times, possibly making it a valid password in some systems.
How to get the “real” length of a string containing unicode characters?
One easy way in ES6+ is to use the spread operator:
;[...'🐶'].length //1
You can also use the Punycode library by Mathias Bynens:
require('punycode').ucs2.decode('🐶').length //1
(Punycode is also great to convert Unicode to ASCII)
Note that emojis that are built by combining other emojis will still give a bad count:
require('punycode').ucs2.decode('👩❤️👩').length //6
[...'👩❤️👩'].length //6
If the string has combining marks however, this still will not give the right count. Check this Glitch https://glitch.com/edit/#!/node-unicode-ignore-marks-in-length as an example.
(you can generate your own weird text with marks here: https://lingojam.com/WeirdTextGenerator)
Length is not the only thing to pay attention. Also reversing a string is error prone if not handled correctly.
ES6 Unicode code point escapes
ES6/ES2015 introduced a way to represent Unicode points in the astral planes (any Unicode code point requiring more than 4 chars), by wrapping the code in graph parentheses:
'\u{XXXXX}'
The dog 🐶 symbol, which is U+1F436
, can be represented as \u{1F436}
instead of having to combine two unrelated Unicode code points, like we showed before: \uD83D\uDC36
.
But length
calculation still does not work correctly, because internally it’s converted to the surrogate pair shown above.
Encoding ASCII chars
The first 128 characters can be encoded using the special escaping character \x
, which only accepts 2 characters:
'\x61' // a
'\x2A' // *
This will only work from \x00
to \xFF
, which is the set of ASCII characters.
Download my free JavaScript Beginner's Handbook
More js tutorials:
- Things to avoid in JavaScript (the bad parts)
- Deferreds and Promises in JavaScript (+ Ember.js example)
- How to upload files to the server using JavaScript
- JavaScript Coding Style
- An introduction to JavaScript Arrays
- Introduction to the JavaScript Programming Language
- The Complete ECMAScript 2015-2019 Guide
- Understanding JavaScript Promises
- The Lexical Structure of JavaScript
- JavaScript Types
- JavaScript Variables
- A list of sample Web App Ideas
- An introduction to Functional Programming with JavaScript
- Modern Asynchronous JavaScript with Async and Await
- JavaScript Loops and Scope
- The Map JavaScript Data Structure
- The Set JavaScript Data Structure
- A guide to JavaScript Template Literals
- Roadmap to Learn JavaScript
- JavaScript Expressions
- Discover JavaScript Timers
- JavaScript Events Explained
- JavaScript Loops
- Write JavaScript loops using map, filter, reduce and find
- The JavaScript Event Loop
- JavaScript Functions
- The JavaScript Glossary
- JavaScript Closures explained
- A tutorial to JavaScript Arrow Functions
- A guide to JavaScript Regular Expressions
- How to check if a string contains a substring in JavaScript
- How to remove an item from an Array in JavaScript
- How to deep clone a JavaScript object
- Introduction to Unicode and UTF-8
- Unicode in JavaScript
- How to uppercase the first letter of a string in JavaScript
- How to format a number as a currency value in JavaScript
- How to convert a string to a number in JavaScript
- this in JavaScript
- How to get the current timestamp in JavaScript
- JavaScript Strict Mode
- JavaScript Immediately-invoked Function Expressions (IIFE)
- How to redirect to another web page using JavaScript
- How to remove a property from a JavaScript object
- How to append an item to an array in JavaScript
- How to check if a JavaScript object property is undefined
- Introduction to ES Modules
- Introduction to CommonJS
- JavaScript Asynchronous Programming and Callbacks
- How to replace all occurrences of a string in JavaScript
- A quick reference guide to Modern JavaScript Syntax
- How to trim the leading zero in a number in JavaScript
- How to inspect a JavaScript object
- The definitive guide to JavaScript Dates
- A Moment.js tutorial
- Semicolons in JavaScript
- The JavaScript Arithmetic operators
- The JavaScript Math object
- Generate random and unique strings in JavaScript
- How to make your JavaScript functions sleep
- JavaScript Prototypal Inheritance
- JavaScript Exceptions
- How to use JavaScript Classes
- The JavaScript Cookbook
- Quotes in JavaScript
- How to validate an email address in JavaScript
- How to get the unique properties of a set of objects in a JavaScript array
- How to check if a string starts with another in JavaScript
- How to create a multiline string in JavaScript
- The ES6 Guide
- How to get the current URL in JavaScript
- The ES2016 Guide
- How to initialize a new array with values in JavaScript
- The ES2017 Guide
- The ES2018 Guide
- How to use Async and Await with Array.prototype.map()
- Async vs sync code
- How to generate a random number between two numbers in JavaScript
- HTML Canvas API Tutorial
- How to get the index of an iteration in a for-of loop in JavaScript
- What is a Single Page Application?
- An introduction to WebAssembly
- Introduction to JSON
- The JSONP Guide
- Should you use or learn jQuery in 2020?
- How to hide a DOM element using plain JavaScript
- How to merge two objects in JavaScript
- How to empty a JavaScript array
- How to encode a URL with JavaScript
- How to set default parameter values in JavaScript
- How to sort an array of objects by a property value in JavaScript
- How to count the number of properties in a JavaScript object
- call() and apply() in JavaScript
- Introduction to PeerJS, the WebRTC library
- Work with objects and arrays using Rest and Spread
- Destructuring Objects and Arrays in JavaScript
- The definitive guide to debugging JavaScript
- The TypeScript Guide
- Dynamically select a method of an object in JavaScript
- Passing undefined to JavaScript Immediately-invoked Function Expressions
- Loosely typed vs strongly typed languages
- How to style DOM elements using JavaScript
- Casting in JavaScript
- JavaScript Generators Tutorial
- The node_modules folder size is not a problem. It's a privilege
- How to solve the unexpected identifier error when importing modules in JavaScript
- How to list all methods of an object in JavaScript
- The String replace() method
- The String search() method
- How I run little JavaScript snippets
- The ES2019 Guide
- The String charAt() method
- The String charCodeAt() method
- The String codePointAt() method
- The String concat() method
- The String endsWith() method
- The String includes() method
- The String indexOf() method
- The String lastIndexOf() method
- The String localeCompare() method
- The String match() method
- The String normalize() method
- The String padEnd() method
- The String padStart() method
- The String repeat() method
- The String slice() method
- The String split() method
- The String startsWith() method
- The String substring() method
- The String toLocaleLowerCase() method
- The String toLocaleUpperCase() method
- The String toLowerCase() method
- The String toString() method
- The String toUpperCase() method
- The String trim() method
- The String trimEnd() method
- The String trimStart() method
- Memoization in JavaScript
- The String valueOf() method
- JavaScript Reference: String
- The Number isInteger() method
- The Number isNaN() method
- The Number isSafeInteger() method
- The Number parseFloat() method
- The Number parseInt() method
- The Number toString() method
- The Number valueOf() method
- The Number toPrecision() method
- The Number toExponential() method
- The Number toLocaleString() method
- The Number toFixed() method
- The Number isFinite() method
- JavaScript Reference: Number
- JavaScript Property Descriptors
- The Object assign() method
- The Object create() method
- The Object defineProperties() method
- The Object defineProperty() method
- The Object entries() method
- The Object freeze() method
- The Object getOwnPropertyDescriptor() method
- The Object getOwnPropertyDescriptors() method
- The Object getOwnPropertyNames() method
- The Object getOwnPropertySymbols() method
- The Object getPrototypeOf() method
- The Object is() method
- The Object isExtensible() method
- The Object isFrozen() method
- The Object isSealed() method
- The Object keys() method
- The Object preventExtensions() method
- The Object seal() method
- The Object setPrototypeOf() method
- The Object values() method
- The Object hasOwnProperty() method
- The Object isPrototypeOf() method
- The Object propertyIsEnumerable() method
- The Object toLocaleString() method
- The Object toString() method
- The Object valueOf() method
- JavaScript Reference: Object
- JavaScript Assignment Operator
- JavaScript Internationalization
- JavaScript typeof Operator
- JavaScript new Operator
- JavaScript Comparison Operators
- JavaScript Operators Precedence Rules
- JavaScript instanceof Operator
- JavaScript Statements
- JavaScript Scope
- JavaScript Type Conversions (casting)
- JavaScript Equality Operators
- The JavaScript if/else conditional
- The JavaScript Switch Conditional
- The JavaScript delete Operator
- JavaScript Function Parameters
- The JavaScript Spread Operator
- JavaScript Return Values
- JavaScript Logical Operators
- JavaScript Ternary Operator
- JavaScript Recursion
- JavaScript Object Properties
- JavaScript Error Objects
- The JavaScript Global Object
- The JavaScript filter() Function
- The JavaScript map() Function
- The JavaScript reduce() Function
- The JavaScript `in` operator
- JavaScript Operators
- How to get the value of a CSS property in JavaScript
- How to add an event listener to multiple elements in JavaScript
- JavaScript Private Class Fields
- How to sort an array by date value in JavaScript
- JavaScript Public Class Fields
- JavaScript Symbols
- How to use the JavaScript bcrypt library
- How to rename fields when using object destructuring
- How to check types in JavaScript without using TypeScript
- How to check if a JavaScript array contains a specific value
- What does the double negation operator !! do in JavaScript?
- Which equal operator should be used in JavaScript comparisons? == vs ===
- Is JavaScript still worth learning?
- How to return the result of an asynchronous function in JavaScript
- How to check if an object is empty in JavaScript
- How to break out of a for loop in JavaScript
- How to add item to an array at a specific index in JavaScript
- Why you should not modify a JavaScript object prototype
- What's the difference between using let and var in JavaScript?
- Links used to activate JavaScript functions
- How to join two strings in JavaScript
- How to join two arrays in JavaScript
- How to check if a JavaScript value is an array?
- How to get last element of an array in JavaScript?
- How to send urlencoded data using Axios
- How to get tomorrow's date using JavaScript
- How to get yesterday's date using JavaScript
- How to get the month name from a JavaScript date
- How to check if two dates are the same day in JavaScript
- How to check if a date refers to a day in the past in JavaScript
- JavaScript labeled statements
- How to wait for 2 or more promises to resolve in JavaScript
- How to get the days between 2 dates in JavaScript
- How to upload a file using Fetch
- How to format a date in JavaScript
- How to iterate over object properties in JavaScript
- How to calculate the number of days between 2 dates in JavaScript
- How to use top-level await in ES Modules
- JavaScript Dynamic Imports
- JavaScript Optional Chaining
- How to replace white space inside a string in JavaScript
- JavaScript Nullish Coalescing
- How to flatten an array in JavaScript
- This decade in JavaScript
- How to send the authorization header using Axios
- List of keywords and reserved words in JavaScript
- How to convert an Array to a String in JavaScript
- How to remove all the node_modules folders content
- How to remove duplicates from a JavaScript array
- Let vs Const in JavaScript
- The same POST API call in various JavaScript libraries
- How to get the first n items in an array in JS
- How to divide an array in multiple equal parts in JS
- How to slow down a loop in JavaScript
- How to load an image in an HTML canvas
- How to cut a string into words in JavaScript
- How to divide an array in half in JavaScript
- How to write text into to an HTML canvas
- How to remove the last character of a string in JavaScript
- How to remove the first character of a string in JavaScript
- How to fix the TypeError: Cannot assign to read only property 'exports' of object '#<Object>' error
- How to create an exit intent popup
- How to check if an element is a descendant of another
- How to force credentials to every Axios request
- How to solve the "is not a function" error in JavaScript
- Gatsby, how to change the favicon
- Loading an external JS file using Gatsby
- How to detect dark mode using JavaScript
- Parcel, how to fix the `regeneratorRuntime is not defined` error
- How to detect if an Adblocker is being used with JavaScript
- Object destructuring with types in TypeScript
- The Deno Handbook: a concise introduction to Deno 🦕
- How to get the last segment of a path or URL using JavaScript
- How to shuffle elements in a JavaScript array
- How to check if a key exists in a JavaScript object
- Event bubbling and event capturing
- event.stopPropagation vs event.preventDefault() vs. return false in DOM events
- Primitive types vs objects in JavaScript
- How can you tell what type a value is, in JavaScript?
- How to return multiple values from a function in JavaScript
- Arrow functions vs regular functions in JavaScript
- In which ways can we access the value of a property of an object?
- What is the difference between null and undefined in JavaScript?
- What's the difference between a method and a function?
- What are the ways we can break out of a loop in JavaScript?
- The JavaScript for..of loop
- What is object destructuring in JavaScript?
- What is hoisting in JavaScript?
- How to change commas into dots with JavaScript
- The importance of timing when working with the DOM
- How to reverse a JavaScript array
- How to check if a value is a number in JavaScript
- How to accept unlimited parameters in a JavaScript function
- JavaScript Proxy Objects
- Event delegation in the browser using vanilla JavaScript
- The JavaScript super keyword
- Introduction to XState
- Are values passed by reference or by value in JavaScript?
- Custom events in JavaScript
- Custom errors in JavaScript
- Namespaces in JavaScript
- A curious usage of commas in JavaScript
- Chaining method calls in JavaScript
- How to handle promise rejections
- How to swap two array elements in JavaScript
- How I fixed a "cb.apply is not a function" error while using Gitbook
- How to add an item at the beginning of an array in JavaScript
- Gatsby, fix the "cannot find module gatsby-cli/lib/reporter" error
- How to get the index of an item in a JavaScript array
- How to test for an empty object in JavaScript
- How to destructure an object to existing variables in JavaScript
- The Array JavaScript Data Structure
- The Stack JavaScript Data Structure
- JavaScript Data Structures: Queue
- JavaScript Data Structures: Set
- JavaScript Data Structures: Dictionaries
- JavaScript Data Structures: Linked lists
- JavaScript, how to export a function
- JavaScript, how to export multiple functions
- JavaScript, how to exit a function
- JavaScript, how to find a character in a string
- JavaScript, how to filter an array
- JavaScript, how to extend a class
- JavaScript, how to find duplicates in an array
- JavaScript, how to replace an item of an array
- JavaScript Algorithms: Linear Search
- JavaScript Algorithms: Binary Search
- JavaScript Algorithms: Selection Sort
- JavaScript Algorithms: Quicksort
- JavaScript Algorithms: Merge Sort
- JavaScript Algorithms: Bubble Sort