What happens when you type the URL in the browser from start to finish
- HTTP protocol
- I only analyze URL requests
- Things are related to macOS/Linux
- DNS lookup phase
- TCP request handshake
- send request
- Parse HTML
This article describes how the browser uses the HTTP/1.1 protocol to perform page requests
If you have ever been interviewed, you may be asked: "What happens when you type in the Google search box and press Enter".
This is one of your most popular questions. People just want to see if you can explain some fairly basic concepts and if you have any understanding of the actual workings of the Internet.
In this article, I will analyze what happens when you type a URL in the address bar of your browser and press Enter.
Profiling in a blog post is a very interesting topic because it involves many techniques that I can explore in a separate article.
This technology rarely changes, and can power the most complex and extensive ecosystem that humans have built.
First of all, I specifically mentioned HTTPS because it is different from HTTPS connections.
I only analyze URL requests
Modern browsers can know whether the content you write in the address bar is an actual URL or a search term, and if it is not a valid URL, they will use the default search engine.
I assume you type an actual URL.
After entering the URL and pressing Enter, the browser will first construct the complete URL.
If you just entered the domain, for example
flaviocopes.com, By default, the browser will front
HTTP://, The default is HTTP protocol.
Things are related to macOS/Linux
for reference only. Some features of Windows may be slightly different.
DNS lookup phase
Browser launchDNSLook up to get the server IP address.
The domain name is a convenient shortcut for us humans, but the way the Internet is organized allows a computer to find the exact location of the server through its IP address, which is a set of numbers, such as
First, it checks the DNS local cache to see if the domain has been resolved recently.
Chrome browser has a convenient DNS cache visualization tool, you can see it in the following locationchrome: // net-internals/#dns
If nothing can be found here, the browser will use a DNS resolver and use
gethostbynamePOSIX system call to retrieve host information.
gethostbynameFirst look at the local host file, which is located on macOS or Linux
/etc/hostsTo see if the system provides information locally.
If this does not provide any information about the domain, the system will make a request to the DNS server.
The address of the DNS server is stored in the system preferences.
These are 2 popular DNS servers:
18.104.22.168: Google public DNS server
22.214.171.124: CloudFlare DNS server
Most people use the DNS server provided by their internet provider.
The browser uses the UDP protocol to perform DNS requests.
TCP and UDP are the two basic protocols of computer networks. They are at the same conceptual level, but TCP is connection-oriented, while UDP is a connectionless protocol, which is lighter and uses less overhead for sending messages.
The execution of UDP requests is outside the scope of this tutorial
The DNS server may have a domain IP in the cache. If not, it will askRoot DNS server. That is a system that drives the entire Internet (consisting of 13 actual servers distributed on the earth).
DNS server canIs notKnow the address of every domain name on the planet.
All it knows isTop DNS resolverYes.
The top-level domain is a domain extension:
.pizzaand many more.
After the root DNS server receives the request, it will forward the request to the top-level domain (TLD) DNS server.
Say you are looking for
flaviocopes.com. The root domain DNS server returns the IP of the .com TLD server.
Now, our DNS resolver will cache the IP of the TLD server, so it does not have to ask the root DNS server again.
The TLD DNS server will have the IP address of the authoritative name server for the domain we are looking for.
how is it? When you purchase a domain name, the domain registrar will send a name server to the corresponding TDL. When you update your name servers (for example, when you change your hosting provider), this information will be automatically updated by your domain registrar.
These are the DNS servers of the hosting service provider. They are usually greater than 1 to be used as a backup.
The DNS resolver starts with the first one, and then tries to ask for the IP (also including subdomains) of the domain you are looking for.
That is the ultimate source of IP address authenticity.
Now that we have an IP address, we can move on.
TCP request handshake
With the server's IP address available, the browser can now initiate a TCP connection with it.
Before the TCP connection is fully initialized, you need to do some handshake before you can start sending data.
After the connection is established, we can send a request
The request is a plain text document constructed in a precise manner determined by the communication protocol.
It consists of 3 parts:
- Request line
- Request header
- Request body
The request line is set to one line:
- HTTP method
- Resource location
- Protocol version
GET / HTTP/1.1
Request header is a set
field: valueSet certain values to.
There are 2 required fields, one of which is
Host,the other is
Connection, While all other fields are optional:
Host: flaviocopes.com Connection: close
HostRepresents the domain name we want to target, and
ConnectionAlways set to
closeUnless you must keep the connection disconnected.
Some of the most commonly used header fields are:
But there is more.
The header part is terminated by a blank line.
The request body is optional and not used in GET requests, but sometimes even in other verbs in POST requests. It can contain the following:JSON formatformat.
Since we are currently analyzing the GET request, the body is blank, so we won't repeat it.
After sending the request, the server will process it and send back a response.
The response starts with a status code and status message. If the request is successful and returns 200, it will start with the following:
The request may return different status codes and messages, such as one of the following:
404 Not Found 403 Forbidden 301 Moved Permanently 500 Internal Server Error 304 Not Modified 401 Unauthorized
Then the response contains a list of HTTP headers and response body (since we are making the request in the browser, it will be HTML)
Now, the browser has received the HTML and started to parse it, and will repeat the exact same process as we did for all the resources required by the page:
- CSS file
- Website icon
Then, how the browser renders the page is out of scope, but it is important to understand that the process I describe is not only for HTML pages, but for any items served via HTTP.
More web tutorials:
- Introduction to WebSockets
- How HTTP requests work
- List of HTTP request headers
- List of HTTP response headers
- HTTP and HTTPS
- What is RFC?
- HTTP protocol
- HTTPS protocol
- The curl guide for HTTP requests
- HTTP cache
- List of HTTP status codes
- What is CDN?
- HTTP/2 protocol
- What is the port
- DNS, Domain Name System
- TCP protocol
- UDP protocol
- Introduction to REST API
- How to install a local SSL certificate in macOS
- How to generate a local SSL certificate
- How to configure Nginx for HTTPS
- A simple Nginx reverse proxy for serving multiple Node.js applications from subfolders
- What is a reverse proxy?