How HTTP requests work

What happens when you type the URL in the browser from start to finish

This article describes how the browser uses the HTTP/1.1 protocol to perform page requests

If you have ever been interviewed, you may be asked: "What happens when you type in the Google search box and press Enter".

This is one of your most popular questions. People just want to see if you can explain some fairly basic concepts and if you have any understanding of the actual workings of the Internet.

In this article, I will analyze what happens when you type a URL in the address bar of your browser and press Enter.

Profiling in a blog post is a very interesting topic because it involves many techniques that I can explore in a separate article.

This technology rarely changes, and can power the most complex and extensive ecosystem that humans have built.

HTTP protocol

First of all, I specifically mentioned HTTPS because it is different from HTTPS connections.

I only analyze URL requests

Modern browsers can know whether the content you write in the address bar is an actual URL or a search term, and if it is not a valid URL, they will use the default search engine.

I assume you type an actual URL.

After entering the URL and pressing Enter, the browser will first construct the complete URL.

If you just entered the domain, for exampleflaviocopes.com, By default, the browser will frontHTTP://, The default is HTTP protocol.

Things are related to macOS/Linux

for reference only. Some features of Windows may be slightly different.

DNS lookup phase

Browser launchDNSLook up to get the server IP address.

The domain name is a convenient shortcut for us humans, but the way the Internet is organized allows a computer to find the exact location of the server through its IP address, which is a set of numbers, such as222.324.3.1(IPv4).

First, it checks the DNS local cache to see if the domain has been resolved recently.

Chrome browser has a convenient DNS cache visualization tool, you can see it in the following locationchrome: // net-internals/#dns

If nothing can be found here, the browser will use a DNS resolver and usegethostbynamePOSIX system call to retrieve host information.

gethostbyname

gethostbynameFirst look at the local host file, which is located on macOS or Linux/etc/hostsTo see if the system provides information locally.

If this does not provide any information about the domain, the system will make a request to the DNS server.

The address of the DNS server is stored in the system preferences.

These are 2 popular DNS servers:

  • 8.8.8.8: Google public DNS server
  • 1.1.1.1: CloudFlare DNS server

Most people use the DNS server provided by their internet provider.

The browser uses the UDP protocol to perform DNS requests.

TCP and UDP are the two basic protocols of computer networks. They are at the same conceptual level, but TCP is connection-oriented, while UDP is a connectionless protocol, which is lighter and uses less overhead for sending messages.

The execution of UDP requests is outside the scope of this tutorial

The DNS server may have a domain IP in the cache. If not, it will askRoot DNS server. That is a system that drives the entire Internet (consisting of 13 actual servers distributed on the earth).

DNS server canIs notKnow the address of every domain name on the planet.

All it knows isTop DNS resolverYes.

The top-level domain is a domain extension:.com,.it,.pizzaand many more.

After the root DNS server receives the request, it will forward the request to the top-level domain (TLD) DNS server.

Say you are looking forflaviocopes.com. The root domain DNS server returns the IP of the .com TLD server.

Now, our DNS resolver will cache the IP of the TLD server, so it does not have to ask the root DNS server again.

The TLD DNS server will have the IP address of the authoritative name server for the domain we are looking for.

how is it? When you purchase a domain name, the domain registrar will send a name server to the corresponding TDL. When you update your name servers (for example, when you change your hosting provider), this information will be automatically updated by your domain registrar.

These are the DNS servers of the hosting service provider. They are usually greater than 1 to be used as a backup.

E.g:

  • ns1.dreamhost.com
  • ns2.dreamhost.com
  • ns3.dreamhost.com

The DNS resolver starts with the first one, and then tries to ask for the IP (also including subdomains) of the domain you are looking for.

That is the ultimate source of IP address authenticity.

Now that we have an IP address, we can move on.

TCP request handshake

With the server's IP address available, the browser can now initiate a TCP connection with it.

Before the TCP connection is fully initialized, you need to do some handshake before you can start sending data.

After the connection is established, we can send a request

send request

The request is a plain text document constructed in a precise manner determined by the communication protocol.

It consists of 3 parts:

  • Request line
  • Request header
  • Request body

Request line

The request line is set to one line:

  • HTTP method
  • Resource location
  • Protocol version

example:

GET / HTTP/1.1

Request header

Request header is a setfield: valueSet certain values to.

There are 2 required fields, one of which isHost,the other isConnection, While all other fields are optional:

Host: flaviocopes.com
Connection: close

HostRepresents the domain name we want to target, andConnectionAlways set tocloseUnless you must keep the connection disconnected.

Some of the most commonly used header fields are:

  • Origin
  • Accept
  • Accept-Encoding
  • Cookie
  • Cache-Control
  • Dnt

But there is more.

The header part is terminated by a blank line.

Request body

The request body is optional and not used in GET requests, but sometimes even in other verbs in POST requests. It can contain the following:JSON formatformat.

Since we are currently analyzing the GET request, the body is blank, so we won't repeat it.

response

After sending the request, the server will process it and send back a response.

The response starts with a status code and status message. If the request is successful and returns 200, it will start with the following:

200 OK

The request may return different status codes and messages, such as one of the following:

404 Not Found
403 Forbidden
301 Moved Permanently
500 Internal Server Error
304 Not Modified
401 Unauthorized

Then the response contains a list of HTTP headers and response body (since we are making the request in the browser, it will be HTML)

Parse HTML

Now, the browser has received the HTML and started to parse it, and will repeat the exact same process as we did for all the resources required by the page:

  • CSS file
  • image
  • Website icon
  • JavaScript file

Then, how the browser renders the page is out of scope, but it is important to understand that the process I describe is not only for HTML pages, but for any items served via HTTP.


More web tutorials: