Crio Bytes - How browsers serve URL requests | Crio.Do | Project-Based Learning Platform for Developers

Background

Typical Interview

image alt text

Does the above interview sound familiar? Typically "What happens when you type google.com" is a wide open question where a wide variety of concepts can be touched upon.

The discussion can touch upon:

HTTP vs HTTPS and their respective port numbers
HTTP redirects, responses and status codes
HTTP methods and RESTful web services
DNS, load balancing
and more..

The more detailed your answer => the better your interview performance. The question is: how do you prepare to answer this question in detail?

Let’s find out - the Crio way.


I hear and I forget

I see and I remember

I do and I understand

Primary goals

At Crio, you learn by doing. In this spirit, let’s start this activity to explore what goes on behind the scenes when web requests are served.

Note

This activity requires a laptop for optimal learning. If you are on your mobile device, you are missing out :)

Background

Typical Interview

image alt text

Does the above interview sound familiar? Typically "What happens when you type google.com" is a wide open question where a wide variety of concepts can be touched upon.

The discussion can touch upon:

HTTP vs HTTPS and their respective port numbers
HTTP redirects, responses and status codes
HTTP methods and RESTful web services
DNS, load balancing
and more..

The more detailed your answer => the better your interview performance. The question is: how do you prepare to answer this question in detail?

Let’s find out - the Crio way.


I hear and I forget

I see and I remember

I do and I understand

Primary goals

At Crio, you learn by doing. In this spirit, let’s start this activity to explore what goes on behind the scenes when web requests are served.

Note

This activity requires a laptop for optimal learning. If you are on your mobile device, you are missing out :)

Getting started with Chrome Developer Tools

Let us use Chrome Developer Tools to get an inside view of all the network requests made by the browser when you visit google.com. The following video should help you get a quick preview of what you will be doing in this milestone!

Open Chrome Developer Tools

Open Developer Tools in Google Chrome using the keyboard shortcut for your operating system.

Windows/Linux: CTRL+SHIFT+I (or F12)
Mac: option + command + i

image alt text

Switch to the Network tab

In Developer Tools, switch to the Network tab as shown below to inspect all network requests made by Google Chrome.
In the filter section, make sure All is selected.

image alt text

We are now ready for some HTTP action!

Type http://google.com in the browser navigation bar and hit ENTER.
And, voila! Google’s home page loads on the left side.
In the Developer Tools window, you can see all the network requests that were made to serve Google’s home page. (see screenshot below for a sample, each row represents a network request)

image alt text

Initial observations on loading 'http://google.com'

Before we start digging into the details of each network request, here are a few quick questions for you. Try to answer them by observing Developer Tools on your browser.

How many HTTP requests were made to serve Google’s home page?
What is the size of Google’s home page?
1. How much of this was rendered from cache?
2. And how much was transferred over the network?
How much time did it take to load Google’s homepage?

The answers to these questions are in the status bar at the bottom of the Developer Tools Window.

image alt text

40 HTTP requests were made to serve Google’s homepage. This is not a constant number. It might have taken a few additional (or fewer) requests on your laptop.
The size of a page is the sum total of the size of all resources required to render to completely.
1. 2.1 MB of resources were loaded from cache.
2. 68.3 kB was transferred over the network. These will primarily include resources that are not cacheable or those that the web page has explicitly asked the browser not to cache.
It took 397 ms to load Google’s home page. This includes the time it took to download all the necessary resources required to render the page. The basic structure of the web page - the Document Object Model (read more about DOM) was loaded first (which took 249 ms), and the remaining resources (including Javascript libraries, images and other resources) took the remaining time (397 - 249 = 148 ms).

Curious Cats

If the speed of your internet connection doubled, which of the above answers would change?
If you accessed Google’s home page from an incognito window, which of the above answers would change? Will the values go up or down? Try it out and compare the values in each case.
How does the browser know what resources to cache?
Assume that the logo image of a website was changed after it was cached by your browser. How would the browser know to fetch the new logo image and not its cached version?
How does the browser know which resources to cache and which ones to fetch afresh when you visit the same web page multiple times? Can a web page explicitly tell the browser what parts are cacheable?

Resolving the URL

We have taken a cursory look at some of the details available inside Developer Tools when network requests are made. Let’s inspect the network requests now to understand what goes on behind the scenes.

image alt text

Click on the first network request and try to answer the following questions:

When you typed http://google.com in the browser, what Request URL did it translate to?
Is this a secure request? Or an insecure request?

You will notice the following upon inspecting the Headers tab in Developer Tools:

The browser automatically adds http:// (or http://www.) to the request URL if you leave it out. It assumes that all web requests are http requests by default. Hence, google.com => http://google.com.
This is an insecure request because it uses http and not https.

Underneath HTTP lies TCP

HTTP is an application (layer-7) protocol. HTTP requests are made over TCP connections (at the transport layer). So, in order to make an HTTP request, you must first establish a TCP connection.

Q. What information must you know to establish a TCP connection between two hosts?

A. Their IP addresses and port numbers (or at least the IP address and port number of the destination host).

In our case, to access Google’s home page, we must make an HTTP request to Google’s backend web server. And for that, you must know it’s IP address.

Q. And how do you get the IP address of google.com?

A. DNS, of course.

Stepping into DNS

DNS (Domain Name System) is used to convert domain names (eg: google.com) to their respective IP addresses. You can use several tools to find the IP address of any domain: e.g. dig, ping.

Let’s try to do this now. Here is an online version of the dig tool hosted by Google:

https://toolbox.googleapps.com/apps/dig/

Use the above tool to find the IP address of google.com. Right now. Go ahead. Below is a sample response.

image alt text

Curious Cats

Try accessing Google’s home page by directly using the IP address. Here’s a sample URL you could use based on the screenshot above:


http://142.250.9.101

Do web requests made using IP addresses directly take less time to load?
Try both variants in Chrome and measure the time difference between them.

Analyzing the DNS response

Here are some questions based on the dig results:

A list of IP addresses of google.com were returned (instead of just one). Why? Why not just one?
Did you also get the same list of IP addresses? Or are they different?
Are these IPv4 or IPv6 addresses?
What does the 299 refer to?

Discuss the above questions with a friend and compare your answers.

For (1) and (2) above, read this interesting and relevant discussion on Stackoverflow.
(3) These are all IPv4 addresses. (read more)
(4) 299 is the TTL. (Deep dive: What is DNS TTL?
Want to see a sample IPv6 address? Here is an example: 2001:0db8:85a3:0000:0000:8a2e:0370:7334. Click the AAAA button on the toolbox to see Google’s IPv6 addresses.

Curious Cats

Why does google.com have both IPv4 and IPv6 addresses?
How does the browser divide whether it should use IPv4 of IPv6 addresses to communicate with google.com? Does it use one of these always?
Read more about how DNS works

Summary

The domain name is first resolved to fetch the corresponding IP address (which is required to make the HTTP/TCP connection, remember?).
Depending on how popular the website is, a DNS load balancer will return different IP addresses from a pool of IP addresses that have been assigned to the domain. The load balancing decision could be based on a combination of geolocation, server load, and other details. (Read more about DNS load balancing)
Each IP address returned in the DNS response will take you to a different backend web server, hence enabling services like Google to serve millions of user requests simultaneously through multiple backend servers.

Back to Developer Tools

Q. What IP address was used to establish a connection with google.com?

Let us review the screenshot again.

image alt text

Remote Address: 216.58.200.142:80 refers to the IP address of Google’s server we are connecting to.

Curious Cats

Do you get the same IP address results when you run dig google.com command from the Linux or Windows terminal?

TCP port numbers

But what is :80 doing after the end of the Remote Address?

80 is a special TCP port. It is reserved for use by the HTTP server to accept incoming requests. You can try to access google.com as google.com:80. The 80 is redundant here because, even if you leave it out, the browser uses it as the default port for all http:// requests.

As you will see later, if you access a website using https, the default port number is 443. For instance, https://google.com is the same as https://google.com:443. TCP port 443 is reserved for use by the HTTPS server to accept incoming requests. It is used implicitly for HTTPS even if we don’t specify it in the URL.

With this, the browser has all the info it needs to make an HTTP request to google.com. We will cover this in the next section.

Curious Cats

Is it really a TCP connection underneath the HTTP request? How do you confirm this?
Hint: have you used wireshark before? Use it to snoop on packets exchanged when you access google.com (sample below)

image alt text

Making HTTP requests

The browser made a DNS request to resolve google.com to a relevant IP address. It has used this information to create a TCP connection to Google’s backend server. It is now time to make the HTTP request using this TCP connection.

image alt text

As you can see in the screenshot, a GET request was made from the browser to fetch google.com’s home page and a response was received. In HTTP lingo, below is the description of the request being made:


GET / HTTP/1.1
Host: google.com

If you were accessing a specific path (e.g. google.com/shopping), it would have looked like this instead:


GET /shopping HTTP/1.1

Host: google.com

HTTP Response: 301 Moved Permanently

The response returned by Google’s server was Status Code: 301 Moved Permanently. Hmm, that’s weird, right? Did Google move or something? Has it found a new (online) home?

301 Moved Permanently indicates that the resource we are trying to access (GET / or GET /shopping) has moved to a new location. But which location?

The Location specified in the response headers: [http://www.google.com/](http://www.google.com/).

image alt text

By sending this status code, the google.com server is asking the browser to update its cache and existing resources referring to the requested domain (google.com) to always point to the new location instead (www.google.com).

In simple words, the web server is saying:


Don’t call me 'google.com'. Call me 'www.google.com'

Note

Did you get a 200 OK instead of 301 Moved Permanently? Type the full URL in the address bar:


http://google.com

If you’re wondering why Chrome translated google.com directly to https://www.google.com, it is because Chrome has received (and correctly processed) a 301 Moved Permanently in the last month.

Examine the response headers in the screenshot above and look for Date and Expires fields just above the Location field. This is Google’s server asking Chrome to remember this 301 Moved Permanently for at least a month (as specified in the Expires value).

The second HTTP request

The Chrome browser interprets this and uses the Location field to make a new request; this time to http://www.google.com/. Click on the second request in the network tab to see all its details.

image alt text

In HTTP lingo, this is the request the browser is making this time:


GET / HTTP/1.1

Host: http://www.google.com

Another HTTP Redirect Response: 307 Internal Redirect

We receive yet another redirect response: 307 Internal Redirect. Did you receive the same status code or a different one?

307 Internal Redirect signifies a temporary redirect. In other words, browsers are not required to update their internal resources mapping http://www.google.com to the new location sent in the response.

So, what is the new location this time? https://www.google.com. Google wants you to access their homepage using an HTTPS connection for security purposes. When you access a web page over HTTPS, all data exchanged with it is encrypted to prevent data theft through man-in-the-middle (MITM) attacks. (Read more about MITM attacks)

The third HTTP request: will it succeed this time?

The Chrome browser follows the Location https://www.google.com/ returned in the earlier response and makes the third request.

In HTTP lingo, this is the request the browser is making this time:


GET / HTTP/1.1

Host: https://www.google.com/

This time, we finally got a Status Code: 200 response. This indicates that the browser has finally made a successful request. (Read more about HTTP Status Code 200).

image alt text

Look at the headers and answer the following.

Is this a secure request?
What TCP port number was used to establish the HTTPS connection?
What was the format of the response received from Google?

If you recall the discussion on secure web connections, you will notice that https:// implies that a secure connection is being established. The reserved port number for HTTPS (443) is used to make this connection.

You’ve got HTML

Google returns an HTML page in response which is rendered by Chrome. The header gives us the info through the following setting:


Content-type: text/html; charset=UTF-8

You can see the actual HTML response by clicking on the Response tab. The <!doctype html> tag indicates that the response is in HTML format.

image alt text

And if you want to see how the browser renders this HTML page, check the Preview tab.

image alt text

Something is missing

Do you notice something weird in the preview above?

Right! The images are missing. Where did they go? The images are seen clearly in the browser tab. However, they are missing in the Preview pane of Developer Tools.

Any idea why they are missing? (Hint: do you remember how many HTTP requests it took to load Google’s homepage?)

It took nearly 40 requests to load Google’s home page, didn’t it? Images are missing in the preview of request #3 because they have not yet been loaded. The HTML version of Google’s homepage has links to the Google logo. Chrome notices the link and makes subsequent requests to fetch and load it.

You can see the actual request made in the screenshot below.

image alt text

If you click on the HTTP request made to fetch Google’s logo, you can see the actual image being downloaded like in the screenshot below.

image alt text

After all the images and other assets (Javascript libraries, CSS files, and more) are downloaded, the Google homepage is rendered in its final form as shown in the browser.

The Google Homepage, at last!

image alt text

We finally have Google's homepage loaded and all ready to use. We can summarise the sequence of HTTP requests in layman terms as follows:

Chrome: knocked on http://google.com/ with a GET request.

Google server: "Welcome, we’re glad you’re here. This is the wrong gate, though. Can you please go to http://www.google.com instead?

Chrome: "Ok, thank you."

Chrome: knocks on http://www.google.com/ with a GET request.

Google server: "Why hello! Thank you for visiting us. This is a very open lobby and everyone can hear what we’re about to discuss. Care for some privacy? Can you please check-in at our other gate at https://www.google.com? There we can exchange keys and speak in a language that nobody else can understand.

Chrome: "Ooh, nice. Will see you there."

Chrome: knocks on https://www.google.com/ exchanges security keys

Google server: "alskdf)(@#E*()&IOyhfkakjsfhfsd3e23(&(&(&^(&"

Chrome: "flaskj-(&)&^%()&^hbkJHVjfiyrt&^%&YUFKHJVKJHLV(^%(&OUGLHJVLKJ"

...and they communicated happily ever after =)

Summary

And that’s a wrap! It took 3 requests to successfully access Google’s homepage (and many subsequent requests to load all the linked assets). Let us summarise everything we have learned so far.

The first thing the browser does with a URL (e.g. google.com) is perform a DNS lookup and get its IP address (e.g. 216.58.200.142 ).
The IP address returned by the DNS server could be an IPv4 or IPv6 address.
Popular websites use DNS load balancers to distribute incoming web traffic across multiple servers (based on geolocation, server load and other parameters).
The browser establishes a TCP connection with the web server.
For insecure web requests (http), the default port is 80. For secure web requests (https), the default port is 443.
Once the underlying TCP connection is successfully established, HTTP requests are sent to the web server.
There are different types of HTTP requests to perform four basic CRUD operations on resources stored on the server: Create, Read, Update and Delete.
Web servers can force browsers to use https by using HTTP redirect responses (like 301 Moved Permanently or 307 Temporary Redirect)
When a successful HTTP request is made, the web server performs the requested CRUD operation and returns a success response (200 OK).
The resource returned by the web server could be in any of several supported formats (e.g. HTML). The content-type field in the response headers is used to specify the type used (e.g. text/html).
The browser renders the page using the HTML response. If there are linked images, scripts and/or stylesheets, they are fetched using subsequent HTTP requests until all resources have been loaded.
The browser is done rendering the requested URL when all resources have been loaded successfully.

Phew - that’s quite a list! But guess what, you learned all of that in under an hour. Not by watching a video or reading a boring old tutorial, but by actually doing it yourself.

Will you ever forget what you have just learned?

The next time you’re giving a job interview and the interviewer asks: "What goes on behind the scenes when you type google.com in the browser?", we know you’ll be smiling in your heart as you ace the interview! =)

Welcome to Crio

You have just scratched the surface of how you learn on Crio. You have seen what you can accomplish in under an hour.

Imagine how much you will learn in our signature learning tracks that pack 160 - 240 hours of curriculum!

Take our FREE trial

Our FREE trial includes the following activities:

Workshop on System design
Get preview of Crio Micro-Experience and Independent projects
Complete career Plan and Program fitness questionnaire

After you complete this, you will also be able to access the first week of our curriculum which has 20 hours of content on Web Developer Essentials. We will cover the following topics that every serious web developer is expected to know

image alt text

Getting started with HTTP (a natural extension of this activity)
Introduction to REST APIs
A hands-on introduction to the Linux command line
Deploying a web application in a cloud-based server
A practical roadmap to crack a development job
..and more!

Complete the FREE trial to qualify for exciting scholarships towards our signature learning programs.

Next Steps

You will receive an email with details of how to start your FREE trial. Until then, feel free to interact with the Crio community on Slack (check email for details).

Welcome to your web developer journey!

We can’t wait to see what you will build!

Contact

Email: devsprint@criodo.com

image alt text

Byte Introduction

Background

Typical Interview

Primary goals

Note

Background

Typical Interview

Primary goals

Note

Getting started with Chrome Developer Tools

Open Chrome Developer Tools

Switch to the Network tab

We are now ready for some HTTP action!

Initial observations on loading 'http://google.com'

Curious Cats

Resolving the URL

Underneath HTTP lies TCP

Stepping into DNS

Curious Cats

Analyzing the DNS response

Curious Cats

Summary

Back to Developer Tools

Curious Cats

TCP port numbers

Curious Cats

Making HTTP requests

HTTP Response: 301 Moved Permanently

Note

The second HTTP request

Another HTTP Redirect Response: 307 Internal Redirect

The third HTTP request: will it succeed this time?

You’ve got HTML

Something is missing

The Google Homepage, at last!

Summary

Welcome to Crio

Take our FREE trial

Next Steps

Contact