ByteIntroduction

Get started with HTTP

Skills:

HTTP

Objective

Learn about HTTP protocol and how it’s used

Background

HTTP is the most popular application protocol on the internet, which makes actions like visiting web pages happen. It starts with a client machine sending requests in the HTTP format. The server machine receives the request, understands it and takes appropriate action. The response again has to be formatted in a specific manner adhering to the HTTP protocol for the client to make sense of it.


Primary goals

  1. Get a clear understanding of how HTTP works.

  2. Use tools like cURL & Postman to perform HTTP requests and analyse responses

Objective

Learn about HTTP protocol and how it’s used

Background

HTTP is the most popular application protocol on the internet, which makes actions like visiting web pages happen. It starts with a client machine sending requests in the HTTP format. The server machine receives the request, understands it and takes appropriate action. The response again has to be formatted in a specific manner adhering to the HTTP protocol for the client to make sense of it.


Primary goals

  1. Get a clear understanding of how HTTP works.

  2. Use tools like cURL & Postman to perform HTTP requests and analyse responses

Open a Linux terminal

Note

If you are taking this Byte on your personal Linux machine, open a new terminal and proceed to the Setup Dependencies section below. You may skip the rest of this milestone.

Instructions for the Crio Workspace

To open a terminal in the Crio Workspace, follow the steps below:

  1. Click on the workspace icon on the left sidebar and click on the Start button to create your workspace.

  2. Once your workspace is ready, click on Launch online IDE.

  3. Click on View > Open in New Tab option (see image below).

image alt text

  1. Open a new terminal, click on the Menu > View > Terminal. (Note: The menu button is on the top-left (three horizontal bars).

image alt text

Setup Dependencies

  • Using Google Chrome is recommended

  • (Optional) Download & install Postman application on your local computer from here

Construct a simple HTTP request on TCP protocol

Let’s try to understand the components of an HTTP request & response by actually doing one. The telnet client helps us connect to other computers on the internet. The format is telnet hostname port. (You can use this online telnet client)

image alt text

The default port for HTTP is 80 and the telnet command has us connected to the HTTP port on the data.pr4e.org server and we can start sending HTTP Requests.

Now, how do we create an HTTP request? Let’s check the HTTP protocol definition doc here to get an idea of how to frame an HTTP request.

image alt text

Properties in brackets are optional, so a basic Request need only include a Request-Line. For our use case, CRLF will be added by hitting Enter and SP is space.


For the Request-Line, we will use GET, which is an HTTP method used to "get" data. We’ll be fetching data from the file located at http://data.pr4e.org/romeo.txt and use HTTP/1.0 keyword to signify version 1.0 of HTTP protocol will be used. Enter the below line and hit Enter twice. Why twice? (Hint: Protocol definition)


GET http://data.pr4e.org/romeo.txt HTTP/1.0

You’ll get the HTTP response back from the server. Similar to the Request format we saw earlier, HTTP response has a format as well. See here


It starts with the Status line which contains the HTTP version used by the server & a status code denoting what happened to the request. A 200 status code tells us the request completed successfully. We also have a number of response headers like Date, Server, Last-Modified etc. Each of these have a meaning. The actual file content comes after all the Response headers.

image alt text

From the above figure, different parts of the HTTP communication are:

  1. Request Line (HTTP Request)

  2. Status Line & Response Header (HTTP Response)

  3. Response Body (HTTP Response)


Try to figure out what some of these response headers mean & what their uses are - for starters, see Last-Modified, Content-Length, Content-Type


If we analyse the network packets transferred to/from our computer during the above communication, we’ll be able to understand some things (192.168.43.197 is the client computer & 192.241.136.170, the server)

  1. Client initiates a TCP connection request to the server (Line 1) - this is performed when we execute the telnet command

  2. HTTP communication happens using this established TCP connection (see the bottom part that lists outs the protocols used for the resource transfer)

  3. Client sends a HTTP Request line to the server (Line 6) to which the server responds with a HTTP Status code & data as we saw earlier in the telnet output

image alt text

Network packets analysed using Wireshark

Browsers to the rescue!

Think about if we had to frame a HTTP request on our own every time we need to fetch a resource from the web. Accessing most websites would include so many HTTP requests.


Browsers use HTTP Requests to fetch us web pages. When we enter a website URL, the browser creates a HTTP Request on our behalf and sends it to the server on which the website is hosted. The HTTP Response from the server is read by the browser and rendered for us beautifully as web pages instead of the raw HTML actually returned. Let’s look into what constitutes a HTTP request & response

  1. Open a new window on Chrome and navigate to https://www.flipkart.com/ in Incognito (To avoid inconsistencies due to caching).

  2. Open Chrome Developer Tools Ctrl + Shift + i / Cmd + Shift + i in the browser window and select the Network tab.

  3. Refresh the page to start recording network activity from Chrome and observe the HTTP requests made to load the website.

  • How many HTTP requests were made?

  • How much data was transferred over the network?

  1. You will see many HTTP requests being made. Scroll to the top of the network activity and click on the first request made. (Find the entry In the Name tab, you should see www.flipkart.com with Type as document). image alt text

image alt text

  • Observe the following details for this HTTP request:

    1. Request URL

    2. Request Method

    3. Status Code (this is the response status code received)

  • Remote Address - The port number used is 443. Is this a special port number? Is there any relationship between the port number used and the Request URL? Why is there a lock icon in the address bar of your browser? image alt text

  • Compare the Response Headers with what we got earlier fetching data from http://data.pr4e.org/romeo.txt using telnet (see screenshot above) eg: Content-Type, Server

  • Find out the HTTP request-line sent (Hint: Click on view source)

image alt text

  • Goto https://www.flipkart.com/television-store in a new tab. What do you think would be the changes in the HTTP request-line? Verify by checking the request-line sent for retrieving the new HTML page like you did in the previous step. (Filter Doc to easily find HTML requests)

image alt text


The request line will now be asking for the resource at /television-store instead of the resource at root (/) when you visited https://www.flipkart.com/. The Host header tells where to fetch this resource from.


GET /television-store HTTP/1.1

Host: www.flipkart.com

Curious Cats

  • Suppose Chrome versions below 80.0 don't support GIF images. We need our server to return a corresponding PNG image if any unsupported browser asks for the GIF image. How would the server know the Chrome version from which the request was made? (Hint: See Request Headers)

  • Open a browser tab in Incognito. Visit https://crio.do/ after opening the Networks tab in DevTools. Observe the size of data transferred. Open a new tab and do the same. Is there a difference in the size of data transferred now? Inspect the request & response headers in both situations to find out what’s happening.

  • We looked at how requesting for a HTML file inturn creates a new HTTP request to fetch resources like scripts & images within it. Visit a couple of websites & inspect the resources loaded. Is there any order in which the resources are loaded? Does HTTP mandate this?

Answers to these Curious Cats questions will be available in the Takeaways milestone at the end.

What are HTTP Request Methods?

We saw how HTTP can be used to fetch data in Milestone 1. How would we use HTTP to

  • Upload data to the server eg: Add profile picture to facebook

  • Update some data present in the server eg: Change your facebook display name

  • Delete some data present in the server eg: Remove contact information from facebook


As listed above there are a variety of use cases and HTTP provides different request methods for each. Let’s look at the most frequently used methods.

GET

GET requests are used to "get" resources from a server. By definition, GET requests should only fetch data from the server and shouldn’t change the data stored on the server.


Check the requests made when you visit https://gitlab.crio.do/users/sign_in in Incognito. If we check the first request made, it’s for a resource of type document which is the HTML file. Use the Preview tab to see the HTML rendered. Is it missing something?


Goto the Response tab & you’ll be able to see the raw HTML data. You’ll be able to see <img> tags. Why aren't the images showing up in the Preview tab then?


As we saw earlier, we can only specify a single resource in the HTTP request-line at a time meaning that we need a separate HTTP request for fetching any related files (image, css, javascript) the HTML needs.


Find out an image included in the HTML. Can you see a HTTP request for that resource?

image alt text

POST

POST requests are used to send some data to the server. Some use cases are to submit data from a web form or to upload a file to the server.


Assuming you’re still at https://gitlab.crio.do/users/sign_in, try to Sign in using some invalid credentials. Inspect the request sent now. How does it differ from what we had when we visited the web page?


Scroll down in the Headers tab to find the form data you filled which was sent along with the HTTP request.

image alt text

PUT

PUT requests are used to update data on the server side. This could be for actions like changing your Facebook relationship status, updating a student’s marks on the college server after improvement exams etc.


Visit https://gitlab.crio.do/ and login using Google sign-in if you haven’t already. Open the DevTools Networks tab and ensure you’ve the Preserve log option checked.

image alt text

Try updating your account status (Yep, even Gitlab allows you to set status :)) and monitor the network requests. Are there any PUT requests made?

(Only change the text part of the status, let the emoji stay the same)

image alt text

image alt text

Try the same process again with the Preserve log option unchecked. Are there any differences? Why so?


We looked at some of the commonly used HTTP request methods. There are a few more that we can use to perform tasks like deleting data, finding the request methods supported for a particular endpoint etc.

image alt text

Curious Cats

  • Is it possible to send form data using a GET request? Why or why not?

  • Are there any limitations in using a GET request to send data to the server?

Answers to these Curious Cats questions will be available in the Takeaways milestone at the end.

What are HTTP Status Codes?

HTTP Status codes are part of the HTTP Response. It helps the client understand what happened to the request. Status codes are 3 digit numbers (201, 304 etc) and are categorised to 5 different families based on their starting digit. Along with the status code, a Reason-Phrase is also present (OK, Moved Permanently etc) which gives a short description of the status code. The Status Code is intended for machines whereas Reason-Phrase is for humans.

image alt text

Status codes - 2xx

The 2xx family of status codes or status codes 200-299 signifies the HTTP request was successfully received & understood by the server. We’ve been seeing the 200 status codes all the way until now. That’s what we get when the server returns some resource for our request.

Status codes - 3xx

3xx family of status codes denotes that further action must be taken to complete the HTTP request made.

  1. Try navigating to http://www.flipkart.com instead of https://www.flipkart.com (http instead of https).

  2. Observe the headers of the first HTTP request again this time.

image alt text

  1. Note the response status code: 301 Moved Permanently. This is Flipkart asking the browser to redirect the request from the unsecure URL (http://) to the secure URL (https://) instead. The browser will oblige and send the request accordingly.

  2. Inspect the Remote Address again. Is the IP address the same as before? What about the port number?

  3. What is the difference between port 80 and port 443?

  4. How does the browser know to redirect to https://www.flipkart.com? (Hint: Response Headers)

  5. In summary, requests to http://www.flipkart.com get redirected to https://www.flipkart.com.

Status codes - 4xx

Getting a 4xx status code tells us that there was an error in the HTTP request sent by the client - that would be the browser if we are visiting web pages.


Check the status code on trying to fetch some random resource from a website eg: https://www.flipkart.com/crio-do


There are a couple more HTTP status code families - 1xx & 5xx. 1xx is for information purposes while 5xx signifies there was a server error.

Curious Cats

  • When you try to access a resource that requires logging in, like LinkedIn feed, https://www.linkedin.com/feed, you get redirected to the login screen. That should be a 301, right? Can you verify.

  • One day or another, you’d have come across the below pop-up when trying to reload a web page containing a form. Why does this happen? Is there any way to avoid this happening?image alt text

  • Find out example situations that result in a 4xx or 5xx response code.

Answers to these Curious Cats questions will be available in the Takeaways milestone at the end.

The cURL utility

cURL is like a web-browser, but for the command line.You can make HTTP requests using cURL just like in a web-browser. The Response can be seen on the command line or redirected to a file.


Open a new terminal window and enter the following commands. Take your time and observe the output of each step closely.

  1. Type the following curl command: curl -X GET https://www.flipkart.com -o ~/flipkart.html Inspect the contents of ~/flipkart.html.

  2. In the above curl request, -X allows you to specify the HTTP method to be used.

  3. You don’t have to use the -X GET switch => it is the default behaviour. Try the following command: curl https://www.flipkart.com

  4. curl can give you the same details that you were looking at, in the Chrome Developer Tools.


Try the below exercises and inspect the content of the flipkart.html file

  1. curl -X GET http://www.flipkart.com -o flipkart.html You should see the 301 Moved Permanently reponse; just like in Chrome Developer Tools

  2. We can also print the HTTP Response Headers using this: curl -i -X GET http://www.flipkart.com -o flipkart.html Now, you should see the full Response Headers as well.

  3. If you also want the full details, try the following command: curl -v -X GET http://www.flipkart.com -o flipkart.html A verbose log is printed. You can get the IP Address and the port number from the console output this time.

  4. You can also instruct curl to follow the Redirect uRL automatically using the -L switch curl -v -L -X GET http://www.flipkart.com -o flipkart.html # still using http and not https


Can you find out other capabilities of cURL?

Postman (Optional)

Postman is a powerful GUI based application that lets us make HTTP requests easily. To start with, create a new Request in Postman.

image alt text

You’ll be able to see this view

image alt text

Using Postman to perform GET requests is straightforward. Just add the request URL and hit Send.Try performing a GET request for https://www.flipkart.com.

image alt text

The returned data is shown in the Body tab and the response headers in the Headers tab. Check the headers out.

Curious Cats

  • Postman has a cool feature presenting us with commands to perform requests using cURL, Java, Python & multiple other languages. Find out how to do that.

Answers to these Curious Cats questions will be available in the Takeaways milestone at the end.

Summary

  • HTTP is an application layer protocol that allows transfer of data between machines. Most common use of HTTP is in loading web pages where HTML documents are fetched along with the other resources like images, CSS & JavaScript that it uses

  • HTTP Request Method - used by the system requesting the resource to specify the type of request

    • GET - can be used to fetch web pages

    • POST - can be used to submit login data

    • PUT - can be used to update data

    • Other HTTP methods are HEAD, DELETE, OPTIONS, CONNECT, TRACE & PATCH

  • HTTP Response Status Code - seen in the response message. Used by the system receiving the request, to specify the result of the request.

    • 1xx - Informational responses

    • 2xx - Successful responses eg: 200 OK - Request successfully completed

    • 3xx - Redirects eg: 301 Moved Permanently - Requested resource was moved permanently to a different location

    • 4xx - Client errors eg: 404 Not Found - Requested resources wasn’t found

    • 5xx - Server errors

  • See pointers on Curious Cats questions here

  • Further reading

Newfound Superpowers

  • Knowledge of the HTTP protocol and how it rules the internet

Now you can

  • Explain how a browser speaks to an HTTP server to fetch a particular web page

  • Explain HTTP methods like GET & POST and how they are used

  • Explain HTTP status codes

  • Use tools like cURL & Postman to send and analyse HTTP requests to servers