The Hypertext Transfer Protocol (HTTP) is an application-level protocol that is being widely used over the Web. HTTP is a request/response protocol, which means, the client sends a request to the server in the form of a request method, URI, protocol version, followed by headers, and possible body content. The server responds with a status code line, a success or error code, followed by server headers information, and possible entity-body content.
Under the hood, HTTP requests and responses are sent over a TCP socket with default port 80 (on the server side). The HTTP client first initiates a TCP connection with the server, once the connection is established, the client and the server access TCP through their respective socket interfaces.
Here we see one of the great advantages of the layered architecture of the OSI model — HTTP doesn’t need to worry about data loss and integrity. This is the job of TCP and other protocols in lower layers.
HTTP is said to be a stateless protocol. The server sends the requested content to clients without storing any information about the client. If a particular client sends the same request twice in a period of a few seconds, the server does not respond by saying that it just served the same request to the client. Instead, the server re-sends the data, as it has completely forgotten what it did earlier.
Nevertheless, although HTTP itself is stateless by design, most modern servers have a complex backend logic that stores information about logged-in clients, and by this means, we can say that those servers are stateful.
We can learn a lot by taking a closer look on a raw HTTP request and response that sent over the network:
curl -v http://httpbin.org/html
Below is the actual raw HTTP request sent by curl
to the server:
GET /html HTTP/1.1
Host: httpbin.org
User-Agent: curl/7.58.0
Accept: */*
The server response is:
HTTP/1.1 200 OK
Date: Wed, 05 Apr 2023 12:43:42 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 3741
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
<!DOCTYPE html>
<html>
<head>
....
The MDN web docs specify the core components of request and response objects, review this resource.
HTTP response status codes indicate how a specific HTTP request was completed.
Responses are grouped in five classes:
Try yourself to perform the below two HTTP requests, and read about the meaning of each status code.
curl -i httpbin.org/status/0
curl -v -X PUT httpbin.org/path/to/nowhere
So far we’ve seen how to communicate with a server as clients.
The below exercise demonstrates how a server might look and operate.
We’ll run a simple Flask Python server, and communicate with the server locally from the same machine. The client will be the old good curl
.
The Python server is based on a Python package called Flask. Let’s install this package.
venv
:(venv) myuser@hostname:~/path/to/project$ which python
/home/myuser/path/to/project/venv/bin/python
^^^^^^^^^^^^^^^^^^^^--------------- path to venv dir in your project files
pip install flask
.In our shared repo, review the code in simple_flask_webserver/app.py
. Above every function, you can find the server endpoint (e.g. /upload
) and the curl
command to request this endpoint.
In your terminal, cd simple_flask_webserver
, and run the server by: python app.py
The default endpoint /
can be accessed using your browser. Use curl
to trigger the other endpoints as specified in app.py
.
Take a closer look at the output you’ve got from the /api/upload
endpoint, this is the so-called JSON format.
JavaScript Object Notation (JSON) is a standard text-based format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web applications. JSON can be used independently from JavaScript, and many programming environments feature the ability to read and generate JSON.
An API (Application Programming Interface) can be thought of as a collection of all server endpoints that together define the functionality that is exposed to the client. Each endpoint typically accepts input parameters in a specific format and returns output data in a standard format such as JSON or XML.
For example, a web API for a social media platform might include endpoints for retrieving a user’s profile information, posting a new status update, or searching for other users. Each endpoint would have a unique URL and a specific set of input parameters and output data.
Many platforms that we will be learning during the course, expose a rich API as well as GUI.
We mentioned that an HTTP server is stateless. This simplifies server design and has permitted engineers to develop high-performance Web servers that can handle thousands of simultaneous TCP connections. However, it is often desirable for a website to remember stateful information of users. For this purpose, HTTP uses cookies.
An HTTP cookie is a small piece of data that a server sends to a user’s web browser. The browser may store the cookie and send it back to the same server with later requests. Most major commercial websites use cookies today.
As shown in the figure above, a cookie has four components:
Set-Cookie
header line in the HTTP response message, usually in the form of key=value
.Cookie
header line in the HTTP request message.Suppose Berta contacts Amazon.com for the first time. When the request comes into the Amazon server, the server creates a unique id and creates an entry in its back-end database that is indexed by the id. Amazon server then responds to Berta’s browser, including in the HTTP response a Set-cookie
header, which contains the id value. For example, the header line might be:
Set-cookie: ITEM=12345
When Berta’s browser receives the HTTP response message, it sees the Set-cookie
header, and appends a line to the special cookie file that it manages locally.
As Berta continues to browse the Amazon site, each time she requests a web page, her browser consults her cookie file, extracts her id value for this site, and puts a cookie header line that includes the id value in the HTTP request. Specifically, each of her HTTP requests to the Amazon server includes the header line:
Cookie: ITEM=12345
In this manner, Amazon server is able to track Berta’s activity, it knows exactly which pages she visited, in which order, and at what times!
Although cookies often simplify and improve user experience, they are controversial because they can also be considered as an invasion of privacy. As we just saw, using a combination of cookies and user-supplied account information, a website can learn a lot about a user and potentially sell this information to a third party. The GDPR website includes extensive information on cookies compliance requirements.
Postman is a powerful and user-friendly tool for testing, debugging, and documenting APIs.
https://learning.postman.com/docs/getting-started/overview/
Enter the interactive self-check page
Use curl to perform an HTTP GET request to http://httpbin.org/image
. Add an Accept header to your requests to indicate to the server that you anticipate a png image. Read carefully the Warning message written by curl at the end of the server response, follow the instructions to save the image on the file system. Execute another curl to save the image in the file system. Which animal appears in the served image?
The server of http://httpbin.org
uses keep-alive
connection by default, indicating that the server would like to keep the TCP connection open, so further requests to the server will use the same underlying TCP socket.
Perform an HTTP POST request to the /anything
endpoint and tell the server that the client (you) would like to close the connection immediately after the server has responded.
Make sure the server’s response contains the Connection: close
header which means that the TCP connection used to serve the request was closed.
Perform an HTTP POST request to http://httpbin.org/anything. You should send the following json to the server
{"test": "me"}
Upon success request, the response body will be a json format with a “json” key and your data as a value, as follows:
"json": {
"test": "me"
}
Bad requests would be responded by:
"json": null
Using curl
, perform an HTTP GET request to http://httpbin.org/cookies/set
, while adding the following query parameters to the URL:
color-theme=light
visitor-id=59201456801
lang=en
Query parameters is the set of key-value pairs that go after the ? in a URL, separated by & characters.
e.g.:
http://127.0.0.1:8000/some/endpoint?k1=value1&k2=value2
The /cookies/set
is a designated endpoint that will add Set-Cookie
headers according to the specified URL query parameters. Use the --cookie-jar my-cookies.txt
option in order to save the cookies in the file my-cookies.txt
.
By default, your browser will send cookies to the corresponding servers. We will use curl
with the option --cookie my-cookies.txt
to send cookies to the server. Perform an HTTP GET request to http://httpbin.org/cookies
. Use the -v
option to see how the Cookie
header was set in the request. Upon success, the server response body-entity should be:
{
"cookies": {
"color-theme": "light",
"lang": "en",
"visitor-id": "59201456801"
}
}
Note: It is uncommon for a server to attach the user cookies in the response body. Keep in mind that httpbin.org
is a toy server that helps you learn the HTTP protocol. The /cookies endpoint was designed specifically to respond with your cookies for debugging purposes.
Use Postman to retrieve data from some interesting free API:
https://github.com/public-apis/public-apis