HTTP query generation
Client and server
There are no wonders in the world, especially in the world of programming! If the program doesn’t work it means that it is written incorrectly or has some mistakes. So, how does browser ask server to send it anything? Very simple! The only thing you have to do is to relax and enjoy the process
Writing our first HTTP query
If you think that it is complicated you are wrong. So, we have browser and web-server. Initiator of the data exchange is browser. The simplest HTTP query can look like:
GET http://www.php.net/ HTTP/1.0\r\n\r\Na
- GET is query type. There are several query types, for example, POST, HEAD, PUT, DELETE
- http://www.php.net/ is a URL we want to receive information from.
- HTTP/1.0 is a type and version of the protocol that will be used
- \r\n is a line end
You can execute that query without any problems. Run the program telnet.exe, use www.php.net as a host, choose 80th port and press Enter twice. As a result you’ll see HTML code of the www.php.net main page.
Query structure
So, let’s view the HTTP query structure. We will view protocol HTTP1.0. So:
Request-Line [General-Header | Request-Header | Entity-Header]\r\n[ Entity-Body
- Request-Line
- Format: "Method Request-URI HTTP-Version\r\n"
- Method – method that will process Request-URI recourse, it can be GET, POST, PUT, DELETE or HEAD.
- Request-URI - relative and absolute link to the page with set of parameters, for example: /index.html or http://www.myhost.com/index.html or /index.html?a=1&b=qq
- HTTP-Version – HTTP protocol version, in our case it is "HTTP/1.0".
We are interested in GET and POST methods. By means of GET method we can transfer parameters to the script, by means of POST method we can emulate submit forms.
For GET method Request-URI can look like: "/index.html?param1=1¶m2=2".
- General-Header is a main part of the header
- Request-Header is a part of the header that describes the query.
- Allow sets the allowable processing methods
- From is e-mail address of the query sender
- If-Modified-Since shows that query hasn’t been modified since the appointed time
- Referrer is an absolute link to the page query was initiated at.
- User-Agent is a browser type
- Entity-Header is a part of the header that describes Entity-Body data
- Allow is a parameter similar to Allow in the General-header
- Content-Encoding is Entity-Body data encoding type.
- Content-Length is amount of bytes that are sent to the Entity-Body. Allowable are whole numbers from 0 and more. For example, "Content-Length: 26457\n".
- Content-Type is a type of the transferred data. For example, "Content-Type: text/html\n".
- Expires is a date when the page should be removed from the browser cache.
- Last-Modified is a date of the last data changing
- Extention-header is a part of the header that can be used, for instance, for processing by browser or other program that accepts the document. In that part you can describe your parameters in "ParameterName: parametervalue\n" format. Parameter data will be ignored if program-client doesn’t know how to process them.
Format: [Date: value\n | Pragma: no-cache\n]
It has only two parameters: Date and Pragma. Date is a GMT date. For example, "Tue, 15 Nov 1994 08:12:31 GMT". Pragma has only one value no-cache that prohibits page caching.
Request-Header can take on the following parameters: Allow, Authorization, From, If-Modified-Since, Referer, User-Agent. In that chapter we won’t talk about Authorization parameter, because it is used only for accessing closed recourses.
Format: "Allow: GET | HEAD\n".
That parameter is ignored while indicating the POST method in the Request-Line. It sets the allowable query processing methods. Proxy servers don’t modify Allow parameter.
Format: "From: adderss\r\n".
For example, "From: myname@mailserver.com\r\n".
Format: "If-Modified-Since: date\r\n"
It is used only for GET method. Date is set in the same format as in the General-Header.
Format: "Referrer: url\n".
For example, "Referrer: www.host.com/index.html\n".
For example, "User-Agent: Mozilla/4.0\n"
In that part of the query are set parameters that describe page body. Entity-Header can have the following parameters: Allow, Content-Encoding, Content-Length, Content-Type, Expires, Last-Modified, extension-header.
Format: "?ontent-Encoding: x-gzip | x-compress | other type\n".
Format: "Expires: date\n". Date is set in the same format as in the General-Header.
Format: "Last-Modified: date\n". Date is set in the same format as in the General-Header.
Method GET
Let’s write our query:
GET http://www.site.com/news.html HTTP/1.0\r\n
Host: www.site.com\r\n
Referer: http://www.site.com/index.html\r\n
Cookie: income=1\r\n
\r\n
That query means that we want to get a content of the page http://www.site.com/news.html, by using method GET. Host field means that given page is situated on the www.site.com server. Referer field means that we came from the main page. Why fields Host, Referer and ?ookie are very important? They are important because programmers while creating dynamic sites check fields’ data that appear in the scripts in the form of variables. It is made for preventing breaking the site.
Now let’s imagine that we have to fill form’s fields on the page and send the query. In our form there will be two fields: login and password:
GET http://www.site.com/news.html?login=Peter%20Budd&password=qq HTTP/1.0\r\n
Host: www.site.com\r\n
Referer: http://www.site.com/index.html\r\n
Cookie: income=1\r\n
\r\n
Our login is Peter Budd. Why should we write Peter%20Budd? We write in such a way because some symbols can be perceived by server as features of the new parameter. That’s why there is an encoding algorithm of parameters and their values. In PHP there are rawurlencode and rawurldecode functions that are used for encoding and decoding correspondingly.
Method POST
In the case of HTTP query (for example, POST) there are two ways of fields transferring from the HTML forms: using algorithms application/x-www-form-urlencoded and multipart/form-data. Differences between given algorithms are really considerable. So, let’s view given algorithms.
Content-Type: application/x-www-form-urlencoded.
Let’s write the query similar to the query described earlier in the GET method.
POST http://www.site.com/news.html HTTP/1.0\r\n
Host: www.site.com\r\n
Referer: http://www.site.com/index.html\r\n
Cookie: income=1\r\n
Content-Type: application/x-www-form-urlencoded\r\n
Content-Length: 35\r\n
\r\n
login=Peter%20Budd&password=qq
Here we use Content-Type and Content-Length fields of the header. Content-Length shows the data area size (in bytes). Parameters that were situated in the Request-URI in the GET query now are situated in the Entity-Body. You can put parameters with other names to Request-URI simultaneously with parameters in the Entity-Body. For example:
POST http://www.site.com/news.html?type=user HTTP/1.0\r\n
.....
\r\n
login=Peter%20Budd&password=qq
Content-Type: multipart/form-data
What are the differences between that type and application/x-www-form-urlencoded type? The main difference is that Entity-Body can be divided into the issues separated by boundaries. Every issue can have its own header for data description. In the same query you can transmit data with different types. So, we’ll view the same example of login and password transmitting, but in other format:
POST http://www.site.com/news.html HTTP/1.0\r\n
Host: www.site.com\r\n
Referer: http://www.site.com/index.html\r\n
Cookie: income=1\r\n
Content-Type: multipart/form-data; boundary=1BEF0A57BE110FD467A\r\n
Content-Length: 209\r\n
\r\n
--1BEF0A57BE110FD467A\r\n
Content-Disposition: form-data; name="login"\r\n
\r\n
PeterBudd\r\n
--1BEF0A57BE110FD467A\r\n
Content-Disposition: form-data; name="password"\r\n
\r\n
qq\r\n
--1BEF0A57BE110FD467A--\r\n
So, let’s look into the script above. As you can see there is a boundary field after Content-Type. That field sets a separator between issues - boundary. As a boundary there can be used line composed of Latin letters and figures. In the query body to the beginning of the boundary you should add '--'. There are two parts in our query. First part describes login field, second part describes password field. Content-Disposition says that data will be taken from the form. And name field should include field name.
There is no need to use Content-Length in the issue headers; in the query header it should be used and its value is a size of the whole Entity-Body.
Now let’s write file transferring query:
POST http://www.site.com/postnews.html HTTP/1.0\r\n
Host: www.site.com\r\n
Referer: http://www.site.com/news.html\r\n
Cookie: income=1\r\n
Content-Type: multipart/form-data; boundary=1BEF0A57BE110FD467A\r\n
Content-Length: 491\r\n
\r\n
--1BEF0A57BE110FD467A\r\n
Content-Disposition: form-data; name="news_header"\r\n
\r\n
News example\r\n
--1BEF0A57BE110FD467A\r\n
Content-Disposition: form-data; name="news_file"; filename="news.txt"\r\n
Content-Type: application/octet-stream\r\n
Content-Transfer-Encoding: binary\r\n
\r\n
News in the file .txt\r\n
--1BEF0A57BE110FD467A--\r\n
In that example in the first part we transfer the header of the news; in the second part we transfer news.txt file. Filename field sets the name of the transferred file; Content-Type filed sets the type of the file; Application/octet-stream shows that it is a standard data-stream; Content-Transfer-Encoding: binary shows that these data are binary and not encoded.
The major part of the CGI scripts is written by smart programmers, which like to check the type of the accepted file situated in Content-Type. What for? The major part of the downloaded files is user’s images. So, browser tries to define the file’s type and inserts corresponding Content-Type to the query. Script checks it, and, for example, if it is not gif or jpeg script ignores it.
| image/gif | for gif |
| image/jpeg | for jpeg |
| image/png | for png |
| image/tiff | for tiff |
P.S.
I think there is no need to tell about query transferring to the server. It will be enough if you read about functions of working with sockets or CURL module functions in the official PHP documentation. There is one more query type - Content-Type: multipart/mixed, but I hope you will understand it after reading that article



