Working with data base. Logs’ analysis.
There are four reasons why you are not to proceed to creation of a large supplement to learn a language.
I’d like to praise my ‘creation’.
In December I described the simplest way to collect logs. Now I’ll show how you could analyze them.
What was in the table? Date, address from the virtual tail’s root ($PHP_SELF), browser, referrer, user’s ip-address and the name of the tail. The line was inserted so:
@mysql_query("INSERT INTO logs (date, ip, host, address, referer, browser) VALUES (NOW(),
"$REMOTE_ADDR", "". gethostbyaddr($REMOTE_ADDR). "", "$PHP_SELF", "$HTTP_REFERER", "$HTTP_USER_AGENT")");
And what we’d like to see in the statistics? Just treat different variants: attendance on days, distribution of the intendances upon the time. Then repeat these selections for the main page only. Distribution upon the week, intendance in the last weeks and months. Number of the customers from other sites. It may be interesting to observe, how many customers came from definite reference in different days. Distribution upon the time on definite day (week, month). Addresses to which they come from some page.
Systemizing this is not as difficult as it seems to be. All described before comes to eight variants of a table grouping. Address, referrer, browser (unluckily here we compare lines "HTTP_USER_AGENT" including versions and operation system but not the separate browsers. Here processing of the data by input can’t be omitted) date, week, month, day of a week, time.
We don’t deal with statistics upon countries, cities and users’ roots – here additional processing is necessary.
Add to the selection of the grouping selection of days’ limitation (last n days) and selection conditions for the field WHERE which can be input in the text field, and we’ll get a system in which all described by me selections and distributions go.
In fact, it’s not so easy to set a list of selections and make it suitable for the processing in the program. At last I placed all the questions in the massive which element looks so:
$selection[0] = array(
"name" => "days schedule",
"select" => array("date_format(date,"%e.%m.%Y") as dday", "count(date) as visits"),
"group" => array("dday"),
"order" => array("date DESC"),
"type" => 1
);
Element ‘name’ is the name of a roller; ‘type’ is a type of a table. There are two table types – simple list and list with ‘columns’. If the list is sorted on the quantity of intendance, we don’t need any columns, and for the convenience of perception (for example for graphic of intendance upon the day) it would be better to build a graphic. The rest of the elements don’t need any comment.
In the request temporary limitations and conditions introduced by the user are also taken in account. Variable $type is the number of requested selection.
$days = intval($days);
if ($days>0)
$selection[$type]["where"][] = "date>DATE_SUB(NOW(),INTERVAL $days DAY)";
$where = stripslashes(trim($where));
if (strlen($where)>0)
$selection[$type]["where"][] = "($where)";
After that the form is drawn. In the rollers selected meanings are output in the first lines. Then a request is built by means of which the general quantity of lines is found out. This, should be mentioned, is a slippery place cause there aren’t any simplifications in it, and unlike basic request, there is a simple lack of sorting here. But the quantity of the lines is found out ‘directly’ – the same is selected and then mysql_num_rows is done. If someone has any ideas, you may send them to me or publish your analisator (only you have to put a reference to me).
$amount_request = "SELECT ". implode(", ", $selection[$type]["select"]). " FROM logs ";
if (sizeof($selection[$type]["where"])>0)
$amount_request .= " WHERE ". implode(" AND ", $selection[$type]["where"]);
$amount_request .= " GROUP BY ". implode(", ", $selection[$type]["group"]);
$request = "SELECT ". implode(", ", $selection[$type]["select"]). " FROM logs ";
if (sizeof($selection[$type]["where"])>0)
$request .= " WHERE ". implode(" AND ", $selection[$type]["where"]);
$request .= " GROUP BY ". implode(", ", $selection[$type]["group"]).
" ORDER BY ". implode(", ", $selection[$type]["order"]). " ".
get_limit($page, $amount, $in_page);
As you can see, I use module described in the previous issue of the paginal output of requests (get_limit).
But giving unlimited opportunities for the requests building isn’t enough. For example, I’m looking through the referrers’ popularity statistics and would like to find out how people came to me from hackzone.com. What should I do? I don’t intend to select necessary rollers’ settings and write in the text field "referer like '%hackzone%'"!
It proves to be easy. Two hours of mental efforts and text edition and there is some kind of opportunity for the detailed selection. I click ‘referrers’ list’ in the layers (as the parameter "referer not like 'http://php.com%'" is transmitted in address), get the table, then in the line "hackzone.com" I click the reference and here is the distributions of the customers from this site upon the dates. You may click another reference and get the distribution upon the time of a day, as you like.
What other selections to do depends on your imagination. If the written isn’t enough, your may add something of your own. Promise to update and improve the program as much as I can.
Having logs on your office or home computer you may do with them everything you like to. You have only to write a processor.



