Parser on PHP. It is possible.
In that article I want to show that PHP can cope with expression syntax analysis function. I hope that that article will be interesting even for those who have never touched on that problem.
I’ll start with the statement that programming method with using finite automat is very simple because the major part of the program is situated inside the automat that you prepare as a matrix and use in your program.
What is the automat?
For example we have a discrete function with two arguments Ft(d, Ft-1). Our first argument is a final countable set (data array). Only one figure comes to the function from that array on every step. Second argument is a value of the function on the previous step. Here is one more condition. The codomain of that function is the final countable set.
What is the advantage of that function? The main advantage is that it can be used in the form of the matrix where line numbers will set incoming data and column numbers will set the codomain. So, if we write to the cell (line, column) figure from the function range we’ll get the matrix that shows function dependence on the input data and value spectrum. Figure from the function range is CONDITION and function is AUTOMAT.
Let’s view the simple example. For instance, we have to analyze a common arithmetic expression. We have to create two automats. First one will choose “words” from the data buffer (scanner); second one will check grammatical order of the words in the expression.
Let’s start with scanner. Words in our case are operators +, -, *, / and symbol strings. Scanner automat will be as follows:
// conditions 0, 1, 2
"0" => array( 0, -1, -1),// delimiter
"1" => array( 2, -1, -1),//one symbol word
"2" => array( 1, 1, -1),//symbol
Line numbers set the symbol type because we need to extract operators to the individual words. Conditions (column numbers) mean as follows:
-1 word is ready, it’s time to return
0 scanning
1 got the symbol and we have to store it up
2 got the predefined word from one symbol
In the 1 condition we will store symbols up to return them as a word in the condition -1. Our scanner will be called from the parser and finish the work when it identifies any “word”. That’s why there is no need to enter condition -1 into the automat table. Automat for parser will have the following shape:
// conditions 0, 1, 2, 3, 4, 5
"0" => array( 1, -1, 1, 1, 1, 1), // operator
"1" => array( 2, 4, -1, 2, -1, -1), // operand
"2" => array( 3, 3, -1, 3, -1, -1), // left bracket
"3" => array(-1, -1, 5, -1, 5, 5), // right bracket
The conditions are:
-1 Error
The beginning of the analysis
1 Got the operator, expecting right operand or left bracket
2 Got left operand, expecting operator or right bracket
3 Got left bracket, expecting operator or right bracket
4 Got right operand, expecting operator or right bracket
5 Got right bracket, expecting operator
Parser will stop working when scanner shows FALSE or there appears an error (condition -1).
Here is an example of the program with detailed comments:
<?php
class ExpressionParser {
var $pos, // Position in the buffer
$length, // Buffer’s length
$line, // Number of the current line
$column, // Current number of the column in the line
$data, // Data buffer
$brackets, // The amount of the opened brackets
$state, // The current condition of the parser
$errorstr, // Error diagnostics line
$instates, // Word input code
$prevstate, // Previous condition of the parser
$automat; // Parser automat table
/**********************************************************************
* Constructor *
**********************************************************************/
function ExpressionParser($str) {
$this->data=$str;
$this->length=strlen($str);
$this->pos=0;
$this->line=1;
$this->column=0;
$this->brackets=0;
// Codes of the words given by scaner
// Other words have code 1
$this->instates = array("+" => 0,
"*" => 0,
"-" => 0,
"/" => 0,
"(" => 2,
")" => 3);
// Parser’s automat
$this->automat=array(
/*
-1 Error
0 The beginning of the analysis
1 Got the operator, expecting right operand or left bracket
2 Got left operand, expecting operator or right bracket
3 Got left bracket, expecting operator or right bracket
4 Got right operand, expecting operator or right bracket
5 Got right bracket, expecting operator
*/
//conditions 0, 1, 2, 3, 4, 5
"0"=>array( 1, -1, 1, 1, 1, 1),//operator
"1"=>array( 2, 4, -1, 2, -1, -1),//operand
"2"=>array( 3, 3, -1, 3, -1, -1),//left bracket
"3"=>array(-1, -1, 5, -1, 5, 5),//right bracket
);
$this->state=$this->prevstate=0;
}
/**********************************************************************
* Scanner *
**********************************************************************/
function Scan() {
// delimiters that are ignored
$delimiters=array(" ","\t","\r","\n");
// Words from one symbol
$words=array("+","-","*","/","(",")");
// Scaner’s automat
$automat=array(
/*
-1 word is ready, it’s time to return
0 scanning
1 got the symbol and we have to store it up
2 got the predefined word from one symbol
*/
//conditions 0, 1, 2
"0"=>array( 0, -1, -1),//delimiter
"1"=>array( 2, -1, -1),//word from one symbol
"2"=>array( 1, 1, -1),//symbol
);
$state=0;
$word="";
// Scanning cycle
while ($this->pos<$this->length) {
// Set the code sent to the input of the automat symbol
if (in_array($this->data[$this->pos],$delimiters))
$instate=0;
elseif (in_array($this->data[$this->pos],$words))
$instate=1;
else
$instate=2;
// Get the condition of the automat
$state=$automat[$instate][$state];
switch($state) {
case 0: // Beginning of the scanning
if ($this->data[$this->pos]=="\n") {
$this->line++;
$this->column=0;
}
$word="";
break;
case -1: // word is ready, it’s time to return
if (strlen($word)) return $word;
break;
case 1: // got the symbol and we have to store it up
$word.=$this->data[$this->pos];
break;
case 2: // got the predefined word from one symbol
$word=$this->data[$this->pos];
break;
}
$this->pos++;
$this->column++;
if ($this->pos==$this->length && strlen($word)) return $word;
}
return false;
}
/**********************************************************************
* Parser *
**********************************************************************/
function Parse() {
// Variable $first = 0, if function was called first time
$first=$this->pos;
// Cycle of conditions
while(1) {
// Get the word from the scanner
$word=$this->Scan();
// If there are no words interrupt the cycle
if ($word===false) break;
// Set the code sent to the input of the automat
$instate=isset($this->instates[$word]) ? $this->instates[$word] : 1;
// Get the condition of the parser’s automat
$this->state=$this->automat[$instate][$this->state];
// If there is an error interrupt the cycle
if ($this->state==-1) {
$this->errorstr="Error in the line: $this->line, column: $this->column<br>";
break;
}
switch($this->state) {
case 1: // Got the operator, expecting right operand or left bracket
// If first word is operator, it can be only "+" or "-"
if (($this->prevstate==3 || $this->prevstate==0) && $word!="-" && $word!="+") {
$this->errorstr="Error in the line: $this->line, column: $this->column<br>";
return false;
}
break;
case 2: // Got operand, expecting operator or right bracket
// Check whether it is figure or not
if (!preg_match("/^[0-9]+(\.[0-9]+)?$/",$word)) {
$this->errorstr=" Error in the line: $this->line, column: $this->column<br>";
return false;
}
break;
case 3: // Got left bracket, expecting operator or left bracket
// Increase amount of opened brackets by 1
$this->brackets++;
if (!$this->Parse()) return false;
break;
case 4: // Got right operand, expecting operator or right bracket
// Check whether it is figure or not
if (!preg_match("/^[0-9]+(\.[0-9]+)?$/",$word)) {
$this->errorstr=" Error in the line: $this->line, column: $this->column<br>";
return false;
}
break;
case 5: // Got right bracket, expecting the operator
// decrement amount of the opened brackets by 1
$this->brackets--;
return true;
} // end switch
$this->prevstate=$this->state;
} // end while
if (!$first && ($this->brackets || $this->state!=4 && $this->state!=5)) return false;
return true;
}
}
$p=new ExpressionParser("-4.25*((2+3)*4+1)/5");
print $p->data."<br>";
if ($p->Parse())
print "Expression is correct.<br>";
else
print $p->errorstr;
?>



