Parser on PHP. It is possible.

Parser on PHP. It is possible.

In that article I want to show that PHP can cope with expression syntax analysis function. I hope that that article will be interesting even for those who have never touched on that problem.

I’ll start with the statement that programming method with using finite automat is very simple because the major part of the program is situated inside the automat that you prepare as a matrix and use in your program.

What is the automat?

For example we have a discrete function with two arguments Ft(d, Ft-1). Our first argument is a final countable set (data array). Only one figure comes to the function from that array on every step. Second argument is a value of the function on the previous step. Here is one more condition. The codomain of that function is the final countable set.

What is the advantage of that function? The main advantage is that it can be used in the form of the matrix where line numbers will set incoming data and column numbers will set the codomain. So, if we write to the cell (line, column) figure from the function range we’ll get the matrix that shows function dependence on the input data and value spectrum. Figure from the function range is CONDITION and function is AUTOMAT.

Let’s view the simple example. For instance, we have to analyze a common arithmetic expression. We have to create two automats. First one will choose “words” from the data buffer (scanner); second one will check grammatical order of the words in the expression.

Let’s start with scanner. Words in our case are operators +, -, *, / and symbol strings. Scanner automat will be as follows:

// conditions   0,  1,  2
                 "0" => array( 0, -1, -1),// delimiter 
                 "1" => array( 2, -1, -1),//one symbol word
                 "2" => array( 1,  1, -1),//symbol

Line numbers set the symbol type because we need to extract operators to the individual words. Conditions (column numbers) mean as follows:

-1 word is ready, it’s time to return
                 0 scanning 
                 1 got the symbol and we have to store it up
                 2 got the predefined word from one symbol

In the 1 condition we will store symbols up to return them as a word in the condition -1. Our scanner will be called from the parser and finish the work when it identifies any “word”. That’s why there is no need to enter condition -1 into the automat table. Automat for parser will have the following shape:

// conditions  0,  1,  2,  3,  4,  5
                 "0" => array( 1, -1,  1,  1,  1,  1), // operator
                 "1" => array( 2,  4, -1,  2, -1, -1), // operand
                 "2" => array( 3,  3, -1,  3, -1, -1), // left bracket
                 "3" => array(-1, -1,  5, -1,  5,  5), // right bracket

The conditions are:

-1 Error

The beginning of the analysis

1 Got the operator, expecting right operand or left bracket

2 Got left operand, expecting operator or right bracket

3 Got left bracket, expecting operator or right bracket

4 Got right operand, expecting operator or right bracket

5 Got right bracket, expecting operator

Parser will stop working when scanner shows FALSE or there appears an error (condition -1).

Here is an example of the program with detailed comments:

<?php 
class ExpressionParser 
  var 
$pos,    // Position in the buffer 
      
$length,    // Buffer’s length 
      
$line,      // Number of the current line 
      
$column,    // Current number of the column in the line 
      
$data,      // Data buffer 
      
$brackets,  // The amount of the opened brackets 
      
$state,     // The current condition of the parser
      
$errorstr,  // Error diagnostics line
      
$instates,  // Word input code  
      
$prevstate// Previous condition of the parser 
      
$automat;   // Parser automat table 

 /********************************************************************** 
  *  Constructor                      * 
  **********************************************************************/ 
  
function ExpressionParser($str) { 
    
$this->data=$str
    
$this->length=strlen($str); 
    
$this->pos=0
    
$this->line=1
    
$this->column=0
    
$this->brackets=0
     
    
// Codes of the words given by scaner 
    // Other words have code 1 
    
$this->instates = array("+" => 0
                            
"*" => 0
                            
"-" => 0,
                            
"/" => 0
                            
"(" => 2
                            
")" => 3); 
     
    
// Parser’s automat 
    
$this->automat=array( 
    
/* 
  -1 Error
  0 The beginning of the analysis
  1 Got the operator, expecting right operand or left bracket 
  2 Got left operand, expecting operator or right bracket
  3 Got left bracket, expecting operator or right bracket
  4 Got right operand, expecting operator or right bracket
  5 Got right bracket, expecting operator
    */ 
     //conditions 0,  1,  2,  3,  4,  5 
     
"0"=>array( 1, -1,  1,  1,  1,  1),//operator 
     
"1"=>array( 2,  4, -1,  2, -1, -1),//operand 
     
"2"=>array( 3,  3, -1,  3, -1, -1),//left bracket 
     
"3"=>array(-1, -1,  5, -1,  5,  5),//right bracket 
    
); 
    
$this->state=$this->prevstate=0
  } 

 
/********************************************************************** 
  *  Scanner                     * 
  **********************************************************************/ 
  
function Scan() { 
    
// delimiters that are ignored
    
$delimiters=array(" ","\t","\r","\n"); 

    
//  Words from one symbol 
    
$words=array("+","-","*","/","(",")"); 

    
// Scaner’s automat 
    
$automat=array( 
    
/* 
    -1 word is ready, it’s time to return
    0 scanning 
    1 got the symbol and we have to store it up
    2 got the predefined word from one symbol
    */ 
    //conditions  0,  1,  2 
     
"0"=>array( 0, -1, -1),//delimiter 
     
"1"=>array( 2, -1, -1),//word from one symbol 
     
"2"=>array( 1,  1, -1),//symbol 
    
); 
    
$state=0
    
$word=""
     
    
// Scanning cycle 
    
while ($this->pos<$this->length) { 

      
// Set the code sent to the input of the automat symbol
      
if (in_array($this->data[$this->pos],$delimiters))  
     
$instate=0
      elseif (
in_array($this->data[$this->pos],$words))  
     
$instate=1
      else 
     
$instate=2
       
      
// Get the condition of the automat 
      
$state=$automat[$instate][$state];  
      switch(
$state) { 
     case 
0// Beginning of the scanning
    
if ($this->data[$this->pos]=="\n") { 
      
$this->line++; 
      
$this->column=0
    } 
    
$word=""
    break; 
     case -
1// word is ready, it’s time to return
    
if (strlen($word)) return $word
    break; 
     case 
1// got the symbol and we have to store it up 
    
$word.=$this->data[$this->pos]; 
    break; 
     case 
2// got the predefined word from one symbol
    
$word=$this->data[$this->pos]; 
    break; 
      } 
      
$this->pos++; 
      
$this->column++; 
      if (
$this->pos==$this->length && strlen($word)) return $word
    } 
    return 
false
  } 

 
/********************************************************************** 
  *  Parser                     * 
  **********************************************************************/ 
  
function Parse() { 
    
// Variable $first = 0, if function was called first time 
    
$first=$this->pos

    
// Cycle of conditions 
    
while(1) { 
       
      
// Get the word from the scanner 
      
$word=$this->Scan(); 
       
      
// If there are no words interrupt the cycle 
      
if ($word===false) break; 
       
      
// Set the code sent to the input of the automat 
      
$instate=isset($this->instates[$word]) ? $this->instates[$word] : 1
       
      
// Get the condition of the parser’s automat 
      
$this->state=$this->automat[$instate][$this->state]; 
       
      
// If there is an error interrupt the cycle 
      
if ($this->state==-1) { 
     
$this->errorstr="Error in the line: $this->line, column: $this->column<br>"
     break; 
      }
      switch(
$this->state) { 

     case 
1// Got the operator, expecting right operand or left bracket 
      
    // If first word is operator, it can be only "+" or "-" 
    
if (($this->prevstate==|| $this->prevstate==0) && $word!="-" && $word!="+") { 
      
$this->errorstr="Error in the line: $this->line, column: $this->column<br>"
      return 
false
    } 
    break; 

     case 
2// Got operand, expecting operator or right bracket

    // Check whether it is figure or not 
    
if (!preg_match("/^[0-9]+(\.[0-9]+)?$/",$word)) { 
      
$this->errorstr=" Error in the line: $this->line, column: $this->column<br>"
      return 
false
    } 
    break; 

     case 
3// Got left bracket, expecting operator or left bracket 

    // Increase amount of opened brackets by 1 
    
$this->brackets++; 
    if (!
$this->Parse()) return false
    break; 

     case 
4// Got right operand, expecting operator or right bracket 

    // Check whether it is figure or not
    
if (!preg_match("/^[0-9]+(\.[0-9]+)?$/",$word)) { 
      
$this->errorstr=" Error in the line: $this->line, column: $this->column<br>"
      return 
false
    } 
    break; 

     case 
5// Got right bracket, expecting the operator 
      
    // decrement amount of the opened brackets by 1 
    
$this->brackets--; 
    return 
true

      } 
// end switch 
      
$this->prevstate=$this->state

    } 
// end while 
   
    
if (!$first && ($this->brackets || $this->state!=&& $this->state!=5)) return false
     
    return 
true
  } 
   


$p=new ExpressionParser("-4.25*((2+3)*4+1)/5"); 
print 
$p->data."<br>"
if (
$p->Parse()) 
  print 
"Expression is correct.<br>"
else 
  print 
$p->errorstr
?>

 

  • Top