Read this post if you don't know what Bison is.
Today I'll try to parse Nginx config into AST.
I get the actual Nginx config from official Symfony documentation to test the parser.
nginx.conf
server { server_name domain.tld www.domain.tld; root /var/www/project/public; location / { # try to serve file directly, fallback to index.php try_files $uri /index.php$is_args$args; } location /bundles { try_files $uri =404; } location ~ ^/index.php(/|$) { fastcgi_pass unix:/var/run/php/php-fpm.sock; fastcgi_split_path_info ^(.+.php)(/.*)$; include fastcgi_params; # optionally set the value of the environment variables used in the application fastcgi_param APP_ENV prod; fastcgi_param APP_SECRET <app-secret-id>; fastcgi_param DATABASE_URL "mysql://db_user:db_pass@host:3306/db_name"; fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name; fastcgi_param DOCUMENT_ROOT $realpath_root; internal; } location ~ .php$ { return 404; } error_log /var/log/nginx/project_error.log; access_log /var/log/nginx/project_access.log; }
First, we need to install PHP dependencies.
composer require --dev mrsuh/php-bison-skeleton composer require mrsuh/tree-printer composer require doctrine/lexer
- mrsuh/php-bison-skeleton – to build PHP parser with Bison
- mrsuh/tree-printer – to print
AST
- doctrine/lexer – to parse text into tokens
We will store our files like this:
. ├── /ast-parser ├── /bin │ └── parse.php # entry point to parse nginx configs ├── /lib │ └── parser.php # generated file ├── /src │ ├── Lexer.php │ └── Node.php # AST node └── grammar.y
The Node
class must implement MrsuhTreeNodeInterface
to print AST
.
src/Node.php
<?php namespace App; use MrsuhTreeNodeInterface; class Node implements NodeInterface { private string $name; /** @var array<string, mixed> */ private array $attributes; /** @var Node[] */ private array $children; public function __construct(string $name, array $attributes = [], array $children = []) { $this->name = $name; $this->attributes = $attributes; $this->children = $children; } public function getChildren(): array { return $this->children; } public function __toString(): string { $line = $this->name; if (!empty($this->attributes)) { $line .= ' {'; foreach ($this->attributes as $key => $value) { $line .= sprintf( " %s: '%s'", $key, is_array($value) ? implode(', ', $value) : $value ); } $line .= ' }'; } return $line; } }
This time I'll use Doctrine lexer library. It can help to parse complex text.
src/Lexer.php
<?php namespace App; use DoctrineCommonLexerAbstractLexer; class Lexer extends AbstractLexer implements LexerInterface { public function __construct($resource) { $this->setInput(stream_get_contents($resource)); $this->moveNext(); } protected function getCatchablePatterns(): array { return [';']; } protected function getNonCatchablePatterns(): array { return [' ','[
]+','#[^
]+']; // skip spaces, eol, and comments } protected function getType(&$value): int { switch ($value) { case 'server': return LexerInterface::T_SERVER; case 'server_name': return LexerInterface::T_SERVER_NAME; ... } return ord($value); } public function yyerror(string $message): void { printf("%s
", $message); } public function getLVal() { return $this->token->value; } public function yylex(): int { if (!$this->lookahead) { return LexerInterface::YYEOF; } $this->moveNext(); return $this->token->type; } }
For example, Lexer
will translate the Nginx config below
server { server_name domain.tld www.domain.tld; root /var/www/project/public; location / { # try to serve file directly, fallback to index.php try_files $uri /index.php$is_args$args; } }
into this:
word | token |
---|---|
server | LexerInterface::T_SERVER (258) |
{ | ASCII (123) |
server_name | LexerInterface::T_SERVER_NAME (259) |
domain.tld | LexerInterface::T_SERVER_NAME_VALUE (260) |
http://www.domain.tld | LexerInterface::T_SERVER_NAME_VALUE (260) |
; | ASCII (59) |
root | LexerInterface::T_SERVER_ROOT (261) |
/var/www/project/public | LexerInterface::T_SERVER_ROOT_PATH (262) |
; | ASCII (59) |
location | LexerInterface::T_LOCATION (263) |
/ | ASCII (264) |
{ | ASCII (123) |
try_files | LexerInterface::T_TRY_FILES (283) |
$uri | LexerInterface::T_TRY_FILES_PATH (284) |
/index.php$is_args$args | LexerInterface::T_TRY_FILES_PATH (284) |
; | ASCII (59) |
} | ASCII (125) |
} | ASCII (125) |
LexerInterface::YYEOF (0) |
Time to create grammar.y
file and build lib/parser.php
We will use block %code parser
to define variables and methods to store AST
into the Parser
class.
You can find full grammar file here.
grammar.y
%define api.parser.class {Parser} %define api.namespace {App} %code parser { private Node $ast; public function setAst(Node $ast): void { $this->ast = $ast; } public function getAst(): Node { return $this->ast; } } %token T_SERVER %token T_SERVER_NAME %token T_SERVER_NAME_VALUE %token T_SERVER_ROOT %token T_SERVER_ROOT_PATH ... %token T_TRY_FILES %token T_TRY_FILES_PATH %% server: T_SERVER '{' server_body_list '}' { self::setAst(new Node('T_SERVER', [], $3)); } ; server_name_values: T_SERVER_NAME_VALUE { $$ = [$1]; } | server_name_values T_SERVER_NAME_VALUE { $$ = $1; $$[] = $2; } ; server_body: T_SERVER_NAME server_name_values ';' { $$ = new Node('T_SERVER_NAME', ['names' => $2]); } | T_SERVER_ROOT T_SERVER_ROOT_PATH ';' { $$ = new Node('T_SERVER_ROOT', ['path' => $2]); } | T_ERROR_LOG T_ERROR_LOG_PATH ';' { $$ = new Node('T_ERROR_LOG', ['path' => $2]); } | T_ACCESS_LOG T_ACCESS_LOG_PATH ';' { $$ = new Node('T_ACCESS_LOG', ['path' => $2]); } ; ...
bison -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 -o lib/parser.php grammar.y
Command options:
-
-S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4
– path toskeleton
file -
-o parser.php
– output parser file -
grammar.y
– our grammar file
The final PHP file is the entry point for the parser.
bin/parse.php
<?php require_once __DIR__ . '/../vendor/autoload.php'; use AppParser; use AppLexer; use MrsuhTreePrinter; $lexer = new Lexer(fopen($argv[1], 'r')); $parser = new Parser($lexer); if (!$parser->parse()) { exit(1); } $printer = new Printer(); $printer->print($parser->getAst());
Autoload for generated lib/parser.php
file.
composer.json
{ "autoload": { "psr-4": { "App\": "src/" }, "files": ["lib/parser.php"] }, ... }
Finally, we can test our parser.
php bin/parse.php nginx.conf . ├── T_SERVER ├── T_SERVER_NAME { names: 'domain.tld, www.domain.tld' } ├── T_SERVER_ROOT { path: '/var/www/project/public' } ├── T_LOCATION { regexp: '' path: '/' } │ └── T_TRY_FILES { paths: '$uri, /index.php$is_args$args' } ├── T_LOCATION { regexp: '' path: '/bundles' } │ └── T_TRY_FILES { paths: '$uri, =404' } ├── T_LOCATION { regexp: '~' path: '^/index.php(/|$)' } │ ├── T_FAST_CGI_PATH { path: 'unix:/var/run/php/php-fpm.sock' } │ ├── T_FAST_CGI_SPLIT_PATH_INFO { path: '^(.+.php)(/.*)$' } │ ├── T_INCLUDE { path: 'fastcgi_params' } │ ├── T_FAST_CGI_PARAM { APP_ENV: 'prod' } │ ├── T_FAST_CGI_PARAM { APP_SECRET: '<app-secret-id>' } │ ├── T_FAST_CGI_PARAM { DATABASE_URL: '"mysql://db_user:db_pass@host:3306/db_name"' } │ ├── T_FAST_CGI_PARAM { SCRIPT_FILENAME: '$realpath_root$fastcgi_script_name' } │ ├── T_FAST_CGI_PARAM { DOCUMENT_ROOT: '$realpath_root' } │ └── T_INTERNAL ├── T_LOCATION { regexp: '~' path: '.php$' } │ └── T_RETURN { code: '404' body: '' } ├── T_ERROR_LOG { path: '/var/log/nginx/project_error.log' } └── T_ACCESS_LOG { path: '/var/log/nginx/project_access.log' }
It works!
You can get the parser source code here and test it by yourself.
Some useful links: