Yesterday, I wrote about the why of making a new syntax highlighter . Today I want to write about the how .
Let's explain how
tempest/highlight
works by implementing a new language —
Blade
is a good candidate. It looks something like this:
@if(! empty($items)) <div class="container"> Items: {{ count($items) }}. </div> @endslot
In order to build such a new language, you need to understand three concepts of how code is highlighted: patterns , injections , and languages .
A
pattern
represents part of code that should be highlighted. A
pattern
can target a single keyword like
return
or
class
, or it could be any part of code, like for example a comment:
/* this is a comment */
or an attribute:
#[Get(uri: '/')]
.
Each
pattern
is represented by a simple class that provides a regex pattern, and a
TokenType
. The regex pattern is used to match relevant content to this specific
pattern
, while the
TokenType
is an enum value that will determine how that specific
pattern
is colored.
Here's an example of a simple pattern to match the namespace of a PHP file:
use Tempest\Highlight\IsPattern; use Tempest\Highlight\Pattern; use Tempest\Highlight\Tokens\TokenType; final readonly class NamespacePattern implements Pattern { use IsPattern; public function getPattern(): string { return 'namespace (?<match>[\w\\\\]+)'; } public function getTokenType(): TokenType { return TokenType::TYPE; } }
Note that each pattern must include a regex capture group that's named
match
. The content that matched within this group will be highlighted.
For example, this regex
namespace (?<match>[\w\\\\]+)
says that every line starting with
namespace
should be taken into account, but only the part within the named group
(?<match>…)
will actually be colored. In practice that means that the namespace name matching
[\w\\\\]+
, will be colored.
Yes, you'll need some basic knowledge of regex. Head over to https://regexr.com/ if you need help, or take a look at the existing patterns in this repository.
In summary:
match
, which is written like so
(?<match>…)
, this group represents the code that will actually be highlighted.TokenType
, which is used to determine the highlight style for the specific match.Once you've understood patterns, the next step is to understand injections . Injections are used to highlight different languages within one code block. For example: HTML could contain CSS, which should be styled properly as well.
An injection will tell the highlighter that it should treat a block of code as a different language. For example:
<div> <x-slot name="styles"> <style> body { background-color: red; } </style> </x-slot> </div>
Everything within
<style></style>
tags should be treated as CSS. That's done by injection classes:
use Tempest\Highlight\Highlighter; use Tempest\Highlight\Injection; use Tempest\Highlight\IsInjection; use Tempest\Highlight\ParsedInjection; final readonly class CssInjection implements Injection { use IsInjection; public function getPattern(): string { return '<style>(?<match>(.|\n)*)<\/style>'; } public function parseContent(string $content, Highlighter $highlighter): ParsedInjection { return new ParsedInjection( content: $highlighter->parse($content, 'css') ); } }
Just like patterns, an
injection
must provide a pattern. This pattern, for example, will match anything between style tags:
<style>(?<match>(.|\n)*)<\/style>
.
The second step in providing an
injection
is to parse the matched content into another language. That's what the
parseContent()
method is for. In this case, we'll get all code between the style tags that was matched with the named
(?<match>…)
group, and parse that content as CSS instead of whatever language we're currently dealing with.
In summary:
match
, which is written like so:
(?<match>…)
.The last concept to understand:
languages
are classes that bring
patterns
and
injections
together. Take a look at the
HtmlLanguage
, for example:
class HtmlLanguage extends BaseLanguage { public function getName(): string { return 'html'; } public function getAliases(): array { return ['htm', 'xhtml']; } public function getInjections(): array { return [ ...parent::getInjections(), new PhpInjection(), new PhpShortEchoInjection(), new CssInjection(), new CssAttributeInjection(), ]; } public function getPatterns(): array { return [ ...parent::getPatterns(), new OpenTagPattern(), new CloseTagPattern(), new TagAttributePattern(), new HtmlCommentPattern(), ]; } }
This
HtmlLanguage
class specifies the following things:
<?=
and longer
<?php
tagsOn top of that, it extends from
BaseLanguage
. This is a language class that adds a bunch of cross-language injections, such as blurs and highlights. Your language doesn't
need
to extend from
BaseLanguage
and could implement
Language
directly if you want to.
With these three concepts in place, let's bring everything together to explain how you can add your own languages.
So we're adding
Blade
support. We could create a new language class and start from scratch, but it'd probably be easier to extend an existing language,
HtmlLanguage
is probably the best. Let create a new
BladeLanguage
class that extends from
HtmlLanguage
:
class BladeLanguage extends HtmlLanguage { public function getName(): string { return 'blade'; } public function getAliases(): array { return []; } public function getInjections(): array { return [ ...parent::getInjections(), ]; } public function getPatterns(): array { return [ ...parent::getPatterns(), ]; } }
With this class in place, we can start adding our own patterns and injections. Let's start with adding a pattern that matches all Blade keywords, which are always prepended with the
@
sign. Let's add it:
final readonly class BladeKeywordPattern implements Pattern { use IsPattern; public function getPattern(): string { return '(?<match>\@[\w]+)\b'; } public function getTokenType(): TokenType { return TokenType::KEYWORD; } }
And register it in our
BladeLanguage
class:
public function getPatterns(): array { return [ ...parent::getPatterns(), new BladeKeywordPattern(), ]; }
Next, there are a couple of places within Blade where you can write PHP code: within the
@php
keyword, as well as within keyword brackets:
@if (count(…))
. Let's write two injections for that:
final readonly class BladePhpInjection implements Injection { use IsInjection; public function getPattern(): string { return '\@php(?<match>(.|\n)*?)\@endphp'; } public function parseContent(string $content, Highlighter $highlighter): ParsedInjection { return new ParsedInjection( content: $highlighter->parse($content, 'php') ); } }
final readonly class BladeKeywordInjection implements Injection { use IsInjection; public function getPattern(): string { return '(\@[\w]+)\s?\((?<match>.*)\)'; } public function parseContent(string $content, Highlighter $highlighter): ParsedInjection { return new ParsedInjection( content: $highlighter->parse($content, 'php') ); } }
Let's add these to our
BladeLanguage
class as well:
public function getInjections(): array { return [ ...parent::getInjections(), new BladePhpInjection(), new BladeKeywordInjection(), ]; }
Next, you can write
{{ … }}
and
{!! … !!}
to echo output. Whatever is between these brackets is also considered PHP, so, one more injection:
final readonly class BladeEchoInjection implements Injection { use IsInjection; public function getPattern(): string { return '({{|{!!)(?<match>.*)(}}|!!})'; } public function parseContent(string $content, Highlighter $highlighter): ParsedInjection { return new ParsedInjection( content: $highlighter->parse($content, 'php') ); } }
And, finally, you can write Blade comments like so:
{{-- --}}
, this can be a simple pattern:
final readonly class BladeCommentPattern implements Pattern { use IsPattern; public function getPattern(): string { return '(?<match>\{\{\-\-(.|\n)*?\-\-\}\})'; } public function getTokenType(): TokenType { return TokenType::COMMENT; } }
With all of that in place, the only thing left to do is to add our language to the highlighter:
$highlighter->addLanguage(new BladeLanguage());
And we're done! Blade support with just a handful of patterns and injections!