Shake Lexer Spec
§ 1 Definition
§ 1.1 Lexer
The lexer is the first step in the compilation process. It takes the source code as input and outputs a list of tokens. The lexer is also called a tokenizer. The lexer is also responsible for removing comments and whitespace.
§ 1.2 Tokens
A token is a sequence of characters that form a meaningful unit in a program. It holds the following information: Learn more about tokens
§ 2 Lexer Implementation
The lexer is implemented in the lexer package. The main lexing logic ist implemented in ShakeLexingBase
§ 2.1 Token Ranking
Ranking will make some difference in the lexer implementation, e.g. *= must be checked before *. This is a sample ranking. Note that this is not the only possible ranking, but it is the one used in the lexer implementation.
- Skip whitespace (if existing) (eg.
,\t) - Line Separator (
\n) - Semicolon (
;) - Comma (
,) - Colon (
:) - Dot (
.) Numbers (floating point, integer, binary, hexadecimal) (eg.
1,1.0) Implementation by checking whether the first character is a digit (0-9). Does not capture signs (+,-)Identifier (eg.
a,b) Implementation by checking whether the first character is a letter (a-z,A-Z) or an underscore (_). It is allowed, but not required, to contain digits (0-9) as first character, as they are captured by the number token, so for example javascripts "\w" regex would be fine for this token.Identifier 2 (eg.
`a`,`b`) Implementation by checking whether the first character is a backtick (`).String (eg.
"abc") Implementation by checking whether the first character is a double quote (").Character (eg.
'a') Implementation by checking whether the first character is a single quote (').Skip single line comment (eg.
// abc) Implementation by checking whether the first two characters are a double slash (//). Then just skip until the next\n(Also works with windows line endings, as\ris skipped as well)Skip multi line comment (eg.
/* abc */) Implementation by checking whether the first two characters are a slash and a star (/*). Then just skip until the next*/- Pow Assignment (
**=) - Mod Assignment (
%=) - Div Assignment (
/=) - Mul Assignment (
*=) - Sub Assignment (
-=) - Add Assignment (
+=) - Increment (
++) - Decrement (
--) - Power (
**) - Modulo (
%) - Division (
/) - Multiplication (
*) - Subtraction (
-) - Addition (
+) - Logical OR (
||) - Logical AND (
&&) - Logical XOR (
^^) - Equals (
==) - Greater Than Or Equal (
>=) - Less Than Or Equal (
<=) - Not Equal (
!=) - Greater Than (
>) - Less Than (
<) - Not (
!) - Bitwise NAND (
~&) - Bitwise NOR (
~|) -
BIT_XNOR(~^) - Bitwise NOT (
~) - Bitwise AND (
&) - Bitwise OR (
|) - Bitwise XOR (
^) - Assignment (
=) - LParen (
() - RParen (
)) - LCurl (
{) - RCurl (
}) - LBracket (
[) - RBracket (
])