Shake Parser Structures Specification
§ 1 Definitions
§ 1.1 Tokens
A token is a sequence of characters that form a meaningful unit in a program. It holds the following information:
- The type of the token
- The position of the token in the source code (Start and End index in the source code)
- The value of the token (if applicable)
§ 1.1.1 Token Types
There are two variants of token types, there are types that always have the same value and therefore do not need to store the value in the token (e.g. SEMICOLON
or any keyword) and there are types that can have different values (e.g. INTEGER
or IDENTIFIER
). The latter ones store the value in the token. They are implemented as an enum in the lexer
package
§ 1.1.2 Token Values
Not every token needs to hold a value. Only those, that can have different values (e.g. INTEGER
or IDENTIFIER
) need to hold a value. The value is stored as a string in the token.
§ 1.2 Structure
A structure is a sequence of tokens that form a meaningful unit in a program.
§ 2 Structures
In this section we will define all the different structures that we refer to in the parser.
When we describe structures, we wil wrap token types in square brackets (<>
) to indicate that they are tokens.
We will refer to other structures by their name and paragraph number and wrap them in curly brackets ({}
).
§ 2.0 Help Structures
§ Access Modifier
An access modifier is a sequence of tokens that define the access modifier of a structure. It can be either public
, private
, or protected
.
Sample code:
public
Token Definition:
<KEYWORD_PUBLIC> | <KEYWORD_PRIVATE> | <KEYWORD_PROTECTED>
§ Type
A type is a sequence of tokens that define the type of a structure. It can be either byte
, short
, int
, long
, float
, double
, boolean
or char
. Additionally it can be unsigned (if it is a byte
, short
, int
or long
).
It can also be an identifier
to reference object oriented types.
Sample code:
int
Token Definition:
<IDENTIFIER>
There are also some special identifiers reserved for the primitive types:
Identifier | Type |
---|---|
byte | Byte (int8) |
short | Short (int16) |
int | Int (int32) |
long | Long (int64) |
float | Float (float32) |
double | Double (float64) |
boolean | Boolean |
char | Char |
ubyte | UByte (uint8) |
ushort | UShort (uint16) |
uint | UInt (uint32) |
ulong | ULong (uint64) |
§ 2.1 Statements
A statement refers to a structure that can be executed.
§ 2.2 Values
A value refers to a structure that can be used as a value.
§ 2.3 Expressions
Lets talk about all the different structures that we refer to as expressions
§ Number
A number is a sequence of digits that form a number. It can be either an integer or a floating point number. It is represented by either an INTEGER
or a FLOAT
token.
Sample code:
1
1.0
Token Definition:
<INTEGER> | <FLOAT>
§ String
A string is a sequence of characters that form a string. It is represented by a STRING
token.
Sample code:
"Hello World"
Token Definition:
<STRInG>
§ Identifier
An identifier is a sequence of characters that form a name. It is represented by an IDENTIFIER
token.
Sample code:
hello
Token Definition:
<IDENTIFIER>
§ Identifier 2
If your identifier contains special characters, you can use the following syntax:
`hello world`
Token Definition:
`<IDENTIFIER>`
§ Group
A group is a sequence of tokens that are surrounded by parentheses.
Sample code:
(...)
Token Definition:
<LPAREN> {§2.2 Values} <RPAREN>
§ Operators
§ Addition
An addition is a sequence of two values that are separated by a +
token.
Sample code:
1 + 2
Token Definition:
{§2.2 Values} <ADD> {§2.2 Values}
§ Subtraction
A subtraction is a sequence of two values that are separated by a -
token.
Sample code:
1 - 2
Token Definition:
{§2.2 Values} <SUB> {§2.2 Values}
§ Multiplication
A multiplication is a sequence of two values that are separated by a *
token.
Sample code:
1 * 2
Token Definition:
{§2.2 Values} <MUL> {§2.2 Values}
§ Division
A division is a sequence of two values that are separated by a /
token.
Sample code:
1 / 2
Token Definition:
{§2.2 Values} <DIV> {§2.2 Values}
§ Modulo
A modulo is a sequence of two values that are separated by a %
token.
Sample code:
1 % 2
Token Definition:
{§2.2 Values} <MOD> {§2.2 Values}
§ Power
A power is a sequence of two values that are separated by a **
token.
Sample code:
1 ** 2
Token Definition:
{§2.2 Values} <POW> {§2.2 Values}
§ Bitwise And
A bitwise and is a sequence of two values that are separated by a &
token.
Sample code:
1 & 2
Token Definition:
{§2.2 Values} <BIT_AND> {§2.2 Values}
§ Bitwise Or
A bitwise or is a sequence of two values that are separated by a |
token.
Sample code:
1 | 2
Token Definition:
{§2.2 Values} <BIT_OR> {§2.2 Values}
§ Bitwise Xor
A bitwise xor is a sequence of two values that are separated by a ^
token.
Sample code:
1 ^ 2
Token Definition:
{§2.2 Values} <BIT_XOR> {§2.2 Values}
§ Bitwise NAND
A bitwise nand is a sequence of two values that are separated by a ~&
token.
Sample code:
1 ~& 2
Token Definition:
{§2.2 Values} <BIT_NAND> {§2.2 Values}
§ Bitwise NOR
A bitwise nor is a sequence of two values that are separated by a ~|
token.
Sample code:
1 ~| 2
Token Definition:
{§2.2 Values} <BIT_NOR> {§2.2 Values}
§ Bitwise XNOR
A bitwise xnor is a sequence of two values that are separated by a ~^
token.
Sample code:
1 ~^ 2
§ Bitwise Not
A bitwise not is a sequence of two values that are separated by a ~
token.
Sample code:
~1
Token Definition:
<BIT_NOT> {§2.2 Values}
§ Bitwise Shift Left (shl)
A bitwise shift left is a sequence of two values that are separated by a <<
token.
Sample code:
1 << 2
Token Definition:
{§2.2 Values} <BIT_SHIFT_LEFT> {§2.2 Values}
§ Bitwise Shift Right (shr)
A bitwise shift right is a sequence of two values that are separated by a >>
token.
Sample code:
1 >> 2
Token Definition:
{§2.2 Values} <BIT_SHIFT_RIGHT> {§2.2 Values}
§ Bitwise Unsigned Shift Right (ushr)
A bitwise unsigned shift right is a sequence of two values that are separated by a >>>
token.
Sample code:
1 >>> 2
Token Definition:
{§2.2 Values} <BIT_UNSIGNED_SHIFT_RIGHT> {§2.2 Values}
§ Unary Plus
A unary minus is a sequence of two values that are separated by a +
token.
Sample code:
+1
Token Definition:
<ADD> {§2.2 Values}
§ Unary Minus
A unary minus is a sequence of two values that are separated by a -
token.
Sample code:
-1
Token Definition:
<SUB> {§2.2 Values}
§ Logical And
A logical and is a sequence of two values that are separated by a &&
token.
Sample code:
1 && 2
Token Definition:
{§2.2 Values} <LOGICAL_AND> {§2.2 Values}
§ Logical Or
A logical or is a sequence of two values that are separated by a ||
token.
Sample code:
1 || 2
Token Definition:
{§2.2 Values} <LOGICAL_OR> {§2.2 Values}
§ Logical Xor
A logical xor is a sequence of two values that are separated by a ^^
token.
Sample code:
1 ^^ 2
Token Definition:
{§2.2 Values} <LOGICAL_XOR> {§2.2 Values}
§ Logical Not
A logical not is a sequence of two values that are separated by a !
token.
Sample code:
!1
Token Definition:
<LOGICAL_NOT> {§2.2 Values}
§ Equals
An equals is a sequence of two values that are separated by a ==
token.
Sample code:
1 == 2
Token Definition:
{§2.2 Values} <EQUALS> {§2.2 Values}
§ Not Equals
A not equals is a sequence of two values that are separated by a !=
token.
Sample code:
1 != 2
Token Definition:
{§2.2 Values} <NOT_EQUALS> {§2.2 Values}
§ Greater Than
A greater than is a sequence of two values that are separated by a >
token.
Sample code:
1 > 2
Token Definition:
{§2.2 Values} <GREATER_THAN> {§2.2 Values}
§ Greater Than Or Equals
A greater than or equals is a sequence of two values that are separated by a >=
token.
Sample code:
1 >= 2
Token Definition:
{§2.2 Values} <GREATER_THAN_OR_EQUALS> {§2.2 Values}
§ Less Than
A less than is a sequence of two values that are separated by a <
token.
Sample code:
1 < 2
Token Definition:
{§2.2 Values} <LESS_THAN> {§2.2 Values}
§ Less Than Or Equals
A less than or equals is a sequence of two values that are separated by a <=
token.
Sample code:
1 <= 2
Token Definition:
{§2.2 Values} <LESS_THAN_OR_EQUALS> {§2.2 Values}
§ 2.4 Variable Declaration [Local]
Var
A variable declaration is a sequence of tokens that declare a variable. It consists of a type, a name and an optional value.
Sample code:
var a: int
var b = 1
Token Definition:
<KEYWORD_VAR> <IDENTIFIER> (<COLON> {Type})? (<ASSIGN> {§2.2 Values})?
Val
A value declaration is a sequence of tokens that declare a value. It consists of a type, a name and a value.
Sample code:
val a: int = 1
Token Definition:
<KEYWORD_VAL> <IDENTIFIER> <COLON> {Type} <ASSIGN> {§2.2 Values}
§ 2.5 Field Declaration [Global]
Var
A field declaration is a sequence of tokens that declare a field. It consists of a type, a name and an optional value.
Sample code:
var a: int
var b = 1
Token Definition
{Access} <KEYWORD_VAR> <IDENTIFIER> (<COLON> {Type})? (<ASSIGN> {§2.2 Values})?
Val
A value declaration is a sequence of tokens that declare a value. It consists of a type, a name and a value.
Sample code:
val a: int = 1
Token Definition:
{Access} <KEYWORD_VAL> <IDENTIFIER> <COLON> {Type} <ASSIGN> {§2.2 Values}
§ 2.6 Function Declaration
A function declaration is a sequence of tokens that declare a function. It consists of an optional access modifier, a type, a name, a list of parameters and a body.
Sample code:
public int add(int a, int b) {
return a + b
}
Token Definition:
{Access}
{Type}
<IDENTIFIER>
<LPAREN>
(
{§2.4 Variable Declaration [Local]}
(
<COMMA>
{§2.4 Variable Declaration [Local]}
)*
)?
<RPAREN> {Block}
§ 2.7 Class Declaration
A class declaration is a sequence of tokens that declare a class. It consists of an optional access modifier, a name, a list of fields and a list of functions.
Sample code:
public class Test {
int a
int b = 1
public int add(int a, int b) {
return a + b
}
}
Token Definition:
{Access}
<KEYWORD_CLASS>
<IDENTIFIER>
<LCURLY>
({§2.5 Field Declaration [Global]} | {§2.6 Function Declaration} | {§2.7 Class Declaration} | §2.7.1 Constructor Declaration)*
<RCURLY>
§ Constructor Declaration
A constructor declaration is a sequence of tokens that declare a constructor. It consists of an optional access modifier, the constructor keyword, optionally a name, a list of parameters and a body.
Sample code:
constructor() {
this.a = 1
}
constructor test(int a) {
this.a = a
}
Token Definition:
{Access}
<KEYWORD_CONSTRUCTOR>
<IDENTIFIER>?
<LPAREN>
(
{§2.4 Variable Declaration [Local]}
(
<COMMA>
{§2.4 Variable Declaration [Local]}
)*
)?
<RPAREN> {Block}
§ 2.8 Block
A block is a sequence of tokens that form a block. It consists of a list of statements. It can either start with a {
token and contain a list of statements and end with a }
token or it can be a single statement.
Sample code:
{
int a = 1
int b = 2
}
Token Definition:
<LCURLY>
({§2.1 Statements})*
<RCURLY>
§ 2.9 If Statement
An if statement is a sequence of tokens that form an if statement. It consists of a condition and a body. It can also have an else branch.
Sample code:
if (a == 1) {
return 1
} else {
return 2
}
Token Definition:
<KEYWORD_IF> <LPAREN> {§2.2 Values} <RPAREN> {Block} (<KEYWORD_ELSE> {Block})?
§ 2.10 While Statement
A while statement is a sequence of tokens that form a while statement. It consists of a condition and a body.
Sample code:
while (a == 1) {
return 1
}
Token Definition:
<KEYWORD_WHILE> <LPAREN> {§2.2 Values} <RPAREN> {Block}
§ 2.11 For Statement
A for statement is a sequence of tokens that form a for statement. It consists of a variable declaration, a condition, an increment and a body.
Sample code:
for (int i = 0; i < 10; i = i + 1) {
return i
}
Token Definition:
<KEYWORD_FOR> <LPAREN> {§2.1 Statements} <SEMICOLON> {§2.2 Values} <SEMICOLON> {§2.1 Statements} <RPAREN> {Block}
§ 2.12 Return Statement
A return statement is a sequence of tokens that form a return statement. It consists of a value.
Sample code:
return 1
Token Definition:
<KEYWORD_RETURN> {§2.2 Values}
§ 2.13 Break Statement
A break statement is a sequence of tokens that form a break statement.
Sample code:
break
Token Definition:
<KEYWORD_BREAK>
§ 2.14 Continue Statement
A continue statement is a sequence of tokens that form a continue statement.
Sample code:
continue
Token Definition:
<KEYWORD_CONTINUE>
§ 2.15 Import Statement
An import statement is a sequence of tokens that form an import statement. It consists of a list of identifiers.
Sample code:
import io.github.shakelang.shake
Token Definition:
<KEYWORD_IMPORT> <IDENTIFIER> (<DOT> <IDENTIFIER>)* (<DOT> <STAR>)?
§ 2.16 Package Statement
A package statement is a sequence of tokens that form a package statement. It consists of a list of identifiers.
Sample code:
package io.github.shakelang.shake
Token Definition:
<KEYWORD_PACKAGE> <IDENTIFIER> (<DOT> <IDENTIFIER>)*