Skip to main content

Shake Parser Structures Specification

§ 1 Definitions

§ 1.1 Tokens

A token is a sequence of characters that form a meaningful unit in a program. It holds the following information:

  • The type of the token
  • The position of the token in the source code (Start and End index in the source code)
  • The value of the token (if applicable)

§ 1.1.1 Token Types

There are two variants of token types, there are types that always have the same value and therefore do not need to store the value in the token (e.g. SEMICOLON or any keyword) and there are types that can have different values (e.g. INTEGER or IDENTIFIER). The latter ones store the value in the token. They are implemented as an enum in the lexer package

§ 1.1.2 Token Values

Not every token needs to hold a value. Only those, that can have different values (e.g. INTEGER or IDENTIFIER) need to hold a value. The value is stored as a string in the token.

§ 1.2 Structure

A structure is a sequence of tokens that form a meaningful unit in a program.

§ 2 Structures

In this section we will define all the different structures that we refer to in the parser. When we describe structures, we wil wrap token types in square brackets (<>) to indicate that they are tokens. We will refer to other structures by their name and paragraph number and wrap them in curly brackets ({}).

§ 2.0 Help Structures

§ Access Modifier

An access modifier is a sequence of tokens that define the access modifier of a structure. It can be either public, private, or protected.

Sample code:

public

Token Definition:

<KEYWORD_PUBLIC> | <KEYWORD_PRIVATE> | <KEYWORD_PROTECTED>

§ Type

A type is a sequence of tokens that define the type of a structure. It can be either byte, short, int, long, float, double, boolean or char. Additionally it can be unsigned (if it is a byte, short, int or long). It can also be an identifier to reference object oriented types.

Sample code:

int

Token Definition:

<IDENTIFIER>

There are also some special identifiers reserved for the primitive types:

IdentifierType
byteByte (int8)
shortShort (int16)
intInt (int32)
longLong (int64)
floatFloat (float32)
doubleDouble (float64)
booleanBoolean
charChar
ubyteUByte (uint8)
ushortUShort (uint16)
uintUInt (uint32)
ulongULong (uint64)

§ 2.1 Statements

A statement refers to a structure that can be executed.

§ 2.2 Values

A value refers to a structure that can be used as a value.

§ 2.3 Expressions

Lets talk about all the different structures that we refer to as expressions

§ Number

A number is a sequence of digits that form a number. It can be either an integer or a floating point number. It is represented by either an INTEGER or a FLOAT token.

Sample code:

1
1.0

Token Definition:

<INTEGER> | <FLOAT>

§ String

A string is a sequence of characters that form a string. It is represented by a STRING token.

Sample code:

"Hello World"

Token Definition:

<STRInG>

§ Identifier

An identifier is a sequence of characters that form a name. It is represented by an IDENTIFIER token.

Sample code:

hello

Token Definition:

<IDENTIFIER>

§ Identifier 2

If your identifier contains special characters, you can use the following syntax:

`hello world`

Token Definition:

`<IDENTIFIER>`

§ Group

A group is a sequence of tokens that are surrounded by parentheses.

Sample code:

(...)

Token Definition:

<LPAREN> {§2.2 Values} <RPAREN>

§ Operators

§ Addition

An addition is a sequence of two values that are separated by a + token.

Sample code:

1 + 2

Token Definition:

{§2.2 Values} <ADD> {§2.2 Values}
§ Subtraction

A subtraction is a sequence of two values that are separated by a - token.

Sample code:

1 - 2

Token Definition:

{§2.2 Values} <SUB> {§2.2 Values}
§ Multiplication

A multiplication is a sequence of two values that are separated by a * token.

Sample code:

1 * 2

Token Definition:

{§2.2 Values} <MUL> {§2.2 Values}
§ Division

A division is a sequence of two values that are separated by a / token.

Sample code:

1 / 2

Token Definition:

{§2.2 Values} <DIV> {§2.2 Values}
§ Modulo

A modulo is a sequence of two values that are separated by a % token.

Sample code:

1 % 2

Token Definition:

{§2.2 Values} <MOD> {§2.2 Values}
§ Power

A power is a sequence of two values that are separated by a ** token.

Sample code:

1 ** 2

Token Definition:

{§2.2 Values} <POW> {§2.2 Values}
§ Bitwise And

A bitwise and is a sequence of two values that are separated by a & token.

Sample code:

1 & 2

Token Definition:

{§2.2 Values} <BIT_AND> {§2.2 Values}
§ Bitwise Or

A bitwise or is a sequence of two values that are separated by a | token.

Sample code:

1 | 2

Token Definition:

{§2.2 Values} <BIT_OR> {§2.2 Values}
§ Bitwise Xor

A bitwise xor is a sequence of two values that are separated by a ^ token.

Sample code:

1 ^ 2

Token Definition:

{§2.2 Values} <BIT_XOR> {§2.2 Values}
§ Bitwise NAND

A bitwise nand is a sequence of two values that are separated by a ~& token.

Sample code:

1 ~& 2

Token Definition:

{§2.2 Values} <BIT_NAND> {§2.2 Values}
§ Bitwise NOR

A bitwise nor is a sequence of two values that are separated by a ~| token.

Sample code:

1 ~| 2

Token Definition:

{§2.2 Values} <BIT_NOR> {§2.2 Values}
§ Bitwise XNOR

A bitwise xnor is a sequence of two values that are separated by a ~^ token.

Sample code:

1 ~^ 2
§ Bitwise Not

A bitwise not is a sequence of two values that are separated by a ~ token.

Sample code:

~1

Token Definition:

<BIT_NOT> {§2.2 Values}
§ Bitwise Shift Left (shl)

A bitwise shift left is a sequence of two values that are separated by a << token.

Sample code:

1 << 2

Token Definition:

{§2.2 Values} <BIT_SHIFT_LEFT> {§2.2 Values}
§ Bitwise Shift Right (shr)

A bitwise shift right is a sequence of two values that are separated by a >> token.

Sample code:

1 >> 2

Token Definition:

{§2.2 Values} <BIT_SHIFT_RIGHT> {§2.2 Values}
§ Bitwise Unsigned Shift Right (ushr)

A bitwise unsigned shift right is a sequence of two values that are separated by a >>> token.

Sample code:

1 >>> 2

Token Definition:

{§2.2 Values} <BIT_UNSIGNED_SHIFT_RIGHT> {§2.2 Values}
§ Unary Plus

A unary minus is a sequence of two values that are separated by a + token.

Sample code:

+1

Token Definition:

<ADD> {§2.2 Values}
§ Unary Minus

A unary minus is a sequence of two values that are separated by a - token.

Sample code:

-1

Token Definition:

<SUB> {§2.2 Values}
§ Logical And

A logical and is a sequence of two values that are separated by a && token.

Sample code:

1 && 2

Token Definition:

{§2.2 Values} <LOGICAL_AND> {§2.2 Values}
§ Logical Or

A logical or is a sequence of two values that are separated by a || token.

Sample code:

1 || 2

Token Definition:

{§2.2 Values} <LOGICAL_OR> {§2.2 Values}
§ Logical Xor

A logical xor is a sequence of two values that are separated by a ^^ token.

Sample code:

1 ^^ 2

Token Definition:

{§2.2 Values} <LOGICAL_XOR> {§2.2 Values}
§ Logical Not

A logical not is a sequence of two values that are separated by a ! token.

Sample code:

!1

Token Definition:

<LOGICAL_NOT> {§2.2 Values}
§ Equals

An equals is a sequence of two values that are separated by a == token.

Sample code:

1 == 2

Token Definition:

{§2.2 Values} <EQUALS> {§2.2 Values}
§ Not Equals

A not equals is a sequence of two values that are separated by a != token.

Sample code:

1 != 2

Token Definition:

{§2.2 Values} <NOT_EQUALS> {§2.2 Values}
§ Greater Than

A greater than is a sequence of two values that are separated by a > token.

Sample code:

1 > 2

Token Definition:

{§2.2 Values} <GREATER_THAN> {§2.2 Values}
§ Greater Than Or Equals

A greater than or equals is a sequence of two values that are separated by a >= token.

Sample code:

1 >= 2

Token Definition:

{§2.2 Values} <GREATER_THAN_OR_EQUALS> {§2.2 Values}
§ Less Than

A less than is a sequence of two values that are separated by a < token.

Sample code:

1 < 2

Token Definition:

{§2.2 Values} <LESS_THAN> {§2.2 Values}
§ Less Than Or Equals

A less than or equals is a sequence of two values that are separated by a <= token.

Sample code:

1 <= 2

Token Definition:

{§2.2 Values} <LESS_THAN_OR_EQUALS> {§2.2 Values}

§ 2.4 Variable Declaration [Local]

Var

A variable declaration is a sequence of tokens that declare a variable. It consists of a type, a name and an optional value.

Sample code:

var a: int
var b = 1

Token Definition:

<KEYWORD_VAR> <IDENTIFIER> (<COLON> {Type})? (<ASSIGN> {§2.2 Values})?

Val

A value declaration is a sequence of tokens that declare a value. It consists of a type, a name and a value.

Sample code:

val a: int = 1

Token Definition:

<KEYWORD_VAL> <IDENTIFIER> <COLON> {Type} <ASSIGN> {§2.2 Values}

§ 2.5 Field Declaration [Global]

Var

A field declaration is a sequence of tokens that declare a field. It consists of a type, a name and an optional value.

Sample code:

var a: int
var b = 1

Token Definition

{Access} <KEYWORD_VAR> <IDENTIFIER> (<COLON> {Type})? (<ASSIGN> {§2.2 Values})?

Val

A value declaration is a sequence of tokens that declare a value. It consists of a type, a name and a value.

Sample code:

val a: int = 1

Token Definition:

{Access} <KEYWORD_VAL> <IDENTIFIER> <COLON> {Type} <ASSIGN> {§2.2 Values}

§ 2.6 Function Declaration

A function declaration is a sequence of tokens that declare a function. It consists of an optional access modifier, a type, a name, a list of parameters and a body.

Sample code:

public int add(int a, int b) {
return a + b
}

Token Definition:

{Access}
{Type}
<IDENTIFIER>
<LPAREN>
(
{§2.4 Variable Declaration [Local]}
(
<COMMA>
{§2.4 Variable Declaration [Local]}
)*
)?
<RPAREN> {Block}

§ 2.7 Class Declaration

A class declaration is a sequence of tokens that declare a class. It consists of an optional access modifier, a name, a list of fields and a list of functions.

Sample code:

public class Test {
int a
int b = 1

public int add(int a, int b) {
return a + b
}
}

Token Definition:

{Access}
<KEYWORD_CLASS>
<IDENTIFIER>
<LCURLY>
({§2.5 Field Declaration [Global]}{§2.6 Function Declaration} | {§2.7 Class Declaration} | §2.7.1 Constructor Declaration)*
<RCURLY>

§ Constructor Declaration

A constructor declaration is a sequence of tokens that declare a constructor. It consists of an optional access modifier, the constructor keyword, optionally a name, a list of parameters and a body.

Sample code:

constructor() {
this.a = 1
}

constructor test(int a) {
this.a = a
}

Token Definition:

{Access}
<KEYWORD_CONSTRUCTOR>
<IDENTIFIER>?
<LPAREN>
(
{§2.4 Variable Declaration [Local]}
(
<COMMA>
{§2.4 Variable Declaration [Local]}
)*
)?
<RPAREN> {Block}

§ 2.8 Block

A block is a sequence of tokens that form a block. It consists of a list of statements. It can either start with a { token and contain a list of statements and end with a } token or it can be a single statement.

Sample code:

{
int a = 1
int b = 2
}

Token Definition:

<LCURLY>
({§2.1 Statements})*
<RCURLY>

§ 2.9 If Statement

An if statement is a sequence of tokens that form an if statement. It consists of a condition and a body. It can also have an else branch.

Sample code:

if (a == 1) {
return 1
} else {
return 2
}

Token Definition:

<KEYWORD_IF> <LPAREN> {§2.2 Values} <RPAREN> {Block} (<KEYWORD_ELSE> {Block})?

§ 2.10 While Statement

A while statement is a sequence of tokens that form a while statement. It consists of a condition and a body.

Sample code:

while (a == 1) {
return 1
}

Token Definition:

<KEYWORD_WHILE> <LPAREN> {§2.2 Values} <RPAREN> {Block}

§ 2.11 For Statement

A for statement is a sequence of tokens that form a for statement. It consists of a variable declaration, a condition, an increment and a body.

Sample code:

for (int i = 0; i < 10; i = i + 1) {
return i
}

Token Definition:

<KEYWORD_FOR> <LPAREN> {§2.1 Statements} <SEMICOLON> {§2.2 Values} <SEMICOLON> {§2.1 Statements} <RPAREN> {Block}

§ 2.12 Return Statement

A return statement is a sequence of tokens that form a return statement. It consists of a value.

Sample code:

return 1

Token Definition:

<KEYWORD_RETURN> {§2.2 Values}

§ 2.13 Break Statement

A break statement is a sequence of tokens that form a break statement.

Sample code:

break

Token Definition:

<KEYWORD_BREAK>

§ 2.14 Continue Statement

A continue statement is a sequence of tokens that form a continue statement.

Sample code:

continue

Token Definition:

<KEYWORD_CONTINUE>

§ 2.15 Import Statement

An import statement is a sequence of tokens that form an import statement. It consists of a list of identifiers.

Sample code:

import io.github.shakelang.shake

Token Definition:

<KEYWORD_IMPORT> <IDENTIFIER> (<DOT> <IDENTIFIER>)* (<DOT> <STAR>)?

§ 2.16 Package Statement

A package statement is a sequence of tokens that form a package statement. It consists of a list of identifiers.

Sample code:

package io.github.shakelang.shake

Token Definition:

<KEYWORD_PACKAGE> <IDENTIFIER> (<DOT> <IDENTIFIER>)*