LearningQuery ::=
ExpQuantifier '(' Expression ')' '[' BoundType ']' Features? ':' PathType Expression Subjection
| ExpQuantifier '[' BoundType ']' Features? ':' PathType Expression Subjection
| ExpPrQuantifier '[' BoundType ']' Features? ':' PathType Expression Subjection
ExpQuantifier ::= ( minE | maxE )
ExpPrQuantifier ::= ( minPr | maxPr )
Features ::= '{' List '}' '->' '{' List '}'
Subjection ::=
// empty for no subjection
| under StrategyName
Features
See rail road diagram for the entire LearningQuery syntax.
minE(cost) [<=10] : <> goal
cost
expression within 10
time units or when goal
predicate becomes true given that the entire system state is observable.maxE(gain) [<=10] : <> goal
gain
expression withing 10
time units or when goal
predicate becomes true given that the entire system state is observable.The goal
predicate is deprecated, for best results use a predicate which stops together with the simulation bound, like t>=10
, where t
is a clock that is never reset.
minE(cost) [<=10] { i, j } -> { d, f } : <> goal
cost
expression within 10
time units or when goal
predicate becomes true. Where only the expressions i
, j
, d
and f
are observable. The {..} -> {..}
syntax controls what is observable.
On one hand, by observing only a partial state learning times can be significantly reduced and the strategy structure simplified.
On the other hand, the resulting strategy is not guaranteed to converge to a optimal solution under partial observability.
There are two types of observable state expressions: discrete and continuous.
The discrete are specified in the first bracket and the continuous in the second: {discrete expressions} -> {continuous expressions}
.
By default the entire state is considered during learning.Discrete expressions are observed as they are, i.e the query minE(cost) [<=10] { i, j } -> { } : <> goal
creates a strategy by only observing the values of i
and j
.
Continuous expressions are discretized using online partition refinement (see Teaching Stratego to Play Ball).
The query minE(cost) [<=10] { } -> { d, f } : <> goal
learns a strategy based on the discretized expressions d
and f
.
Integers, clocks, floating points or even arbitrary expressions can be used in either type of observabilty. However we suggest caution when using floating point numbers or clocks in discrete observability.
Process locations are ignored when specifying observability unless explicitly specified using location
keyword.
For example Cat.location
and Mouse.location
refer to the locations of Cat
and Mouse
processes.
The learning queries are usually used together with strategy assignment and refinement explained in Strategy Queries.