Lab 5: Name Analysis
Overview
In this lab, you define name bindings and corresponding constraints for MiniJava in NaBL2. The concepts you are going to use in NaBL2 are described in the following papers:
- P. Neron, A. Tolmach, E. Visser, G. Wachsmuth: A Theory of Name Resolution, ESOP 2015
- H. van Antwerpen, P. Neron, A. Tolmach, E. Visser, G. Wachsmuth: A Constraint Language for Static Semantic Analysis based on Scope Graphs, PEPM 2016
Documentation for NaBL2 can be found online at NaBL2 Documentation. For a complete overview of the language, use the NaBL2 Language Reference. The NaBL2 Configuration documentation describes how to configure logging and inspect the results of analysis.
Overview
Objectives
Specify name analysis for MiniJava in NaBL2, by specifying a scope graph and resolution constraints. The specification should include:
- Scope graph constraints for
- class declarations
- class references
- field declarations
- parameter declarations
- variable declarations
- field, parameter, and variable references
- Resolution constraints for
- class references
- parameter references
- field references
- variable references
- Constraints for
- duplicate definitions of classes, fields, parameters, and variables
- variable declarations which hide parameter or field declarations
- Properties on
- variable declarations to indicate whether they are fields, parameters, or local variables
Submission
You need to submit your MiniJava project with a pull request against branch assignment5
on GitHub.
The Git documentation explains how to file such a request.
The deadline for submissions is October 27th 2017, 23:59.
Grading
You can earn up to 100 points for the correctness of your name analysis. Therefore, we run several test cases against your implementation. You earn points, when your implementation passes test cases. The total number of points depends on how many test cases you pass in each of the following groups:
- name binding (45 points)
- class declarations and references (6 points)
- field declarations (6 points)
- parameter declarations (6 points)
- variable declarations and references (6 points)
- scopes of program, classes, and methods (12 points)
- variable origins (3 points)
- constraints (45 points)
- unresolved references (16 points)
- duplicate definitions (20 points)
- hiding variables and fields (9 points)
- challenge (10 points)
Early Feedback
We provide early feedback for your language implementation. This feedback gives you an indication which parts of your implementation might still be wrong. It includes a summary on how many tests you pass and how many points you earn by passing them. You have 3 early feedback attempts.
Detailed Instructions
Preliminaries
GitHub Repository
You continue with your work from the previous assignment. See the
Git documentation on
how to create the assignment5
branch from your previous work.
Constraint generation rules
Name binding is specified as constraint generation rules in .nabl2
files. NaBL2 files must go in
the trans/analysis
directory. The module name at the top of the file should match the filename relative to trans
. For example, the file trans/analysis/minijava.nabl2
starts as:
module analysis/minijava
The rules match on an AST constructor, and get one or more scopes as an argument. The initial
project contains an init
rule that creates the initial global scope. The general schema for these
rules looks like this:
rules
[[ <Pattern> ^ (<{Scope ","}*>) ]] :=
<Constraint>.
Multiple constraints are separated by commas. The constraint true
always succeeds. Inspect the
signature files in reference/src-gen/signatures/
for the available constructors.
There can only be one rule that matches on a certain constructor. It is possible and recommended to split the rules over multiple files.
There is no implicit traversal of the AST. This means the rules should be complete: there must be a rule for every constructor you want to visit.
Name Binding
Namespaces
Different kinds of names can be kept separate by putting them in a different namespace. For
example, variables live in the Var
namespace. A name with a namespace and an (implicit) AST
position is called an occurrence, and written as Var{x}
, where Var
is the namespace, and x
the name.
Namespaces are declared as part of the name of the name resolution parameters. Put the following in
one of your *.nabl2
files:
signature
name resolution
namespaces Var // ... space separated list of namespaces ...
For this assignment it is required to use the Var
namespace for fields, method parameters, and
local variables.
Declarations and references
References and declarations are specified as occurrences with arrows going to and from scopes respectively. Reference resolution can be thought of as finding a path in the scope graph from the reference to a declaration with the same name. For example the rules for variable declarations and references, both in the current scope, look like this:
rules
[[ Var(_,x) ^ (s) ]] :=
Var{x} <- s.
[[ VarRef(x) ^ (s) ]] :=
Var{x} -> s.
Note that specifying declarations and references like this does not imply that references must resolve. To ensure this, you must add resolution constraints, described below.
Introducing new scopes
New scopes can be created and passed down to children in the AST using new
and recursive calls to
the constraint generation rules. The following example creates a new scope s'
, adds an edge to the
current scope s
, and passes it as the current scope in the recursive call for the subterm e
.
rules
[[ Method(_, _, _, _, _, e) ^ (s) ]] :=
new s',
s' ---> s,
[[ e ^ (s') ]].
The term in the recursive call must be a variable from your match pattern.
There are some default rules to traverse a list of terms. For example to traverse all class definition in a program, you can write:
rules
[[ Program(_,cs) ^ (s) ]] :=
// ...
Map1 [[ cs ^ (s) ]].
The rule Map1
takes one scope parameter. There is also a rule Map2
that takes two
parameters. These rules are implemented in NaBL2, so it is possible to write such named rules
yourself as well. See
stdlib/map.nabl2
for the implementation of these rules.
Importing scopes by name
Scopes can also be imported by name, instead of using parent edges. Imports always require a pair of constraints, one that associates a name with a scope, and one the imports a reference into a scope. The constraints are written as:
<Occurrence> ===> <Scope>
associates the scope with the given declaration.<Occurrence> <=== <Scope>
imports a reference into the scope. The declarations in the associated scope of the resolved reference are now visible in the importing scope.
The references and declarations used in the previous two constraints still need to be introduced explicitly in the graph.
Constraints
Name resolution
A reference in the scope graph must resolve to a declaration. This is specified with a resolution
constraint. Usually the right hand side of the constraint is a variable that will get the value of
the declaration that the reference refers to. In the following example, we capture the declaration
in the variable d
. Note that d
here is a meta-variable, that is, it is a variable in the rule,
not a variable in MiniJava.
rules
[[ VarRef(x) ^ (s) ]] :=
Var{x} -> s,
Var{x} |-> d.
Custom errors and warnings
It is possible to control the errors and warnings that are generated if a constraint fails.
rules
[[ VarRef(x) ^ (s) ]] :=
Var{x} -> s,
Var{x} |-> d | <Severity> <Message> <Location>.
The severity is one of error
, warning
, or note
. The message is optional and can be 1) a
literal string "variable not found"
, or 2) a formatted message $[Cannot find variable [x]]
.
The default error location is the matched term, but it can be specified explicitly using @t
,
where t
is a variable from your match pattern.
Sets of names
MiniJava requires errors and warnings in a few cases where hiding occurs. Fields are not allowed to hide fields in super classes. Local variables and parameters are allowed to hide, but get a warning if they do.
The following constraints can be used to restrict the names that can appear in scopes:
distinct <Nameset>
anddistinct/name <Nameset>
<Nameset> subseteq <Nameset>
and<Nameset> subseteq/name <Nameset>
The constraint <Nameset> seteq <Nameset>
is desugared to two subseteq
constraints.
The following sets of names are available (for a given scope s
):
- all declarations
D(<Scope>)/<Namespace>
, - all references
R(<Scope>)/<Namespace>
, - all (transitively) visible declarations
V(<Scope>)/<Namespace>
, - and all reachable (including shadowed) declarations
W(<Scope>)/<Namespace>
.
For example, D(s)/Var
would give all variable declarations in s
.
You can write expressions to get the set of names you are interested in, using the following operators:
- union
(<Nameset> union <Nameset>)
, - intersection
(<Nameset> isect <Nameset>)
, - and set different
(<Nameset> minus <Nameset>)
.
Note that sets of names behave like multisets. For example, X diff Y
where X
contains two x
s
and Y
contains one x
will result in a set with one x
.
Error messages on subseteq
and distinct
constraints can use two special keywords. The position
can be @NAMES
, in which case the error will appear on the relevant names. In a formatted message
NAME
can be used to insert the name in the message.
See the reference documentation for a more complete description of these constraints.
Note that if you use set expressions in your constraint, the order in which you do the operations can be important to get the error on the right name.
Origin properties on variable declarations
Properties can be set on declarations with a constraint <Namespace>{<Term>}.<NAME> := <Term>
. In
this lab you must set a property origin
on Var
declarations.
- Fields must get an origin
Field()
- Parameters must get an origin
Param()
- Local variables must get an origin
Local()
It is important for grading to use these exact property and constructor names.
You will need to define these constructors by adding a signature
section of a Stratego file,
e.g. trans/analysis.str
. Stratego signatures define new constructors that you can use, and are
written as:
module analysis
// ...
signature
constructors
<ConstructorName> : <SortName>
// ...
The sort name is not important, but if you do not know what to use, use Origin
.
Challenge
In lab 7 we need to test that an overriding method has the same type as the overridden method in the parent class. For every method declaration, we want to introduce a reference, that resolves to its overridden method. For example, consider the following situation:
class Foo {
public int m() {
return 1;
}
}
class Bar extends Foo {
public int m() {
return 1;
}
}
The declaration of Bar.m()
should introduce a reference to Foo.m()
. Of course just introducing a
reference in the parent scope is not enough, because this will result in an error for Foo.m()
which does not override any method. The challenge is to create a reference that resolves to the
overridden method if it exists, and to its own method declaration otherwise.
It is possible to encode this logic in the scope graph by modifying edge labels, the specificity order of labels, and the well-formedness regular expression on paths. By default the parameters for name binding are the following:
signature
name resolution
labels
P I
order
D < I,
D < P,
I < P
well-formedness
P* . I*
The label P
is the default label for edges between scope. This can be written explicitly with the
constraint s' -P-> s
. The label I
is the default label for imports, and can also be written
explicitly as d =I=> s
and s =I=> r
. Note that scopes can have multiple outgoing edges, which
can have different labels.
The well-formedness is a regular expression on path labels, that rules out paths that do not match it. In the default case, it means that after an import, you cannot resolve in a parent scope anymore. The regular expressions on labels have the following syntax:
e
is the empty string0
is the empty language (i.e., it matches nothing)<Label>
is a literal label~<Regexp>
is the complement (e.g.,~0
matches anything)<Regexp>*
is the closure (i.e., match zero or more)<Regexp> . <Regexp>
is concatenation (e.g.,P . I
matches one parent edge followed by an import)<Regexp> | <Regexp>
is a logical or (e.g.,P | I
matches a parent or an import edge)<Regexp> & <Regexp>
is a logical and (i.e., both expressions must match)(<Regexp>)
brackets can be used for grouping