Frequently Asked Questions about Sather |
Sather has garbage collection, statically-checked strong typing, multiple inheritance, separate implementation and type inheritance, parameterized classes, dynamic dispatch, iteration abstraction, higher-order routines and iters, exception handling, assertions, preconditions, postconditions, and class invariants. Sather code can be compiled into C code and can efficiently link with C object files.
Sather has a very unrestrictive license aimed at encouraging
contribution to the public library without precluding the use of
Sather for proprietary projects.
There is a newsgroup "comp.lang.sather"
that is devoted to discussion of Sather issues.
There is a Sather mailing list maintained at the International
Computer Science Institute (ICSI). Since the formation of the newsgroup,
this list is primarily used for announcements. To be added to or deleted
from the Sather list, send a message to
Neither. Valid Eiffel programs are not Sather programs, nor vice
versa. Sather 0.2 was closer to being a subset of Eiffel 2.0 but even
then introduced several distinct constructs primarily to improve
computational performance. Eiffel 3.0 has expanded significantly in a
different direction. Sather 1.0 has introduced several new constructs
(eg. iteration abstraction, higher order routines, object
constructors, routine and iter overloading, contravariant class
interfaces, typecase) which makes the two languages quite distinct
now.
Is Sather a subset or superset of Eiffel?
The Sather language gets its name from the Sather Tower (popularly
known as the Campanile), the best-known landmark of the University of
California at Berkeley. A symbol of the city and the University, it is
the Berkeley equivalent of the Golden Gate bridge. Erected in 1914,
the tower is modeled after St. Mark's Campanile in Venice, Italy. It is
smaller and a bit younger than the Eiffel tower, and closer to most
Americans - and lovers of Venice of course. The way most people say
the name of the language rhymes with "bather".
Where does the name ``Sather'' come from? How do I pronounce it?
What does the ``Hello World'' program look like?
class MAIN
is
main
is
#OUT + "Hello World!\n";
end; -- main
end; -- class MAIN
The best way is to check out the Sather WWW page at
Where can I get information on Sather?
http://www.icsi.berkeley.edu/~sather.
sather-request@icsi.berkeley.edu
If you have problems with Sather or related questions that are not of
general interest, mail to
sather-bugs@icsi.berkeley.edu
This is also where you want to send bug reports and suggestions for
improvements.
The ICSI Sather 1.1 compiler can be easily downloaded from the
get-Sather-page.
Are there freely available implementations of Sather?
It can also be loaded from these ftp servers:
I am looking for reliable sites on other continents to mirror the Sather distribution and be included in this FAQ. If you can help with this, please send me mail.
ftp.icsi.berkeley.edu: /pub/sather These sites mirror the Sather distribution: ftp.sterling.com: /programming/languages/sather ftp.uni-muenster.de: /pub/languages/sather maekong.ohm.york.ac.uk: /pub/csp/sather
(no longer maintained, but still running)ftp.th-darmstadt.de: /pub/programming/languages/sather ftp.sra.co.jp: /pub/lang/sather
The ICSI implementation includes a browser, class libraries including basic types, math structures such as matrices, vectors, rational numbers, associative data structures, file manipulation (etc.), the compiler source (in Sather), and a large library of contributed but unofficial code. Contributed binaries for some systems are also available. ICSI maintains but does not support a library of donated code, some of it of a tutorial nature, at
http://www.icsi.berkeley.edu/Sather/Contrib/contrib.htmlThe ICSI compiler generates C. `gdb' or other C debuggers can be used for debugging in combination with a compiler flag. However, there is not yet a debugger which uses Sather the syntax and namespace.
There is another dialect of Sather called Sather-K that is being developed at the University of Karlsruhe, where it has been used in undergraduate instruction. The library of Sather-K is Karla, the KARlsruhe Library or Algorithms, and it has been used in graduate courses on algorithms and object-oriented design.
The Sather-K compiler and library are available at
i44ftp.info.uni-karlsruhe.de: /pub/sather and /pub/Karlaas well as at the ICSI ftp site in pub/sather/Sather-K.
Parallel Sather (pSather) is a parallel version of the language, developed and in use at ICSI. pSather addresses non-uniform-memory-access multiprocessor architectures but presents a shared memory model to the programmer. It extends serial Sather with threads, synchronization and data distribution. Unlike actor languages, multiple threads can execute in one object. A distinguished class GATE combines various dependent low-level synchronization mechanisms efficiently: locks, futures, and conditions. The new version of the pSather compiler is being integrated into the serial Sather 1.1 compiler. More information on pSather is available at:
http://www.icsi.berkeley.edu/~sather/psather.html
Ultimately there will be a better development environment; we envision
an interpreter/on-the-fly compiler. This won't be too hard to do
because the compiler already emits an abstract machine representation
that is appropriate for interpretation. There are presently students
working on extensions to the compiler as class projects.
The compiler generates ANSI C and has very few Unix dependencies
which can be fixed if we find out what they are; if your machine runs
gcc, you should be able to port the compiler.
So far it has been ported to the following systems, that I know of:
How portable is the compiler?
|
|
The compiler itself is quite slow. Small programs compile quite
quickly. The bottleneck is in generating and compiling the C code.
The compiler is an example of a very large Sather program. It takes
about 40 secs for the compiler to generate optimized C on a modern
workstation (with a *lot* of memory, though!), and then another couple
of minutes to compile the C code using parallel make on our local
network. We ship the generated C and executables for common platforms.
Most importantly, have enough physical RAM. 32MB is much
superior to 16 MB. 32 should be enough for most things, except
perhaps recompiling the compiler again. Netscape is a memory hog.
Quit it. Besides, you have to do real work sometime.
Secondly, the default compiler flags are not tuned to make the
fastest compilation, but to generate the fastest or safest
executables. Do not use "-O_fast" until the final production
stages: the loop and subexpression optimizations are surprisingly slow
and memory-consuming.
To get extra debugging information, such as a full backtrace
of the stack whenever a crash occurs, use the option
A heuristic approach is used to determine which routines and iterators
are simple enough to be inlined. Complexity is computed by traversing
the abstract machine representation of the function body. Weights are
assigned to all expressions and statements. Calls are replaced if the
computed function weights are less than a specified threshold value.
Some statements or expressions are never inlined, including "raise" and
"loop".
The "-O_fast" and "-O" options provide default inlining for
function and iterators. The "-inline" option just turns on the
inlining without affecting anything else. By default inlining uses the
inlining threshold set to 16 statements plus expressions, which was
experimentally found to be optimal for a few applications including the
compiler running on a Sparcstation 10.
The optimal inlining threshold is dependent on the application and
underlying architecture and for exotic machines may be somewhat different
from the default level. To specify inlining with the threshold different
from default can use one of the following options:
The reported performance improvements are between 10% and 40%. I/O
intensive applications are less affected by inlining. The default
threshold inlines 40% of all calls in the compiler itself.
Surprisingly, moderate levels of inlining have not appeared to have
negative consequences on the executable size. In fact, default inlining
might even slightly reduce the generated executable. The benefits are
dependent on the underlying architecture and parameter passing
conventions.
Iterators are as efficient as standard loops when they are built-in
or inlined (they are converted into standard C loops). In other cases
they are probably significantly better than the "iterator objects"
i.e. cursors, that might otherwise be required. Simple iterators are
inlined. In order to be inlined, an iterator must have at most one
yield, and the yield must not be in a conditional statement. i.e. the
iterator definition must be of the form:
Routines are inlined if they contain only a single
return statement at the end of the routine, and do not
contain any "raise" statement.
See the next section for some other options to produce fast code.
Some preliminary experiments on inlining can be found at
Consult
http://www.icsi.berkeley.edu/~fleiner/benchmarks
for information on performance impact of these optimizations.
There is a more comprehensive list of people at
Sather is contravariant. That means that it isn't possible to get type
errors at runtime. It also means that some ways of doing
object-oriented programming will require that you, the Sather
programmer, insert explicit type checks (using a typecase) in places
where a covariant compiler would have inserted an implicit check for
you. We choose contravariance because it eliminates a potential source
of bugs that can't be discovered at compile time; other language
designers have choosen the opposite to allow more expressiveness.
Eiffel says toMAHto, we say toMAYto.
A frequently cited reason for not specifying an order of evaluation is
to allow the compiler to choose an order of evaluation which leads to
the most efficient code; for example, simple arguments can be evaluated
after complicated ones to relieve register pressure. This can also be
done for ordered Sather arguments in the absence of side effects.
While the order is unspecified in C, the evaluations of arguments must
appear to occur in _some_ order, not interleaved in execution. (In the
extreme this would allow C compilers to fork threads when evaluating
arguments, a practice which would break most existing code.) Since a
compiler capable of taking advantage of the parallelism made available
by unordered arguments must do dependency analysis to make sure the
generated instructions appear to evaluate the arguments in some order,
such a compiler would of course be able to do the same dependency
analysis and instruction reordering on arguments required to be
observed evaluating left to right. The generated code would only be
different if there were side effects in an argument evaluation which
would make the order of evaluation important; and such code would
clearly be in error if the argument order was unspecified.
It's really a question about what the language does with erroneous code
that depends on the order of evaluation. It would be nice to detect
such situations, but this is very hard. By leaving the order
unspecified one allows bugs (which usually appear only when changing
compilers). Sather chooses to just eliminate the possibility.
Languages that operate only over immutable objects are called functional
languages; operations defined over immutable types are side-effect free
and therefore referentially transparent (any given expression always
evaluates to the same result). There are many ways to implement immutable
objects. Immutable objects may be implemented as actual values (primitive
or composite) or as references to actual values or even as applied
closures yielding actual values, but in all cases the value of the
immutable object is the same and never changes for as long as it exists.
In contrast, languages like Sather also provide reference objects,
which are best used to model entities that have an identity plus a
current state. The idea of an object identity bound to a modifiable
state introduces side effects into the language, which can make
expressions referentially opaque (an expression involving a reference
object may evaluate to a different result each time that it is
invoked).
Sather distinguishes between reference and immutable objects at the level
of types. Instances of immutable types have value semantics: once created
they never change, and there is no such thing as a "reference" to a
immutable object. Reference objects have an identity and the state of a
reference object can be modified by writing to its attributes.
Logically, when immutable objects are passed as arguments, their value is
first copied and then the operations are invoked on the copy. The
special properties of immutable objects make them especially amenable to
compiler optimizations; a immutable object may be copied freely without the
possibility of aliasing conflicts, allowing them to be kept in
registers or efficiently stored on the stack without requiring heap
allocation.
A variable of abstract type can be used to store either immutable or
reference objects. Because it is desirable to make is possible to
replace any concrete type by an abstraction, it is necessary for
immutable types and reference types to have the same semantics with
respect to assignment, passing arguments to functions, and applying
the dot "." operator. It is possible to make reference types behave
with value semantics by coding them with this in mind; for example,
the Sather INTI class (infinite-precision integer) returns a new INTI
on modification. This makes it possible to substitute either an INTI
or an INT into code using only standard integer operations; more
generally, it would be possible to make an abstract class $INTEGER
defining an interface that such code had to conform to regardless of
the implementation.
The immutable nature of immutable types means that the implicit
routines that set attributes return a new object rather than modifying
the old one in place. For this reason, the syntax "a.b:=c" is not
legal, because it is really syntactic sugar for "a.b(c)". For a immutable
class, this routine "b" has a return value which must be used in the
calling context. Therefore this example should have been written
"a:=a.b(c)". Notice that if you are setting multiple fields, one can
conveniently string them together "a:=a.b(c).d(e).f(g)".
If this seems unnatural, consider how operations work on integers:
when subtracting five from seven to get two, one isn't modifying "the"
seven, turning all sevens into twos; logically one makes a new integer
instead of modifying an existing one in place. Similarly, bitfield
operations like AND and OR conceptually create a new integer rather
than modifying one in place. It just isn't reasonable to have
reference semantics on basic types. Sather allows arbitrary classes to
have compiler enforced value semantics.
Here's what a complex number class might look like (this is a
simplified fragment of the complex number class found in the library)
When immutable objects are assigned to a variable of abstract type,
"boxing" occurs. This means that some heap is allocated to hold the
value along with a tag that can be used for dispatching. Reference
objects always have a tag so they are not boxed.
An `out' argument is passed from the called method to the caller when
the called method is returned. It is a fatal error for the called
method to examine the value of out argument before assigning to it.
An `inout' argument is passed to the called method and then back to the
caller when the method returns. Modifications to `inout' arguments are
not observed by the caller until the method returns (value-result
semantics).
Remember that argument modes are specified both at the method
definition and method call. Some examples of usage are given below.
Test/test-out.sa contains a variety of other examples. For more
information on argument modes refer to pp.43,44 of Sather 1.1 manual.
The sather compiler can be slow, especially if you don't have
enough memory or use a lot of class parametrization. What follows is
mainly from a posting by Matt Kennel, a long-time Sather user.
What can I do to improve compilation speed?Stage1: Compilation errors
In the initial phase of getting a program running, when you
are just looking for compiler errors, the following option is
extremely useful:
-only_check -- Just report errrors, do not generate any code
Using this option means that the compiler does not have to
generate code, so it will use far less memory and get to your errors
much faster. I strongly recommend doing this until all your compile
time errors have disappeared.
Stage2: Debugging
In the next phase, when debugging a program,
generate a progam with checking on by using the options
-chk -- Turn on all checking in all classes
-- - slows down resulting executable,
-- but it should never crash
-debug_source -- Debugging with source line numbers
-debug_no_backtrace -- Stack information is expensive.
-- You can usually make do by using gdb and printing the stack
-- using the command 'bt'
Note that all the -debug options reduce incrementality
considerably since whenever source line numbers change slightly (even
though the generated code remains the same), a lot of recompilation must
take place
-debug -- More debugging information
This may explode the executable size to an extent that you
might find unacceptable.
Production Code Generation
Here are the flags that are useful for making production quality
i.e. non-checking, optimized programs quickly.
-output_C -- This is VERY important! If you don't
-- do this you will never get incremental
-- C compilation.
-only_reachable -- Alternatively, since unreachable code
-- is only checked after the executable is
-- generated, you can kill the compile when
-- it says ``Checking unreachable...''
-O_inline_routines 30 -- These strongly help executable performance
-O_inline_iters 30 -- but don't add much to compile time.
-replace_iters
-chk_no_void all -- Turn off expensive checking options
-chk_no_bounds all
-chk_no_pre all
-chk_no_assert all
If you are strapped for memory or VM space consider using
"-only_C"
and then do a manual "make" of the generated C code. This might save
some extra swapping above "-output_C".
Starting with version 1.0.7, the compiler provides a general inlining
facility. Most of Sather programs contain a large number of very short
routines and iterators. Inlining inserts the code for such routines and
iters in place of calls thus eliminating the cost of the call. This
tends to allow better use of registers. It also enables optimizations
that would otherwise require interprocedural analysis. For example,
replacing formal arguments with concrete values followed by constant
propagation.
How do function and iterator calls get inlined?
Is there a way to specify different levels of inlining?
What performance benefits are there?
What is the result of inlining on the size of executable?
-O_inline_routines <threshold>
-O_inline_iters <threshold>
This also allows one to experiment with selectively inlining either
routines or iterators. Using too high a threshold leads to bloated
code.
iter_def!( ) is
iter_init;
loop
... code before the yield
yield (only one, and not in an if statement)
... code after the yield
end;
other_code;
end;
http://www.icsi.berkeley.edu/~fleiner/benchmarks/
has performance mesurements of inlining and some
The option -O_fast turns on all optimizations.
You can turn off individual optimizations by prefixing them with a
-O_no_, like in -O_no_hoist_const.
What optimizations does the compiler currently do?
What options do I use?
There are many people helping to extend and improve the Sather
libraries. ICSI encourages free exchange of useful code; if you have
code you think others will find useful, please submit it. Here are
some contacts; the newsgroup is a useful resource for finding people
with similar interests.
I'm interested in working on a library class.
Who is currently working on what classes?
Many people have contributed code, which can be found in the Contrib/
directory of the distribution.
gomes@icsi.berkeley.edu:
General Library, Browser, graph classes, neural nets
mbk@inls1.ucsd.edu:
Matrix/vector, numerical, fortran
lewikk@aud.alcatel.com:
Emacs support
cbitmead@versant.com:
New file classes
haps@inf.tu-dresden.de:
INTI, FLTI based on GNU MP
sather-bugs@icsi.berkeley.edu:
General questions
http://www.icsi.berkeley.edu/~sather/whoswho.html
and on
http://www.icsi.berkeley.edu/~sather/contrib.html
A religious war occasionally crops up on the net with people arguing
about whether covariance or contravariance is best. If you haven't
heard of either, don't worry about it.
What is all this about covariance vs. contravariance?
The main idea of our overloading rules is that new code can't break
existing code. A properly written library class will never break,
because some of the parameters was of an unexpected subtype. For more
information, please check out the Sather language manual at
http://www.icsi.berkeley.edu/~sather/Documentation/LanguageDescription/Descript.ps.gz
What's the reasoning behind the overloading rules?
In many languages, such as C, the order of evaluation of routine
arguments is undefined. In Sather, however, arguments are always
evaluated left to right. This brings greater determinism across
platforms to Sather code; unlike C, there is no place in the Sather
specification where we resort to declaring the results of an operation
to be undefined.
Why is the order of evaluation of arguments defined?
You have to declare the type of self when leaving it unbound; otherwise
there's no way to know what class the method is supposed to be in. For
example,
How do I use closures without binding `self' or for overloaded routines?
class MAIN is
main is
br::=bind(_:INT.plus(_)); -- Notice the :INT
#OUT + "1 + 2 = " + br.call(1,2) + '\n';
end;
end;
In the case of overloaded routines, the type must be inferred from the
declared type of the variable.
class MAIN is
foo is ... end;
foo: INT is ... end;
main is
a: ROUT := bind(foo); -- Selects the first "foo"
b: ROUT:INT := bind(foo); -- Selects the second "foo"
end;
end;
Renaming only the definition is needed more frequently than renaming
uses as well as definitions, so that's the default. It is often
possible to avoid the renaming by introducing a new feature name.
Another way is to write this:
Renaming a feature in an include statement only renames the
feature itself, not the calls on that feature. What do I do?
class A is
foo is ... end;
bar is ...
this way instead:
class A is
foo is ... end;
foo2 is foo; end;
bar is ...
The indirection is not a performance problem with inlining.
Sather distinguishes between reference and immutable objects. Most
user-defined objects are reference objects. These are passed by
reference (pointer to space on the heap) and may be aliased. The
fundamental types representing BOOL, CHAR, INT, FLT, FLTD, CPX, etc.,
are called immutable objects. These are always passed by value and it is
not possible to alias them (i.e. to reference the same object under two
names). Many pure object-oriented languages try to unify these
notions.
I don't understand immutable classes.
How come I can't use the "a.b:=c" syntax for assigning attributes?
immutable class CPX is
attr re: FLT;
attr im: FLT;
create(r,i: FLT): SAME is
res: SAME; -- res is a CPX immutable object
res := res.re(r); -- Create a new value with re = r
-- and reassign it to the result
res := res.im(i);
return res;
-- Or, more concisely: return re(r).im(i);
end;
main is
a: CPX;
a := a.re(5.0); -- Set the real part to 5.0 --
end;
end;
Tuples are a immutable class, which are translated into C structs. Just
how efficient the final code is depends on your C compiler. Just about
every C compiler will put structs on the stack, so at least there's no
heap manipulation overhead. Loading and storing to the stack is still
a lot more expensive than keeping things in registers; some compilers
will do this, some won't. I looked at gcc, which didn't, and the HP
cc, which did; your mileage may vary. The Sather compiler does
optimize away the simple case "a:=a.b(c)".
How (in)expensive are immutable classes like TUP, CPX, etc.?
Beginning with version 1.0.8, the Sather compiler supports a
value/result parameter passing convention. Method arguments each have a
mode. Modes are specified by a keyword preceding argument names; if no
keyword is given, the argument mode defaults to `in' mode.
What are value/result arguments?
How does one use them?
|
Last change: 7/16/96 The Sather Team (sather@icsi.berkeley.edu) |