Project Ideas

About this page

This page is for developers and people learning about guile, who want ideas to persue. If you've done any of these things, or want advice about them then please feel free to do so on the guile mailing list!

What's this page is about

People occasionally ask if there's anything they can work on in Guile. In the past, I've just been dumb and written up a separate list of ideas in response to each message; so I forget things, I waste a lot of time retyping, and ideas that occur to me when nobody's asking me disappear into the ether. So I've decided to write them down.

Also, these are just the things that have come to my head over the past few months. Each idea kind of argues for its own worldview, but I don't actually claim that they're well-developed, or that the functional boundaries are right, or anything. As a friend of mine used to say, in a similar spirit: ``Why not take two --- they're small!''

This is only meant as a source of ideas, not as a statement of what I think is important, where I think Guile should go, or what I'm willing to incorporate into some ``official'' distribution of something. People should hack on what pleases them. These are basically the things I wish I had time to do.

I hope this is helpful!

--- Jim Blandy

I've added a little here, mostly pointing to some additional information on various bits, and removing some of the bits that've been finished and added to guile, like guardians

--- Greg Harvey

Other Guile idea lists

If you've got your own list of Guile modules or features you'd like to see, we'll put a link to it here.

Guile modules

Here are some things I'd like to see people write and distribute as separate Guile modules.

Items are in most-recently-added-or-modified order.

Clean SLIB integration

This may need to wait for the new module system, but it would be nice to have a solid integration of SLIB into Guile, so that SLIB files appear as first-class Guile modules.

Server-side scripting

I'm thinking of something like a translator that reads a mix of HTML-like tags and Scheme code --- basically, something like quasiquote, except that it builds HTML structure, instead of list structure. The translator would compile HTML-like things into expressions that construct a tree which we can finally spit out as HTML.

PHP has become really popular, but there's really no such thing as a little language. I bet we'll see the same pattern with that that we've seen with Perl and Tcl --- the interpreter will start out small and fast, but then as people use it for more serious work, it'll start acquiring real control structures, real functions, more datatypes, ... until it'll just be another poorly-implemented lisp with studly libraries. That's basically what Tcl 8.0 is.

I don't know for sure, but I suspect that MHTML invented their own language too. If so, the same rules apply.

We should do it right, by taking the features of PHP and MHTML and giving them to Guile. Someone who does web application development should do this, so it'll get done right. I don't do that much web hacking myself, so I'd get the details all wrong.

Note that it's really easy to hack something up that works here and there. This project will only have a significant impact if it's an industrial-strength job.

See Olin Shivers' ``net.tar.gz''.

IEEE 754 recommended functions and predicates

Guile should provide the functions described in the appendix to IEEE 754. They're handy.

Resurrect the Mesa interface

Long ago, Guile had an interface to the Mesa 3-D graphics library, which is compatible with OpenGL. (See freshmeat for details.) This code could probably be resurrected pretty easily.

Ideally, of course, this would be combined with a Mesa widget for GTK+. :)

This is kind of stupid, but one thing you could use this for is an origami program. I'd like to be able to see the relationship between the folded and unfolded sheets. This is one of those things that's not worth coding in C, but would be fun to whip up in an interpreted language.

Simple Network Management Protocol interface

There's a library called UCD-SNMP (search freshmeat for it) which might be fun to hook up to Guile. I'd love to be able to combine UCD-SNMP with the GTK+ canvas widget and get network utilization graphs from my router.

Bug-tracking system

I'd like something akin to Gnats, but written in Guile. I'm sure we could do better than Gnats, and this would give us a chance to focus on the needs of open source development --- for example, the bug list should be open, and viewable from the web.

Value history

When you're using Guile interactively, it's really common to want to operate on a value returned by one of the previous instructions. Mathematica and GDB automatically stash the result of each expression in a variable (or something like a variable), so you can refer to it later.

I have in mind something like this:

      guile> (list 1 2 3)
      $1 => (1 2 3)
      guile> (cons 0 $1)
      $2 => (0 1 2 3)
      guile> (set-car! (cdr $2) 'yowza)
      $3 => yowza
      guile> $1
      $4 => (yowza 2 3)

(Bengt Kleberg recommended the ``$1 => mumble'' syntax, on the grounds that it resembles the notation RnRS uses to show what an expression evaluates to. And indeed, $1 will evaluate to mumble.)

The current repl doesn't print the vale of expressions returning #<unspecified>, and it shouldn't assign such values to value history variables either.

Directory walker with list of predicates

People have posted a bunch of functions for walking directory trees, modeled more or less after the Unix `find' command. They're essentially of the form (find DIRECTORY PROC), and apply PROC to each file they find in the directory tree DIRECTORY.

It occurs to me that it might be convenient to actually have `find' accept a list of procedures: (find DIRECTORY PROC ...), and for each file found, apply each PROC successively, until one returns a false value. That way, you could do things like this:

      (find "." (glob ".deps") directory?
            (lambda (d)
               ... now do something with the .deps directory d ...))

I'm imagining `glob' to be a function that accepts a filename wildcard pattern and returns a predicate that likes filenames that match the wildcard.

Maciej Stachowiak and others have suggested, as a convention, that any function which accepts a predicate as an argument, and applies that predicate to filenames, should also accept a string as that argument, and treat that string as a wildcard pattern. Thus, the code above would become:

      (find "." ".deps" directory?
            (lambda (d) ...))

Now we're starting to get friendly.

smart dumping

This is kind of a dumb idea; I have no idea whether it's even worth thinking about.

Would it be possible to write a function which walked data structures in the Scheme heap, and wrote out, directly from Scheme, a valid ELF shared library? Then you could load a module and spit it back out as a shared library. While walking the heap, we have all the information we need to emit the right relocs. The code is certainly machine- and ABI-dependent.

Does this have any advantage over the freezer? The freezer writes out initialized C data structures, which you can then compile into shared libraries; what's wrong with that? I guess the C compiler isn't involved, which is one more variable eliminated. But freezing is much more portable.

Sound/sample processing library

I'd love to use Guile to munge sound samples, try out various effects, etc. The perfect complement to a sound-processing library would be a nice interface to the audio hardware. Look into the Enlightened Sound Daemon (esd), and audiofile, a library that understands how to load a variety of different audio file formats. (Tom Tromey pointed me at esd and audiofile.)

Bill Schottstaedt writes:

I've written a C library of basic audio functions (support for hardware, headers, data types, etc) called sndlib which is already tied into Guile in my sound editor. And a ton of sound-processing functions can be found in clm (a Common Lisp and C based Music V implementation).

If there's interest, I could package all this up in some pretty way; or perhaps help anyone else who wants to do something along these lines. As I understand the copyright issue (I'm no lawyer), Stanford owns the copyright since I'm doing this work as their employee; they, however, have said the software author can decide distribution policy (or whatever the word is), so I placed the sound editor under the GPL, and have always made all the code freeware available via anonymous ftp.

Database engine interfaces

This is pretty important for web scripting, but also just a generally handy thing.

Perl has DBI, which I think we should use a model; here's the description from CPAN:

The Database Interface. The Perl DBI initiative has standardized the interface to a number of commercial database engines, so that you can move from, say, Oracle to Sybase with a minimum of effort. You'll find DBD::DB2, DBD::Informix, DBD::Oracle, DBD::QBase, DBD::Sybase, DBD::MySQL, and DBD:mSQL inside the DBD module set.

The job here is to discern the important ideas in DBI's design, figure out the nicest way to transpose those into Scheme, write that up, and provide at least one implementation, so that people can write database drivers that meet the spec, and use the implementation as a reference.

Database manager interfaces

These would be interfaces to Berkeley DB, GDBM, NDBM, DBM, etc.

I think the difference between a ``database engine'' and a ``database manager'' is that the former usually supports SQL, and handles multiple readers and writers, whereas the latter just implements the data structure on disk, and leaves questions of synchronization and how to actually arrange data usefully in the hands of the caller.

Just as discussed in the ``Database engine interface'' entry elsewhere in this list, we want a common interface to these libraries, so people can write code that will operate with any file format.

Query languages

Logic programming languages make great database query languages --- much nicer and more consistent than SQL. It would be really cool if someone could look into the work Richard Salter and Chris Haynes did on embedding backtracking and unification in Scheme, or Per Bothner's similar work, and then use that to produce the best database front end ever.

Of course, this should ride on the

GIMP interface

Peter Mattis did do a Guile/GIMP interface at one point, but we're not the standard there; they're still using SIOD. Someone who uses both Guile and the GIMP needs to pick this up, turn it into a package people can use, and do whatever's needed to get the GIMP people to accept it as their standard. If there are critical Guile changes necessary (Make it smaller? Can do!), I want to help with that.

Basically, I think all we need here is a module that exports the GIMP's functions. Should be real easy.

C parser

It would be cool to hook up a full parser for ANSI C to Guile. Then we could use it to parse header files and generate Guile interfaces, scan Guile code for missing argument typechecks, cases where we need to call scm_return_first or scm_remember, unbalanced calls to SCM_DEFER_INTS. We could even switch Guile to an explicit-marking GC, and then use the checker to catch errors in the explicit marking.

Tom Lord did this for Systas; perhaps his work could be ported back to Guile.

For way extra points, implement a C++ parser. Ouch.

A FastCGI interface

It would be nice to have Guile implement the FastCGI interface, turning each HTTP request into a function call, with arguments and environment broken out.

ABI-conformant packed data structure interface

It would be nice if Guile could manipulate arbitrary C data structures. Basically, you'd take the C declaration for a structure, figure out its layout at the bit/byte offset level (given a particular ABI), and generate a bunch of accessors, and a new opaque type for pointers to that structure.

We're not going for type safety here --- there should be an operation to turn these opaque values into integers and back, or to treat the data in a string as one of them. But it would be nice to provide some error checking.

If the ABI were something you could choose at run-time, that could make Guile a powerful system for doing cross-platform munging. "Sure, I know exactly how a 6-bit field would be laid out on the i960!". Well, okay --- maybe that's not thrilling. But I still think it would be cool.

Henry Spencer's regexp matcher

I'd like to see someone package up Henry Spencer's latest regexp engine (which supports Unicode's UTF-8 encoding!) as a dynamically linked module for Guile. Actually, I'd like to incorporate this into the Guile core. Our present code just uses whatever regexp engine is in the system's C library, which is sometimes pretty pathetic.

HTTP routines

It would be cool if people could use Guile to implement web robots and the like. Tim Pierce started to work on this, but it's not finished.

General URL functions

This would be a set of functions for just retriving any URL. I think the WWW Consortium has a library which implements everything.

SGML and DSSSL

As long as we're supporting multiple languages, why not DSSSL? Craig Brozefsky <craig@red-bean.com> has already attached an SGML language parser to Guile.

Mailbox support

Guile should have routines for doing full RFC 822 parsing, MIME encoding, etc., to help people writing mailbots.

Emacs-like buffers, for file handling.

Everyone is used to the sed/awk/perl model of file processing --- you munge a line at a time, and maintain state to handle multi-line things. That's just the way it's done.

But actually, there's a totally different system which works a lot better for some applications, exemplified by Emacs. Emacs lisp gives you buffers with very fast search, insert, and delete operations. You don't have to process the data in any order; there are no line boundaries to obscure the semantic structure of the content, if the file isn't really line-structured; and so on.

So basically, this idea is, "implement Emacs buffers for Guile, with all the searching, editing, and I/O facilities, but none of the redisplay support."

(Greg) Initial code implementing these is available here It's not complete, but could be used to build something bigger and better (I intend to go back and reimplement these with goops classes, to allow for more flexible usage).

Cool I/O ports

Guile should be able to talk to compression libraries. You should be able to hand an ordinary output port to a function, and have it give you a new port, where data written to the new port gets written compressed to the original port. And the reverse for uncompression and input ports.

The same principle applies to any kind of stream transformation:

encryption
uu/base64 encoding, or error correcting codes
Unicode/JIS/ISO-8859 conversion
CR/LF vs. LF conversion
telnet (for implement FTP)
line and column number counting (the port just passes data through unchanged, but counts the number of characters and newlines, and has extra functions that let you read and set the counters).

Guile's port implementation already has the infrastructure needed to implement ports that do arbitrary things with their streams (see the scm_ptobfuns structure). It's just waiting to be used.

SSLeay

The quintessential example of the above. SSLeay is a library that implements the Secure Socket Layer protocol, the foundation of secure http. It's basically a generic authentication and privacy layer for network connections. I think PRMS uses it too.

Adobe Document Structuring Conventions parser

Make it trivial to write psnup, and such. Make it trivial to produce output that conforms to the DSC.

Functional PostScript

Imagine taking the primitives of PostScript and providing them in a more functional-language kind of way. That's what Olin Shivers' group has done.

FPS is a portable system for doing device-independent, resolution- independent graphics from Scheme programs. It is PostScript, with the Forth computational engine replaced with Scheme. At present, it runs on SCSH.

Occam-like thread control structures

The Occam language, designed for the INMOS transputer, made parallalism as concise to use as `let'. It was a much nicer way of thinking about threads, I think. It would be cool if someone implemented the interesting Occam features in Guile:

channels: These are one-deep message queues, with the right blocking behavior; Guile's channels would carry objects.
SEQ: Well, this is just begin.
PAR: Like begin, but execute all the subexpressions in parallel.

And so on.

RenderMan interface

It would be nice to be able to generate RenderMan scene description files using Guile code.

Translators for other languages

Here are some languages I'd love to see translated into Guile.

Items are in most-recently-added-or-modified order.

Tcl 8.0: Tcl's syntax is remarkably simple for a language its age; I quite like it. Especially with the semantic cleanups made for Tcl 8.0, we should be able to do a good integration here.
Ian Bicking <bickiia@earlham.edu> has done some work on a translator. The first cut was an interpreter, partially evaluated using Similix to produce a compiler; the latest version has been rewritten by hand.
PHP: The simple server-side scripting language.
MetaHTML: A more complex server-side scripting system, but a new language, so still probably clean enough to tackle.
Python: Quite a pretty language, clean in both syntax and semantics. ("Pure in body and mind.") Datatypes seem very friendly with Scheme's, so it should be possible to do a very satisfying integration here.
Emacs Lisp: An interesting challenge. I've got a decent solution for reconciling the nil/()/#f issue, which this translator should use. Mikael Djurfeldt has some ideas about this too.
Perl: This is a herculean task, because Perl's syntax and semantics are so complicated. Hats off to whoever even tries this.

Changes to the Guile Core

Here are some improvements I'd like to see made to the core Guile interpreter.

Internationalization/Multilingualization

Torbj�rn Gannholm has kindly offered to work on this.

I just got back from a conference in Japan about multilingual information processing in free software, organized by the MULE folks. While there I put together a pretty clear idea of how I want Guile to work:

Guile should have a separate "byte array" and "string" types. Probably the byvect stuff in unif.c is a decent "byte array" type already, but we may need to beef up support for it.
All characters should be Unicode characters, and all strings are strings of Unicode characters. (We'll need a read syntax for Unicode characters; I think Marc Feely has a proposal for this.)
I/O ports should know what encoding they're reading or writing (ISO Latin-1 by default), and do the appropriate translations. Some structure that allows us to layer translators on top of raw ports might be nice, to decouple the character set support from the I/O source support.
Internally, strings should be represented as UTF-8 encoded strings; this is the representation that C code linked with Guile will see, and operate on. Guile should provide convenient functions to ease the complexity of handling these.
However, Scheme code will still index strings by character, not byte. The expression
```
(string-ref s 1)
```
will always return the second character of s, even if the first character is several bytes long. The fact that the elements of the string are of varying width will be concealed from Scheme code.
This means that string-ref and string-set! will no longer be constant time operations. Oh well; people usually manipulate strings using searching, substring extraction, and concatenation anyway; the complexity of those operations is unaffected.
Each string object should record the byte position of the first non-single-byte character, so we can still index strings containing only fixed-width characters (ASCII) in constant time.

This isn't perfect, but here's my rationale:

My driving concern is what I'll call the "pass-through" problem. In the process of carrying out a user's request, each piece of data will pass through many different modules. For example, data might be read from an I/O port, stored in a database, and then retrieved from that database and displayed by a GUI toolkit. If any module fails to handle multilingual data correctly, the user will experience the overall system as non-multilingual.

This means that it's not enough to merely have most modules handle multilingual text correctly. They must all do it, if we are to earn the user's trust. We will need to police Guile modules carefully, put pressure on the authors of non-multilingual modules, support them with plenty of helpful routines and documentation, mark entries in the public module archive as "multilingual-safe", and so on.

But if we're to impose this burden on developers, it must be a reasonable burden. We don't actually have any authority; we rely on their good will. If we require them to become experts on every character set and encoding on the planet, that's too much; they simply won't bother. If we require developers to do too much, they will do very little.

The proposal above presents the developer with a single character set, whose semantics are clearly documented. Developers working in C must also cope with a variable-width encoding, which does complicate code, but it has some nice properties that ameliorate the complexity somewhat.

It has been suggested that Guile simply use Unicode encoded in sixteen-bit characters throughout, as Java does. However, this isn't viable; the 16-bit space for Unicode is almost full, and the Taiwanese have a ton of characters they want code points for. If you represent them using the Unicode ``surrogate characters'', then you've got a variable-width encoding again; you might as well use UTF-8 and save memory, since you're not saving any complexity.

It has also been suggested that Guile use two or three different string representations, with eight, sixteen, or thirty-two bits per character. Guile could automatically select the most dense representation capable of holding the data at hand. However, this would require everyone working in C to write out three copies of their string-processing code. Each copy would be simpler than the code for handling UTF-8, because it would be working with a fixed-width encoding, but it's my sense that a single UTF-8 loop is less hair than three fixed-width loops.

So, I'd love to have routines to convert text between Unicode and all the various local encodings --- the JIS standards, BIG5, ISO 8859, and so on. Guile should try to use the gconv interface in the GNU C library, then iconv, and then whatever else is available.

Henry Spencer's latest regexp engine handles UTF-8, but as of this writing, it hadn't been optimized yet.

Guile also needs some kind of gettext interface. We could add a new syntax for translatable strings like

#"This is a translatable string."

Use GMP for bignum arithmetic

At the moment, Guile uses a homebrew bignum implementation. GMP, Torbjorn Granlund's multi-precision arithmetic package, is faster. Guile should be changed to use GMP if it is installed, and omit bignum support if GMP is absent.

It would be nice to implement rational numbers as part of this.

R5RS support

Right now Guile only supports R4RS. We should be able to use the module system to let the user choose which dialect they want to use on a module-by-module basis. I think this should be tied in with Chris Hanson's work on a byte compiler.

Custom buffered I/O

Guile has several different kinds of I/O ports. Those that talk to the outside world are implemented on top of the ubiquitous C standard I/O FILE buffered streams. This leads to a few problems:

We have to use fgets for speed, but it's difficult to handle lines containing null characters, given fgets's interface. So we use ftell to find out how much we've read with fgets. But that doesn't work on sockets. So on sockets we fall back to our old, slow routine based on getc.
We have to use unbuffered input sockets, because standard I/O streams only promise you one buffer, so you can't mix read and write operations, to implement a network protocol, say. You have to do an fflush between a write and a read, and an fseek or something equivalent between a read and a write. This is stupid.
There's no way to tell whether there's input immediately available on a buffered stream. You can get the underlying file descriptor and do fcntl magic on that, but that won't tell you whether there are characters waiting in the buffer.

You get the idea. The thing is, every Unix system has read, write, and seek, and I don't know of any system that doesn't have select. So we could actually implement our own buffering port implementation, and address all these problems.

We'd actually do better, because people have had some good ideas since standard I/O was implemented. For example, we could follow the lead of the libio library, and expose the buffer directly to the consumer, thus avoiding some copies. We could run regexp matches directly in the buffer. We could implement our own definition of line boundaries with little penalty. Each port could have a magic writable shared substring object that gave Scheme code direct access to the "current line", with no copies. (Well, maybe that's not such a hot idea. But that's what Perl does.)

Having our own buffered stream implementation would also allow us to start acquiring cool optimizations strictly below the interface. The presence of the interface would protect Guile from whatever weird system-specific stuff we wanted to do for speed.

(Greg) Gary Houston has been working on this; patches against cvs guile (for the brave) are available at Gary's web page.

Fast string implementations

Guile should implement the substring operation by sharing memory with the original string, and using copy-on-write (COW) to preserve the right semantics in case something gets side-effected. The garbage collector would need to have some policy for throwing away large substrings referenced only by small substrings, perhaps by copying strings if they are small enough.

Once we had the copy-on-write system in place, people wouldn't need to use explicitly shared substrings for efficiency any more, and we'd be able to make explicitly shared substrings writable. Which is a useful thing if you're doing a lot of string munging.

This could be easily combined with a hack for a fast string-append. Have strings carry a bit that says, "I was constructed by string-append." If string-append notices that its first argument has this bit set, then the user is probably in a loop building up a long string by appending a bunch of strings, in a pattern like (string-append (string-append (string-append ...) ...) ...). In this case, it could allocate extra space beyond the end of the string in anticipation of the next string-append. The next append would notice this extra space, copy the tail into it. The original string becomes a COW-shared substring of the result. Combined with buffer-doubling, this makes building up strings linear in time and consumed storage, instead of quadratic.

The same trick should be applied in reverse: if string-append notices that its last argument has this bit set, then the user is doing (string-append ... (string-append ... (string-append ...))), and it should allocate extra space before the beginning of the string.

Generational garbage collection

(Greg Harvey has started taking a shot at this; ((Greg) check out my personal Guile page, for news, notes, and code).

At the moment, anyone who profiles the Guile interpreter notices that it's spending a lot of its time in gc_mark. This is not too surprising. I'd like Guile to have a conservative generational collector. The hard parts here are the write barrier, and managing conservatism. I've put some of my ideas for dealing with conservatism here.

The usual way to keep track of an object's generation is to keep each generation in a separate region of memory, and then check the object's address to see which generation it's in. To age an object, you copy it from from a younger generation to an older generation, and update all pointers to that object. There's nothing magic about this; you could just as well have a field in each object saying what generation it's in. However, using address ranges saves space; you don't need that extra field per object.

Unfortunately, when you're using a conservative collector like Guile's, you can't move an object that's pointed to by the stack. You have no idea whether any given word on the stack is actually a pointer, or just some integer, or a piece of a string, so you can't fix up the pointer after you've moved the object. Which complicates the collector a bit, if you want to copy objects.

One approach would be to assume that the stack doesn't contain pointers to too many objects, so you could just leave those there. After all, aging just affects performance; it's not necessary for correctness. I think this is a variant of Joel Bartlett's ``Mostly-copying Collector'' idea. Guile uses a free list to manage its storage, so having a few old objects sticking around (is there some nice concise derogatory term for people who have failed a grade in school and are sticking around for another year?) doesn't affect the allocation strategy at all.

(Greg) This assumption is a pretty safe one; the number of cells traced conservatively is generally a very small fraction of the actual number of cells traced. However, it's worth mentioning that Bartlett's method is patented (don't get me started), so we have to be a little careful about the gc we end up with.

Anyway, anyone considering this project should check out Paul Wilson's survey papers on garbage collection.

Miscellanea

Here are projects that don't fall into the above categories.

GDB support for debugging Guile-using C code

GDB actually has some Scheme support in there; we should teach it how to print Scheme values, how to print interpreter frames, and so on.

This has been done in the past with a mixed GDB/Guile solution, but I think it would be more robust to actually put everything in GDB.

Negotiate design with the GDB group, so it can be merged into Cygnus's sources.

New Guile logo

I'd like to have a Guile logo which is the word ``Guile'' outfitted with some eyebrows and pupils looking schemeing (ahem) and crafty.

2 Aug 2000 spacey