People occasionally ask if there's anything they can work on in Guile. In the past, I've just been dumb and written up a separate list of ideas in response to each message; so I forget things, I waste a lot of time retyping, and ideas that occur to me when nobody's asking me disappear into the ether. So I've decided to write them down.
Also, these are just the things that have come to my head over the past few months. Each idea kind of argues for its own worldview, but I don't actually claim that they're well-developed, or that the functional boundaries are right, or anything. As a friend of mine used to say, in a similar spirit: ``Why not take two --- they're small!''
This is only meant as a source of ideas, not as a statement of what I think is important, where I think Guile should go, or what I'm willing to incorporate into some ``official'' distribution of something. People should hack on what pleases them. These are basically the things I wish I had time to do.
I hope this is helpful!
--- Jim Blandy
I've added a little here, mostly pointing to some additional information on various bits, and removing some of the bits that've been finished and added to guile, like guardians
--- Greg Harvey
If you've got your own list of Guile modules or features you'd like to see, we'll put a link to it here.
Here are some things I'd like to see people write and distribute as separate Guile modules.
Items are in most-recently-added-or-modified order.
PHP has become really popular, but there's really no such thing as a little language. I bet we'll see the same pattern with that that we've seen with Perl and Tcl --- the interpreter will start out small and fast, but then as people use it for more serious work, it'll start acquiring real control structures, real functions, more datatypes, ... until it'll just be another poorly-implemented lisp with studly libraries. That's basically what Tcl 8.0 is.
I don't know for sure, but I suspect that MHTML invented their own language too. If so, the same rules apply.
We should do it right, by taking the features of PHP and MHTML and giving them to Guile. Someone who does web application development should do this, so it'll get done right. I don't do that much web hacking myself, so I'd get the details all wrong.
Note that it's really easy to hack something up that works here and there. This project will only have a significant impact if it's an industrial-strength job.
See Olin Shivers' ``net.tar.gz''.
Ideally, of course, this would be combined with a Mesa widget for GTK+. :)
This is kind of stupid, but one thing you could use this for is an origami program. I'd like to be able to see the relationship between the folded and unfolded sheets. This is one of those things that's not worth coding in C, but would be fun to whip up in an interpreted language.
I have in mind something like this:
guile> (list 1 2 3) $1 => (1 2 3) guile> (cons 0 $1) $2 => (0 1 2 3) guile> (set-car! (cdr $2) 'yowza) $3 => yowza guile> $1 $4 => (yowza 2 3)
(Bengt Kleberg recommended the ``$1 => mumble'' syntax, on the grounds that it resembles the notation RnRS uses to show what an expression evaluates to. And indeed, $1 will evaluate to mumble.)
The current repl doesn't print the vale of expressions returning #<unspecified>, and it shouldn't assign such values to value history variables either.
It occurs to me that it might be convenient to actually have `find' accept a list of procedures: (find DIRECTORY PROC ...), and for each file found, apply each PROC successively, until one returns a false value. That way, you could do things like this:
(find "." (glob ".deps") directory? (lambda (d) ... now do something with the .deps directory d ...))I'm imagining `glob' to be a function that accepts a filename wildcard pattern and returns a predicate that likes filenames that match the wildcard.
Maciej Stachowiak and others have suggested, as a convention, that any function which accepts a predicate as an argument, and applies that predicate to filenames, should also accept a string as that argument, and treat that string as a wildcard pattern. Thus, the code above would become:
(find "." ".deps" directory? (lambda (d) ...))Now we're starting to get friendly.
Would it be possible to write a function which walked data structures in the Scheme heap, and wrote out, directly from Scheme, a valid ELF shared library? Then you could load a module and spit it back out as a shared library. While walking the heap, we have all the information we need to emit the right relocs. The code is certainly machine- and ABI-dependent.
Does this have any advantage over the freezer? The freezer writes out initialized C data structures, which you can then compile into shared libraries; what's wrong with that? I guess the C compiler isn't involved, which is one more variable eliminated. But freezing is much more portable.
Bill Schottstaedt writes:
I've written a C library of basic audio functions (support for hardware, headers, data types, etc) called sndlib which is already tied into Guile in my sound editor. And a ton of sound-processing functions can be found in clm (a Common Lisp and C based Music V implementation).
If there's interest, I could package all this up in some pretty way; or perhaps help anyone else who wants to do something along these lines. As I understand the copyright issue (I'm no lawyer), Stanford owns the copyright since I'm doing this work as their employee; they, however, have said the software author can decide distribution policy (or whatever the word is), so I placed the sound editor under the GPL, and have always made all the code freeware available via anonymous ftp.
Perl has DBI, which I think we should use a model; here's the description from CPAN:
The Database Interface. The Perl DBI initiative has standardized the interface to a number of commercial database engines, so that you can move from, say, Oracle to Sybase with a minimum of effort. You'll find DBD::DB2, DBD::Informix, DBD::Oracle, DBD::QBase, DBD::Sybase, DBD::MySQL, and DBD:mSQL inside the DBD module set.
The job here is to discern the important ideas in DBI's design, figure out the nicest way to transpose those into Scheme, write that up, and provide at least one implementation, so that people can write database drivers that meet the spec, and use the implementation as a reference.
I think the difference between a ``database engine'' and a ``database manager'' is that the former usually supports SQL, and handles multiple readers and writers, whereas the latter just implements the data structure on disk, and leaves questions of synchronization and how to actually arrange data usefully in the hands of the caller.
Just as discussed in the ``Database engine interface'' entry elsewhere in this list, we want a common interface to these libraries, so people can write code that will operate with any file format.
Of course, this should ride on the
Basically, I think all we need here is a module that exports the GIMP's functions. Should be real easy.
Tom Lord did this for Systas; perhaps his work could be ported back to Guile.
For way extra points, implement a C++ parser. Ouch.
We're not going for type safety here --- there should be an operation to turn these opaque values into integers and back, or to treat the data in a string as one of them. But it would be nice to provide some error checking.
If the ABI were something you could choose at run-time, that could make Guile a powerful system for doing cross-platform munging. "Sure, I know exactly how a 6-bit field would be laid out on the i960!". Well, okay --- maybe that's not thrilling. But I still think it would be cool.
But actually, there's a totally different system which works a lot better for some applications, exemplified by Emacs. Emacs lisp gives you buffers with very fast search, insert, and delete operations. You don't have to process the data in any order; there are no line boundaries to obscure the semantic structure of the content, if the file isn't really line-structured; and so on.
So basically, this idea is, "implement Emacs buffers for Guile, with all the searching, editing, and I/O facilities, but none of the redisplay support."
(Greg) Initial code implementing these is available here It's not complete, but could be used to build something bigger and better (I intend to go back and reimplement these with goops classes, to allow for more flexible usage).
The same principle applies to any kind of stream transformation:
Guile's port implementation already has the infrastructure needed to implement ports that do arbitrary things with their streams (see the scm_ptobfuns structure). It's just waiting to be used.
FPS is a portable system for doing device-independent, resolution- independent graphics from Scheme programs. It is PostScript, with the Forth computational engine replaced with Scheme. At present, it runs on SCSH.
Here are some languages I'd love to see translated into Guile.
Items are in most-recently-added-or-modified order.
Ian Bicking <bickiia@earlham.edu> has done some work on a translator. The first cut was an interpreter, partially evaluated using Similix to produce a compiler; the latest version has been rewritten by hand.
Torbjörn Gannholm has kindly offered to work on this.
I just got back from a conference in Japan about multilingual information processing in free software, organized by the MULE folks. While there I put together a pretty clear idea of how I want Guile to work:
(string-ref s 1)will always return the second character of s, even if the first character is several bytes long. The fact that the elements of the string are of varying width will be concealed from Scheme code.
This means that string-ref and string-set! will no longer be constant time operations. Oh well; people usually manipulate strings using searching, substring extraction, and concatenation anyway; the complexity of those operations is unaffected.
This isn't perfect, but here's my rationale:
My driving concern is what I'll call the "pass-through" problem. In the process of carrying out a user's request, each piece of data will pass through many different modules. For example, data might be read from an I/O port, stored in a database, and then retrieved from that database and displayed by a GUI toolkit. If any module fails to handle multilingual data correctly, the user will experience the overall system as non-multilingual.
This means that it's not enough to merely have most modules handle multilingual text correctly. They must all do it, if we are to earn the user's trust. We will need to police Guile modules carefully, put pressure on the authors of non-multilingual modules, support them with plenty of helpful routines and documentation, mark entries in the public module archive as "multilingual-safe", and so on.
But if we're to impose this burden on developers, it must be a reasonable burden. We don't actually have any authority; we rely on their good will. If we require them to become experts on every character set and encoding on the planet, that's too much; they simply won't bother. If we require developers to do too much, they will do very little.
The proposal above presents the developer with a single character set, whose semantics are clearly documented. Developers working in C must also cope with a variable-width encoding, which does complicate code, but it has some nice properties that ameliorate the complexity somewhat.
It has been suggested that Guile simply use Unicode encoded in sixteen-bit characters throughout, as Java does. However, this isn't viable; the 16-bit space for Unicode is almost full, and the Taiwanese have a ton of characters they want code points for. If you represent them using the Unicode ``surrogate characters'', then you've got a variable-width encoding again; you might as well use UTF-8 and save memory, since you're not saving any complexity.
It has also been suggested that Guile use two or three different string representations, with eight, sixteen, or thirty-two bits per character. Guile could automatically select the most dense representation capable of holding the data at hand. However, this would require everyone working in C to write out three copies of their string-processing code. Each copy would be simpler than the code for handling UTF-8, because it would be working with a fixed-width encoding, but it's my sense that a single UTF-8 loop is less hair than three fixed-width loops.
So, I'd love to have routines to convert text between Unicode and all the various local encodings --- the JIS standards, BIG5, ISO 8859, and so on. Guile should try to use the gconv interface in the GNU C library, then iconv, and then whatever else is available.
Henry Spencer's latest regexp engine handles UTF-8, but as of this writing, it hadn't been optimized yet.
Guile also needs some kind of gettext interface. We could add a new syntax for translatable strings like
#"This is a translatable string."
It would be nice to implement rational numbers as part of this.
You get the idea. The thing is, every Unix system has read, write, and seek, and I don't know of any system that doesn't have select. So we could actually implement our own buffering port implementation, and address all these problems.
We'd actually do better, because people have had some good ideas since standard I/O was implemented. For example, we could follow the lead of the libio library, and expose the buffer directly to the consumer, thus avoiding some copies. We could run regexp matches directly in the buffer. We could implement our own definition of line boundaries with little penalty. Each port could have a magic writable shared substring object that gave Scheme code direct access to the "current line", with no copies. (Well, maybe that's not such a hot idea. But that's what Perl does.)
Having our own buffered stream implementation would also allow us to start acquiring cool optimizations strictly below the interface. The presence of the interface would protect Guile from whatever weird system-specific stuff we wanted to do for speed.
(Greg) Gary Houston has been working on this; patches against cvs guile (for the brave) are available at Gary's web page.
Once we had the copy-on-write system in place, people wouldn't need to use explicitly shared substrings for efficiency any more, and we'd be able to make explicitly shared substrings writable. Which is a useful thing if you're doing a lot of string munging.
This could be easily combined with a hack for a fast string-append. Have strings carry a bit that says, "I was constructed by string-append." If string-append notices that its first argument has this bit set, then the user is probably in a loop building up a long string by appending a bunch of strings, in a pattern like (string-append (string-append (string-append ...) ...) ...). In this case, it could allocate extra space beyond the end of the string in anticipation of the next string-append. The next append would notice this extra space, copy the tail into it. The original string becomes a COW-shared substring of the result. Combined with buffer-doubling, this makes building up strings linear in time and consumed storage, instead of quadratic.
The same trick should be applied in reverse: if string-append notices that its last argument has this bit set, then the user is doing (string-append ... (string-append ... (string-append ...))), and it should allocate extra space before the beginning of the string.
At the moment, anyone who profiles the Guile interpreter notices that it's spending a lot of its time in gc_mark. This is not too surprising. I'd like Guile to have a conservative generational collector. The hard parts here are the write barrier, and managing conservatism. I've put some of my ideas for dealing with conservatism here.
The usual way to keep track of an object's generation is to keep each generation in a separate region of memory, and then check the object's address to see which generation it's in. To age an object, you copy it from from a younger generation to an older generation, and update all pointers to that object. There's nothing magic about this; you could just as well have a field in each object saying what generation it's in. However, using address ranges saves space; you don't need that extra field per object.
Unfortunately, when you're using a conservative collector like Guile's, you can't move an object that's pointed to by the stack. You have no idea whether any given word on the stack is actually a pointer, or just some integer, or a piece of a string, so you can't fix up the pointer after you've moved the object. Which complicates the collector a bit, if you want to copy objects.
One approach would be to assume that the stack doesn't contain pointers to too many objects, so you could just leave those there. After all, aging just affects performance; it's not necessary for correctness. I think this is a variant of Joel Bartlett's ``Mostly-copying Collector'' idea. Guile uses a free list to manage its storage, so having a few old objects sticking around (is there some nice concise derogatory term for people who have failed a grade in school and are sticking around for another year?) doesn't affect the allocation strategy at all.
(Greg) This assumption is a pretty safe one; the number of cells traced conservatively is generally a very small fraction of the actual number of cells traced. However, it's worth mentioning that Bartlett's method is patented (don't get me started), so we have to be a little careful about the gc we end up with.
Anyway, anyone considering this project should check out Paul Wilson's survey papers on garbage collection.
Here are projects that don't fall into the above categories.
This has been done in the past with a mixed GDB/Guile solution, but I think it would be more robust to actually put everything in GDB.
Negotiate design with the GDB group, so it can be merged into Cygnus's sources.
2 Aug 2000 spacey