Google Summer of Code suggestions

Google's Summer of Code, an aboslutely wonderful program, pays students to work on open source projects over their summer vacations. Here's some ideas on how it can improve over last year.

Last year's Summer Of Code (http://code.google.com/soc/) produced some great (and not so great) contributions to several projects I care about. Namely, mono (http://www.mono-project.com) and gcc (http://gcc.gnu.org). These contributions were mostly feature-oriented, though, which is not really what either of these projects needs at this point, in my opinion.

I love the mono project. I really do. My product BugScan was based on Mono and shipped on an appliance with a self-booting Linux CD, increasing our margins per unit sold noticably. My current employer, imeem (http://www.imeem.com), uses mono to share a great deal of code between their Windows and MacOS clients as well as their backend Linux servers. Mono has done a pretty good job of building a good testing framework for their JIT and compiler and, until recently, adding tests and running them regularly. Feature additions and optimizations have become more aggressive while testing has become more lax. As a result, it is extremely difficult to gauge whether a given mono release will be a step forward or backward in the functionality. I recently started running the tests myself after each update from their SVN because it became obvious that this was not being done. Right now I have a bug open where one of the basic JIT tests is failing (http://bugzilla.ximian.com/show_bug.cgi?id=78035). Another bug was causing crashes while running mono's own System.Windows.Forms unit tests (http://bugzilla.ximian.com/show_bug.cgi?id=77944).

To their credit, these memory corruption crashes will vary from machine to machine due to memory layout having to do with glibc version, gcc version, optimizations used, whether mono was invoked with --debug, etc, etc. Sometimes it won't crash. Even so, you can always see the problem by using valgrind (http://valgrind.org) on mono. valgrind usually doesn't get along with libgc, which mono uses for garbage collection. There are even instructions for how to do this with mono on mono's own web page (http://www.mono-project.com/Debugging#Using_Valgrind_on_Mono).

I think the primary SoC projects for mono should be the following: -fix mono's C# compiler so that it doesn't have false warnings on the compiler's own code

-set up continuous build servers for mono running on all of mono's supported platforms that do a full, clean build, and run 'make check-all'. If the build or tests fail, a bot send a message to the mono IRC channels and to a mailing list with the commit messages since the last building/passing checkout.

-make check measures code coverage of managed tests and fails if they fall below a certain watermark

-make check-all runs tests under valgrind and fails if unexpected output is discovered. (also, an SoC project to eliminate all valgrind false-positives on several projects would be good -- see below).

-eliminate all compiler warnings in mono, mcs, and gmcs

-make all tests run and pass for mono, mini, mcs, gmcs, and all the class libraries during 'make check-all'

-extend Mono.Cecil to do basic reading/writing of the PDB symbol format to allow stack traces with file/line numbers to be output from MS.NET-compiled applications that are deloyed onto mono

-extend Mono.Cecil and Gendarme to allow exclusions of Rules and do inter-function dataflow tracking so as to find SQL Injection (and XSS) bugs

Of course, the mono folks have their own suggestions about SoC projects (http://www.mono-project.com/StudentProjects). In my opinion, most of them are bogus as they continue to build upon a foundation that is not stable. In addition, they focus on "porting" SharpDevelop code (http://www.icsharpcode.net) to mono rather than improving SharpDevelop itself. (This is a historical pissing match, for the most part, in which all end-users like myself lose.) For mono to be able to be used as the foundation of any product or deployment strategy, this must be fixed.

That being said, there are a few of these projects that would be very useful to that end. 5.3 (http://www.mono-project.com/StudentProjects#Binary_file_writer_for_AOT) could be a great extension to Mono.Cecil to begin diversifying it's functionality a bit. In that same vein, 7.3 and 7.4 (http://www.mono-project.com/StudentProjects#API_problem_finder) are also a necessity for keeping mono's quality steady. (Assuming they run it in the automated build and fail the build if anything unexpected is reported, that is.)

8.1 (http://www.mono-project.com/StudentProjects#Windows_Forms_2.0_controls) is going to be important after these quality stabilizing things, specifically the WebControl since many applications either host IE via a COM wrapper and that makes them non-portable to mono on non-Windows platforms. 8.3 (http://www.mono-project.com/StudentProjects#Windows.Forms_Designer) is necessary for SharpDevelop to be able to run under mono, assuming it actually needs to be reimplemented rather than just re-using the existing component. 5.1 (http://www.mono-project.com/StudentProjects#Elimination_of_redundant_checks_in_JITted_code) is a good set of optimizations, but speed doesn't matter without accuracy of the code as-is. The existing bugs that I have found (mentioned above) should've been found by 'make check'. They weren't, and that problem needs to be solved before any other enhancements are made. If the JIT produces bad code, and there is no way to discover this except for users to complain, then mono is quite simply fucked.

Onto GCC. I think Mark Mitchell does a fucking *amazing* job herding the GCC releases in a timely fashion. Releases come out in a timely fashion, with new features, decent quality/stability, and is generally competitive with any commercial compiler you can mention. (Dr. Dobbs Journal (http://www.ddj.com) does a comparison once a year.) The problem is, once again, lots of new features without a focus on fixing regressions. The good news is that they find out about a great deal of regressions, but just don't have the time/resources to fix them. Here is a typical status message from Mark for a given release: http://gcc.gnu.org/ml/gcc/2006-02/msg00585.html .

He makes the very hard decisions about which regressions are critical and must be fixed before a release, while still getting out the release in a timely fashion. For Summer of Code, I *strongly* suggest that any new help be in the area of fixing these known regressions and adding automated tests for them. It would also be really great if someone could set up a continuous build for gcc, bootstrapping on several supported platforms every 10 minutes or so if there are changes to compile and test. Then, if the bootstrap or tests fail, it sends an email with the names and commit messages of the last commits since the last passing build.

Another project not on the list is NUnitAsp (http://nunitasp.sf.net). This project desparately needs JavaScript (read: AJAX) support like it's Java-based cousin, HttpUnit (http://httpunit.sf.net). I've mentioned this before in a prior post (http://wiki.yak.net/668). Whoever does this will be helping to commodotize an entire industry of Web UI testing apps that cost thousands of dollars *per seat*.

Lastly, I want to mention a project which is not on the SoC list: valgrind (http://valgrind.org). Valgrind is an amazing tool, and the team has an amazing set of automated tests that are run each night and the results are reported to the mailing list. When there is a bug, they usually write a failing test that goes into the automated suite. As such, I think valgrind could use some help with adding features more quickly. Specifically, it would be great if:

- valgrind had the ability to generate code coverage metrics on a binary in the same format that GCC does and gcov expects, assuming sufficient symbol information is present, etc.

- valgrind was able to use its existing infrastructure to do static analysis using abstract values/pointers in its virtual CPU and registers. For instance: test cl, cl; jnz 0xdeadbeef. if the jnz branch is taken, cl != 0, and many other things can be inferred from that (like not incorrectly taking branches that do not match previously established constraints). The second step would be to have signatures for taint producing calls like recv() and mark the appropriate argument's abstract values as being tainted. Then, if a tainted value goes into a "sink" in which tainted data is forbidden, a bug can be reported. Many of these concepts are well-documented in Benjamin Livshits' papers on static analysis of Java bytecode and C programs (http://suif.stanford.edu/~livshits/work.html).

One last project: SharpDevelop. SharpDevelop is pretty awesome, and their 2.0 version is really neat so far. One problem: it doesn't run on Linux. This is for a couple of reasons, the main one being that their TextEditor component (ICSharpCode.TextEditor) has PInvokes inside it to Windows-specific libraries. Since other open source .NET programs embed this component (http://www.kiwidude.com/blog), they are also not immediately portable to mono. If someone could start with TextEditor working on mono (with passing unit tests) and then work their way to other areas, that would be amazing for the open source C# development community, and for companies that develop and deploy C# programs on multiple platforms

Phew. This is a long post, so I'll end it now. I want to be clear that all of these projects are really amazing and everyone is probably doing the best jobs they know how. As what I hope to be an objective observer and end-user, I hope these suggestions and the backgrounds for their reasoning are embraced by the projects and google so they can move themselves (and open source software in general) forward.

Discussion:

showing all 1 messages [Show 3 7 14 30 100 *999* days or *10* 30 50 100 999 messages]

2006-05-01

by matt:

[edit]

http://www.winehq.com/pipermail/wine-devel/2006-April/046842.html

The wine folks appear to have a similar suggestion to the one I gave above for mono.

`YaK:: WebLog #535 Topic : 2006-04-30 04.15.29 matt : Google Summer of Code suggestions`	`[Changes] [Calendar] [Search] [Index] [PhotoTags]`
	`[Back to weblog: pretention]`