Mozilla Firefox Projects
Featured Projects for 2011:
- A fun project might be for someone to write a Valgrind tool. Click here to read about a cool tool that can tell you when your program allocates memory that is hardly (ever) accessed, which we've used successfully in Mozilla. It would be interesting to improve it to detect more fine-grained information, for example by tracking accesses to individual offsets within a memory block and discovering if there are unused or very-rarely-used fields (i.e., for a given allocation site and some fixed set of offsets, we never or very rarely use the memory at those offsets in blocks allocated at the site).
- An interesting project would to be to experiment with our Content Security Policy proposal. It would be interesting to take some common Web site frameworks/CMSes, e.g. phpBB, and see how easy it is to apply CSP to secure the site. A report could discuss the best way to use CSP, how CSP could be improved, what security benefits were obtained, and any other issues that arose during the process. For bonus points, choose a framework in which vulnerabilities have been discovered and determine whether those vulnerabilities would have been protected with CSP.
Medium-sized hacking projects:
- Writing high-performance virtual machines and emulators in Javascript; creative use of "eval" combined with a modern JS compiler (Tracemonkey, SFX, V8). Take one or more existing JS-based emulator interpreters on the Web (e.g., JSNES) and speed it up a lot using "eval" for dynamic code generation.
- Implement advanced PDF output when printing Web pages to PDF: clickable links, table of contents, form fields that you can fill in in the PDF
- Integrate Linux freedesktop "secrets" API into Firefox so your private Firefox data (passwords etc) will be stored securely without having to type a separate master password
- Integrate Windows 7's handwriting-to-MathML feature into Firefox's HTML contenteditable support (using the Firefox HTML5 parser so that MathML can be included directly into HTML without using XML), and get it to work in a WYSIWYG wiki editor, for an awesome mathematics wiki interface
- Add DOM APIs to discover what specific font faces are being used for a fragment of text or in the entire page, and to extract metadata for those faces (e.g., font vendor and license). Then extend Firebug (or write a specific extension) to display that data and assist Web authors in understanding what's going on with the fonts in the page. This will be essential as downloadable fonts get more widely used.
- Implement CSS auto-hyphenation feature for Web content
- Implement CSS "ruby" layout feature (used in Japanese)
Medium-sized "researchy" projects:
- Study the sources of noise in performance tests (including comparing use of VMs), and develop techniques to reduce the noise. (Random noise in our performance measurements is a huge problem for us, when we're trying to measure small changes in performance ... lots of small changes can add up to big changes.)
Major research projects:
- Evaluate memory utility: how much performance "bang" do we get per byte of memory consumed for non-essentials? (David Pearce brought up the observation that most of what we do is caching; ideally you want to allocate memory to caches to maximise overall performance --- some caches get more bang for the byte than others --- but it's completely unclear how to do this or even measure what you're getting)
- Characterize how strings get used in the browser, produce "Ultimate String Benchmark", compare string libraries, design optimal string library (strings are hugely important to space/time performance for many apps, including browsers, and there are many degrees of freedom for string library design, but string research is sadly neglected!)
- "Trace diff" for performance or correctness regression finding
- Use probabilistic, privacy-preserving techniques to capture browsing habits and pref settings statistics
- Recording user-space process execution for deterministic replay with low overhead would be incredibly useful. We can assume limited program cooperation, e.g., the program will not read CPU timestamps without going through a hookable well-known function. Is it possible to gather a detailed-enough trace for exact replay of regular Linux process exectution *without* instrumenting code at the instruction level? For example we could use LD_PRELOAD hacks to record system call results, and to reimplement threads as a user-threads library.
Huge research projects:
- With record-and-replay we can catch a hard-to-reproduce bug while recording and figure out what the problem is. Then we develop a fix. But how do you verify that the fix actually fixes a bug that you can't reproduce? You kind of want to patch the fix into your build and replay the recorded execution, except the fix will perturb the execution so you might fail to see the bug in the recorded execution even though you haven't really fixed it.
- Many, perhaps most, changes checked into mozilla-central are refactorings that should not change observable behaviour. (Many bug fixes can be split into a large refactoring plus a small fix; the refactoring enables the fix to be small and obvious.) Verifying that software meets a complete specification is prohibitively expensive in general, partly because writing a complete specification is very difficult, but for these refactoring patches it's easier to write a complete spec ("nothing changes"). Some of them can be easily verified (e.g. patches generated by automated refactorings). Can we verify the non-obvious refactorings?


