Skip to content

rust-lang/google-summer-of-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 

Repository files navigation

Rust project ideas

This page contains a list of ideas for various projects that could help improve the Rust Project and potentially also the wider Rust community.

These project ideas can be used as inspiration for various OSS contribution programs, such as Google Summer of Code or OSPP.

This document contains ideas that should still be actual and that were not yet completed. Here you can also find an archive of older projects from past GSoC events:

  • Past Google Summer of Code projects

We invite contributors that would like to participate in projects such as GSoC or that would just want to find a Rust project that they would like to work on to examine the project list and use it as an inspiration. Another source of inspiration can be the Rust Project Goals, particularly the orphaned goals.

If you would like to participate in GSoC, please read this. If you would like to discuss projects ideas or anything related to them, you can do so on our Zulip.

We use the GSoC project size parameters for estimating the expected time complexity of the project ideas. The individual project sizes have the following expected amounts of hours:

  • Small: 90 hours
  • Medium: 175 hours
  • Large: 350 hours

Index

Project ideas

The list of ideas is divided into several categories.

Rust Compiler

C codegen backend for rustc

Description

rustc currently has three in-tree codegen backends: LLVM (the default), Cranelift, and GCC. These live at https://github.com/rust-lang/rust/tree/master/compiler, as rustc_codegen_* crates.

The goal of this project is to add a new experimental rustc_codegen_c backend that could turn Rust's internal representations into C code (i.e. transpile) and optionally invoke a C compiler to build it. This will allow Rust to use benefits of existing C compilers (better platform support, optimizations) in situations where the existing backends cannot be used.

Expected result

The minimum viable product is to turn rustc data structures that represent a Rust program into C code, and write the output to the location specified by --out-dir. This involves figuring out how to produce buildable C code from the inputs provided by rustc_codegen_ssa::traits::CodegenBackend.

A second step is to have rustc invoke a C compiler on these produced files. This should be designed in a pluggable way, such that any C compiler can be dropped in.

Desirable skills

Knowledge of Rust and C, basic familiarity with compiler functionality.

Project size

Large.

Difficulty

Hard.

Mentor

Zulip streams

Extend annotate-snippets with features required by rustc

Description

rustc currently has incomplete support for using annotate-snippets to emit errors, but it doesn't support all the features that rustc's built-in diagnostic rendering does. The goal of this project is to execute the rustc test suite using annotate-snippets, identify missing features or bugs, fix those, and repeat until at feature-parity.

Expected result

More of the rustc test suite passes with annotate-snippets.

Desirable skills

Knowledge of Rust.

Project size

Medium.

Difficulty

Medium or hard.

Mentor

Zulip streams

Reproducible builds

Description

Recent OSS attacks such as the XZ backdoor have shown the importance of having reproducible builds.

Currently, the Rust toolchain distributed to Rust developers is not very reproducible. Our source code archives should be reproducible as of this pull request, however making the actual binary artifacts reproducible is a much more difficult effort.

The goal of this project is to investigate what exactly makes Rust builds not reproducible, and try to resolve as many such issues as possible.

While the main motivation is to make the Rust toolchain (compiler, standard library, etc.) releases reproducible, any improvements on this front should benefit the reproducibility of all Rust programs.

See Tracking Issue for Reproducible Build bugs and challenges for a non-exhaustive list of reproducibility challenges.

Expected result

Rust builds are more reproducible, ideally the Rust toolchain can be compiled in a reproducible manner.

Desirable skills

Knowledge of Rust and ideally also build systems.

Project size

Medium.

Difficulty

Hard.

Mentor

Related links

Bootstrap of rustc with rustc_codegen_gcc

Description

rustc_codegen_gcc used to be able to compile rustc and use the resulting compiler to successfully compile a Hello, World! program. While it can still compile a stage 2 rustc, the resulting compiler cannot compile the standard library anymore.

The goal of this project would be to fix in rustc_codegen_gcc any issue preventing the resulting compiler to compile a Hello, World! program and the standard library. Those issues are not known, so the participant would need to attempt to do a bootstrap and investigate the issues that arises.

If time allows, an optional additional goal could be to be able to do a full bootstrap of rustc with rustc_codegen_gcc, meaning fixing even more issues to achieve this result.

Expected result

A rustc_codegen_gcc that can compile a stage 2 rustc where the resulting compiler can compile a Hello, World! program using the standard library (also compiled by that resulting compiler).

An optional additional goal would be: a rustc_codegen_gcc that can do a full bootstrap of the Rust compiler. This means getting a stage 3 rustc that is identical to stage 2.

Desirable skills

Good debugging ability. Basic knowledge of:

Project size

Medium-Large depending on the chosen scope.

Difficulty

Hard.

Mentor

Zulip streams

Refactoring of rustc_codegen_ssa to make it more convenient for the GCC codegen

Description

rustc_codegen_gcc uses rustc_codegen_ssa and implements the traits in this crate in order to have a codegen that plugs in rustc seamlessly. Since rustc_codegen_ssa was created based on rustc_codegen_llvm, they are somewhat similar, which sometimes makes it awkward for the GCC codegen. Indeed, some hacks were needed to be able to implement the GCC codegen with this API:

  • Usage of unsafe transmute: for instance, this or this. Fixing this might require separating Value into RValue and LValue or using Function in place of Value in some places to better fit the GCC API.
  • Usage of mappings to workaround the API: for instance, this or this.

Some other improvement ideas include:

  • Separate the aggregate operations (structs, arrays): methods like extract_value are generic over structures and arrays because it's the same operation in LLVM, but it is different operations in GCC, so it might make sense to have multiple methods like extract_field and extract_array_element.
  • Remove duplications between rustc_codegen_gcc and rustc_codegen_llvm by moving more stuff into rustc_codegen_ssa. For instance:

The goal of this project is to improve rustc_codegen_gcc by removing hacks, unnecessary unsafe code and/or code duplication with rustc_codegen_llvm by refactoring rustc_codegen_ssa. It would be important that this refactoring does not result in a performance degradation for rustc_codegen_llvm.

Expected result

A rustc_codegen_gcc that contains less hacks, unsafe code and/or code duplication with rustc_codegen_llvm.

Desirable skills

Knowledge of Rust and basic knowledge of rustc internals, especially the codegen part.

Project size

Small-Medium depending on the chosen scope.

Difficulty

Medium.

Mentor

Zulip streams

ABI/Layout handling for the automatic differentiation feature

Description

Over the last year, support for automatic differentiation ('autodiff') was added to the Rust compiler. The autodiff tool which we are using (Enzyme) operates on LLVM-IR, which is the intermediate representation of code, used by LLVM. LLVM is the default backend of the Rust compiler. Unfortunately, two layout related problems limit its usability.

A) The Rust compiler has a set of ABI optimizations which can improve performance, but make it harder for autodiff to work. An example is the function fn foo(a: f32, b: f32) -> f32, which the compiler might optimize to fn foo(x: i64) -> f32. While this is fine from an LLVM perspective, it makes it hard for Enzyme, the LLVM based autodiff tool. More information about such optimizations can be found here. If a function has a #[rustc_autodiff] attribute, the Rust compiler should simply not perform such optimizations. We don't want to disable these optimizations for all functions, as they are generally beneficial. Multiple examples of function headers which will get handled incorrectly at the moment are listed here.

B) Enzyme requires good information about the memory layout of types, both to be able to differentiate the code, and to do so efficiently. In order to help Enzyme, we want to lower more Type Information from MIR or even THIR into LLVM-IR metadata, or make better usage of existing debug info. If you are interested in this part and also have some LLVM experience, please have a look at the LLVM website for the related proposal.

For both A) and B), the online compiler explorer here can be used to trigger both types of bugs, to get a feeling for existing problems.

Expected result

The Rust compiler should not perform ABI optimizations on functions with the #[rustc_autodiff] attribute. As a result, #[autodiff(..)] should be able to handle functions with almost arbitrary headers. If a general solution turns out tricky, it is ok to focus on the most common types like those listed in the issue above (e.g. combinations of floats, small arrays/structs/tuples, etc.). We care less about advanced types like those listed here. These changes can't have a performance impact on functions without the #[rustc_autodiff] attribute.

Newly working testcases should be added to the rust test suite. The rustc_autodiff parsing in the autodiff frontend might need small bugfixes if the new testcases discover additional bugs, but those can also be solved by other contributors.

Examples for code that currently is not handled correctly can be discussed in the project proposal phase.

Desirable skills

Intermediate knowledge of Rust. Familiarity with ABIs is a bonus, but not required.

Project size

Medium

Difficulty

Medium to hard.

Mentor

Zulip streams

Improving parallel frontend

Description

Improving compiler performance has always been a focus of the Rust community and one of the main tasks of the compiler team. Parallelization of rust compiler is an important and effective approach. Currently, the backend end (codegen part) of the compiler has been parallelized, which has brought a huge improvement in the performance of the compiler. However, there is still much room for improvement in the parallelization of the rust frontend.

The most important and valuable work in this area are two aspects:

A) Diagnosing and fixing deadlock issues caused by the execution order of compiler queries in a multithreaded environment. Queries is a unique design of the Rust compiler, which is used to achieve incremental compilation process. It divides the compiler process into various parts and caches the execution results of each part. However, queries caching dependencies between multiple threads may cause deadlock. Work-stealing, a method used to improve parallelization performance, is the core reason.

To solve these problems, we need to find the part of the compiler process that causes deadlock through diagnosing coredumps in issues, and adjusting the execution order of this part of code so that there will be no circular dependencies on the query caches between multiple threads. This PR is a good example of solving a deadlock problem.

B) Improving the performance of the parallel frontend The parallel frontend has implemented parallelization in type checking, MIR borrow checking and other parts of the compiler. However, there is still a lot of room for improvement:

  • HIR lowering. Modifying the array structure of tcx.untracked.definitions so that it can be accessed efficiently in multiple threads is likely to be the key.
  • Macro expansion. How to deal with the order problem of name resolution during macro expansion is a difficult problem.
  • Lexing and/or parsing.

Achieving the above goals is of big significance to improving the performance of the Rust compiler.

The project could choose either one of these two areas, or try to tackle both of them together.

Expected result

Parallel frontend will not cause deadlock issues. We can ensure usability through UI testing.

The performance of the compiler will be improved, ideally at least by a couple of percentage points.

Desirable skills

Intermediate knowledge of Rust. A basic understanding of the implementation of the compiler process (such as typeck, hir_lowering, macro expansion) would be ideal.

Project size

Medium to hard (depending on the chosen scope).

Difficulty

Medium to hard.

Mentor

Zulip streams

Infrastructure

Implement merge functionality in bors

Description

Various Rust repositories under the rust-lang organization use a merge queue bot (bors) for testing and merging pull requests. Currently, we use a legacy implementation called homu, which is quite buggy and very difficult to maintain, so we would like to get rid of it. We have started the implementation of a new bot called simply bors, which should eventually become the primary method for merging pull requests in the rust-lang/rust repository.

The bors bot is a GitHub app that responds to user commands and performs various operations on a GitHub repository. Primarily, it creates merge commits and reports test workflow results for them. It can currently perform so-called "try builds", which can be started manually by users on a given PR to check if a subset of CI passed on the PR. However, the most important functionality, actually merging pull requests into the main branch, has not been implemented yet.

Expected result

bors can be used to perform pull request merges, including "rollups". In an ideal case, bors will be already usable on the rust-lang/rust repository.

Desirable skills

Intermediate knowledge of Rust. Familiarity with GitHub APIs is a bonus.

Project size

Medium.

Difficulty

Medium.

Mentors

Zulip streams

Improve bootstrap

Description

The Rust compiler it bootstrapped using a complex set of scripts and programs generally called just bootstrap. This tooling is constantly changing, and it has accrued a lot of technical debt. It could be improved in many areas, for example:

  • Design a new testing infrastructure and write more tests.
  • Write documentation.
  • Remove unnecessary hacks.

Expected result

The bootstrap tooling will have less technical debt, more tests, and better documentation.

Desirable skills

Intermediate knowledge of Rust. Knowledge of the Rust compiler bootstrap process is welcome, but not required.

Project size

Medium or large.

Difficulty

Medium.

Mentor

Zulip streams

Cargo

Prototype an alternative architecture for cargo fix

Description

Some compiler errors know how to fix the problem and cargo fix is the command for applying those fixes. Currently, cargo fix calls into the APIs that implement cargo check with cargo in a way that allows getting the json messages from rustc and apply them to workspace members. To avoid problems with conflicting or redundant fixes, cargo fix runs rustc for workspace members in serial. As one fix might lead to another, cargo fix runs rustc for each workspace member in a loop until a fixed point is reached. This can be very slow for large workspaces.

We want to explore an alternative architecture where cargo fix runs the cargo check command in a loop, processing the json messages, until a fixed point is reached.

Benefits

  • Always runs in parallel
  • May make it easier to extend the behavior, like with an interactive mode

Downsides

  • Might have issues with files owned by multiple packages or even multiple build targets

This can leverage existing CLI and crate APIs of Cargo and can be developed as a third-party command.

See cargo#13214 for more details.

Expected result

  • A third-party command as described above
  • A comparison of performance across representative crates
  • An analysis of corner the behavior with the described corner cases

Desirable skills

Intermediate knowledge of Rust.

Project size

Medium

Difficulty

Medium.

Mentor

Zulip streams

Prototype Cargo plumbing commands

Description

Cargo is a high-level, opinionated command. Instead of trying to directly support every use case, we want to explore exposing the building blocks of the high-level commands as "plumbing" commands that people can use programmatically to compose together to create custom Cargo behavior.

This can be prototyped outside of the Cargo code base, using the Cargo API.

See the Project Goal for more details.

Expected result

Ideal: a performant cargo porcelain check command that calls out to individual cargo plumbing <name> commands to implement its functionality.

Depending on the size the particpant takes on and their experience, this may be out of reach. The priorities are:

  1. A shell of cargo porcelain check
  2. Individual commands until cargo porcelain check is functional
  3. Performance

Desirable skills

Intermediate knowledge of Rust.

Project size

Scaleable

Difficulty

Medium.

Mentor

Zulip streams

Move cargo shell completions to Rust

Description

Cargo maintains Bash and Zsh completions, but they are duplicated and limited in features.

A previous GSoC participant added unstable support for completions in Cargo itself, so we can have a single implementation with per-shell skins (rust-lang/cargo#6645).

There are many more arguments that need custom completers as well as polish in the completion system itself before this can be stabilized.

See

Expected result

Ideal:

  • A report to clap maintainers on the state of the unstable completions and why its ready for stabilization
  • A report to cargo maintainers on the state of the unstable completions and why its ready for stabilization

Desirable skills

Intermediate knowledge of Rust. Shell familiarity is a bonus.

Project size

Medium.

Difficulty

Medium.

Mentor

Build script delegation

Description

When developers need to extend how Cargo builds their package, they can write a build script. This gives users quite a bit of flexibility but

  • Allows running arbitrary code on the users system, requiring extra auditing
  • Needs to be compiled and run before the relevant package can be built
  • They are all-or-nothing, requiring users to do extra checks to avoid running expensive logic
  • They run counter to the principles of third-party build tools that try to mimic Cargo

A developer could make their build script a thin wrapper around a library (e.g. shadow-rs) but a build script still exists to be audited (even if its small) and each individual wrapper build script must be compiled and linked. This is still opaque to third-party build tools.

Leveraging an unstable feature, artifact dependencies, we could allow a developer to say that one or more dependencies should be run as build scripts, passing parameters to them.

This project would add unstable support for build script delegation that can then be evaluated for proposing as an RFC for approval.

See the proposal for more details.

Expected result

Milestones

  1. An unstable feature for multiple build scripts
  2. An unstable feature for passing parameters to build scripts from Cargo.toml, built on the above
  3. An unstable feature for build script delegation, built on the above two

Bonus: preparation work to stabilize a subset of artifact dependencies.

Desirable skills

Intermediate knowledge of Rust, especially experience with writing build scripts.

Project size

Large.

Difficulty

Medium.

Mentor

Crate ecosystem

Modernize the libc crate

Description

The libc crate is one of the oldest crates of the Rust ecosystem, long predating Rust 1.0. Additionally, it is one of the most widely used crates in the ecosystem (#4 most downloaded on crates.io). This combinations means that the current version of the libc crate (v0.2) is very conservative with breaking changes and remains backwards-compatible with all Rust compilers since Rust 1.13 (released in 2016).

The language has evolved a lot since Rust 1.13, and we would like to make use of these features in libc. The main one is support for union types to proper expose C unions.

At the same time there, is a backlog of desired breaking changes tracked in this issue. Some of these come from the evolution of the underlying platforms, some come from a desire to use newer language features, while others are simple mistakes that we cannot correct without breaking existing code.

The goal of this project is to prepare and release the next major version of the libc crate.

Expected result

The libc crate is cleaned up and modernized, and released as version 0.3.

Desirable skills

Intermediate knowledge of Rust.

Project size

Medium.

Difficulty

Medium.

Mentor

Zulip streams

Add more lints to cargo-semver-checks

Description

cargo-semver-checks is a linter for semantic versioning. It ensures that Rust crates adhere to semantic versioning by looking for breaking changes in APIs.

It can currently catch ~120 different kinds of breaking changes, meaning there are hundreds of kinds of breaking changes it still cannot catch! The goal of this project is to extend its abilities, so that it can catch and prevent more breaking changes, by:

  • adding more lints, which are expressed as queries over a database-like schema (playground)
  • extending the schema, so more Rust functionality is made available for linting

Expected result

cargo-semver-checks will contain new lints, together with test cases that both ensure the lint triggers when expected and does not trigger in situations where it shouldn't (AKA false-positives).

Desirable skills

Intermediate knowledge of Rust. Familiarity with databases, query engines, or query language design is welcome but not required.

Project size

Medium or large, depends on how many lints will be implemented. The more lints, the better!

Difficulty

Medium to high, depends on the choice of implemented lints or schema extensions.

Mentor

Zulip streams

Related Links

Make cargo-semver-checks run faster

Description

As more lints get added to cargo-semver-checks, its runtime grows longer. As a result, users' iteration loops and CI pipelines take longer as well, degrading the overall experience of using the tool.

Figure out ways to speed up cargo-semver-checks, and find good ways to deploy them without degrading the maintainability of the codebase!

Expected result

The wall-clock runtime of running cargo-semver-checks on a large Rust crate gets cut by 50-80%, while still running the same lints as before.

Desirable skills

Interest in and at least a bit of experience with performance engineering. Understanding of how to apply techniques like:

  • profiling and benchmarking
  • parallel programming (e.g. with rayon)
  • building and applying indexes (in the database sense)

Strong attention to detail. Willingness to learn quickly and perform lots of experiments, even though many of them may prove to be dead ends. Discipline and thoughtfulness when writing and testing code, to ensure that code changes are not merely fast but also maintainable.

Project size

Ideally large, to have the biggest possible positive performance impact.

Difficulty

Medium to high. See the "desirable skills" section above.

Mentor

Zulip streams

Related Links

Enable witness generation in cargo-semver-checks

Description

When cargo-semver-checks reports a breaking change, it in principle has seen enough information for the breakage to be reproduced with an example program: a witness program. Witness programs are valuable as they confirm that the suspected breakage did indeed happen, and is not a false-positive.

Expected result

Automatic witness generation is something we've explored, but we've only scratched the surface at implementing it so far. The goal of this project would be to take it the rest of the way: enable cargo-semver-checks to (with the user's opt-in) generate witness programs for each lint, verify that they indeed demonstrate the detected breakage, and inform the user appropriately of the breakage and the manner in which it was confirmed. If a witness program fails to reproduce breakage flagged by one of our lints, we've found a bug — the tool should then prepare a diagnostic info packet and offer to help the user open an auto-populated GitHub issue.

Stretch goal: having implemented witness generation, run another study of SemVer compliance in the Rust ecosystem, similar to the study we completed in 2023. The new study would cover many more kinds of breaking changes, since cargo-semver-checks today has 2.5x times more lints than it did back then. It would also reveal any new false-positive issues, crashes, or other regressions that may have snuck into the tool in the intervening years.

Desirable skills

Intermediate knowledge of Rust. Interest in building dev tools, and empathy for user needs so we can design the best possible user experience. Familiarity with databases, query engines, or programming language design is welcome but not required.

Project size

Large

Difficulty

Medium

Mentor

Zulip streams

Related Links

Wild linker with test suites from other linkers

Description

The Wild linker is a project to build a very fast linker in Rust that has incremental linking and hot reload capabilities.

It currently works well enough to link itself, the Rust compiler, clang (provided you use the right compiler flags) and a few other things. However, there are various features and combinations of flags that don’t yet work correctly. Furthermore, we have a pretty incomplete picture of what we don’t support.

The proposed project is to run the test suite of other linkers with Wild as the linker being tested, then for each failure, determine what the problem is. It’s expected that many failures will have the same root cause.

Expected result

Write a program, ideally in Rust, that runs the test suite of some other linker. Mold’s test suite is pretty easy to run with Wild, so that’s probably a good default choice. The Rust program should emit a CSV file with one row per test, whether the test passes or fails and if it fails, an attempt to identify the cause based on errors / warnings emitted by Wild.

For tests where Wild doesn’t currently emit any error or warning that is related to the cause of the test failure, attempt to make it do so. Some of the tests might fail for reasons that are hard to identify. It’s OK to just leave these as uncategorised. Where tests fail due to bugs or differences in behaviour of Wild, automatic classification likely isn’t practical. A one-off classification of these would be beneficial.

If time permits, pick something achievable that seems like an important feature / bug to support / fix and implement / fix it.

Desirable skills

Knowledge of Rust. Any existing knowledge of low-level details like assembly or the ELF binary format is useful, but can potentially be learned as we go.

Project size

Small to large depending on chosen scope.

Difficulty

Some of the work is medium. Diagnosing and / or fixing failures is often pretty hard.

Mentor

Zulip streams

Further resources

About

Rust project ideas for Google Summer of Code

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published