๐Ÿ‘‹๐Ÿฝ Welcome

Welcome to the wg-async-foundations website!

Leads

The leads of this working group are @tmandry and @nikomatsakis. Both of them can be found on Zulip.

Getting involved

There is a weekly triage meeting that takes place in our Zulip stream. Feel free to stop by then (or any time!) to introduce yourself.

If you're interested in fixing bugs, though, there is no need to wait for the meeting! Take a look at the instructions here.

What is the goal of this working group?

This working group is focused around implementation/design of the โ€œfoundationsโ€ for Async I/O. This means that we are focused on designing and implementing extensions to the language, standard library, and other "core" bits of support offered by the Rust organization. We do not directly work on external projects like tokio, async-std, smol, embassy and so forth, although we definitely discuss ideas and coordinate with them where appropriate.

Zulip

We hold discussions on the #wg-async-foundations stream in Zulip

License

Licensed under either of

  • Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
  • MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

๐Ÿ”ฎ The vision

What is this

We believe Rust can become one of the most popular choices for building distributed systems, ranging from embedded devices to foundational cloud services. Whatever they're using it for, we want all developers to love using Async Rust. For that to happen, we need to move Async Rust beyond the "MVP" state it's in today and make it accessible to everyone.

This document is a collaborative effort to build a shared vision for Async Rust. Our goal is to engage the entire community in a collective act of the imagination: how can we make the end-to-end experience of using Async I/O not only a pragmatic choice, but a joyful one?

Where we are and where we are going

The "vision document" starts with a cast of characters. Each character is tied to a particular Rust value (e.g., performance, productivity, etc) determined by their background; this background also informs the expectations they bring when using Rust. Grace, for example, wants to keep the same level of performance she currently get with C, but with the productivity benefits of memory safety. Alan, meanwhile, is hoping Rust will give him higher performance without losing the safety and ergonomics that he enjoys with garbage collected languages.

For each character, we write "status quo" stories that describe the challenges they face as they try to achieve their goals (and typically fail in dramatic fashion!), These stories are not fiction. They are an amalgamation of the real experiences of people using Async Rust, as reported to us by interviews, blog posts, and tweets. Writing these stories helps us gauge the cumulative impact of the various papercuts and challenges that one encounters when using Async Rust.

The ultimate goal of the vision doc, of course, is not just to tell us where we are now, but where we are going and how we will get there. For this, we include "shiny future" stories that tell us how those same characters will fare in a few years time, when we've had a chance to improve the Async Rust experience.

The vision drives the work

The vision is not just idle speculation. It is the central document that we use to organize ourselves. When we think about our roadmap for any given year, it is always with the aim of moving us closer to the vision we lay out here.

Involving the whole community

The async vision document provides a forum where the Async Rust community can plan a great overall experience for Async Rust users. Async Rust was intentionally designed not to have a "one size fits all" mindset, and we don't want to change that. Our goal is to build a shared vision for the end-to-end experience while retaining the loosely coupled, exploration-oriented ecosystem we have built.

๐Ÿšง Under construction! Help needed! ๐Ÿšง

This document is not yet complete! We are actively working on it as part of the working group, and we would like your help! Check out the How to vision doc page for more details.

graph TD;
A-->B;
A-->C;
B-->D;
C-->D;

โ“ How to vision

How you can help

WhenWhat
โœ… Now till 2021-05-14Improve the sample projects
โœ… Now till 2021-05-14Propose new "status quo" stories or comment on existing PRs
โœ… Now till 2021-05-14Propose new "shiny future" stories or comment on existing PRs
๐Ÿ›‘ Starting 2021-05-14Vote for the awards on the status quo and shiny future stories!

The big picture

The process we are using to write the vision doc encourages active collaboration and "positive sum" thinking. It starts with a brainstorming period, during which we aim to collect as many "status quo" and "shiny future" stories as we can.

This brainstorming period runs for six weeks, until the end of April. For the first two weeks (until 2021-04-02), we are collecting "status quo" stories only. After that, we will accept both "status quo" and "shiny future" stories until the end of the brainstorming period. Finally, to cap off the brainstorming period, we will select winners for awards like "Most Humorous Story" or "Most Supportive Contributor".

Once the brainstorming period is complete, the working group leads will begin work on assembling the various stories and shiny futures into a coherent draft. This draft will be reviewed by the community and the Rust teams and adjusted based on feedback.

Brainstorming

The brainstorming period runs until 2021-05-14:

The more the merrier!

During this brainstorming period, we want to focus on getting as many ideas as we can. Having multiple "shiny futures" that address the same problem is a feature, not a bug, as it will let us mix-and-match later to try and find the best overall plan. Comments and questions will be used as a tool for improving understanding or sharpening proposals. Presenting alternative ideas is done by writing an alternative story.

Reviewing contributions

To merge a story or project PR, any member of the working group can open a topic on Zulip and propose it be merged. Ideally there will be no outstanding concerns. If a second member of the working group approves, the PR can then be merged.

Reviewers should ensure that new stories and projects are added to the SUMMARY.md file either before merging or directly afterwards.

Harmonizing

At this point, the wg leads will write the draft vision document, drawing on the status quo and shiny future stories that were submitted. Like an RFC, this draft vision doc will be opened for comment and improved based on the resulting feedback. When the wg leads feel it is ready, it will be taken to the lang and libs teams for approval (and other Rust teams as appropriate).

Living document

This meant to be a living document. We plan to revisit it regularly to track our progress and update it based on what we've learned in the meantime. Note that the shiny future stories in particular are going to involve a fair bit of uncertainty, so we expect them to change as we go.

Wait, did somebody say awards?

Yes! We are planning to give awards in various categories for folks who write status quo and shiny future PRs. The precise categories are TBD. Check out the awards page for more details.

โ“ How to vision: Projects

How to open a PR

If you'd like to add a new project, please open a PR using this template and adding a new file into the projects directory. Do not add your file to SUMMARY.md, that will create conflicts. We'll add it after merging.

We are pretty happy to add new projects, although we would prefer only to add a new project if it has some characteristic that is distinct from the other projects we've got so far and which is important to a 'status quo' or 'shiny future' story.

FAQs to answer in your PR

In your PR, make sure to include the following FAQs:

  • What makes this project different from most others?
  • Are there existing crates that are similar to this project?

โ“ How to vision: "Status quo" stories

We want to make sure all Async Rust users and their experiences are reflected in the async vision doc, so please help us by writing 'status quo' stories about your experiences or the experiences of others! Remember, status quo stories are not "real", but neither are they fiction. They are constructed from the real experiences of people using Async Rust (often multiple people).

TL;DR

Just want to get started? Here are quick instructions to get you going:

Optional: open an issue to discuss your story or find others with similar experiences

If you have a story idea but you don't have the time to write about it, or if you would like to know whether other folks have encountered the same sorts of problems, you can open up a "status quo" story issue on the wg-async-foundations repository. Alternatively, if you're looking for a story to write, you can browse the open issues tagged as status-quo-story-idea and see if anything catches your eye. If you see people describing problems you have hit, or have questions about the experiences people are sharing, then please leave a comment -- but remember to comment supportively. (You can also come to Zulip to discuss.)

How to open a PR

If you have an idea you'd like to write about, please open a PR using this template and adding a new file into the status_quo directory. Do not add your file to SUMMARY.md -- that will create conflicts, we'll do it manually after merging.

Goals of a status quo PR

When writing a status quo story, your goal is to present what you see as a major challenge for Async Rust. You want to draw upon people's experiences (sometimes multiple people) to show all the aspects of the problem in an engaging and entertaining way.

Each story is always presented from the POV of a particular character. Stories should be detailed, not abstract -- it's better to give specifics than generalities. Don't say "Grace visited a website to find the answer to her question", tell us whether she went to stackoverflow, asked on reddit, or found the answer on some random blog post. Ideally you should get this detail from whatever your "source" of the story is -- but if you are using multiple sources and they disagree, you can pick one and use the FAQ to convey some of the other alternatives.

The role of the FAQ

Every status quo PR includes a FAQ. This FAQ should always include answers to some standard questions:

  • What are the morals of the story?
    • Talk about the major takeaways-- what do you see as the biggest problems.
  • What are the sources for this story?
    • Talk about what the story is based on, ideally with links to blog posts, tweets, or other evidence.
  • Why did you choose NAME to tell this story?
    • Talk about the character you used for the story and why.
  • How would this story have played out differently for the other characters?
    • In some cases, there are problems that only occur for people from specific backgrounds, or which play out differently. This question can be used to highlight that.

You can feel free to add whatever other FAQs seem appropriate. You should also expect to grow the FAQ in response to questions that come up on the PR.

The review process

When you open a status quo PR, people will start to comment on it. These comments should always be constructive, with the goal not of negating the story but of making it more precise or more persuasive. Ideally, you should respond to every comment in one of two ways:

  • Adjust the story with more details or to correct factual errors.
  • Add something to the story's FAQ to explain the confusion.
    • If the question is already covered by a FAQ, you can just refer the commenter to that.

The goal is that, at the end of the review process, the status quo story has a lot more details that address the major questions people had.

๐Ÿค” Frequently Asked Questions

What is the process to propose a status quo story?

What if my story applies to multiple characters?

  • Look at the "morals" of your story and decide which character will let you get those across the best.
  • Use the FAQ to talk about how other characters might have been impacted.
  • If the story would play out really differently for other characters, maybe write it more than once!

How much detail should I give? How specific should I be?

  • Detailed is generally better, but only if those details are helpful for understanding the morals of your story.
  • Specific is generally better, since an abstract story doesn't feel as real.

What should I do when I'm trying to be specific but I have to make an arbitrary choice?

Add a FAQ with some of the other alterantives, or just acknowledging that you made an arbitrary choice there.

None of the characters are a fit for my story.

It doesn't have to be perfect. Pick the one that seems like the closest fit. If you really feel stuck, though, come talk to us on Zulip about it!

How should I describe the "evidence" for my status quo story?

The more specific you can get, the better. If you can link to tweets or blog posts, that's ideal. You can also add notes into the conversations folder and link to those. Of course, you should be sure people are ok with that.

โ“ How to vision: "Shiny future" stories

We want all Async Rust users and their hopes and dreams for what Async Rust should be in the future to be reflected in the async vision doc, so please help us by writing 'shiny future' stories about what you would like async Rust to look like! Remember: we are in a brainstorming period. Please feel free to leave comments in an effort to help someone improve their PRs, but if you would prefer a different approach, you are better off writing your own story. (In fact, you should write your own story even if you like their approach but just have a few alternatives that are worth thinking over.)

TL;DR

Just want to get started? Here are quick instructions to get you going:

  • To write your own story:

How to open a PR

If you have an idea you'd like to write about, please open a PR using this template and adding a new file into the shiny_future directory. Do not add your file to SUMMARY.md, that will create conflicts. We'll do it after merging.

Goals of a shiny future PR

Shiny future PRs "retell" the story from one or more status quo PRs. The story is now taking place 2-3 years in the future, when Async Rust has had the chance to make all sorts of improvements. Shiny future stories are aspirational: we don't have to know exactly how they will be achieved yet! (Of course, it never hurts to have a plan too.)

Like status quo stories, each shiny future story is always presented from the POV of a particular character. They should be detailed. Sometimes this will mean you have to make stuff up, like method names or other details -- you can use the FAQ to spell out areas of particular uncertainty.

The role of the FAQ

Every shiny future PR includes a FAQ. This FAQ should always include answers to some standard questions:

  • What status quo story or stories are you retelling?
    • Link to the status quo stories here. If there isn't a story that you're retelling, write it!
  • What is Alan most excited about in this future? Is he disappointed by anything?
    • Think about Alan's top priority (performance) and the expectations he brings (ease of use, tooling, etc). How do they fare in this future?
  • What is Grace most excited about in this future? Is she disappointed by anything?
    • Think about Grace's top priority (memory safety) and the expectations she brings (still able to use all the tricks she knows and loves). How do they fare in this future?
  • What is Niklaus most excited about in this future? Is he disappointed by anything?
    • Think about Niklaus's top priority (accessibility) and the expectations he brings (strong community that will support him). How do they fare in this future?
  • What is Barbara most excited about in this future? Is she disappointed by anything?
    • Think about Barbara's top priority (productivity, maintenance over time) and the expectations she brings (fits well with Rust). How do they fare in this future?
  • If this is an alternative to another shiny future, which one, and what motivated you to write an alternative?
    • Cite the story. Be specific, but focus on what you like about your version, not what you dislike about the other.
    • If this is not an alternative, you can skip this one. =)
  • What projects benefit the most from this future?
  • Are there any projects that are hindered by this future?

There are also some optional questions:

  • What are the incremental steps towards realizing this shiny future?
    • Talk about the actual work we will do. You can link to design docs or even add new ones, as appropriate.
    • You don't have to have the whole path figured out yet!
  • Does realizing this future require cooperation between many projects?
    • For example, if you are describing an interface in libstd that runtimes will have to implement, talk about that.

You can feel free to add whatever other FAQs seem appropriate. You should also expect to grow the FAQ in response to questions that come up on the PR.

The review process

When you opan a status quo PR, people will start to comment on it. These comments should always be constructive. They usually have the form of asking "in this future, what does NAME do when X happens?" or asking you to elaborate on other potential problems that might arise. Ideally, you should respond to every comment in one of two ways:

  • Adjust the story with more details or to correct factual errors.
  • Add something to the story's FAQ to explain the confusion.
    • If the question is already covered by a FAQ, you can just refer the commenter to that.

The goal is that, at the end of the review process, the status quo story has a lot more details that address the major questions people had.

๐Ÿค” Frequently Asked Questions

What is the process to propose a shiny future story?

What character should I use for my shiny future story?

  • Usually you would use the same character from the status quo story you are retelling.
  • If for some reason you chose a different character, add a FAQ to explain why.

What do I do if there is no status quo story for my shiny future?

Write the status quo story first!

What happens when there are multiple "shiny future" stories about the same thing?

During this brainstorming period, we want to focus on getting as many ideas as we can. Having multiple "shiny futures" that address the same problem is a feature, not a bug, as it will let us mix-and-match later to try and find the best overall plan.

How much detail should I give? How specific should I be?

  • Detailed is generally better, but only if those details are helpful for understanding the morals of your story.
  • Specific is generally better, since an abstract story doesn't feel as real.

What is the "scope" of a shiny future story? Can I tell shiny future stories that involve ecosystem projects?

All the stories in the vision doc are meant to cover the full "end to end" experience of using async Rust. That means that sometimes they will take about things that are really part of projects that are outside of the Rust org. For example, we might write a shiny future that involves how the standard library has published standard traits for core concepts and those concepts have been adopted by libraries throughout the ecosystem. There is a FAQ that asks you to talk about what kinds of coordinate between projects will be required to realize this vision.

What do I do when I get to details that I don't know yet?

Take your best guess and add a FAQ explaining which details are still up in the air.

Do we have to know exactly how we will achieve the "shiny future"?

You don't have to know how your idea will work yet. We will eventually have to figure out the precise designs, but at this point we're more interested in talking about the experience we aim to create. That said, if you do have plans for how to achieve your shiny future, you can also include [design docs] in the PR, or add FAQ that specify what you have in mind (and perhaps what you have to figure out still).

What do I do if somebody leaves a comment about how my idea will work and I don't know the answer?

Add it to the FAQ!

What if we write a "shiny future" story but it turns out to be impossible to implement?

Glad you asked! The vision document is a living document, and we intend to revisit it regularly. This is important because it turns out that predicting the future is hard. We fully expect that some aspects of the "shiny future" stories we write are going to be wrong, sometimes very wrong. We will be regularly returning to the vision document to check how things are going and adjust our trajectory appropriately.

โ“ How to vision: Constructive comments

Figuring out the future is tricky business. We all know the internet is not always a friendly place. We want this discussion to be different.

Be respectful and supportive

Writing a "status quo" or "shiny future" story is an act of bravery and vulnerability. In the status quo, we are asking people to talk about the things that they or others found hard, to admit that they had trouble figuring something out. In the case of the shiny future, we're asking people to put out half-baked ideas so that we can find the seeds that will grow into something amazing. It doesn't take much to make that go wrong.

Comment to understand or improve, not to negate or dissuade

โ€œMost people do not listen with the intent to understand; they listen with the intent to reply.โ€

-- Stephen Covey

The golden rule is that when you leave a comment, you are looking to understand or improve the story.

For status quo stories, remember that these are true stories about people's experiences -- they can't be wrong (though they could be inaccurate). It may be that somebody tries for days to solve a problem that would've been easy if they had just known to call a particular method. That story is not wrong, it's an opportunity to write a shiny future story in which you explain how they would've learned about that method, or perhaps about how that method would become unnecessary.

For shiny future stories, even if you don't like the idea, you should ask comments with the goal of better understanding what the author likes about it. Understanding that may give you an idea for how to get those same benefits in a way that you are happier with. It's also valid to encourage the author to elaborate on the impact their story will have on different characters.

You might just want to write your own story

Remember, opening your own PR is free (In fact, we're giving an award for being "most prolific"). If you find that you had a really different experience than a status quo story, or you have a different idea for a shiny future, consider just writing your own PR instead of commenting negatively on someone else's. The goal of the brainstorming phase is to put a lot of alternatives, so that we can look for opportunities to combine them and make something with the best of both.

Good questions for status quo stories

Here are some examples of good questions for "status quo" stories:

  • Tell me more about this step. What led NAME to do X?
  • What do you think OTHER_NAME would have done here?
  • Can you be more specific about this point? What library did they use?

Good questions for shiny future stories

Here are some examples of good questions for "shiny future" stories:

  • How does NAME do X in this future?
  • It seems like this would interfere with X, which is important for application A. How would NAME handle that case in this future?

You should not be afraid to raise technical concerns -- we need to have a robust technical discussion! But do so in a way that leaves room to find an answer that satisfies both of you.

โ“ How to vision: Awards

At the end of the brainstorming period, we'll also vote on various awards to give to the status quo and shiny future PRs that were submitted.

Award categories

These are the award categories:

  • Most humorous story
  • Most creative story
  • Most supportive -- who left the most helpful comments?
  • Most prolific -- who wrote the most stories?
  • Most unexpected -- which status quo story (or shiny future) took you by surprise?
  • Most painful "status quo" story
  • Most ambitious "shiny future" story
  • Most extensive FAQ

However, if you have an idea for another award category, we are happy to take suggestions. One rule: the awards can't be negative (e.g., no "most unrealistic"), and they can't be about which thing is "best". That would work against the brainstorming spirit.

Voting

At the end of the brainstorming period, we're going to have a voting session to select which PRs and people win the awards. The winners will be featured in a blog post. ๐Ÿ†

โœ๏ธ Design tenets for async

StatusOwner
โš ๏ธ Draft โš ๏ธnikomatsakis

Draft status. These tenets are a first draft. nikomatsakis plans to incorporate feedback and revise them before they are finalized.

The design tenets describe the key principles that drive our work on async. Hopefully, we are able to achieve and honor all of them all of the time. Sometimes, though, they come into conflict, and we have to pick -- in that case, we prefer the tenet earlier in the list.

  1. Minimal overhead. Rust Async I/O performance should compare favourably with any other language. In the extreme case, it should be possible to use async/await without any required allocation, although this is unlikely to be a common case in production systems.
  2. Easy to get started, but able to do anything you want. We should make it simple to write Async I/O code and get things that work reasonably well, but it should be possible for people to obtain fine-grained control as needed.
  3. Async is like sync, but with blocking points clearly identified. At the highest level, writing a simple program using asynchronous I/O in Rust should be analogous to writing one that uses synchronous I/O, except that one adds async in front of function declarations and adds .await after each call. We should aim for analogous design between synchronous and asynchronous equivalents. Similarly, streams should be like asynchronous iterators. One should be able to use the same sort of combinators with streams and to iterate over them in analogous ways.
  4. No one true runtime. We need to be able to hook into existing runtimes in different environments, from embedded environments to runtimes like node.js. Specialized systems need specialized runtimes.
  5. Library ecosystem is key. We want to have a strong ecosystem of async crates, utilities, and frameworks. This will require mechanisms to write libraries/utilities/frameworks that are generic and interoperable across runtimes.

Stress tests

"Stress tests" are important use cases that tend to "stretch" the design. When we are contemplating changes, it's important to look over the stress tests and make sure that they all still work:

  • Single-threaded executors: Some systems tie each task to a single thread; such tasks should be able to access data that is not Send or Sync, and the executor for those tasks should be able to be fully optimized to avoid atomic accesses, etc.
  • Multi-threaded executors: Many systems migrate tasks between threads transparently, and that should be supported as well, though tasks will be required to be Send.
  • "Bring your own runtime": The Rust language itself should not require that you start threads, use epoll, or do any other particular thing.
  • Zero allocation, single task: Embedded systems might want to be able to have a single task that is polled to completion and which does no allocation whatsoever.
  • Multiple runtimes in one process: Sometimes people have to combine systems, each of which come with their own event loop. We should avoid assuming there is one global event loop in the system.
  • Non-Rust based runtimes: Sometimes people want to integrate into event loops from other, non-Rust-based systems.
  • WebAssembly in the browser: We want to integrate with WebAssembly.

๐Ÿ™‹โ€โ™€๏ธ Cast of characters

What is this?

We've created four characters that we use to guide our thinking. These characters are the protagonists of our status quo and shiny future stories, and they help us to think about the different kinds of priorities and expectations that people bring to Async Rust. Having names and personalities also makes the stories more fun and approachable.

The characters

  • Alan: the experienced "GC'd language" developer, new to Rust
    • Top priority: performance -- that's what he is not getting from current GC'd language
    • Expectations: absence of memory safety bugs (he gets that now from his GC), strong ecosystem, great tooling
  • Grace: the systems programming expert, new to Rust
    • Top priority: memory safety -- that's what she is not getting from C/C++
    • Expectations: able to do all the things she's used to from C/C++
  • Niklaus: new programmer from an unconventional background
    • Top priority: accessibility -- he's learning a lot of new things at once
    • Expectations: community -- the community enabled him to have early success, and he is excited to have it support him and him grow more
  • Barbara: the experienced Rust developer
    • Top priority: overall productivity and long-term maintenance -- she loves Rust, and wants to see it extended to new areas; she has an existing code base to maintain
    • Expectations: elegance and craftsmanship, fits well with Rust

๐Ÿค” Frequently Asked Questions

Where do the names come from?

Famous programming language designers and theorists. Alan Turing, Grace Hopper, Niklaus Wirth, and Barbara Liskov.

I don't see myself in these characters. What should I do?

Come to Zulip and talk to us about it! Maybe they need to be adjusted!

I see myself in more than one of these characters!

Yeah, me too.

๐Ÿ™‹โ€โ™€๏ธ Cast of characters

Alan: the experienced "GC'd language" developer, new to Rust

Variant A: Dynamic languages

Alan has been programming for years. He has built systems in Ruby on Rails, node.js, and used Django too. Lately he's been learning Rust and he is tinkering with integrating Rust into some of his projects to get better performance and reliability. He's also building some projects entirely in Rust.

Variant B: Java

Alan works at a Java shop. They run a number of network services built in Java, along with some that use Kotlin or Scala. He's very familiar with the Java ecosystem and the tooling that the JVM offers. He's also sometimes had to tweak his code to work around garbage collector latencies or to reduce overall memory usage. He's curious to try porting some systems to Rust to see how it works.

Variant C: Kotlin

Alan is developing networking programs in Kotlin. He loves Kotlin for its expressive syntax and clean integration with Java. Still, he sometimes encounters problems running his services due to garbage collection latencies or overall memory usage. He's heard that Rust can be fun to use too, and is curious to try it out.

๐Ÿค” Frequently Asked Questions

What does Alan want most from Async Rust?

  • The promise of better performance and memory usage than the languages he's been using. Rust's safety guarantees are important too; he's considered using C++ in the past but always judged the maintenance burden would be too high.

What expectations does Alan bring from his current environment?

  • A focus on ease of use, a strong ecosystem, and great tooling.

๐Ÿ™‹โ€โ™€๏ธ Cast of characters

Grace: the systems programming expert, new to Rust

Grace has been writing C and C++ for a number of years. She's accustomed to hacking lots of low-level details to coax the most performance she can from her code. She's also experienced her share of epic debugging sessions resulting from memory errors in C. She's intrigued by Rust: she likes the idea of getting the same control and performance she gets from C but with the productivity benefits she gets from memory safety. She's currently experimenting with introducing Rust into some of the systems she works on, and she's considering Rust for a few greenfield projects as well.

๐Ÿค” Frequently Asked Questions

What does Grace want most from Async Rust?

Grace is most interested in memory safety. She is comfortable with C and C++ but she's also aware of the maintenance burden that arises from the lack of memory safety.

What expectations does Grace bring from her current environment?

  • Grace expects to be able to get the same performance she used to get from C or C++.
  • Grace is accustomed to various bits of low-level tooling, such as gdb or perf. It's nice if Rust works reasonably well with those tools, but she'd be happy to have access to better alternatives if they were available. She's happy using cargo instead of make, for example.

๐Ÿ™‹โ€โ™€๏ธ Cast of characters

Niklaus: new programmer from an unconventional background

He's always been interested in programming but doesn't have experience with it. He's been working as a tech writer and decided to dip his toe in by opening PRs to improve the documentation for one of the libraries he was playing with. The feedback was positive so he fixed a small bug. He's now considering getting involved in a deeper way.

๐Ÿค” Frequently Asked Questions

What does Niklaus want most from Async Rust?

  • Niklaus values accessibility. He's learning a lot of new things at once and it can be overwhelming.

What expectations does Niklaus bring from his current environment?

  • Niklaus expects a strong and supportive community. The Rust community enabled him to have early success, and he is excited to have it support him and for it to help him grow more.

๐Ÿ™‹โ€โ™€๏ธ Cast of characters

Barbara: the experienced Rust developer

Barbara has been using Rust since the 0.1 release. She remembers some of the crazy syntax in Ye Olde Rust of Yore and secretly still misses the alt keyword (don't tell anyone). Lately she's maintaining various projects in the async space.

๐Ÿค” Frequently Asked Questions

What does Barbara want most from Async Rust?

  • She is using Rust for its feeling of productivity, and she expects Async Rust to continue in that tradition.
  • She maintains several existing projects, so stability is important to her.

What expectations does Barbara bring from her current environment?

  • She wants a design that feels like the rest of Rust.
  • She loves Rust and she expects Async Rust to share its overall values.

โšก Projects

What is this?

This section describes various sample projects that are referenced in our stories. Each project is meant to represent some domain that we are targeting.

List of projects

See the sidebar for the full list.

Don't find a project like yours here?

Don't despair! This is just a list of fun projects that we've needed for stories. If you'd like to add a project for your story, feel free to do so! Note though that you may find that some existing project has the same basic characteristics as your project, in which case it's probably better to reuse the existing project.

โšก Projects: NAME (DOMAIN)

This is a template for adding new projects. See the instructions for more details on how to add new project!

What is this?

This is a sample project for use within the various "status quo" or "shiny future" stories.

Description

Give a fun description of the project here! Include whatever details are needed.

๐Ÿค” Frequently Asked Questions

What makes this project different from others?

Does this project require a custom tailored runtime?

How much of this project is likely to be built with open source components from crates.io?

What is of most concern to this project?

What is of least concern to this project?

โšก Projects: MonsterMesh (embedded sensors)

What is this?

This is a sample project for use within the various "status quo" or "shiny future" stories.

Description

"MonsterMesh" is a sensor mesh on microcontrollers using Rust. The nodes communicate wirelessly to relay their results. These sensors are built using very constrained and low power hardware without operating system, so the code is written in a #[no_std] environment and is very careful about available resources.

๐Ÿค” Frequently Asked Questions

What makes embedded projects like MonsterMesh different from others?

  • Embedded developers need to write error-free applications outside of the comfort zone of an operating system. Rust helps to prevent many classes of programming errors at compile time which inspires confidence in the software quality and and cuts time intensive build-flash-test iterations.
  • Embedded developers needs good hardware abstraction. Frameworks in other languages do not provide the sophisticated memory mapped IO to safe type abstraction tooling which have been created by the Rust teams.
  • Embedded developers care about hard real time capabilities; the concept of "you only pay for what you use" is very important in embedded applications. The combination of the inherently asynchronous interrupt handling of microcontrollers with the Rust async building blocks are a perfect match to effortlessly create applications with hard realtime capabilities.
  • Embedded developers are particularly appreciative of strong tooling support. The availability of the full environment via rustup and the integration of the full toolchain with cargo and build.rs make her very happy because she can focus on what she does best instead of having regular fights with the environment.

Does MonsterMesh require a custom tailored runtime?

Yes! The tradeoffs for an embedded application like MonsterMesh and a typical server are very different. Further, most server-grade frameworks are not #[no_std] compatible and far exceeded the available footprint on the sensor nodes.

How much of this project is likely to be built with open source components from crates.io?

Having no operating system to provide abstractions to it, MonsterMesh will contain all the logic it needs to run. Much of this, especially around the hardware-software-interface is unlikely to be unique to MonsterMesh and will be sourced from crates.io. However, the further up the stack one goes, the more specialized the requirements will become.

How did you pick the name?

So glad you asked! Please watch this entertaining video.

โšก Projects: DistriData (Generic Infrastructure)

What is this?

This is a sample project for use within the various "status quo" or "shiny future" stories.

Description

DistriData is the latest in containerized, micro-service distributed database technology. Developed completely in the open as part of Cloud Native Computing Foundation, this utility is now deployed in a large portion of networked server applications across the entire industry. Since it's so widely used, DistriData has to balance flexibility with having sensible defaults.

๐Ÿค” Frequently Asked Questions

What makes DistriData different from others?

  • This project is meant to be used in many different ways in many different projects, and is not unique to any one application.
  • Many of those using this project will not even need or want to know that it's written in Rust.

Does DistriData require a custom tailored runtime?

DistriData's concerns are at a higher level than the runtime. A fast, reliable, and resource conscious general purpose runtime will serve DistriData's needs.

How much of this project is likely to be built with open source components from crates.io?

Yes, while DistriData receives many contributions, it's important to the team that when possible they utilize existing technologies that developers are already familiar with to ensure that contributing to the project is easy.

What is of most concern to this project?

It needs to be resource conscious, fast, reliable, but above all else it needs to be easy to run, monitor, and maintain.

What is of least concern to this project?

While DistriData is resource conscious, it's not resource starved. There's no need to make life difficult to save on a memory allocation here or there.

โšก Projects: TrafficMonitor (Custom Infrastructure)

What is this?

This is a sample project for use within the various "status quo" or "shiny future" stories.

Description

TrafficMonitor is a utility written by AmoogleSoft, a public cloud provider, for monitoring network traffic as it comes into its data centers to prevent things like distributed denial-of-service attacks. It monitors all network traffic, looking for patterns, and deciding when to take action against certain threat vectors. TrafficMonitor runs across almost all server racks in a data center, and while it does run on top of an operating system, it is resource conscious. It's also extremely important that TrafficMonitor stay running and handle network traffic with as few "hiccups" as possible. TrafficMonitor is highly tuned to the needs of AmoogleSoft's cloud offering and won't run anywhere else.

๐Ÿค” Frequently Asked Questions

What makes networking infrastructure projects like TrafficMonitor different from others?

  • Networking infrastructure powers entire datacenters or even public internet infrastructure, and as such it is imperative that it run without failure.
  • It is also extremely important that such projects take few resources as possible. Being on an operating system and large server racks may mean that using the standard library is possible, but memory and CPU usage should be kept to a minimum.
  • This project is worked on by software developers with different backgrounds. Some are networking infrastructure experts (usually using C) while others have experience in networked applications (usually using GCed languages like Java, Go, or Node).

Does TrafficMonitor require a custom tailored runtime?

Maybe? TrafficMonitor runs on top of a full operating system and takes full advantage of that operating systems networking stack. It's possible that a runtime meant for server workloads will work with TrafficMonitor.

How much of this project is likely to be built with open source components from crates.io?

  • TrafficMonitor is highly specialized to the internal workings of AmoogleSoft's public cloud offering. Thus, "off-the-shelf" solutions will only work if they're highly flexible and highly tuneable.
  • TrafficMonitor is central to AmoogleSoft's success meaning that getting things "just right" is much more important than having something from crates.io that mostly works but requires little custom tuning.

What is of most concern to this project?

  • Reliability is the number one concern. This infrastructure is at the core of the business - it needs to work extremely reliable. A close second is being easily monitorible. If something goes wrong, AmoogleSoft needs to know very quickly what the issue is.
  • AmoggleSoft is a large company with many existing custom tooling for building, monitoring, and deploying its software. TrafficMonitor has to play nicely in a world that existed long before it came around.

What is of least concern to this project?

AmoogleSoft is a large company with time and resources. High-level frameworks that remove control in favor of peak developer productivity is not what they're after. Sure, the easier things are to get working, the better, but that should not be at the sacrifice of control.

โšก Projects: YouBuy (Traditional Server Application)

What is this?

This is a sample project for use within the various "status quo" or "shiny future" stories.

Description

YouBuy is a growing e-commerce website that now has millions of users. The team behind YouBuy is struggling to keep up with traffic and keep server costs low. Having originally written YouBuy in a mix of Ruby on Rails and Node, the YouBuy team decides to rewrite many parts of their service in Rust which they've investigated and found to be performant while still allowing for high levels of abstraction they're used to.

๐Ÿค” Frequently Asked Questions

What makes YouBuy and other server applications different from others?

  • Many server applications are written in languages with garbage collectors. Many of the things that Rust forces users to care about are not first order concerns for those working on server applications (e.g., memory management, stack vs heap allocations, etc.).
  • Many server applications are written in languages without static type checking. The developers of YouBuy don't have much experience with statically typed languages and some of the developers early in their Rust learning journeys expressed frustration that they found it hard to get their programs to compile especially when using async constructs.

Does YouBuy require a custom tailored runtime?

YouBuy should be perfectly fine with a runtime from crates.io. In fact, their concern isn't at the runtime level but at the high-level server framework level.

How much of this project is likely to be built with open source components from crates.io?

YouBuy is in fierce competition with many other e-commerce sites. Therefore, the less that YouBuy engineers have to write themselves, the better. Ideally, YouBuy can focus 100% of its energy on features that differentiate it from its competition and none of its time on tweaking its networking stack.

What is of most concern to this project?

It seems like YouBuy is always on the verge of either becoming the next billion-dollar company with hundreds of millions of users or completely going out of business. YouBuy needs to be able to move fast and focus on the application business logic.

What is of least concern to this project?

Since moving fast is of primary concern, the ins and outs of the underlying networking stack are only of concern when something goes wrong. The hope is that that rarely if ever happens and when it does, it's easy to find the source of the issue.

โšก Projects: SLOW (Protocol implementation)

What is this?

This is a sample project for use within the various "status quo" or "shiny future" stories.

Description

SLOW is an open source implementation of a fancy new protocol. This protocol uses a mix of TCP and UDP packets and is designed to operate particularly well over high latency, low throughput links.

๐Ÿค” Frequently Asked Questions

What makes this project different from others?

SLOW is a library, not an application.

Does this project require a custom tailored runtime?

Ideally, SLOW would be developed in an independent way that permits it to be used across many runtimes in a variety of different environments.

How much of this project is likely to be built with open source components from crates.io?

SLOW builds on other generic libraries available from crates.io. For example, it would like to make use of compression algorithms that others have written, or to use future adapters.

What is of most concern to this project?

Uh, I don't really know! If you develop software like this, maybe open a PR and tell me! --nikomatsakis

What is of least concern to this project?

Uh, I don't really know! If you develop software like this, maybe open a PR and tell me! --nikomatsakis

Why is this called SLOW?

It's like QUIC, but slow! Get it? Get it? :D

๐Ÿ˜ฑ Status quo stories

๐Ÿšง Under construction! Help needed! ๐Ÿšง

We are still in the process of drafting the vision document. The stories you see on this page are examples meant to give a feeling for how a status quo story looks; you can expect them to change. We encourage you to propose your own by opening a PR -- see the "How to vision" page for instructions and details.

What is this

The "status quo" stories document the experience of using Async Rust today. Each story narrates the challenges encountered by one of our characters as they try (and typically fail in dramatic fashion) to achieve their goals.

Writing the "status quo" stories helps us to compensate for the curse of knowledge: the folks working on Async Rust tend to be experts in Async Rust. We've gotten used to the workarounds required to be productive, and we know the little tips and tricks that can get you out of a jam. The stories help us gauge the cumulative impact all the paper cuts can have on someone still learning their way around. This gives us the data we need to prioritize.

Based on a true story

These stories may not be true, but they are not fiction. They are based on real-life experiences of actual people. Each story contains a "Frequently Asked Questions" section referencing sources used to create the story. In some cases, it may link to notes or summaries in the conversations section, though that is not required. The "Frequently Asked Questions" section also contains a summary of what the "morals" of the story are (i.e., what are the key takeaways), along with answers to questions that people have raised along the way.

The stories provide data we use to prioritize, not a prioritization itself

Just because a user story is represented here doesn't mean we're going to be able to fix it right now. Some of these user stories will indicate more severe problems than others. As we consider the stories, we'll select some subset to try and address; that choice is reflected in the roadmap.

Metanarrative

What follows is a kind of "metanarrative" of using async Rust that summarizes the challenges that are present today. At each point, we link to the various stories; you can read the full set in the table of contents on the left. We would like to extend this to also cover some of its glories, since reading the current stories is a litany of difficulties, but obviouly we see great promise in async Rust. Note that many stories here appear more than once.

Rust strives to be a language that brings together performance, productivity, and correctness. Rust programs are designed to surface bugs early and to make common patterns both ergonomic and efficient, leading to a sense that "if it compiles, it generally works, and works efficiently". Async Rust aims to extend that same feeling to an async setting, in which a single process interweaves numerous tasks that execute concurrently. Sometimes this works beautifully. However, other times, the reality falls short of that goal.

Making hard choices from a complex ecosystem from the start

The problems begin from the very first moment a user starts to try out async Rust. The async Rust support in Rust itself is very basic, consisting only of the core Future mechanism. Everything else -- including the basic async runtimes themselves -- lives in user space. This means that users must make a number of choices rom the very beginning:

Once your basic setup is done, the best design patterns are subtle and not always known.

Writing async programs turns out to have all kinds of subtle tradeoffs. Rust aims to be a language that gives its users control, but that also means that users wind up having to make a lot of choices, and we don't give them much guidance.

Even once you've chosen a pattern, gettings things to compile can be a challenge.
Once you get it to compile, things don't "just work" at runtime, or they may be unexpectedly slow.
When you have those problems, you can't readily debug them or get visibility into what is going on.
Rust has always aimed to interoperate well with other languages and to fit itself into every niche, but that's harder with async.

๐Ÿ˜ฑ Status quo stories: Template

This is a template for adding new "status quo" stories. To propose a new status quo PR, do the following:

  • Create a new file in the status_quo directory named something like Alan_tries_to_foo.md or Grace_does_bar.md, and start from the raw source from this template. You can replace all the italicized stuff. :)
  • Do not add a link to your story to the SUMMARY.md file; we'll do it after merging, otherwise there will be too many conflicts.

For more detailed instructions, see the How To Vision: Status Quo page!

If you're looking for ideas of what to write about, take a look at the open issues. You can also open an issue of your own to throw out an idea for others.

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Write your story here! Feel free to add subsections, citations, links, code examples, whatever you think is best.

๐Ÿค” Frequently Asked Questions

Here are some standard FAQ to get you started. Feel free to add more!

What are the morals of the story?

Talk about the major takeaways-- what do you see as the biggest problems.

What are the sources for this story?

Talk about what the story is based on, ideally with links to blog posts, tweets, or other evidence.

Why did you choose NAME to tell this story?

Talk about the character you used for the story and why.

How would this story have played out differently for the other characters?

In some cases, there are problems that only occur for people from specific backgrounds, or which play out differently. This question can be used to highlight that.

๐Ÿ˜ฑ Status quo stories: Alan tries to cache requests, which doesn't always happen.

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories [cannot be wrong], only inaccurate). Alternatively, you may wish to [add your own status quo story][htvsq]!

The story

Alan is working on an HTTP server. The server makes calls to some other service. The performance of the downstream service is somewhat poor, so Alan would like to implement some basic caching.

Alan writes up some code which does the caching:


#![allow(unused)]
fn main() {
async fn get_response(&mut self, key: String) {
    // Try to get the response from cache
    if let Some(cached_response) = self.cache.get(key) {
        self.channel.send(cached_response).await;
        return;
    }

    // Get the response from the downstream service
    let response = self.http_client.make_request(key).await;
    self.channel.send(response).await;
    
    // Store the response in the cache
    self.cache.set(key, response);
}
}

Alan is happy with how things are working, but notices every once in a while the downstream service hangs. To prevent that, Alan implements a timeout.

He remembers from the documentation for his favorite runtime that there is the race function which can kick off two futures and polls both until one completes (similar to tokio's select and async-std's race for example).


#![allow(unused)]
fn main() {
runtime::race(timeout(), get_response(key)).await
}

The bug

Alan ships to production but after several weeks he notices some users complaining that they receive old data.

Alan looks for help. The compiler unfortunately doesn't provide any hints. He turns to his second best friend clippy, who cannot help either. Alan tries debugging. He uses his old friend println!. After hours of working through, he notices that sometimes the line that sets the response in the cache never gets called.

The solution

Alan goes to [Barbara][] and asks why in the world that might be โ‰๏ธ

๐Ÿ’ก Barbara looks through the code and notices that there is an await point between sending the response over the channel and setting the cache.

Since the get_response future can be dropped at each available await point, it may be dropped after the http request has been made, but before the response has successfully been sent over the channel, thus not executing the remaining instructions in the function.

This means the cache might not be set.

Alan fixes it by setting the cache before sending the result over the channel. ๐ŸŽ‰


#![allow(unused)]
fn main() {
async fn get_response(&mut self, key: String) {
    // ... cache miss happened here

    // We perform the HTTP request and our code might continue
    // after this .await once the HTTP request is complete
    let response = self.http_client.make_request(key).await;

    // Immediately store the response in the cache
    self.cache.set(key, response);

    self.channel.send(response).await;
}
}

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Futures can be "canceled" at any await point. Authors of futures must be aware that after an await, the code might not run.
    • This is similar to panic safety but way more likely to happen
  • Futures might be polled to completion causing the code to work. But then many years later, the code is changed and the future might conditionally not be polled to completion which breaks things.
  • The burden falls on the user of the future to poll to completion, and there is no way for the lib author to enforce this - they can only document this invariant.
  • Diagnosing and ultimately fixing this issue requires a fairly deep understanding of the semantics of futures.
  • Without a Barbara, it might be hard to even know where to start: No lints are available, Alan is left with a normal debugger and println!.

What are the sources for this story?

The relevant sources of discussion for this story have been gathered in this github issue.

Why did you choose Alan to tell this story?

Alan has enough experience and understanding of push based async languages to make the assumptions that will trigger the bug.

How would this story have played out differently for the other characters?

This story would likely have played out the same for almost everyone but Barbara, who has probably been bitten by that already. The debugging and fixing time would however probably have varied depending on experience and luck.

๐Ÿ˜ฑ Status quo stories: Alan finds dropping database handles is hard.

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The problem

Alan has been adding an extension to YouBuy that launches a singleton actor which interacts with a Sqlite database using the sqlx crate. The Sqlite database only permits a single active connection at a time, but this is not a problem, because the actor is a singleton, and so there only should be one at a time. He consults the documentation for sqlx and comes up with the following code to create a connection and do the query he needs:

use sqlx::Connection;

#[async_std::main]
async fn main() -> Result<(), sqlx::Error> {
    // Create a connection

    let conn = SqliteConnection::connect("sqlite::memory:").await?;

    // Make a simple query to return the given parameter
    let row: (i64,) = sqlx::query_as("SELECT $1")
        .bind(150_i64)
        .fetch_one(&conn).await?;

    assert_eq!(row.0, 150);

    Ok(())
}

Things seem to be working fairly well but sometimes when he refreshes the page he encounters a panic with the message "Cannot open a new connection: connection is already open". He is flummoxed.

Searching for the Solution

Alan tries to figure out what happened from the logs, but the only information he sees is that a new connection has been received. Alan turns to the documentation for the sqlx crate to see if there are flags that might enable extra instrumentation but he can't find any.

He's a bit confused, because he's accustomed to having things generally be cleaned up automatically when they get dropped (for example, dropping a File will close it). Searching the docs, he sees the close method, but the comments confirm that he shouldn't have to call it explicitly: "This method is not required for safe and consistent operation. However, it is recommended to call it instead of letting a connection drop as the database backend will be faster at cleaning up resources." Still, just in case, he decides to add a call to close into his code. It does seem to help some, but he is still able to reproduce the problem if he refreshes often enough. Feeling confused, he adds a log statement right before calling close to see if it is working:


#![allow(unused)]
fn main() {
use sqlx::Connection;

#[async_std::main]
async fn do_the_thing() -> Result<(), sqlx::Error> {
    // Create a connection
    let conn = SqliteConnection::connect("sqlite::memory:").await?;

    // Make a simple query to return the given parameter
    let row: (i64,) = sqlx::query_as("SELECT $1")
        .bind(150_i64)
        .fetch_one(&conn).await?; // <----- if this await is cancelled, doesn't help

    assert_eq!(row.0, 150);
    
    // he adds this:
    log!("closing the connection");
    conn.close();

    Ok(())
}
}

He observes that in the cases where he has the problem the log statement never executes. He asks Barbara for help and she points him to this gist that explains how await can be canceled, and cancellation will invoke the destructors for things that are in scope. He reads the source for the SqliteConnection destructor and finds that destructor spawns a task to actually close the connection.

He realizes there is a race condition and the task may not have actually closed the connection before do_the_thing is called a second time. At this point, he is feeling pretty frustrated!

Next, Alan seeks verification and validation of his understanding of the source code from the sqlx forum. Someone on the forum explains why the destructor launches a fresh task: Rust doesn't have a way to execute async operations in a destructor.

Finding the Solution

Alan briefly considers rearchitecting his application in more extreme ways to retain use of async, but he gives up and seeks a more straight forward solution. He discovers rusqlite, a synchronous database library and adopts it. This requires some rearchitecting but solves the problem.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Rust's async story is lacking a way of executing async operations in destructors. Spawning is a workaround, but it can have unexpected side-effects.
  • The story demonstrates solid research steps that Alan uses to understand and resolve his problem.
  • Completion of the Cancellation and timeouts docs may have been helpful. It's difficult to know how something absent might have improved the solution search process.

What are the sources for this story?

This specific story describes an actual bug encountered by Sergey Galich at 1Password.

Why did you choose Alan to tell this story?

His experience and understanding of other languages coupled with his desire to apply Rust would likely lead him to try solutions before deeply researching them.

How would this story have played out differently for the other characters?

This story would likely have played out the same for everyone.

๐Ÿ˜ฑ Status quo stories: Alan has an external event loop and wants to use futures/streams

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

As a first Rust Project, Alan decides to program his own IRC Client.

Since it is Alan's first Project in Rust, it is going to be a private one. He is going to use it on is Mac, so he decides to go with the cocoa crate to not have to learn any Framework specific quirks. This way Alan can get a feel of Rust itself.

Alans hopes and dreams

Despite a learning curve, he managed to creating a first window and have some buttons and menus works. After the initialisation is done, the App hand over control to CFRunLoop::Run.

Once Alan is happy with his Mock UI, he wants to make it actually do something. Reading about async Rust, he sees that several of the concepts there map pretty well to some core Cocoa concepts:

  • Promises => Futures
  • Observables => Streams.

Alan smiles, thinking he knows what and more importantly how to do this.

First time dealing with runtimes

Unfortunately, coming from frameworks like Angular or Node.js, Alan is not used to being responsible for driving the processing of Futures/Streams.

After reading up about Runtimes, his mental image of a runtime is something like:


#![allow(unused)]
fn main() {
impl Runtime {
    fn run() {
        while !self.tasks.is_empty() {
            while let Some(task) = self.awoken_tasks.pop() {
                task.poll();
                //... remove finished task from 'tasks'
            }
        }
    }
}
}

Coming from Single-Threaded Angular development, Alan decides to limit his new App to Single-Threaded. He does not feel like learning about Send/Sync/Mutex as well as struggling with the borrow checker.

On top of that, his App is not doing any heavy calculation so he feels async should be enough to not block the main thread too bad and have a hanging UI.

Fun time is over

Soon Alan realises that he cannot use any of those runtimes because they all take control of the thread and block. The same as the OS Event loop.

Alan spends quite some time to look through several runtime implementations. Ignoring most internal things, all he wants is a runtime that looks a bit like this:


#![allow(unused)]
fn main() {
impl Runtime {
    fn make_progress() {
        while let Some(task) = self.awoken_tasks.pop() {
            task.poll();
            //... remove finished task from 'tasks'
        }
    }
    fn run() {
        while !self.tasks.is_empty() {
            self.make_progress();
        }
    }
}
}

It could be so easy. Unfortunately he does not find any such solution. Having already looked through quite a bit of low level documentation and runtime code, Alan thinks about implementing his own runtime...

...but only for a very short time. Soon after looking into it, he finds out that he has to deal with RawWakerVTable, RawWaker, Pointers. Worst of all, he has to do that without the safety net of the rust compiler, because this stuff is unsafe.

Reimplementing the OS Event Loop is also not an option he wants to take. See here >Override run() if you want the app to manage the main event loop differently than it does by default. (This a critical and complex task, however, that you should only attempt with good reason).

The cheap way out

Alan gives up and uses a runtime in a seperate thread from the UI. This means he has to deal with the additional burden of syncing and he has to give up the frictionless use of some of the patterns he is accustomed to by treating UI events as Stream<Item = UIEvent>.

๐Ÿค” Frequently Asked Questions

  • What are the morals of the story?
    • Even though you come from a language that has async support, does not mean you are used to selecting und driving a runtime.
    • It should be possible to integrate runtimes into existing Event loops.
  • What are the sources for this story?
  • Why did you choose Alan to tell this story?
    • The story deals about UI event loops, but the other characters could run into similar issues when trying to combine event loops from different systems/frameworks.
  • Is this Apple specific?
    • No! You have the same issue with other OSs/Frameworks that don't already support Rust Async.
  • How would this story have played out differently for the other characters?
    • Since this is a technical and not a skill or experience issue, this would play out similar for other Characters. Although someone with deep knowledge of those Event loops, like Grace, might be more willing to re-implement them.

๐Ÿ˜ฑ Status quo stories: Alan hates writing a Stream

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Alan is used to writing web server applications using async sockets, but wants to try Rust to get that signature vroom vroom.

After a couple weeks learning Rust basics, Alan quickly understands async and await, and therefore has several routes built for his application that await a few things and then construct an HTTP response and send a buffered body. To build the buffered response bodies, Alan was reading a file, and then appending a signature, and putting that all into a single buffer of bytes.

Eventually, Alan realizes that some responses have enormous bodies, and would like to stream them instead of buffering them fully in memory. He's used the Stream trait before. Using it was very natural, and followed a similar pattern to regular async/await:


#![allow(unused)]
fn main() {
while let Some(chunk) = body.next().await? {
    file.write_all(&chunk).await?;
}
}

However, implementing Stream turns out to be rather different. With a quick search, he learned the simple way to turn a File into a Stream with ReaderStream, but the signing part was much harder.

Imperatively Wrong

Alan first hoped he could simply write signing stream imperatively, reusing his new knowledge of async and await, and assuming it'd be similar to JavaScript:


#![allow(unused)]
fn main() {
async* fn sign(file: ReaderStream) -> Result<Vec<u8>, Error> {
    let mut sig = Signature::new();

    while let Some(chunk) = file.next().await? {
        sig.push(&chunk);
        yield Ok(chunk)
    }

    yield Ok(sig.digest().await)
}
}

Unfortunately, that doesn't work. The compiler first complains about the async* fn syntax:

error: expected item, found keyword `async`
  --> src/lib.rs:21:1
   |
21 | async* fn sign(file: ReaderStream) -> Result<Vec<u8>, Error> {
   | ^^^^^ expected item

Less hopeful, Alan tries just deleting the asterisk:

error[E0658]: yield syntax is experimental
  --> src/lib.rs:27:9
   |
27 |         yield Ok(chunk)
   |         ^^^^^^^^^^^^^^^
   |
   = note: see issue #43122 <https://github.com/rust-lang/rust/issues/43122> for more information

After reading about how yield is experimental, and giving up reading the 100+ comments in the linked issue, Alan figures he's just got to implement Stream manually.

Implementing Stream

Implementing a Stream means writing async code in a way that doesn't feel like the async fn that Alan has written so far. He needs to write a poll function and it has a lot of unfamiliar concepts:

  • Pin
  • State machines
  • Wakers

Unsure of what the final code will look like, he starts with:


#![allow(unused)]
fn main() {
struct SigningFile;

impl Stream for SigningFile {
    type Item = Result<Vec<u8>, Error>;
    
    fn poll_next(self: Pin<&mut Self>, cx: &mut Context)
        -> Poll<Self::Item>
    {
 
    }
}
}

Pin :scream:

First, he notices Pin. Alan wonders, "Why does self have bounds? I've only ever seen self, &self, and &mut self before". Curious, he reads the std::pin page, and a bunch of jargon about pinning data in memory. He also reads that this is useful to guarantee that an object cannot move, and he wonders why he cares about that. The only example on the page explains how to write a "self-referential struct", but notices it needs unsafe code, and that triggers an internal alarm in Alan: "I thought Rust was safe..."

After asking Barbara, Alan realizes that the types he's depending on are Unpin, and so he doesn't need to worry about the unsafe stuff. It's just a more-annoying pointer type.

State Machine

With Pin hopefully ignored, Alan next notices that in the imperative style he wanted originally, he didn't need to explicitly keep track of state. The state was simply the imperative order of the function. But in a poll function, the state isn't saved by the compiler. Alan finds blog posts about the dark ages of Futures 0.1, when it was more common for manual Futures to be written with a "state machine".

He thinks about his stream's states, and settles on the following structure:


#![allow(unused)]
fn main() {
struct SigningFile {
    state: State,
    file: ReaderStream,
    sig: Signature,
}

enum State {
    File,
    Sign,
}
}

It turns out it was more complicated than Alan thought (the author made this same mistake). The digest method of Signature is async, and it consumes the signature, so the state machine needs to be adjusted. The signature needs to be able to be moved out, and it needs to be able to store a future from an async fn. Trying to figure out how to represent that in the type system was difficult. He considered adding a generic T: Future to the State enum, but then wasn't sure what to set that generic to. Then, he tries just writing Signing(impl Future) as a state variant, but that triggers a compiler error that impl Trait isn't allowed outside of function return types. Patient Barbara helped again, so that Alan learns to just store a Pin<Box<dyn Future>>, wondering if the Pin there is important.


#![allow(unused)]
fn main() {
struct SigningFile {
    state: State,
}

enum State {
    File(ReaderStream, Signature),
    Signing(Pin<Box<dyn Future<Output = Vec<u8>>>>),
    Done,
}
}

Now he tries to write the poll_next method, checking readiness of individual steps (thankfully, Alan remembers ready! from the futures 0.1 blog posts he read) and proceeding to the next state, while grumbling away the weird Pin noise:


#![allow(unused)]
fn main() {
match self.state {
    State::File(ref mut file, ref mut sig) => {
        match ready!(Pin::new(file).poll_next(cx)) {
            Some(result) => {
                let chunk = result?;
                sig.push(&chunk);
                Poll::Ready(Some(Ok(chunk)))
            },
            None => {
                let sig = match std::mem::replace(&mut self.state, State::Done) {
                    State::File(_, sig) => sig,
                    _ => unreachable!(),
                };
                self.state = State::Signing(Box::pin(sig.digest()));
                Poll::Pending
            }
        }
    },
    State::Signing(ref mut sig) => {
        let last_chunk = ready!(sig.as_mut().poll(cx));
        self.state = State::Done;
        Poll::Ready(Some(Ok(last_chunk)))
    }
    State::Done => Poll::Ready(None),
}
}

Oh well, at least it works, right?

Wakers

So far, Alan hasn't paid too much attention to Context and Poll. It's been fine to simply pass them along untouched. There's a confusing bug in his state machine. Let's look more closely:


#![allow(unused)]
fn main() {
// zooming in!
match ready!(Pin::new(file).poll_next(cx)) {
    Some(result) => {
        let chunk = result?;
        sig.push(&chunk);
        return Poll::Ready(Some(Ok(val));
    },
    None => {
        self.set_state_to_signing();
        // oops!
        return Poll::Pending;
    }
}
}

In one of the branches, the state is changed, and Poll::Pending is returned. Alan assumes that the task will be polled again with the new state. But, since the file was done (and has returned Poll::Ready), there was actually no waker registered to wake the task again. So his stream just hangs forever.

The compiler doesn't help at all, and he re-reads his code multiple times, but because of this easy-to-misunderstand logic error, Alan eventually has to ask for help in a chat room. After a half hour of explaining all sorts of details, a kind person points out he either needs to register a waker, or perhaps use a loop.

All too often, since we don't want to duplicate code in multiple branches, the solution for Alan is to add an odd loop around the whole thing, so that the next match branch uses the Context:


#![allow(unused)]
fn main() {
loop {
    match self.state {
        State::File(ref mut file, ref mut sig) => {
            match ready!(Pin::new(file).poll_next(cx)) {
                Some(result) => {
                    let chunk = result?;
                    sig.push(&chunk);
                    return Poll::Ready(Some(Ok(chunk)))
                },
                None => {
                    let sig = match std::mem::replace(&mut self.state, State::Done) {
                        State::File(_, sig) => sig,
                        _ => unreachable!(),
                    };
                    self.state = State::Signing(Box::pin(sig.digest()));
                    // loop again, to catch the `State::Signing` branch
                }
            }
        },
        State::Signing(ref mut sig) => {
            let last_chunk = ready!(sig.as_mut().poll(cx));
            self.state = State::Done;
            return Poll::Ready(Some(Ok(last_chunk)))
        }
        State::Done => return Poll::Ready(None),
    }
}
}

Gives Up

A little later, Alan needs to add some response body transforming to some routes, to add some app-specific framing. Upon realizing he needs to implement another Stream in a generic fashion, he instead closes the editor and complains on Twitter.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Writing an async Stream is drastically different than writing an async fn.
  • The documentation for Pin doesn't provide much practical guidance in how to use it, instead focusing on more abstract considerations.
  • Missing a waker registration is a runtime error, and very hard to debug. If it's even possible, a compiler warning or hint would go a long way.

What are the sources for this story?

Part of this story is based on the original motivation for async/await in Rust, since similar problems exist writing impl Future.

Why did you choose Alan to tell this story?

Choosing Alan was somewhat arbitrary, but this does get to reuse the experience that Alan may already have around await coming from JavaScript.

How would this story have played out differently for the other characters?

  • This likely would have been a similar story for any character.
  • It's possible Grace would be more used to writing state machines, coming from C.

๐Ÿ˜ฑ Status quo stories: Alan iteratively regresses performance

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

A core part of DistriData, called DDSplit, is in charge of splitting input data records into fragments that are stored on distinct servers, and then reassembling those fragments back into records in response to user queries.

DDSplit was originally implemented using Java code (plus some C, interfaced via JNI). Alan thinks that Rust could provide the same quality of service while requiring less memory. He decides to try reimplementing DDSplit in Rust, atop tokio.

Alan wants to copy some of the abstractions he sees in the Java code that are defined via Java interfaces. Alan sees Rust traits as the closest thing to Java interfaces. However, when he experimentally defines a trait with an async fn, he gets the following message from the compiler:

error[E0706]: functions in traits cannot be declared `async`
 --> src/main.rs:2:5
  |
2 |     async fn method() { }
  |     -----^^^^^^^^^^^^^^^^
  |     |
  |     `async` because of this
  |
  = note: `async` trait functions are not currently supported
  = note: consider using the `async-trait` crate: https://crates.io/crates/async-trait

This diagnostic leads Alan to add the async-trait crate as a dependency to his project. Alan then uses the #[async_trait] attribute provided by that crate to be able to define async fn methods within traits.

When Alan finishes the prototype code, he finds the prototype performance has 20% slower throughput compared to the Java version.

Alan is disappointed; his experience has been that Rust code performs great, (at least once you managed to get the code to be accepted by the compiler). Alan was not expecting to suffer a 20% performance hit over the Java code.

The DDSplit service is being developed on a Linux machine, so Alan is able use the perf tool to gather sampling-based profiling data the async/await port of DDSplit.

Looking at a flamegraph for the call stacks, Alan identified two sources of execution time overhead that he did not expect: calls into the memory allocator (malloc) with about 1% of the execution time, and calls to move values in memory (memcpy), with about 8% of execution time.

Alan reaches out to Barbara, as the local Rust expert, for help on how identify where the performance pitfalls are coming from.

Alan asks Barbara whether the problem could be caused by the tokio executor. Barbara says it is hard to know that without more instrumentation. She explains it could be that the program is overloading tokio's task scheduler (for example), but it also could be that the application code itself has expensive operations, such as lots of small I/O operations rather than using a buffer.

Alan and Barbara look at the perf data. They find the output of perf report difficult to navigate and interpret. The data has stack trace fragments available, which gives them a few hints to follow up on. But when they try to make perf report annotate the original source, perf only shows disassembled machine code, not the original Rust source code. Alan and Barbara both agree that trying to dissect the problem from the machine code is not an attractive strategy.

Alan asks Barbara what she thinks about the malloc calls in the profile. Barbara recommends that Alan try to eliminate the allocation calls, and if they cannot be eliminated, then that Alan try tuning the parameters for the global memory allocator, or even switching which global memory allocator he is using. Alan looks at Barbara in despair: his time tweaking GC settings on the Java Virtual Machine taught him that allocator tuning is often a black art.

Barbara suggests that they investigate where the calls to memcpy are arising, since they look like a larger source of overhead based on the profile data. From the call stacks in perf report, Alan and Barbara decide to skim over the source code files for the corresponding functions.

Upon seeing #[async_trait] in Alan's source code, Barbara recommends that if performance is a concern, then Alan should avoid #[async_trait]. She explains that #[async_trait] transforms a trait's async methods into methods that return Pin<Box<dyn Future>>, and the overhead that injects that will be hard to diagnose and impossible to remove. When Alan asks what other options he could adopt, Barbara thinks for a moment, and says he could make an enum that carries all the different implementations of the code. Alan says he'll consider it, but in the meantime he wants to see how far they can improve the code while keeping #[async_trait].

They continue looking at the code itself, essentially guessing at potential sources of where problematic memcpy's may be arising. They identify two potential sources of moves of large datatypes in the code: pushes and pops on vectors of type Vec<DistriQuery>, and functions with return types of the form Result<SuccessCode, DistriErr>.

Barbara asks how large the DistriQuery, SuccessCode, and DistriErr types are. Alan immediately notes that DistriQuery may be large, and they discuss options for avoiding the memory traffic incurred by pushing and popping DistriQuery.

For the other two types, Alan responds that the SuccessCode is small, and that the error variants are never constructed in his benchmark code. Barbara explains that the size of Result<T, E> has to be large enough to hold either variant, and that memcpy'ing a result is going to move all of those bytes. Alan investigates and sees that DistriErr has variants that embed byte arrays that go up to 50kb in size. Barbara recommends that Alan look into boxing the variants, or the whole DistriErr type itself, in order to reduce the cost of moving it around.

Alan uses Barbara's feedback to box some of the data, and this cuts the memcpy traffic in the perf report to one quarter of what it had been reporting previously.

However, there remains a significant performance delta between the Java version and the Rust version. Alan is not sure his Rust-rewrite attempt is going to get anywhere beyond the prototype stage.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  1. Rust promises great performance, but when performance is not meeting one's targets, it is hard to know what to do next. Rust mostly leans on leveraging existing tools for native code development, but those tools are (a.) foreign to many of our developers, (b.) do not always measure up to what our developers have access to elsewhere, (c.) do not integrate as well with Rust as they might with C or C++.

  2. Lack of certain language features leads developers to use constructs like #[async_trait] which add performance overhead that is (a.) hard to understand and (b.) may be significant.

  3. Rust makes some things very explicit, e.g. the distinction between Box<T> versus T is quite prominent. But Rust's expressive type system also makes it easy to compose types without realizing how large they have gotten.

  4. Programmers do not always have a good mental model for where expensive moves are coming from.

  5. An important specific instance of (1c.) for the async vision: Native code tools do not have any insight into Rust's async model, as that is even more distant from the execution model of C and C++.

  6. We can actually generalize (5.) further: When async performance does not match expectations, developers do not have much insight into whether the performance pitfalls arise from issues deep in the async executor that they have selected, or if the problems come directly from overheads built into the code they themselves have written.

What are the sources for this story?

Discussions with engineers at Amazon Web Services.

Why did you choose Alan to tell this story?

I chose Alan because he is used to Java, where these issues play out differently.

Java has very mature tooling, including for performance investigations. Alan has used JProfiler at his work, and VisualVM for personal hobby projects. Alan is frustrated by his attempts to use (or even identify) equivalent tools for Rust.

With respect to memory traffic: In Java, every object is handled via a reference, and those references are cheap to copy. (One pays for that convenience in other ways, of course.)

How would this story have played out differently for the other characters?

From her C and C++ background, Grace probably would avoid letting her types get so large. But then again, C and C++ do not have enums with a payload, so Grace would likely have fallen in the same trap that Alan did (of assuming that the cost of moving an enum value is proportional to its current variant, rather than to its type's overall size). Also, Grace might report that her experience with gcc-based projects yielded programs that worked better with perf, due in part to gcc producing higher quality DWARF debuginfo.

Barbara probably would have added direct instrumentation via the tracing crate, potentially even to tokio itself, rather than spend much time wrestling with perf.

Niklaus is unlikely to be as concerned about the 20% throughput hit; he probably would have been happy to get code that seems functionally equivalent to the original Java version.

๐Ÿ˜ฑ Status quo stories: Alan lost the world!

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Alan heard about a project to reimplement a deprecated browser plugin using Rust and WASM. This old technology had the ability to load resources over HTTP; so it makes sense to try and implement that functionality using the Fetch API. Alan looks up the documentation of web_sys and realizes they need to...

  1. Call one of the fetch methods, which returns a Promise
  2. Convert the Promise into a Rust thing called a Future
  3. await the Future in an async function
  4. Do whatever they want with the resulting data

#![allow(unused)]
fn main() {
use web_sys::{Request, window};

fn make_request(src: &url) -> Request {
    // Pretend this contains all of the complicated code necessary to
    // initialize a Fetch API request from Rust
}

async fn load_image(src: String) {
    let request = make_request(&url);
    window().unwrap().fetch_with_request(&request).await;
    log::error!("It worked");
}
}

Alan adds calls to load_image where appropriate. They realize that nothing is happening, so they look through more documentation and find a thing called spawn_local. Once they pass the result of load_image into that function, they see their log message pop up in the console, and figure it's time to actually do something to that loaded image data.

At this point, Alan wants to put the downloaded image onto the screen, which in this project means putting it into a Node of the current World. A World is a bundle of global state that's passed around as things are loaded, rendered, and scripts are executed. It looks like this:


#![allow(unused)]

fn main() {
/// All of the player's global state.
pub struct World<'a> {
    /// A list of all display Nodes.
    nodes: &'a mut Vec<Node>,

    /// The last known mouse position.
    mouse_pos &'a mut (u16, u16),

    // ...
}
}

In synchronous code, this was perfectly fine. Alan figures it'll be fine in async code, too. So Alan adds the world as a function parameter and everything else needed to parse an image and add it to our list of nodes:


#![allow(unused)]
fn main() {
async fn load_image(src: String, inside_of: usize, world: &mut World<'_>) {
    let request = make_request(&url);
    let data = window().unwrap().fetch_with_request(&request).await.unwrap().etc.etc.etc;
    let image = parse_png(data, context);

    let new_node_index = world.nodes.len();
    if let Some(parent) = world.nodes.get(inside_of) {
        parent.set_child(new_node_index);
    }
    world.nodes.push(image.into());
}
}

Bang! Suddenly, the project stops compiling, giving errors like...

error[E0597]: `world` does not live long enough
  --> src/motionscript/globals/loader.rs:21:43

Hmm, okay, that's kind of odd. We can pass a World to a regular function just fine - why do we have a problem here? Alan glances over at loader.rs...


#![allow(unused)]
fn main() {
fn attach_image_from_net(world: &mut World<'_>, args: &[Value]) -> Result<Value, Error> {
    let this = args.get(0).coerce_to_object()?;
    let url = args.get(1).coerce_to_string()?;

    spawn_local(load_image(url, this.as_node().ok_or("Not a node!")?, world))
}
}

Hmm, the error is in that last line. spawn_local is a thing Alan had to put into everything that called load_image, otherwise his async code never actually did anything. But why is this a problem? Alan can borrow a World, or anything else for that matter, inside of async code; and it should get it's own lifetime like everything else, right?

Alan has a hunch that this spawn_local thing might be causing a problem, so Alan reads the documentation. The function signature seems particularly suspicious:


#![allow(unused)]
fn main() {
pub fn spawn_local<F>(future: F) 
where
    F: Future<Output = ()> + 'static
}

So, spawn_local only works with futures that return nothing - so far, so good - and are 'static. Uh-oh. What does that last bit mean? Alan asks Barbara, who responds that it's the lifetime of the whole program. Yeah, but... the async function is part of the program, no? Why wouldn't it have the 'static lifetime? Does that mean all functions that borrow values aren't 'static, or just the async ones?

Barbara explains that when you borrow a value in a closure, the closure doesn't gain the lifetime of that borrow. Instead, the borrow comes with it's own lifetime, separate from the closure's. The only time a closure can have a non-'static lifetime is if one or more of its borrows is not provided by it's caller, like so:


#![allow(unused)]
fn main() {
fn benchmark_sort() -> usize {
    let mut num_times_called = 0;
    let test_values = vec![1,3,5,31,2,-13,10,16];

    test_values.sort_by(|a, b| {
        a.cmp(b)
        num_times_called += 1;
    });

    num_times_called
}
}

The closure passed to sort_by has to copy or borrow anything not passed into it. In this case, that would be the num_times_called variable. Since we want to modify the variable, it has to be borrowed. Hence, the closure has the lifetime of that borrow, not the whole program, because it can't be called anytime - only when num_times_called is a valid thing to read or write.

Async functions, it turns out, act like closures that don't take parameters! They have to, because all Futures have to implement the same trait method poll:


#![allow(unused)]
fn main() {
pub trait Future {
    type Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}
}

When you call an async function, all of it's parameters are copied or borrowed into the Future that it returns. Since we need to borrow the World, the Future has the lifetime of &'a mut World, not of 'static.

Barbara suggests changing all of the async function's parameters to be owned types. Alan asks Grace, who architected this project. Grace recommends holding a reference to the Plugin that owns the World, and then borrowing it whenever you need the World. That ultimately looks like the following:


#![allow(unused)]
fn main() {
async fn load_image(src: String, inside_of: usize, player: Arc<Mutex<Player>>) {
    let request = make_request(&url);
    let data = window().unwrap().fetch_with_request(&request).await.unwrap().etc.etc.etc;
    let image = parse_png(data, context);

    player.lock().unwrap().update(|world| {
        let new_node_index = world.nodes.len();
        if let Some(parent) = world.nodes.get(inside_of) {
            parent.set_child(new_node_index);
        }
        world.nodes.push(image.into());
    });
}
}

It works, well enough that Alan is able to finish his changes and PR them into the project. However, Alan wonders if this could be syntactically cleaner, somehow. Right now, async and update code have to be separated - if we need to do something with a World, then await something else, that requires jumping in and out of this update thing. It's a good thing that we only really have to be async in these loaders, but it's also a shame that we practically can't mix async code and Worlds.

๐Ÿค” Frequently Asked Questions

  • What are the morals of the story?
    • Async functions capture all of their parameters for the entire duration of the function. This allows them to hold borrows of those parameters across await points.
      • When the parameter represents any kind of "global environment", such as the World in this story, it may be useful for that parameter not to be captured by the future but rather supplied anew after each await point.
    • Non-'static Futures are of limited use to developers, as lifetimes are tied to the sync stack. The execution time of most asynchronous operations does not come with an associated lifetime that an executor could use.
      • It is possible to use borrowed futures with block_on style executors, as they necessarily extend all lifetimes to the end of the Future. This is because they turn asynchronous operations back into synchronous ones.
      • Most practical executors want to release the current stack, and thus all of it's associated lifetimes. They need 'static futures.
    • Async programming introduces more complexity to Rust than it does, say, JavaScript. The complexity of async is sometimes explained in terms of 'color', where functions of one 'color' can only call those of another under certain conditions, and developers have to keep track of what is sync and what is async. Due to Rust's borrowing rules, we actually have three 'colors', not the two of other languages with async I/O:
      • Sync, or 'blue' in the original metaphor. This color of function can both own and borrow it's parameters. If made into the form of a closure, it may have a lifetime if it borrows something from the current stack.
      • Owned Async, or 'red' in the original metaphor. This color of function can only own parameters, by copying them into itself at call time.
      • Borrowed Async. If an async function borrows at least one parameter, it gains a lifetime, and must fully resolve itself before the lifetime of it's parameters expires.
  • What are the sources for this story?
    • This is personal experience. Specifically, I had to do almost exactly this dance in order to get fetch to work in Ruffle.
    • I have omitted a detail from this story: in Ruffle, we use a GC library (gc_arena) that imposes a special lifetime on all GC references. This is how the GC library upholds it's memory safety invariants, but it's also what forces us to pass around contexts, and once you have that, it's natural to start putting even non-GC data into it. It also means we can't hold anything from the GC in the Future as we cannot derive it's Collect trait on an anonymous type.
  • Why did you choose Alan to tell this story?
    • Lifetimes on closures is already non-obvious to new Rust programmers and using them in the context of Futures is particularly unintuitive.
  • How would this story have played out differently for the other characters?
    • Niklaus probably had a similar struggle as Alan.
    • Grace would have felt constrained by the async syntax preventing some kind of workaround for this problem.
    • Barbara already knew about Futures and 'static and carefully organizes their programs accordingly.

๐Ÿ˜ฑ Status quo stories: Alan needs async in traits

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories [cannot be wrong], only inaccurate). Alternatively, you may wish to [add your own status quo story][htvsq]!

The story

Alan is working on a project with Barbara which has already gotten off to a somewhat rocky start. He is working on abstracting away the HTTP implementation the library uses so that users can provide their own. He wants the user to implement an async trait called HttpClient which has one method perform(request: Request) -> Response. Alan tries to create the async trait:


#![allow(unused)]
fn main() {
trait HttpClient {
    async fn perform(request: Request) -> Response;
}
}

When Alan tries to compile this, he gets an error:

 --> src/lib.rs:2:5
  |
2 |     async fn perform(request: Request) -> Response;
  |     -----^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |     |
  |     `async` because of this
  |
  = note: `async` trait functions are not currently supported
  = note: consider using the `async-trait` crate: https://crates.io/crates/async-trait

Alan, who has been using Rust for a little while now, has learned to follow compiler error messages and adds async-trait to his Cargo.toml. Alan follows the README of async-trait and comes up with the following code:


#![allow(unused)]
fn main() {
#[async_trait]
trait HttpClient {
    async fn perform(request: Request) -> Response;
}
}

Alan's code now compiles, but he also finds that his compile times have gone from under a second to around 6s, at least for a clean build.

After Alan finishes adding the new trait, he shows his work off to Barbara and mentions he's happy with the work but is a little sad that compile times have worsened. Barbara, an experienced Rust developer, knows that using async-trait comes with some additional issues. In this particular case she is especially worried about tying their public API to a third-party dependency. Even though it is technically possible to implement traits annotated with async_trait without using async_trait, doing so in practice is very painful. For example async_trait:

  • handles lifetimes for you if the returned future is tied to the lifetime of some inputs.
  • boxes and pins the futures for you.

which the implementer will have to manually handle if they don't use async_trait. She decides to not worry Alan with this right now. Alan and Barbara are pretty happy with the results and go on to publish their crate which gets lots of users.

Later on, a potential user of the library wants to use their library in a no_std context where they will be providing a custom HTTP stack. Alan and Barbara have done a pretty good job of limiting the use of standard library features and think it might be possible to support this use case. However, they quickly run into a show stopper: async-trait boxes all of the futures returned from a async trait function. They report this to Alan through an issue.

Alan, feeling (over-) confident in his Rust skills, decides to try to see if he can implement async traits without using async-trait.


#![allow(unused)]
fn main() {
trait HttpClient {
   type Response: Future<Output = Response>;

   fn perform(request: Request) -> Self::Response; 
}
}

Alan seems to have something working, but when he goes to update the examples of how to implement this trait in his crate's documentation, he realizes that he either needs to:

  • use trait object:

    
    #![allow(unused)]
    fn main() {
    struct ClientImpl;
    
    impl HttpClient for ClientImpl {
        type Response = Pin<Box<dyn Future<Output = Response>>>;
    
        fn perform(request: Request) -> Self::Response {
            Box::pin(async move {
                // Some async work here creating Reponse
            })
        }
    }
    }
    

    which wouldn't work for no_std.

  • implement Future trait manually, which isn't particulary easy/straight-forward for non-trivial cases, especially if it involves making other async calls (likely).

After a lot of thinking and discussion, Alan and Barbara accept that they won't be able to support no_std users of their library and add mention of this in crate documentation.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • async-trait is awesome, but has some drawbacks
    • compile time increases
    • performance cost of boxing and dynamic dispatch
    • not a standard solution so when this comes to language, it might break things
  • Trying to have a more efficient implementation than async-trait is likely not possible.

What are the sources for this story?

Why did you choose Alan to tell this story?

We could have used Barbara here but she'd probably know some of the work-arounds (likely even the details on why they're needed) and wouldn't need help so it wouldn't make for a good story. Having said that, Barbara is involved in the story still so it's not a pure Alan story.

How would this story have played out differently for the other characters?

  • Barbara: See above.
  • Grace: Probably won't know the solution to these issues much like Alan, but might have an easier time understanding the why of the whole situation.
  • Niklaus: would be lost - traits are somewhat new themselves. This is just more complexity, and Niklaus might not even know where to go for help (outside of compiler errors).

๐Ÿ˜ฑ Status quo stories: Alan wants to migrate a web server to Rust

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

The story

Is Rust ready for the web?

Alan has been following the arewewebyet site for quite some time. He is a Typescript full-stack developer and follows the project in order to know when it would be sensible to migrate the backend of a web application he's responsible for. Alan loves Rust and has used it for some tasks that didn't quite need async routines. Since arewewebyet is an official Rust language project, he trusts their reviews of several web frameworks, tools, libraries, etc.

Alan was thrilled during the 2020 Xmas holiday. It turns out that at that time Rust was declared to be web ready! Alan takes this is a sign that not only is Rust great for web servers, but also a confirmation that async features have matured and stabilised. For, how can a language be web ready and not fully support asynchronous tasks?

Alan's point of reference are the Golang and Javascript languages. They were both created for web servers and clients. They also support async/await natively. At the same time, Alan is not aware of the complexities that these languages are "hiding" from him.

Picking a web server is ok

Golang native http server is nice but, as a Typescript developer, Alan is also used to dealing with "Javascript fatigue". Javascript developers often use this term to refer to a fast-pace framework ecosystem, where every so often there is the "new" thing everybody else is migrating to. Similarly, Javascript engineers are used to having to pick from a myriad of options within the vast npm ecosystem. And so, the lack of a web sever in Rust's standard library didn't surprise him. The amount of options didn't overwhelm him either.

The arewewebyet site mentions four good web servers. Alan picks Tide because the interfaces and the emphasis on middleware reminds him of Nodejs' Express framework.

The first endpoint

Alan sets up all the boilerplate and is ready to write the first endpoint. He picks PUT /support-ticket because it barely has any logic in it. When a request arrives, the handler only makes a request to Zendesk to create a support ticket. The handler is stateless and has no middleware.

The arewewebyet site doesn't recommend a specific http client, so Alan searches for one in crates.io. He picks reqwest simply because it's the most popular.

Alan combines the knowledge he has from programming in synchronous Rust and asynchronous Javascript to come up with a few lines that should work. If the compiler is happy, then so is he!

First problem: incompatible runtimes

The first problem he runs into is very similar to the one described in the compiler trust story: thread 'main' panicked at 'there is no reactor running, must be called from the context of a Tokio 1.x runtime.

In short, Alan has problems because Tide is based on std-async and reqwest on the latest version of tokio. This is a real pain for Alan as he has now to change either the http client or the server so that they use the same runtime.

He decides to switch to Actix web.

Second problem: incompatible versions of the same runtime

Alan migrates to Actix web and again the compiler seems to be happy. To his surprise, the same problem happens again. The program panics with the message as before: there is no reactor running, must be called from the context of a Tokio 1.x runtime. He is utterly puzzled as Actix web is based on Tokio just like reqwest. Didn't he just fix problem number 1?

It turns out that the issue is that Alan's using v0.11.2 of reqwest, which uses tokio v1, and v3.3.2 of actix-web, which uses tokio v0.3.

The solution to this problem is then to dig into all the versions of reqwest until he finds one which uses the same version of tokio.

Can Alan sell the Rust migration to his boss?

This experience has made Alan think twice about whether Rust is indeed web ready. On the one hand, there are very good libraries for web servers, ORMs, parsers, session management, etc. On the other, Alan is fearful that in 2/3/6 months time he has to develop new features with libraries that already exist but turn out to be incompatible with the runtime chosen at the beginning of the project.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Rust's ecosystem has a lot of great components that may individually be ready for the web, but combining them is still a fraught proposition. In a typical web server project, dependencies that use async features form an intricate web which is hard to decipher for both new and seasoned Rust developers. Alan picked Tide and reqwest, only to realise later that they are not compatible. How many more situations like this will he face? Can Alan be confident that it won't happen again? New users especially are not accustomed to having to think about what "runtime" they are using, since there is usually not a choice in the matter.
  • The situation is so complex that it's not enough knowing that all dependencies use the same runtime. They all have to actually be compatible with the same runtime and version. Newer versions of reqwest are incompatible with the latest stable version of actix web (verified at the time of writing)
  • Developers that need a stable environment may be fearful of the complexity that comes with managing async dependencies in Rust. For example, if reqwest had a security or bug fix in one of the latest versions that's not backported to older ones, Alan would not be able to upgrade because actix web is holding him back. He has in fact to wait until ALL dependencies are using the same runtime to apply fixes and upgrades.

What are the sources for this story?

Personal experience of the author.

Why did you choose Alan to tell this story?

As a web developer in GC languages, Alan writes async code every day. A language without stable async features is not an option.

How would this story have played out differently for the other characters?

Learning what async means and what it entails in a codebase is usually hard enough. Niklaus would struggle to learn all that while at the same time dealing with the many gotchas that can happen when building a project with a lot of dependencies.

Barbara may be more tolerant with the setup since she probably knows the rationale behind keeping Rust's standard library lean and the need for external async runtimes.

How would this story have played out differently if Alan came from another GC'd language?

Like the trust story, it would be very close, since all other languages (that I know of) provide async runtimes out of the box and it's not something the programmer needs to concern themselves with.

๐Ÿ˜ฑ Status quo stories: Alan runs into stack allocation trouble

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

The problem

One day, as Alan is working on his async Rust project, he runs his application and hits an error:

$ .\target\debug\application.exe
thread 'main' has overflowed its stack

Perplexed, Alan sees if anything with his application works by seeing if he can get output when the --help flag is passed, but he has no luck:

$ .\target\debug\application.exe --help
thread 'main' has overflowed its stack

Searching for the solution

Having really only ever seen stack overflow issues caused by recursive functions, Alan desperately tries to find the source of the bug but searching through the codebase for recursive functions only to find none. Having learned that Rust favors stack allocation over heap allocation (a concept Alan didn't really need to worry about before), he started manually looking through his code, searching for structs that looked "too large"; he wasn't able to find any candidates.

Confused, Alan reached out to Grace for her advice. She suggested making the stack size larger. Although she wasn't a Windows expert, she remembers hearing that stack sizes on Windows might be smaller than on Linux. After much searching, Alan discovers an option do just that: RUSTFLAGS = "-C link-args=-Wl,-zstack-size=<size in bytes>".

While eventually Alan gets the program to run, the stack size must be set to 4GB before it does! This seems untenable, and Alan goes back to the drawing board.

Alan reaches out to Barbara for her expertise in Rust to see if she has something to suggest. Barbara recommends using RUSTFLAGS = "-Zprint-type-sizes to print some type sizes and see if anything jumps out. Barbara noted that if Alan does find a type that stands out, it's usually as easy as putting some boxes in that type to provide some indirection and not have everything be stack allocated. Alan never needs the nightly toolchain, but this option requires it so he installs it using rustup. After searching through types, one did stand out as being quite large. Ultimately, this was a red herring, and putting parts of it in Boxes did not help.

Finding the solution

After getting no where, Alan went home for the weekend defeated. On Monday, he decided to take another look. One piece of code, stuck out to him: the use of the select! macro from the futures crate. This macro allowed multiple futures to race against each other, returning the value of the first one to finish. This macro required the futures to be pinned which the docs had shown could be done by using pin_mut!. Alan didn't fully grasp what pin_mut! was actually doing when he wrote that code. The compiler had complained to him that the futures he was passing to select! needed to be pinned, and pin_mut! was what he found to make the compiler happy.

Looking back at the documents made it clear to Alan that this could potentially be the issue: pin_mut! pins futures to the stack. It was relatively clear that a possible solution would be to pin to the heap instead of the stack. Some more digging in the docs lead Alan to Box::pin which did just that. An extra heap allocation was of no consequence to him, so he gave it a try. Lo and behold, this fixed the issue!

While Alan knew enough about pinning to know how to satisfy the compiler, he didn't originally take the time to fully understand what the consequences were of using pin_mut! to pin his futures. Now he knows!

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • When coming from a background of GCed languages, taking the time to understand the allocation profile of a particular piece of code is not something Alan was used to doing.
  • It was hard to tell where in his code the stack was being exhausted. Alan had to rely on manually combing his code to find the culprit.
  • Pinning is relatively confusing, and although the code compiled, Alan didn't fully understand what he wrote and what consequences his decision to use pin_mut! would have.

What are the sources for this story?

This story is adapted from the experiences of the team working on the Krustlet project. You can read about this story in their own words here.

Why did you choose Alan to tell this story?

  • The programmers this story was based on have an experience mostly in Go, a GCed language.
  • The story is rooted in the explicit choice of using stack vs heap allocation, a choice that in GCed languages is not in the hands of the programmer.

How would this story have played out differently for the other characters?

  • Grace would have likely had a similar hard time with this bug. While she's used to the tradeoffs of stack vs heap allocations, the analogy to the Pin API is not present in languages she's used to.
  • Barbara, as an expert in Rust, may have had the tools to understand that pin_mut is used for pinning to the stack while Box::pin is for pinning heap allocations.
  • This problem is somewhat subtle, so someone like Niklaus would probably have had a much harder time figuring this out (or even getting the code to compile in the first place).

Could Alan have used another API to achieve the same objectives?

Perhaps! Tokio's select! macro doesn't require explicit pinning of the futures it's provided, but it's unclear to this author whether it would have been smart enough to avoid pinning large futures to the stack. However, pinning is a part of the way one uses futures in Rust, so it's possible that such an issue would have arisen elsewhere.

๐Ÿ˜ฑ Status quo stories: Alan started trusting the Rust compiler, but then... async

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

The story

Trust the compiler

Alan has a lot of experience in C#, but in the meantime has created some successful projects in Rust. He has dealt with his fair share of race conditions/thread safety issues during runtime in C#, but is now starting to trust that if his Rust code compiles, he won't have those annoying runtime problems to deal with.

This allows him to try to squeeze his programs for as much performance as he wants, because the compiler will stop him when he tries things that could result in runtime problems. After seeing the performance and the lack of runtime problems, he starts to trust the compiler more and more with each project finished.

He knows what he can do with external libraries, he does not need to fear concurrency issues if the library cannot be used from multiple threads, because the compiler would tell him.

His trust in the compiler solidifies further the more he codes in Rust.

The first async project

Alan now starts with his first async project. He sees that there is no async in the standard library, but after googling for "rust async file open", he finds 'async_std', a crate that provides some async versions of the standard library functions. He has some code written that asynchronously interacts with some files:

use async_std::fs::File;
use async_std::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = File::create("a.txt").await?;
    file.write_all(b"Hello, world!").await?;
    Ok(())
}

But now the compiler complains that await is only allowed in async functions. He now notices that all the examples use #[async_std::main] as an attribute on the main function in order to be able to turn it into an async main, so he does the same to get his code compiling:

use async_std::fs::File;
use async_std::prelude::*;

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = File::create("a.txt").await?;
    file.write_all(b"Hello, world!").await?;

    Ok(())
}

This aligns with what he knows from C#, where you also change the entry point of the program to be async, in order to use await. Everything is great now, the compiler is happy, so no runtime problems, so Alan is happy.

The project is working like a charm.

Fractured futures, fractured trust

The project Alan is building is starting to grow, and he decides to add a new feature that needs to make some API calls. He starts using reqwest in order to help him achieve this task. After a lot of refactoring to make the compiler accept the program again, Alan is satisfied that his refactoring is done. His program now boils down to:

use async_std::fs::File;
use async_std::prelude::*;

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = File::create("a.txt").await?;
    file.write_all(b"Hello, world!").await?;

    let body = reqwest::get("https://www.rust-lang.org")
        .await?
        .text()
        .await?;
    println!("{}", body);

    Ok(())
}

He runs his project but is suddenly greeted with a runtime error. He is quite surprised. "How is this even possible?", he thinks. "I don't have any out-of-bounds accesses, and I never use .unwrap or .expect." At the top of the error message he sees: thread 'main' panicked at 'there is no reactor running, must be called from the context of a Tokio 1.x runtime'

He searches what "Tokio" is in Rust, and he finds that it also provides an attribute to put on main, namely [tokio::main], but what is the difference with [async_std::main]? His curiosity leads him to watch videos/read blogs/scour reddit,... on why there are multiple runtimes in Rust. This leads him into a rabbit hole and now he learns about Executors, Wakers, Pin,... He has a basic grasp of what they are, but does not have a good understanding of them or how they all fit together exactly. These are all things he had not need to know nor heed in C#. (Note: there is another story about troubles/confusion that might arise when learning all these things about async: Alan hates writing a Stream)

He does understand the current problems and why there is no one-size-fits-all executor (yet). Trying to get his async Rust code to work, he broadened his knowledge about what async code actually is, he gains another way to reason about asynchronous code, not only in Rust, but also more generally.

But now he realizes that there is a whole new area of runtime problems that he did not have to deal with in C#, but he does in Rust. Can he even trust the Rust compiler anymore? What other kinds of runtime problems can occur in Rust that can't in C#? If his projects keep increasing in complexity, will other new kinds of runtime problems keep popping up? Maybe it's better to stick with C#, since Alan already knows all the runtime problems you can have over there.

The Spider-Man effect

Do you recall in Spider-Man, that after getting bitten by the radioactive spider, Peter first gets ill before he gains his powers? Well, imagine instead of being bitten by a radioactive spider, he was bitten by an async-rust spider...

In his work, Alan sees an async call to a C# wrapper around SQLite, his equivalent of a spider-sense (async-sense?) starts tingling. Now knowing from Rust the complexities that arise when trying to create asynchronicity, what kind of complex mechanisms are at play here to enable these async calls from C# that end up in the C/C++ of SQLite?

He quickly discovers that there are no complex mechanism at all! It's actually just a synchronous call all the way down, with just some extra overhead from wrapping it into an asynchronous function. There are no points where the async function will yield. He transforms all these asynchronous calls to their synchronous counterparts, and sees a slight improvement in performance. Alan is happy, product management is happy, customers are happy!

Over the next few months, he often takes a few seconds to reflect about why certain parts of the code are async, if they should be, or how other parts of the code might benefit from being async and if it's possible to make them async. He also uses what he learned from async Rust in his C# code reviews to find similar problems or general issues (With great power...). He even spots some lifetime bugs w.r.t. asynchronous code in C#, imagine that.

His team recognizes that Alan has a pretty good grasp about what async is really about, and he is unofficially crowned the "async guru" of the team.

Even though this spider-man might have gotten "ill" (his negative experience with async Rust), he has now become the superhero he was meant to be!

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Async I/O includes a new set of runtime errors and misbehaviors that the compiler can't help you find. These include cases like executing blocking operations in an async context but also mixing runtime libraries (something users may not even realize is a factor).
  • Rust users get used to the compiler giving them error messages for runtime problems but also helping them to fix them. Pushing error messages to runtimes feels surprising and erodes some of their confidence in Rust.
  • The "cliff" in learning about async is very steep -- at first everything seems simple and similar to other languages, then suddenly you are thrown into a lot of information. It's hard to know what's important and what is not. But, at the same time, dipping your toes into async Rust can broaden the understanding a programmer has of asynchronous coding, which can help them even in other languages than Rust.

What are the sources for this story?

Personal experience of the author.

Why did you choose Alan to tell this story?

With his experience in C#, Alan probably has experience with async code. Even though C# protects him from certain classes of errors, he can still encounter other classes of errors, which the Rust compiler prevents.

How would this story have played out differently for the other characters?

For everyone except Barbara, I think these would play out pretty similarly, as this is a kind of problem unique to Rust. Since Barbara has a lot of Rust experience, she would probably already be familiar with this aspect.

How would this story have played out differently if Alan came from another GC'd language?

It would be very close, since all other languages (that I know of) provide async runtimes out of the box and it's not something the programmer needs to concern themselves with.

๐Ÿ˜ฑ Status quo stories: Alan thinks he needs async locks

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

One of Alan's first Rust related tasks in his job at YouBuy is writing an HTTP based service. This service is a simple internal proxy router that inspects an incoming HTTP request and picks the downstream service to call based on certain aspects of the HTTP request.

Alan decides that he'll simply use some shared state that request handlers can read from in order to decide how to proxy the request.

Alan, having read the Rust book and successfully completed the challenge in the last chapters, knows that shared state can be achieved in Rust with reference counting (using std::sync::Arc) and locks (using std::sync::Mutex). Alan starts by throwing his shared state (a std::collections::HashMap<String, url::Url>) into an Arc<Mutex<T>>.

Alan, smitten with how quickly he can write Rust code, ends up with some code that compiles that looks roughly like this:


#![allow(unused)]
fn main() {
#[derive(Clone)]
struct Proxy {
   routes: Arc<Mutex<HashMap<String, String>>,
}

impl Proxy {
  async fn handle(&self, key: String, request: Request) -> crate::Result<Response> {
      let routes = self.state.lock().unwrap();
      let route = routes.get(key).unwrap_or_else(crate::error::MissingRoute)?;
      Ok(self.client.perform_request(route, request).await?)
  }
}
}

Alan is happy that his code seems to be compiling! The short but hard learning curve has been worth it. He's having fun now!

Unfortunately, Alan's happiness soon comes to end as he starts integrating his request handler into calls to tokio::spawn which he knows will allow him to manage multiple requests at a time. The error message is somewhat cryptic, but Alan is confident he'll be able to figure it out:

189 |     tokio::spawn(async {
    |     ^^^^^^^^^^^^ future created by async block is not `Send`
::: /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.5.0/src/task/spawn.rs:129:21
    |
129 |         T: Future + Send + 'static,
    |                     ---- required by this bound in `tokio::spawn`

note: future is not `Send` as this value is used across an await
   --> src/handler.rs:787:9
      |
786   |         let routes = self.state.lock().unwrap();
      |             - has type `std::sync::MutexGuard<'_, HashMap<String, Url>>` which is not `Send`
787   |         Ok(self.client.perform_request(route, request).await?)
      |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here, with `routes` maybe used later
788   |     })
      |     - `routes` is later dropped here

Alan stops and takes a deep breath. He tries his best to make sense of the error message. He sort of understands the issue the compiler is telling him. Apparently routes is not marked as Send, and because it is still alive over a call to await, it is making the future his handler returns not Send. And tokio's spawn function seems to require that the future it received be Send.

Alan reaches the boundaries of his knowledge of Rust, so he reaches out over chat to ask his co-worker Barbara for help. Not wanting to bother her, Alan provides the context he's already figured out for himself.

Barbara knows that mutex guards are not Send because sending mutex guards to different threads is not a good idea. She suggests looking into async locks which can be held across await points because they are Send. Alan looks into the tokio documentation for more info and is easily able to move the use of the standard library's mutex to tokio's mutex. It compiles!

Alan ships his code and it gets a lot of usage. After a while, Alan notices some potential performance issues. It seems his proxy handler does not have the throughput he would expect. Barbara, having newly joined his team, sits down with him to take a look at potential issues. Barbara is immediately worried by the fact that the lock is being held much longer than it needs to be. The lock only needs to be held while accessing the route and not during the entire duration of the downstream request.

She suggests to Alan to switch to not holding the lock across the I/O operations. Alan first tries to do this by explicitly cloning the url and dropping the lock before the proxy request is made:


#![allow(unused)]
fn main() {
impl Proxy {
  async fn handle(&self, key: String, request: Request) -> crate::Result<Response> {
      let routes = self.state.lock().unwrap();
      let route = routes.get(key).unwrap_or_else(crate::error::MissingRoute)?.clone();
      drop(routes);
      Ok(self.client.perform_request(route, request).await?)
  }
}
}

This compiles fine and works in testing! After shipping to production, they notice a large increase in throughput. It seems their change made a big difference. Alan is really excited about Rust, and wants to write more!

Alan continues his journey of learning even more about async Rust. After some enlightening talks at the latest RustConf, he decides to revisit the code that he and Barbara wrote together. He asks himself, is using an async lock the right thing to do? This lock should only be held for a very short amount of time. Yielding to the runtime is likely more expensive than just synchronously locking. But he remembers vaguely hearing that you should never use blocking code in async code as this will block the entire async executor from being able to make progress, so he doubts his intuition.

After chatting with Barbara, who encourages him to benchmark and measure, he decides to switch back to synchronous locks.

Unfortunately, switching back to synchronous locks brings back the old compiler error message about his future not being Send. Alan is confused as he's dropping the mutex guard before it ever crosses an await point.

Confused Alan goes to Barbara for advice. She is also confused, and it takes several minutes of exploration before she comes to a solution that works: wrapping the mutex access in a block and implicitly dropping the mutex.


#![allow(unused)]
fn main() {
impl Proxy {
  async fn handle(&self, key: String, request: Request) -> crate::Result<Response> {
      let route = {
        let routes = self.state.lock().unwrap();
        routes.get(key).unwrap_or_else(crate::error::MissingRoute)?.clone()
      };
      Ok(self.client.perform_request(route, request).await?)
  }
}
}

Barbara mentions she's unsure why explicitly dropping the mutex guard did not work, but they're both happy that the code compiles. In fact it seems to have improved the performance of the service when its under extreme load. Alan's intuition was right!

In the end, Barbara decides to write a blog post about how blocking in async code isn't always such a bad idea.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

* Locks can be quite common in async code as many tasks might need to mutate some shared state.
* Error messages can be fairly good, but they still require a decent understanding of Rust (e.g., `Send`, `MutexGuard`, drop semantics) to fully understand what's going on.
* This can lead to needing to use certain patterns (like dropping mutex guards early) in order to get code working.
* The advice to never block in async code is not always true: if blocking is short enough, is it even blocking at all?

What are the sources for this story?

* Chats with [Alice](https://github.com/Darksonn) and [Lucio](https://github.com/LucioFranco).
* Alice's [blog post](https://ryhl.io/blog/async-what-is-blocking/) on the subject has some good insights.
* The issue of conservative analysis of whether values are used across await points causing futures to be `!Send` is [known](https://rust-lang.github.io/async-book/07_workarounds/03_send_approximation.html), but it takes some digging to find out about this issue. A tracking issue for this can be [found here](https://github.com/rust-lang/rust/issues/57478).

Why did you choose Alan to tell this story?

* While Barbara might be tripped up on some of the subtlties, an experienced Rust developer can usually tell how to avoid some of the issues of using locks in async code. Alan on the other hand, might be surprised when his code does not compile as the issue the `Send` error is protecting against (i.e., a mutex guard being moved to another thread) is not protected against in other languages.

How would this story have played out differently for the other characters?

* Grace would have likely had a similar time to Alan. These problems are not necessarily issues you would run into in other languages in the same way.
* Niklaus may have been completely lost. This stuff requires a decent understanding of Rust and of async computational systems.

๐Ÿ˜ฑ Status quo stories: Alan tries using a socket Sink

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Alan is working on a project that uses async-std. He has worked a bit with tokio in the past and is more familiar with that, but he is interested to learn something how things work in async-std.

One of the goals is to switch from a WebSocket implementation using raw TCP sockets to one managed behind an HTTP server library, so both HTTP and WebSocket RPC calls can be forwarded to a transport-agnostic RPC server.

In this server implementation:

  • RPC call strings can be received over a WebSocket
  • The strings are decoded and sent to an RPC router that calls the methods specified in the RPC call
  • Some of the methods that are called can take some time to return a result, so they are spawned separately
    • RPC has built-in properties to organize call IDs and methods, so results can be sent in any order
  • Since WebSockets are bidirectional streams (duplex sockets), the response is sent back through the same client socket

He finds the HTTP server tide and it seems fairly similar to warp, which he was using with tokio. He also finds the WebSocket middleware library tide-websockets that goes with it.

However, as he's working, Alan encounters a situation where the socket needs to be written to within an async thread, and the traits just aren't working. He wants to split the stream into a sender and receiver:


#![allow(unused)]
fn main() {
use futures::{SinkExt, StreamExt};
use async_std::sync::{Arc, Mutex};
use log::{debug, info, warn};

async fn rpc_ws_handler(ws_stream: WebSocketConnection) {
    let (ws_sender, mut ws_receiver) = ws_stream.split();
    let ws_sender = Arc::new(Mutex::new(ws_sender));

    while let Some(msg) = ws_receiver.next().await {
        debug!("Received new WS RPC message: {:?}", msg);

        let ws_sender = ws_sender.clone();

        async_std::task::spawn(async move {
            let res = call_rpc(msg).await?;

            match ws_sender.lock().await.send_string(res).await {
                Ok(_) => info!("New WS data sent."),
                Err(_) => warn!("WS connection closed."),
            };
        });
    }
}
}

The split method splits the ws_stream into two separate halves:

  • a producer (ws_sender) that implements a Stream with the messages arriving on the websocket;
  • a consumer (ws_receiver) that implements Sink, which can be used to send responses.

This way, one task can pull items from the ws_sender and spawn out subtasks. Those subtasks share access to the ws_receiver and send messages there when they're done. Unfortunately, Alan finds that he can't use this pattern here, as the Sink trait wasn't implemented in the WebSockets middleware library he's using.

Alan also tries creating a sort of poller worker thread using an intermediary messaging channel, but he has trouble reasoning about the code and wasn't able to get it to compile:


#![allow(unused)]
fn main() {
use async_std::channel;
use async_std::sync::{Arc, Mutex};
use log::{debug, info, warn};

async fn rpc_ws_handler(ws_stream: WebSocketConnection) {
    let (ws_sender, mut ws_receiver) = channel::unbounded::<String>();
    let ws_receiver = Arc::new(ws_receiver);

    let ws_stream = Arc::new(Mutex::new(ws_stream));
    let poller_ws_stream = ws_stream.clone();

    async_std::task::spawn(async move {
        while let Some(msg) = ws_receiver.next().await {
            match poller_ws_stream.lock().await.send_string(msg).await {
                Ok(msg) => info!("New WS data sent. {:?}", msg),
                Err(msg) => warn!("WS connection closed. {:?}", msg),
            };
        }
    });

    while let Some(msg) = ws_stream.lock().await.next().await {
        async_std::task::spawn(async move {
            let res = call_rpc(msg).await?;
            ws_sender.send(res);
        });
    }
}
}

Alan wonders if he's thinking about it wrong, but the solution isn't as obvious as his earlier Sink approach. Looking around, he realizes a solution to his problems already exists-- as others have been in his shoes before-- within two other nearly-identical pull requests, but they were both closed by the project maintainers. He tries opening a third one with the same code, pointing to an example where it was actually found to be useful. To his joy, his original approach works with the code in the closed pull requests in his local copy! Alan's branch is able to compile for the first time.

However, almost immediately, his request is closed with a comment suggesting that he try to create an intermediate polling task instead, much as he was trying before. Alan is feeling frustrated. "I already tried that approach," he thinks, "and it doesn't work!"

As a result of his frustration, Alan calls out one developer of the project on social media. He knows this developer is opposed to the Sink traits. Alan's message is not well-received: the maintainer sends a short response and Alan feels dismissed. Alan later finds out he was blocked. A co-maintainer responds to the thread, defending and supporting the other maintainer's actions, and suggests that Alan "get over it". Alan is given a link to a blog post. The post provides a number of criticisms of Sink but, after reading it, Alan isn't sure what he should do instead.

Because of this heated exchange, Alan grows concerned for his own career, what these well-known community members might think or say about his to others, and his confidence in the community surrounding this language that he really enjoys using is somewhat shaken.

Despite this, Alan takes a walk, gathers his determination, and commits to maintaining his fork with the changes from the other pull requests that were shut down. He publishes his version to crates.io, vowing to be more welcoming to "misfit" pull requests like the one he needed.

A few weeks later, Alan's work at his project at work is merged with his new forked crate. It's a big deal, his first professional open source contribution to a Rust project! Still, he doesn't feel like he has a sense of closure with the community. Meanwhile, his friends say they want to try Rust, but they're worried about its async execution issues, and he doesn't know what else to say, other than to offer a sense of understanding. Maybe the situation will get better someday, he hopes.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • There are often many sources of opinion in the community regarding futures and async, but these opinions aren't always backed up with examples of how it should be better accomplished. Sometimes we just find a thing that works and would prefer to stick with it, but others argue that some traits make implementations unnecessarily complex, and choose to leave it out. Disagreements like these in the ecosystem can be harmful to the reputation of the project and the participants.
  • If there's a source of substantial disagreement, the community becomes even further fragmented, and this may cause additional confusion in newcomers.
  • Alan is used to fragmentation from the communities he comes from, so this isn't too discouraging, but what's difficult is that there's enough functionality overlap in async libraries that it's tempting to get them to interop with each other as-needed, and this can lead to architectural challenges resulting from a difference in design philosophies.
  • It's also unclear if Futures are core to the Rust asynchronous experience, much as Promises are in JavaScript, or if the situation is actually more complex.
  • The Sink trait is complex but it solves a real problem, and the workarounds required to solve problems without it can be unsatisfactory.
  • Disagreement about core abstractions like Sink can make interoperability between runtimes more difficult; it also makes it harder for people to reproduce patterns they are used to from one runtime to another.
  • It is all too easy for technical discussions like this to become heated; it's important for all participants to try and provide each other with the "benefit of the doubt".

What are the sources for this story?

Why did you choose Alan to tell this story?

  • Alan is more representative of the original author's background in JS, TypeScript, and NodeJS.

How would this story have played out differently for the other characters?

  • (I'm not sure.)

๐Ÿ˜ฑ Status quo stories: Alan tries to debug a hang

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Alan's startup has officially launched and YouBuy is live for the world to use. The whole team is very excited especially as this will be their first use of Rust in production! Normally, as a .NET shop, they would have written the entire application in C#, but because of the scalability and latency requirements on their inventory service, they decided to write a microservice in Rust utilizing the async features they've heard so much about.

The day's excitement soon turns into concern as reports begin coming into support of customers who can't checkout. After a few cases, a pattern begins to emerge: when a customer tries to buy the last available item, the checkout process hangs forever.

Alan suspects there is an issue with the lock used in the inventory service to prevent multiple people from buying the last available item at the same time. With this hunch, he builds the latest code and opens this local dev environment to conduct some tests. Soon enough, Alan has a repro of the bug.

With the broken environment still running, he decides to use a debugger to see if he can confirm his theory. In the past, Alan has used Visual Studio's debugger to diagnose a very similar issue in a C# application he wrote. The debugger was able to show him all the async Tasks currently waiting, their call stacks and what resource they were waiting on.

Alan hasn't used a debugger with Rust before, usually a combination of the strict compiler and a bit of manual testing has been enough to fix all the bugs he's previously encountered. He does a quick Google search to see what debugger he should use and decides to go with gdb because it is already installed on his system and sounds like it should work. Alan also pulls up a blog post that has a helpful cheatsheet of gdb commands since he's not familiar with the debugger.

Alan restarts the inventory service under gdb and gets to work reproducing the issue. He reproduces the issue a few times in the hope of making it easier to identify the cause of the problem. Ready to pinpoint the issue, Alan presses Ctrl+C and then types bt to get a backtrace:

(gdb) bt
(gdb) bt
#0  0x00007ffff7d5e58a in epoll_wait (epfd=3, events=0x555555711340, maxevents=1024, timeout=49152)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055555564cf7d in mio::sys::unix::selector::epoll::Selector::select (self=0x7fffffffd008, events=0x7fffffffba40, 
    timeout=...) at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.7.11/src/sys/unix/selector/epoll.rs:68
#2  0x000055555564a82f in mio::poll::Poll::poll (self=0x7fffffffd008, events=0x7fffffffba40, timeout=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.7.11/src/poll.rs:314
#3  0x000055555559ad96 in tokio::io::driver::Driver::turn (self=0x7fffffffce28, max_wait=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/io/driver/mod.rs:162
#4  0x000055555559b8da in <tokio::io::driver::Driver as tokio::park::Park>::park_timeout (self=0x7fffffffce28, duration=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/io/driver/mod.rs:238
#5  0x00005555555e9909 in <tokio::signal::unix::driver::Driver as tokio::park::Park>::park_timeout (self=0x7fffffffce28, 
    duration=...) at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/signal/unix/driver.rs:156
#6  0x00005555555a9229 in <tokio::process::imp::driver::Driver as tokio::park::Park>::park_timeout (self=0x7fffffffce28, 
    duration=...) at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/process/unix/driver.rs:84
#7  0x00005555555a898d in <tokio::park::either::Either<A,B> as tokio::park::Park>::park_timeout (self=0x7fffffffce20, 
    duration=...) at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/park/either.rs:37
#8  0x00005555555ce0b8 in tokio::time::driver::Driver<P>::park_internal (self=0x7fffffffcdf8, limit=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/time/driver/mod.rs:226
#9  0x00005555555cee60 in <tokio::time::driver::Driver<P> as tokio::park::Park>::park (self=0x7fffffffcdf8)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/time/driver/mod.rs:398
#10 0x00005555555a87bb in <tokio::park::either::Either<A,B> as tokio::park::Park>::park (self=0x7fffffffcdf0)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/park/either.rs:30
#11 0x000055555559ce47 in <tokio::runtime::driver::Driver as tokio::park::Park>::park (self=0x7fffffffcdf0)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/runtime/driver.rs:198
#12 0x000055555557a2f7 in tokio::runtime::basic_scheduler::Inner<P>::block_on::{{closure}} (scheduler=0x7fffffffcdb8, 
    context=0x7fffffffcaf0)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/runtime/basic_scheduler.rs:224
#13 0x000055555557b1b4 in tokio::runtime::basic_scheduler::enter::{{closure}} ()
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/runtime/basic_scheduler.rs:279
#14 0x000055555558174a in tokio::macros::scoped_tls::ScopedKey<T>::set (
    self=0x555555701af8 <tokio::runtime::basic_scheduler::CURRENT>, t=0x7fffffffcaf0, f=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/macros/scoped_tls.rs:61
#15 0x000055555557b0b6 in tokio::runtime::basic_scheduler::enter (scheduler=0x7fffffffcdb8, f=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/runtime/basic_scheduler.rs:279
#16 0x0000555555579d3b in tokio::runtime::basic_scheduler::Inner<P>::block_on (self=0x7fffffffcdb8, future=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/runtime/basic_scheduler.rs:185
#17 0x000055555557a755 in tokio::runtime::basic_scheduler::InnerGuard<P>::block_on (self=0x7fffffffcdb8, future=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/runtime/basic_scheduler.rs:425
#18 0x000055555557aa9c in tokio::runtime::basic_scheduler::BasicScheduler<P>::block_on (self=0x7fffffffd300, future=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/runtime/basic_scheduler.rs:145
#19 0x0000555555582094 in tokio::runtime::Runtime::block_on (self=0x7fffffffd2f8, future=...)
    at /home/alan/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.4.0/src/runtime/mod.rs:450
#20 0x000055555557c22f in inventory_service::main () at /home/alan/code/inventory_service/src/main.rs:4

Puzzled, the only line Alan even recognizes is the main entry point function for the service. He knows that async tasks in Rust aren't run individually on their own threads which allows them to scale better and use fewer resources but surely there has to be a thread somewhere that's running his code? Alan doesn't completely understand how async works in Rust but he's seen the Future::poll method so he assumes that there is a thread which constantly polls tasks to see if they are ready to wake up. "Maybe I can find that thread and inspect its state?" he thinks and then consults the cheatsheet for the appropriate command to see the threads in the program. info threads seems promising so he tries that:

(gdb) info threads
(gdb) info threads
  Id   Target Id                                          Frame 
* 1    Thread 0x7ffff7c3b5c0 (LWP 1048) "inventory_servi" 0x00007ffff7d5e58a in epoll_wait (epfd=3, events=0x555555711340, 
    maxevents=1024, timeout=49152) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30

Alan is now even more confused: "Where are my tasks?" he thinks. After looking through the cheatsheet and StackOverflow, he discovers there isn't a way to see which async tasks are waiting to be woken up in the debugger. Taking a shot in the dark, Alan concludes that this thread must be thread which is polling his tasks since it is the only one in the program. He googles "epoll_wait rust async tasks" but the results aren't very helpful and inspecting the stack frame doesn't yield him any clues as to where his tasks are so this seems to be a dead end.

After thinking a bit, Alan realizes that since the runtime must know what tasks are waiting to be woken up, perhaps he can have the service ask the async runtime for that list of tasks every 10 seconds and print them to stdout? While crude, this would probably also help him diagnose the hang. Alan gets to work and opens the runtime docs to figure out how to get that list of tasks. After spending 30 minutes reading the docs, looking at StackOverflow questions and even posting on users.rust-lang.org, he discovers this simply isn't possible and he will have to add tracing to his application to figure out what's going on.

Disgruntled, Alan begins the arduous, boring task of instrumenting the application in the hope that the logs will be able to help him.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Developers, especially coming from an language that has a tightly integrated development environment, expect their debugger to help them particularly in situations where "println" debugging can't.
  • If the debugger can't help them, developers will often try to reach for a programmatic solution such as debug functions in their runtime that can be invoked at critical code paths.
  • Trying to debug an issue by adding logging and then triggering the issue is painful because of the long turn-around times when modifying code, compiling and then repro'ing the issue.

What are the sources for this story?

  • @erickt's comments in #76, similar comments I've heard from other developers.

Why did you choose Alan to tell this story?

  • Coming from a background in managed languages where the IDE, debugger and runtime are tightly integrated, Alan would be used to using those tools to diagnose his issue.
  • Alan has also been a bit insulated from the underlying OS and expects the debugger to understand the language and runtime even if the OS doesn't have similar concepts such as async tasks.

How would this story have played out differently for the other characters?

  • Some of the characters with either a background in Rust or a background in systems programming might know that Rust's async doesn't always map to an underlying system feature and so they might expect that gdb or lldb is unable to help them.
  • Barbara, the experienced Rust dev, might also have used a tracing/instrumentation library from the beginning and have that to fall back on rather than having to do the work to add it now.

๐Ÿ˜ฑ Status quo stories: Alan writes a web framework

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

YouBuy is written using an async web framework that predates the stabilization of async function syntax. When Alan joins the company, it is using async functions for its business logic, but can't use them for request handlers because the framework doesn't support it yet. It requires the handler's return value to be Box<dyn Future<...>>. Because the web framework predates async function syntax, it requires you to take ownership of the request context (State) and return it alongside your response in the success/error cases. This means that even with async syntax, an http route handler in this web framework looks something like this (from the Gotham Diesel example):


#![allow(unused)]
fn main() {
// For reference, the framework defines these type aliases.
pub type HandlerResult = Result<(State, Response<Body>), (State, HandlerError)>;
pub type HandlerFuture = dyn Future<Output = HandlerResult> + Send;

fn get_products_handler(state: State) -> Pin<Box<HandlerFuture>> {
    use crate::schema::products::dsl::*;

    async move {
        let repo = Repo::borrow_from(&state);
        let result = repo.run(move |conn| products.load::<Product>(&conn)).await;
        match result {
            Ok(prods) => {
                let body = serde_json::to_string(&prods).expect("Failed to serialize prods.");
                let res = create_response(&state, StatusCode::OK, mime::APPLICATION_JSON, body);
                Ok((state, res))
            }
            Err(e) => Err((state, e.into())),
        }
    }
    .boxed()
}
}

and then it is registered like this:


#![allow(unused)]
fn main() {
    router_builder.get("/").to(get_products_handler);
}

The handler code is forced to drift to the right a lot, because of the async block, and the lack of ability to use ? forces the use of a match block, which drifts even further to the right. This goes against what he has learned from his days writing go.

Rather than switching YouBuy to a different web framework, Alan decides to contribute to the web framework himself. After a bit of a slog and a bit of where-clause-soup, he manages to make the web framework capable of using an async fn as an http request handler. He does this by extending the router builder with a closure that boxes up the impl Future from the async fn and then passes that closure on to .to().


#![allow(unused)]
fn main() {
    fn to_async<H, Fut>(self, handler: H)
    where
        Self: Sized,
        H: (FnOnce(State) -> Fut) + RefUnwindSafe + Copy + Send + Sync + 'static,
        Fut: Future<Output = HandlerResult> + Send + 'static,
    {
        self.to(move |s: State| handler(s).boxed())
    }
}

The handler registration then becomes:


#![allow(unused)]
fn main() {
    router_builder.get("/").to_async(get_products_handler);
}

This allows him to strip out the async blocks in his handlers and use async fn instead.


#![allow(unused)]
fn main() {
// Type the library again, in case you've forgotten:
pub type HandlerResult = Result<(State, Response<Body>), (State, HandlerError)>;

async fn get_products_handler(state: State) -> HandlerResult {
    use crate::schema::products::dsl::*;

    let repo = Repo::borrow_from(&state);
    let result = repo.run(move |conn| products.load::<Product>(&conn)).await;
    match result {
        Ok(prods) => {
            let body = serde_json::to_string(&prods).expect("Failed to serialize prods.");
            let res = create_response(&state, StatusCode::OK, mime::APPLICATION_JSON, body);
            Ok((state, res))
        }
        Err(e) => Err((state, e.into())),
    }
}
}

It's still not fantastically ergonomic though. Because the handler takes ownership of State and returns it in tuples in the result, Alan can't use the ? operator inside his http request handlers. If he tries to use ? in a handler, like this:


#![allow(unused)]
fn main() {
async fn get_products_handler(state: State) -> HandlerResult {
    use crate::schema::products::dsl::*;

    let repo = Repo::borrow_from(&state);
    let prods = repo
        .run(move |conn| products.load::<Product>(&conn))
        .await?;
    let body = serde_json::to_string(&prods).expect("Failed to serialize prods.");
    let res = create_response(&state, StatusCode::OK, mime::APPLICATION_JSON, body);
    Ok((state, res))
}
}

then he receives:

error[E0277]: `?` couldn't convert the error to `(gotham::state::State, HandlerError)`
  --> examples/diesel/src/main.rs:84:15
   |
84 |         .await?;
   |               ^ the trait `From<diesel::result::Error>` is not implemented for `(gotham::state::State, HandlerError)`
   |
   = note: the question mark operation (`?`) implicitly performs a conversion on the error value using the `From` trait
   = note: required by `std::convert::From::from`

Alan knows that the answer is to make another wrapper function, so that the handler can take an &mut reference to State for the lifetime of the future, like this:


#![allow(unused)]
fn main() {
async fn get_products_handler(state: &mut State) -> Result<Response<Body>, HandlerError> {
    use crate::schema::products::dsl::*;

    let repo = Repo::borrow_from(&state);
    let prods = repo
        .run(move |conn| products.load::<Product>(&conn))
        .await?;
    let body = serde_json::to_string(&prods).expect("Failed to serialize prods.");
    let res = create_response(&state, StatusCode::OK, mime::APPLICATION_JSON, body);
    Ok(res)
}
}

and then register it with:


#![allow(unused)]
fn main() {
    route.get("/").to_async_borrowing(get_products_handler);
}

but Alan can't work out how to express the type signature for the .to_async_borrowing() helper function. He submits his .to_async() pull-request upstream as-is, but it nags on his mind that he has been defeated.

Shortly afterwards, someone raises a bug about ?, and a few other web framework contributors try to get it to work, but they also get stuck. When Alan tries it, the compiler diagnostics keep sending him around in circles . He can work out how to express the lifetimes for a function that returns a Box<dyn Future + 'a> but not an impl Future because of how where clauses are expressed. Alan longs to be able to say "this function takes an async function as a callback" (fn register_handler(handler: impl async Fn(state: &mut State) -> Result<Response, Error>)) and have Rust elide the lifetimes for him, like how they are elided for async functions.

A month later, one of the contributors finds a forum comment by Barbara explaining how to express what Alan is after (using higher-order lifetimes and a helper trait). They implement this and merge it. The final .to_async_borrowing() implementation ends up looking like this (also from Gotham):


#![allow(unused)]
fn main() {
pub trait AsyncHandlerFn<'a> {
    type Res: IntoResponse + 'static;
    type Fut: std::future::Future<Output = Result<Self::Res, HandlerError>> + Send + 'a;
    fn call(self, arg: &'a mut State) -> Self::Fut;
}

impl<'a, Fut, R, F> AsyncHandlerFn<'a> for F
where
    F: FnOnce(&'a mut State) -> Fut,
    R: IntoResponse + 'static,
    Fut: std::future::Future<Output = Result<R, HandlerError>> + Send + 'a,
{
    type Res = R;
    type Fut = Fut;
    fn call(self, state: &'a mut State) -> Fut {
        self(state)
    }
}

pub trait HandlerMarker {
    fn call_and_wrap(self, state: State) -> Pin<Box<HandlerFuture>>;
}

impl<F, R> HandlerMarker for F
where
    R: IntoResponse + 'static,
    for<'a> F: AsyncHandlerFn<'a, Res = R> + Send + 'static,
{
    fn call_and_wrap(self, mut state: State) -> Pin<Box<HandlerFuture>> {
        async move {
            let fut = self.call(&mut state);
            let result = fut.await;
            match result {
                Ok(data) => {
                    let response = data.into_response(&state);
                    Ok((state, response))
                }
                Err(err) => Err((state, err)),
            }
        }
        .boxed()
    }
}

...
    fn to_async_borrowing<F>(self, handler: F)
    where
        Self: Sized,
        F: HandlerMarker + Copy + Send + Sync + RefUnwindSafe + 'static,
    {
        self.to(move |state: State| handler.call_and_wrap(state))
    }
}

Alan is still not sure whether it can be simplified.

Later on, other developers on the project attempt to extend this approach to work with closures, but they encounter limitations in rustc that seem to make it not work (rust-lang/rust#70263).

When Alan sees another open source project struggling with the same issue, he notices that Barbara has helped them out as well. Alan wonders how many people in the community would be able to write .to_async_borrowing() without help.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Callback-based APIs with async callbacks are a bit fiddly, because of the impl Future return type forcing you to write where-clause-soup, but not insurmountable.
  • Callback-based APIs with async callbacks that borrow their arguments are almost impossible to write without help.

What are the sources for this story?

Why did you choose Alan/YouBuy to tell this story?

  • Callback-based apis are a super-common way to interact with web frameworks. I'm not sure how common they are in other fields.

How would this story have played out differently for the other characters?

  • I suspect that even many Barbara-shaped developers would struggle with this problem.

๐Ÿ˜ฑ Status quo stories: Barbara Anguishes Over HTTP

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect people's experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara is starting a new project, working together with Alan. They want to write a Rust library and as part of it they will need to make a few HTTP calls to various web services. While HTTP is part of the responsibilities of the library it is by no means the only thing the library will need to do.

As they are pair programming, they get the part of the library where HTTP will be involved and Alan asks Barbara, "OK, how do I make an HTTP request?".

As an experienced async Rust developer Barbara has been dreading this question from the start of the project. She's tempted to ask "How long do you have?", but she quickly gathers herself and starts to outline the various considerations. She starts with a relatively simple question: "Should we use an HTTP library with a sync interface or an async interface?".

Alan, who comes from a JavaScript background, remembers the transition from callbacks to async/await in that language. He assumes Rust is merely making its transition to async/await, and it will eventually be the always preferred choice. He hesitates and asks Barbara: "Isn't async/await always better?". Barbara, who can think of many scenarios where a blocking, sync interface would likely be better, weighs whether going done the rabbit-hole of async vs sync is the right way to spend their time. She decides instead to try to directly get at the question of whether they should use async for this particular project. She knows that bridging sync and async can be difficult, and so there's another question they need to answer first: "Are we going to expose a sync or an async interface to the users of our library?".

Alan, still confused about when using a sync interface is the right choice, replies as confident as he can: "Everybody wants to use async these days. Let's do that!". He braces for Barbara's answer as he's not even sure what he said is actually true.

Barbara replies, "If we expose an async API then we need to decide which async HTTP implementation we will use". As she finishes saying this, Barbara feels slightly uneasy. She knows that it is possible to use a sync HTTP library and expose it through an async API, but she fears totally confusing Alan and so decides to not mention this fact.

Barbara looks over at Alan and sees a blank stare on his face. She repeats the question: "So, which async HTTP implementation should we use?". Alan responds with the only thing that comes to his mind: "which one is the best?" to which Barbara responds "Well, it depends on which async runtime you're using".

Alan, feeling utterly dejected and hoping that the considerations will soon end tries a new route out of this conversation: "Can we allow the user of the library to decide?".

Barbara thinks to herself, "Oh boy, we could provide a trait that abstracts over the HTTP request and response and allow the user to provide the implementation for whatever HTTP library they want... BUT, if we ever need any additional functionality that an async runtime needs to expose - like async locks or async timers - we might be forced to pick an actual runtime implementation on behalf of the user... Perhaps, we can put the most popular runtime implementations behind feature flags and let the user chose that way... BUT what if we want to allow plugging in of different runtimes?"

Alan, having watched Barbara stare off into the distance for what felt like a half-hour, feels bad for his colleague. All he can think to himself is how Rust is so much more complicated that C#.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • What is a very mundane and simple decision in many other languages, picking an HTTP library, requires users to contemplate many different considerations.
  • There is no practical way to choose an HTTP library that will serve most of the ecosystem. Sync/Async, competing runtimes, etc. - someone will always be left out.
  • HTTP is a small implementation detail of this library, but it is a HUGE decision that will ultimately be the biggest factor in who can adopt their library.

What are the sources for this story?

Based on the author's personal experience of taking newcomers to Rust through the decision making process of picking an HTTP implementation for a library.

Why did you choose Barbara to tell this story?

Barbara knows all the considerations and their consequences. A less experienced Rust developer might just make a choice even if that choice isn't the right one for them.

๐Ÿ˜ฑ Status quo stories: Barbara battles buffered streams

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Mysterious timeouts

Barbara is working on her YouBuy server and is puzzling over a strange bug report. She is encountering users reporting that their browser connection is timing out when they connect to YouBuy. Based on the logs, she can see that they are timing out in the do_select function:


#![allow(unused)]
fn main() {
async fn do_select<T>(database: &Database, query: Query) -> Result<Vec<T>> {
    let conn = database.get_conn().await?;
    conn.select_query(query).await
}
}

This is surprising, because do_select doesn't do much - it does a database query to claim a work item from a queue, but isn't expected to handle a lot of data or hit extreme slowdown on the database side. Some of the time, there is some kind of massive delay in between the get_conn method opening a connection and the call to select_query. But why? She has metrics that show that the CPU is largely idle, so it's not like the cores are all occupied.

She looks at the caller of do_select, which is a function do_work:


#![allow(unused)]
fn main() {
async fn do_work(database: &Database) {
    let work = do_select(database, FIND_WORK_QUERY)?;
    stream::iter(work)
        .map(|item| do_select(database, work_from_item(item)))
        .buffered(5)
        .for_each(|work_item| process_work_item(database, work_item))
        .await;
}

async fn process_work_item(...) { }
}

The do_work function is invoking do_select as part of a stream; it is buffering up a certain number of do_select instances and, for each one, invoking process_work_item. Everything seems to be in order, and she can see that calls to process_work_item are completing in the logs.

Following a hunch, she adds more logging in and around the process_work_item function and waits a few days to accumulate new logs. She notices that shortly after each time out, there is always a log of a process_work_item call that takes at least 20 seconds. These calls are not related to the connections that time out, they are for other connections, but they always appear afterwards in time.

process_work_item is expected to be slow sometimes because it can end up handling large items, so this is not immediately surprising to Barbara. She is, however, surprised by the correlation - surely the executor ensures that process_work_item can't stop do_select from doing its job?

Barbara thought she understood how async worked

Barbara thought she understood futures fairly well. She thought of async fn as basically "like a synchronous function with more advanced control flow". She knew that Rust's futures were lazy -- that they didn't start executing until they were awaited -- and she knew that could compose them using utilities like join, FuturesUnordered, or the buffered method (as in this example).

Barbara also knows that every future winds up associated with a task, and that if you have multiple futures on the same task (in this case, the futures in the stream, for example) then they would run concurrently, but not in parallel. Based on this, she thinks perhaps that process_work_item is a CPU hog that takes too long to complete, and so she needs to add a call to spawn_blocking. But when she looks more closely, she realizes that process_work_item is an async function, and those 20 seconds that it spends executing are mostly spent waiting on I/O. Huh, that's confusing, because the task ought to be able to execute other futures in that case -- so why are her connections stalling out without making progress?

Barbara goes deep into how poll works

She goes to read the Rust async book and tries to think about the model, but she can't quite see the problem. Then she asks on the rust-lang Discord and someone explains to her what is going on, with the catchphrase "remember, async is about waiting in parallel, not working in parallel". Finally, after reading over what they wrote a few times, and reading some chapters in the async book, she sees the problem.

It turns out that, to Rust, a task is kind of a black box with a "poll" function. When the executor thinks a task can make progress, it calls poll. The task itself then delegates this call to poll down to all the other futures that are composed together. In the case of her buffered stream of connections, the stream gets woken up and it would then delegate down the various buffered items in its list.

When it executes Stream::for_each, the task is doing something like this:


#![allow(unused)]
fn main() {
while let Some(work_item) = stream.next().await {
    process_work_item(database, work_item).await;
}
}

The task can only "wait" on one "await" at a time. It will execute that await until it completes and only then move on to the rest of the function. When the task is blocked on the first await, it will process all the futures that are part of the stream, and hence the various buffered connections all make progress.

But once a work item is produced, the task will block on the second await -- the one that resulted from process_work_item. This means that, until process_work_item completes, control will never return to the first await. As a result, none of the futures in the stream will make progress, even if they could do so!

The fix

Once Barbara understands the problem, she considers the fix. The most obvious fix is to spawn out tasks for the do_select calls, like so:


#![allow(unused)]
fn main() {
async fn do_work(database: &Database) {
    let work = do_select(database, FIND_WORK_QUERY)?;
    stream::iter(work)
        .map(|item| task::spawn(do_select(database, work_from_item(item))))
        .buffered(5)
        .for_each(|work_item| process_work_item(database, work_item))
        .await;
}
}

Spawning a task will allow the runtime to keep moving those tasks along independently of the do_work task. Unfortunately, this change results in a compilation error:

error[E0759]: `database` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
  --> src/main.rs:8:18
   |
8  | async fn do_work(database: &Database) {
   |                  ^^^^^^^^  --------- this data with an anonymous lifetime `'_`...
   |                  |
   |                  ...is captured here...
   |        .map(|item| task::spawn(do_select(database, work_from_item(item))))
   |                    ----------- ...and is required to live as long as `'static` here

"Ah, right," she says, "spawned tasks can't use borrowed data. I wish I had [rayon] or the scoped threads from [crossbeam]."

"Let me see," Barbara thinks. "What else could I do?" She has the idea that she doesn't have to process the work items immediately. She could buffer up the work into a FuturesUnordered and process it after everything is ready:


#![allow(unused)]
fn main() {
async fn do_work(database: &Database) {
    let work = do_select(database, FIND_WORK_QUERY)?;
    let mut results = FuturesUnordered::new();
    stream::iter(work)
        .map(|item| do_select(database, work_from_item(item)))
        .buffered(5)
        .for_each(|work_item| {
            results.push(process_work_item(database, work_item));
            futures::future::ready(())
        })
        .await;

    while let Some(_) = results.next().await { }
}
}

This changes the behavior of her program quite a bit though. The original goal was to have at most 5 do_select calls occuring concurrently with exactly one process_work_item, but now she has all of the process_work_item calls executing at once. Nonetheless, the hack solves her immediate problem. Buffering up work into a FuturesUnordered becomes a kind of "fallback" for those cases where can't readily insert a task::spawn.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Rust's future model is a 'leaky abstraction' that works quite differently from futures in other languages. It is prone to some subtle bugs that require a relatively deep understanding of its inner works to understand and fix.
  • "Nested awaits" -- where the task blocks on an inner await while there remains other futures that are still awaiting results -- are easy to do but can cause a lot of trouble.
  • Lack of scoped futures makes it hard to spawn items into separate tasks for independent processing sometimes.

What are the sources for this story?

This is based on the bug report Footgun with Future Unordered but the solution that Barbara came up with is something that was relayed by farnz vision doc writing session. farnz mentioned at the time that this pattern was frequently used in their codebase to work around this sort of hazard.

Why did you choose Barbara to tell this story?

To illustrate that knowing Rust -- and even having a decent handle on async Rust's basic model -- is not enough to make it clear what is going on in this particular case.

How would this story have played out differently for the other characters?

Woe be unto them! Identifying and fixing this bug required a lot of fluency with Rust and the async model. Alan in particular was probably relying on his understanding of async-await from other languages, which works very differently. In those languages, every async function is enqueued automatically for independent execution, so hazards like this do not arise (though this comes at a performance cost).

Besides timeouts for clients, what else could go wrong?

The original bug report mentioned the possibility of deadlock:

When using an async friendly semaphore (like Tokio provides), you can deadlock yourself by having the tasks that are waiting in the FuturesUnordered owning all the semaphores, while having an item in a .for_each() block after buffer_unordered() requiring a semaphore.

Is there any way for Barbara to both produce and process work items simultaneously?

Yes, in this case, she could've. For example, she might have written


#![allow(unused)]
fn main() {
async fn do_work(database: &Database) {
    let work = do_select(database, FIND_WORK_QUERY).await?;

    stream::iter(work)
        .map(|item| async move {
            let work_item = do_select(database, work_from_item(item)).await;
            process_work_item(database, work_item).await;
        })
        .buffered(5)
        .for_each(|()| std::future::ready(()))
        .await;
}
}

This would however mean that she would have 5 calls to process_work_item executing at once. In the actual case that inspired this story, process_work_item can take as much as 10 GB of RAM, so having multiple concurrent calls is a problem.

Is there any way for Barbara to both produce and process work items simultaneously, without the buffering and so forth?

Yes, she might use a loop with a select!. This would ensure that she is processing both the stream that produces work items and the FuturesUnordered that consumes them:


#![allow(unused)]
fn main() {
async fn do_work(database: &Database) {
    let work = do_select(database, FIND_WORK_QUERY).await?;

    let selects = stream::iter(work)
        .map(|item| do_select(database, work_from_item(item)))
        .buffered(5)
        .fuse();
    tokio::pin!(selects);

    let mut results = FuturesUnordered::new();

    loop {
        tokio::select! {
            Some(work_item) = selects.next() => {
                results.push(process_work_item(database, work_item));
            },
            Some(()) = results.next() => { /* do nothing */ },
            else => break,
        }
    }
}
}

Note that doing so is producing code that looks quite a bit different than where she started, though. :( This also behaves very differently. There can be a queue of tens of thousands of items that do_select grabs from, and this code will potentially pull far too many items out of the queue, which then would have to be requeued on shutdown. The intent of the buffered(5) call was to grab 5 work items from the queue at most, so that other hosts could pull out work items and share the load when there's a spike.

๐Ÿ˜ฑ Status quo stories: Barbara bridges sync and async in perf.rust-lang.org

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara is working on the code for perf.rust-lang.org and she wants to do a web request to load various intermediate results. She has heard that the reqwest crate is quite nice, so she decides to give it a try. She writes up an async function that does her web request:


#![allow(unused)]
fn main() {
async fn do_web_request(url: &Url) -> Data {
    ...
}
}

She needs to apply this async function to a number of urls. She wants to use the iterator map function, like so:

async fn do_web_request(url: &Url) -> Data {...}

fn aggregate(urls: &[Url]) -> Vec<Data> {
    urls
        .iter()
        .map(|url| do_web_request(url))
        .collect()
}

fn main() {
    /* do stuff */
    let data = aggregate();
    /* do more stuff */
}

Of course, since do_web_request is an async fn, she gets a type error from the compiler:

error[E0277]: a value of type `Vec<Data>` cannot be built from an iterator over elements of type `impl Future`
  --> src/main.rs:11:14
   |
11 |             .collect();
   |              ^^^^^^^ value of type `Vec<Data>` cannot be built from `std::iter::Iterator<Item=impl Future>`
   |
   = help: the trait `FromIterator<impl Future>` is not implemented for `Vec<Data>`

"Of course," she thinks, "I can't call an async function from a closure."

Introducing block_on

She decides that since she is not overly concerned about performance, so she decides she'll just use a call to block_on from the futures crate and execute the function synchronously:

async fn do_web_request(url: &Url) -> Data {...}

fn aggregate(urls: &[Url]) -> Vec<Data> {
    urls
        .iter()
        .map(|url| futures::executor::block_on(do_web_request(url)))
        .collect()
}

fn main() {
    /* do stuff */
    let data = aggregate();
    /* do more stuff */
}

The code compiles, and it seems to work.

Switching to async main

As Barbara works on perf.rust-lang.org, she realizes that she needs to do more and more async operations. She decides to convert her synchronous main function into an async main. She's using tokio, so she is able to do this very conveniently with the #[tokio::main] decorator:

#[tokio::main]
async fn main() {
    /* do stuff */
    let data = aggregate();
    /* do more stuff */
}

Everything seems to work ok on her laptop, but when she pushes the code to production, it deadlocks immediately. "What's this?" she says. Confused, she runs the code on her laptop a few more times, but it seems to work fine. (There's a faq explaining what's going on. -ed.)

She decides to try debugging. She fires up a debugger but finds it is isn't really giving her useful information about what is stuck (she has basically the same problems that Alan has). She wishes she could get insight into tokio's state.

Frustrated, she starts reading the tokio docs more closely and she realizes that tokio runtimes offer their own block_on method. "Maybe using tokio's block_on will help?" she thinks, "Worth a try, anyway." She changes the aggregate function to use tokio's block_on:


#![allow(unused)]
fn main() {
fn block_on<O>(f: impl Future<Output = O>) -> O {
    let rt = tokio::runtime::Runtime::new().unwrap();
    rt.block_on(f)
}

fn aggregate(urls: &[Url]) -> Vec<Data> {
    urls
        .iter()
        .map(|url| block_on(do_web_request(url)))
        .collect()
}
}

The good news is that the deadlock is gone. The bad news is that now she is getting a panic:

thread 'main' panicked at 'Cannot start a runtime from within a runtime. This happens because a function (like block_on) attempted to block the current thread while the thread is being used to drive asynchronous tasks.'

"Well," she thinks, "I could use the Handle API to get the current runtime instead of creating a new one? Maybe that's the problem."


#![allow(unused)]
fn main() {
fn aggregate(urls: &[&str]) -> Vec<String> {
    let handle = tokio::runtime::Handle::current();
    urls.iter()
        .map(|url| handle.block_on(do_web_request(url)))
        .collect()
}
}

But this also seems to panic in the same way.

Trying out spawn_blocking

Reading more into this problem, she realizes she is supposed to be using spawn_blocking. She tries replacing block_on with tokio::task::spawn_blocking:


#![allow(unused)]
fn main() {
fn aggregate(urls: &[Url]) -> Vec<Data> {
    urls
        .iter()
        .map(|url| tokio::task::spawn_blocking(move || do_web_request(url)))
        .collect()
}
}

but now she gets a type error again:

error[E0277]: a value of type `Vec<Data>` cannot be built from an iterator over elements of type `tokio::task::JoinHandle<impl futures::Future>`
  --> src/main.rs:22:14
   |
22 |             .collect();
   |              ^^^^^^^ value of type `Vec<Data>` cannot be built from `std::iter::Iterator<Item=tokio::task::JoinHandle<impl futures::Future>>`
   |
   = help: the trait `FromIterator<tokio::task::JoinHandle<impl futures::Future>>` is not implemented for `Vec<Data>`

Of course! spawn_blocking, like map, just takes a regular closure, not an async closure; it's purpose is to embed some sync code within an async task, so a sync closure makes sense -- and moreover async closures aren't stable -- but it's all rather frustrating nonetheless. "Well," she thinks, "I can use spawn to get back into an async context!" So she adds a call to spawn inside the spawn_blocking closure:


#![allow(unused)]
fn main() {
fn aggregate(urls: &[Url]) -> Vec<Data> {
    urls
        .iter()
        .map(|url| tokio::task::spawn_blocking(move || {
            tokio::task::spawn(async move {
                do_web_request(url).await
            })
        }))
        .collect()
}
}

But this isn't really helping, as spawn still yields a future. She's getting the same errors.

Trying out join_all

She remembers now that this whole drama started because she was converting her main function to be async. Maybe she doesn't have to bridge between sync and async? She starts digging around in the docs and finds futures::join_all. Using that, she can change aggregate to be an async function too:


#![allow(unused)]
fn main() {
async fn aggregate(urls: &[Url]) -> Vec<Data> {
    futures::join_all(
        urls
            .iter()
            .map(|url| do_web_request(url))
    ).await
}
}

Things are working again now, so she is happy, although she notes that join_all has quadratic time complexity. That's not great.

Filtering

Later on, she would like to apply a filter to the aggregation operation. She realizes that if she wants to use the fetched data when doing the filtering, she has to filter the vector after the join has completed. She wants to write something like


#![allow(unused)]
fn main() {
async fn aggregate(urls: &[Url]) -> Vec<Data> {
    futures::join_all(
        urls
            .iter()
            .map(|url| do_web_request(url))
            .filter(|data| test(data))
    ).await
}
}

but she can't, because data is a future and not the Data itself. Instead she has to build the vector first and then post-process it:


#![allow(unused)]
fn main() {
async fn aggregate(urls: &[Url]) -> Vec<Data> {
    let mut data: Vec<Data> = futures::join_all(
        urls
            .iter()
            .map(|url| do_web_request(url))
    ).await;
    data.retain(test);
    data
}
}

This is annoying, but performance isn't critical, so it's ok.

And the cycle begins again

Later on, she wants to call aggregate from another binary. This one doesn't have an async main. This context is deep inside of an iterator chain and was previously entirely synchronous. She realizes it would be a lot of work to change all the intervening stack frames to be async fn, rewrite the iterators into streams, etc. She decides to just call block_on again, even though it make her nervous.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Some projects don't care about max performance and just want things to work once the program compiles.
    • They would probably be happy with sync but as the most popular libraries for web requests, databases, etc, offer async interfaces, they may still be using async code.
  • There are contexts where you can't easily add an await.
    • For example, inside of an iterator chain.
    • Big block of existing code.
  • Mixing sync and async code can cause deadlocks that are really painful to diagnose, particularly when you have an async-sync-async sandwich.

Why did you choose Barbara to tell this story?

  • Because Mark (who experienced most of it) is a very experienced Rust developer.
  • Because you could experience this story regardless of language background or being new to Rust.

How would this story have played out differently for the other characters?

I would expect it would work out fairly similarly, except that the type errors and things might well have been more challenging for people to figure out, assuming they aren't already familiar with Rust.

Why did Barbara only get deadlocks in production, and not on her laptop?

This is because the production instance she was using had only a single core, but her laptop is a multicore machine. The actual cause of the deadlocks is that block_on basically "takes over" the tokio worker thread, and hence the tokio scheduler cannot run. If that block_on is blocked on another future that will have to execute, then some other thread must take over of completing that future. On Barbara's multicore machine, there were more threads available, so the system did not deadlock. But on the production instance, there was only a single thread. Barbara could have encountered deadlocks on her local machine as well if she had enough instances of block_on running at once.

Could the runtime have prevented the deadlock?

One way to resolve this problem would be to have a runtime that creates more threads as needed. This is what was proposed in this blog post, for example.

Adapting the number of worker threads has downsides. It requires knowing the right threshold for creating new threads (which is fundamentally unknowable). The result is that the runtime will sometimes observe that some thread seems to be taking a long time and create new threads just before that thread was about to finish. These new threads generate overhead and lower the overall performance. It also requires work stealing and other techniques that can lead to work running on mulitple cores and having less locality. Systems tuned for maximal performance tend to prefer a single thread per core for this reason.

If some runtimes are adaptive, that may also lead to people writing libraries which block without caring. These libraries would then be a performance or deadlock hazard when used on a runtime that is not adaptive.

Is there any way to have kept aggregate as a synchronous function?

Yes, Barbara could have written something like this:

fn aggregate(urls: &[Url]) -> Vec<Data> {
    let handle = Handle::current();

    urls.iter()
        .map(|url| handle.block_on(do_web_request(url)))
        .collect()
}

#[tokio::main]
async fn main() {
    let data = task::spawn_blocking(move || aggregate(&[Url, Url]))
        .await
        .unwrap();
    println!("done");
}

This aggregate function can only safely be invoked from inside a tokio spawn_blocking call, however, since Handle::current will only work in that context. She could also have used the original futures variant of block_on, in that case, and things would also work.

Why didn't Barbara just use the sync API for reqwest?

reqwest does offer a synchronous API, but it's not enabled by default, you have to use an optional feature. Further, not all crates offer synchronous APIs. Finally, Barbara has had some vague poor experience when using synchronous APIs, such as panics, and so she's learned the heuristic of "use the async API unless you're doing something really, really simple".

Regardless, the synchronous reqwest API is actually itself implemented using block_on: so Barbara would have ultimately hit the same issues. Further, not all crates offer synchronous APIs -- some offer only async APIs. In fact, these same issues are probably the sources of those panics that Barbara encountered in the past.

In general, though, embedded sync within async or vice versa works "ok", once you know the right tricks. Where things become challenging is when you have a "sandwich", with async-sync-async.

Do people mix spawn_blocking and spawn successfully in real code?

Yes! Here is some code from perf.rust-lang.org doing exactly that. The catch is that it winds up giving you a future in the end, which didn't work for Barbara because her code is embedded within an iterator (and hence she can't make things async "all the way down").

What are other ways people could experience similar problems mixing sync and async?

  • Using std::Mutex in async code.
  • Calling the blocking version of an asynchronous API.
    • For example, reqwest::blocking, the synchronous zbus and rumqtt APIs.
    • These are commonly implemented by using some variant of block_on internally.
    • Therefore they can lead to panics or deadlocks depending on what async runtime they are built from and used with.

Why wouldn't Barbara just make everything async from the start?

There are times when converting synchronous code to async is difficult or even impossible. Here are some of the reasons:

  • Asynchronous functions cannot appear in trait impls.
  • Asynchronous functions cannot be called from APIs that take closures for callbacks, like Iterator::map in this example.
  • Sometimes the synchronous functions come from other crates and are not fully under their control.
  • It's just a lot of work!

How many variants of block_on are there?

  • the futures crate offers a runtime-independent block-on (which can lead to deadlocks, as in this story)
  • the tokio crate offers a block_on method (which will panic if used inside of another tokio runtime, as in this story)
  • the pollster crate exists just to offer block_on
  • the futures-lite crate offers a block_on
  • the aysnc-std crate offers block_on
  • the async-io crate offers block_on
  • ...there are probably more, but I think you get the point.

๐Ÿ˜ฑ Status quo stories: Barbara builds an async executor

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

The story

Barbara wants to set priorities to the tasks spawned to the executor. However, she finds no existing async executor provides such a feature so she decided to build her own async executor.

First, Barbara found crossbeam-deque provides work-stealing deques of good quality. She decides to use it to build task schedulers. She plans for each working thread to have a loop which repeatedly gets a task from the deque and polls it.

But wait, what should we put into those queues to represent each "task"?

At first, Barbara thought it must contain the Future itself and the additional priority which was used by the scheduler. So she first wrote:


#![allow(unused)]
fn main() {
pub struct Task {
    future: Pin<Box<dyn Future<Output = ()> + Send + 'static>>,
    priority: u8
}
}

And the working thread loop should run something like:


#![allow(unused)]
fn main() {
pub fn poll_task(task: Task) {
    let waker = todo!();
    let mut cx = Context::from_waker(&waker);
    task.future.as_mut().poll(&mut cx);
}
}

"How do I create a waker?" Barbara asked herself. Quickly, she found the Wake trait. Seeing the wake method takes an Arc<Self>, she realized the task in the scheduler should be stored in an Arc. After some thought, she realizes it makes sense because both the deque in the scheduler and the waker may hold a reference to the task.

To implement Wake, the Task should contain the sender of the scheduler. She changed the code to something like this:


#![allow(unused)]
fn main() {
pub struct Task {
    future: Pin<Box<dyn Future<Output = ()> + Send + 'static>>,
    scheduler: SchedulerSender,
    priority: u8,
}

unsafe impl Sync for Task {}

impl Wake for Task {
    fn wake(self: Arc<Self>) {
        self.scheduler.send(self.clone());
    }
}

pub fn poll_task(task: Arc<Task>) {
    let waker = Waker::from(task.clone());
    let mut cx = Context::from_waker(&waker);
    task.future.as_mut().poll(&mut cx);
//  ^^^^^^^^^^^ cannot borrow as mutable
}
}

The code still needed some change because the future in the Arc<Task> became immutable.

"Okay. I can guarantee Task is created from a Pin<Box<Future>>, and I think the same future won't be polled concurrently in two threads. So let me bypass the safety checks." Barbara changed the future to a raw pointer and confidently used some unsafe blocks to make it compile.


#![allow(unused)]
fn main() {
pub struct Task {
    future: *mut (dyn Future<Output = ()> + Send + 'static),
    ...
}

unsafe impl Send for Task {}
unsafe impl Sync for Task {}

pub fn poll_task(task: Arc<Task>) {
    ...
    unsafe {
        Pin::new_unchecked(&mut *task.future).poll(&mut cx);
    }
}
}

Luckily, a colleague of Barbara noticed something wrong. The wake method could be called multiple times so multiple copies of the task could exist in the scheduler. The scheduler might not work correctly because of this. What's worse, a more severe problem was that multiple threads might get copies of the same task from the scheduler and cause a race in polling the future.

Barbara soon got a idea to solve it. She added a state field to the Task. By carefully maintaining the state of the task, she could guarantee there are no duplicate tasks in the scheduler and no race can happen when polling the future.


#![allow(unused)]
fn main() {
const NOTIFIED: u64 = 1;
const IDLE: u64 = 2;
const POLLING: u64 = 3;
const COMPLETED: u64 = 4;

pub struct Task {
    ...
    state: AtomicU64,
}

impl Wake for Task {
    fn wake(self: Arc<Self>) {
        let mut state = self.state.load(Relaxed);
        loop {
            match state {
                // To prevent a task from appearing in the scheduler twice, only send the task
                // to the scheduler if the task is not notified nor being polling. 
                IDLE => match self
                    .state
                    .compare_exchange_weak(IDLE, NOTIFIED, AcqRel, Acquire)
                {
                    Ok(_) => self.scheduler.send(self.clone()),
                    Err(s) => state = s,
                },
                POLLING => match self
                    .state
                    .compare_exchange_weak(POLLING, NOTIFIED, AcqRel, Acquire)
                {
                    Ok(_) => break,
                    Err(s) => state = s,
                },
                _ => break,
            }
        }
    }
}

pub fn poll_task(task: Arc<Task>) {
    let waker = Waker::from(task.clone());
    let mut cx = Context::from_waker(&waker);
    loop {
        // We needn't read the task state here because the waker prevents the task from
        // appearing in the scheduler twice. The state must be NOTIFIED now.
        task.state.store(POLLING, Release);
        if let Poll::Ready(()) = unsafe { Pin::new_unchecked(&mut *task.future).poll(&mut cx) } {
            task.state.store(COMPLETED, Release);
        }
        match task.state.compare_exchange(POLLING, IDLE, AcqRel, Acquire) {
            Ok(_) => break,
            Err(NOTIFIED) => continue,
            _ => unreachable!(),
        }
    }
}
}

Barbara finished her initial implementation of the async executor. Despite there were a lot more possible optimizations, Barbara already felt it is a bit complex. She was also confused about why she needed to care so much about polling and waking while her initial requirement was just adding additional information to the task for customizing scheduling.

๐Ÿค” Frequently Asked Questions

Here are some standard FAQ to get you started. Feel free to add more!

What are the morals of the story?

  • It is difficult to customize any of the current async executors (to my knowledge). To have any bit of special requirement forces building an async executor from scratch.
  • It is also not easy to build an async executor. It needs quite some exploration and is error-prone. async-task is a good attempt to simplify the process but it could not satisfy all kinds of needs of customizing the executor (it does not give you the chance to extend the task itself).

What are the sources for this story?

  • The story was from my own experience about writing a new thread pool supporting futures: https://github.com/tikv/yatp.
  • People may feel strange about why we want to set priorities for tasks. Currently, the futures in the thread pool are like user-space threads. They are mostly CPU intensive. But I think people doing async I/O may have the same problem.

Why did you choose Barbara to tell this story?

  • At the time of the story, I had written Rust for years but I was new to the concepts for async/await like Pin and Waker.

How would this story have played out differently for the other characters?

  • People with less experience in Rust may be less likely to build their own executor. If they try, I think the story is probably similar.

๐Ÿ˜ฑ Status quo stories: Barbara carefully dismisses embedded Future

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara is contributing to an OS that supports running multiple applications on a single microcontroller. These microcontrollers have as little as 10's of kilobytes of RAM and 100's of kilobytes of flash memory for code. Barbara is writing a library that is used by multiple applications -- and is linked into each application -- so the library is very resource constrained. The library should support asynchronous operation, so that multiple APIs can be used in parallel within each (single-threaded) application.

Barbara begins writing the library by trying to write a console interface, which allows byte sequences to be printed to the system console. Here is an example sequence of events for a console print:

  1. The interface gives the kernel a callback to call when the print finishes, and gives the kernel the buffer to print.
  2. The kernel prints the buffer in the background while the app is free to do other things.
  3. The print finishes.
  4. The app tells the kernel it is ready for the callback to be invoked, and the kernel invokes the callback.

Barbara tries to implement the API using core::future::Future so that the library can be compatible with the async Rust ecosystem. The OS kernel does not expose a Future-based interface, so Barbara has to implement Future by hand rather than using async/await syntax. She starts with a skeleton:


#![allow(unused)]
fn main() {
/// Passes `buffer` to the kernel, and prints it to the console. Returns a
/// future that returns `buffer` when the print is complete. The caller must
/// call kernel_ready_for_callbacks() when it is ready for the future to return. 
fn print_buffer(buffer: &'static mut [u8]) -> PrintFuture {
    // TODO: Set the callback
    // TODO: Tell the kernel to print `buffer`
}

struct PrintFuture;

impl core::future::Future for PrintFuture {
    type Output = &'static mut [u8];

    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        // TODO: Detect when the print is done, retrieve `buffer`, and return
        // it.
    }
}
}

Note: All error handling is omitted to keep things understandable.

Barbara begins to implement print_buffer:


#![allow(unused)]
fn main() {
fn print_buffer(buffer: &'static mut [u8]) -> PrintFuture {
    kernel_set_print_callback(callback);
    kernel_start_print(buffer);
    PrintFuture {}
}

// New! The callback the kernel calls.
extern fn callback() {
    // TODO: Wake up the currently-waiting PrintFuture.
}
}

So far so good. Barbara then works on poll:


#![allow(unused)]
fn main() {
    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        if kernel_is_print_done() {
            return Poll::Ready(kernel_get_buffer_back());
        }
        Poll::Pending
    }
}

Of course, there's something missing here. How does the callback wake the PrintFuture? She needs to store the Waker somewhere! Barbara puts the Waker in a global variable so the callback can find it (this is fine because the app is single threaded and callbacks do NOT interrupt execution the way Unix signals do):


#![allow(unused)]
fn main() {
static mut PRINT_WAKER: Option<Waker> = None;

extern fn callback() {
    if let Some(waker) = unsafe { PRINT_WAKER.as_ref() } {
        waker.wake_by_ref();
    }
}
}

She then modifies poll to set PRINT_WAKER:


#![allow(unused)]
fn main() {
    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        if kernel_is_print_done() {
            return Poll::Ready(kernel_get_buffer_back());
        }
        unsafe { PRINT_WAKER = Some(cx.waker()); }
        Poll::Pending
    }
}

PRINT_WAKER is stored in .bss, which occupies space in RAM but not flash. It is two words in size. It points to a RawWakerVTable that is provided by the executor. RawWakerVTable's design is a compromise that supports environments both with and without alloc. In no-alloc environments, drop and clone are generally no-ops, and wake/wake_by_ref seem like duplicates. Looking at RawWakerVTable makes Barbara realize that even though Future was designed to work in embedded contexts, it may have too much overhead for her use case.

Barbara decides to do some benchmarking. She comes up with a sample application -- an app that blinks a led and responds to button presses -- and implements it twice. One implementation does not use Future at all, the other does. Both implementations have two asynchronous interfaces: a timer interface and a GPIO interface, as well as an application component that uses the interfaces concurrently. In the Future-based app, the application component functions like a future combinator, as it is a Future that is almost always waiting for a timer or GPIO future to finish.

To drive the application future, Barbara implements an executor. The executor functions like a background thread. Because alloc is not available, this executor contains a single future. The executor has a spawn function that accepts a future and starts running that future (overwriting the existing future in the executor if one is already present). Once started, the executor runs entirely in kernel callbacks.

Barbara identifies several factors that add branching and error handling code to the executor:

  1. spawn should be a safe function, because it is called by high-level application code. However, that means it can be called by the future it contains. If handled naively, this would result in dropping the future while it executes. Barbara adds runtime checks to identify this situation.
  2. Waker is Sync, so on a multithreaded system, a future could give another thread access to its Waker and the other thread could wake it up. This could happen while the poll is executing, before poll returns Poll::Pending. Therefore, Barbara concludes that if wake is called while a future is being polled then the future should be re-polled, even if the current poll returns Poll::Pending. This requires putting a retry loop into the executor.
  3. A kernel callback may call Waker::wake after its future returns Poll::Ready. After poll returns Poll::Ready, the executor should not poll the future again, so Barbara adds code to ignore those wakeups. This duplicates the "ignore spurious wakeups" functionality that exists in the future itself.

Ultimately, this made the executor logic nontrivial, and it compiled into 96 bytes of code. The executor logic is monomorphized for each future, which allows the compiler to make inlining optimizations, but results in a significant amount of duplicate code. Alternatively, it could be adapted to use function pointers or vtables to avoid the code duplication, but then the compiler definitely cannot inline Future::poll into the kernel callbacks.

Barbara publishes an analysis of the relative sizes of the two app implementations, finding a large percentage increase in both code size and RAM usage (note: stack usage was not investigated). Most of the code size increase is from the future combinator code.

In the no-Future version of the app, a kernel callback causes the following:

  1. The kernel callback calls the application logic's event-handling function for the specific event type.
  2. The application handles the event.

The call in step 1 is inlined, so the compiled kernel callback consists only of the application's event-handling logic.

In the Future-based version of the app, a kernel callback causes the following:

  1. The kernel callback updates some global state to indicate the event happened.
  2. The kernel callback invokes Waker::wake.
  3. Waker::wake calls poll on the application future.
  4. The application future has to look at the state saved in step 1 to determine what event happened.
  5. The application future handles the event.

LLVM is unable to devirtualize the call in step 2, so the optimizer is unable to simplify the above steps. Steps 1-4 only exist in the future-based version of the code, and add over 200 bytes of code (note: Barbara believes this could be reduced to between 100 and 200 bytes at the expense of execution speed).

Barbara concludes that Future is not suitable for highly-resource-constrained environments due to the amount of code and RAM required to implement executors and combinators.

Barbara redesigns the library she is building to use a different concept for implementing async APIs in Rust that are much lighter weight. She has moved on from Future and is refining her async traits instead. Here are some ways in which these APIs are lighter weight than a Future implementation:

  1. After monomorphization, kernel callbacks directly call application code. This allows the application code to be inlined into the kernel callback.
  2. The callback invocation is more precise: these APIs don't make spurious wakeups, so application code does not need to handle spurious wakeups.
  3. The async traits lack an equivalent of Waker. Instead, all callbacks are expected to be 'static (i.e. they modify global state) and passing pointers around is replaced by static dispatch.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • core::future::Future isn't suitable for every asynchronous API in Rust. Future has a lot of capabilities, such as the ability to spawn dynamically-allocated futures, that are unnecessary in embedded systems. These capabilities have a cost, which is unavoidable without backwards-incompatible changes to the trait.
  • We should look at embedded Rust's relationship with Future so we don't fragment the embedded Rust ecosystem. Other embedded crates use Future -- Future certainly has a lot of advantages over lighter-weight alternatives, if you have the space to use it.

Why did you choose Barbara to tell this story?

  • This story is about someone who is an experienced systems programmer and an experienced Rust developer. All the other characters have "new to Rust" or "new to programming" as a key characteristic.

How would this story have played out differently for the other characters?

  • Alan would have found the #![no_std] crate ecosystem lacking async support. He would have moved forward with a Future-based implementation, unaware of its impact on code size and RAM usage.
  • Grace would have handled the issue similarly to Barbara, but may not have tried as hard to use Future. Barbara has been paying attention to Rust long enough to know how significant the Future trait is in the Rust community and ecosystem.
  • Niklaus would really have struggled. If he asked for help, he probably would've gotten conflicting advice from the community.

Future has a lot of features that Barbara's traits don't have -- aren't those worth the cost?

  • Future has many additional features that are nice-to-have:
    1. Future works smoothly in a multithreaded environment. Futures can be Send and/or Sync, and do not need to have interior mutability, which avoids the need for internal locking.
      • Manipulating arbitrary Rust types without locking allows async fn to be efficient.
    2. Futures can be spawned and dropped in a dynamic manner: an executor that supports dynamic allocation can manage an arbitrary number of futures at runtime, and futures may easily be dropped to stop their execution.
      • Dropping a future will also drop futures it owns, conveniently providing good cancellation semantics.
      • A future that creates other futures (e.g. an async fn that calls other async fns) can be spawned with only a single memory allocation, whereas callback-based approaches need to allocate for each asynchronous component.
    3. Community and ecosystem support. This isn't a feature of Future per se, but the Rust language has special support for Future (async/await) and practically the entire async Rust ecosystem is based on Future. The ability to use existing async crates is a very strong reason to use Future over any alternative async abstraction.
  • However, the code size impact of Future is a deal-breaker, and no number of nice-to-have features can outweigh a deal-breaker. Barbara's traits have every feature she needs.
  • Using Future saves developer time relative to building your own async abstractions. Developers can use the time they saved to minimize code size elsewhere in the project. In some cases, this may result in a net decrease in code size for the same total effort. However, code size reduction efforts have diminishing returns, so projects that expect to optimize code size regardless likely won't find the tradeoff beneficial.

Is the code size impact of Future fundamental, or can the design be tweaked in a way that eliminates the tradeoff?

  • Future isolates the code that determines a future should wake up (the code that calls Waker::wake) from the code that executes the future (the executor). The only information transferred via Waker::wake is "try waking up now" -- any other information has to be stored somewhere. When polled, a future has to run logic to identify how it can make progress -- in many cases this requires answering "who woke me up?" -- and retrieve the stored information. Most completion-driven async APIs allow information about the event to be transferred directly to the code that handles the event. According to Barbara's analysis, the code required to determine what event happened was the majority of the size impact of Future.

I thought Future was a zero-cost abstraction?

  • Aaron Turon described futures as zero-cost abstractions. In the linked post, he elaborated on what he meant by zero-cost abstraction, and eliminating their impact on code size was not part of that definition. Since then, the statement that future is a zero-cost abstraction has been repeated many times, mostly without the context that Aaron provided. Rust has many zero-cost abstractions, most of which do not impact code size (assuming optimization is enabled), so it is easy for developers to see "futures are zero-cost" and assume that makes them lighter-weight than they are.

How does Barbara's code handle thread-safety? Is her executor unsound?

  • The library Barbara is writing only works in Tock OS' userspace environment. This environment is single-threaded: the runtime does not provide a way to spawn another thread, hardware interrupts do not execute in userspace, and there are no interrupt-style callbacks like Unix signals. All kernel callbacks are invoked synchronously, using a method that is functionally equivalent to a function call.

๐Ÿ˜ฑ Status quo stories: Barbara compares some code (and has a performance problem)

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories [cannot be wrong], only inaccurate). Alternatively, you may wish to [add your own status quo story][htvsq]!

The story

Barbara is recreating some code that has been written in other languages they have some familiarity with. These include C++, but also GC'd languages like Python.

This code collates a large number of requests to network services, with each response containing a large amount of data. To speed this up, Barbara uses buffer_unordered, and writes code like this:


#![allow(unused)]
fn main() {
let mut queries = futures::stream::iter(...)
    .map(|query| async move {
        let d: Data = self.client.request(&query).await?;
        d
     })
     .buffer_unordered(32);

use futures::stream::StreamExt;
let results = queries.collect::<Vec<Data>>().await;
}

Barbara thinks this is similar in function to things she has seen using Python's asyncio.wait, as well as some code her coworkers have written using c++20's coroutines, using this:

std::vector<folly::coro::Task<Data>> tasks;
 for (const auto& query : queries) {
    tasks.push_back(
        folly::coro::co_invoke([this, &query]() -> folly::coro::Task<Data> {
              co_return co_await client_->co_request(query);
        }
    )
}
auto results = co_await folly:coro::collectAllWindowed(
      move(tasks), 32);

However, the Rust code performs quite poorly compared to the other impls, appearing to effectively complete the requests serially, despite on the surface looking like effectively identical code.

While investigating, Barbara looks at top, and realises that her coworker's C++20 code sometimes results in her 16 core laptop using 1600% CPU; her Rust async code never exceeds 100% CPU usage. She spends time investigating her runtime setup, but Tokio is configured to use enough worker threads to keep all her CPU cores busy. This feels to her like a bug in buffer_unordered or tokio, needing more time to investigate.

Barbara goes deep into investigating this, spends time reading how buffer_unordered is implemented, how its underlying FuturesUnordered is implemented, and even thinks about how polling and the tokio runtime she is using works. She evens tries to figure out if the upstream service is doing some sort of queueing.

Eventually Barbara starts reading more about c++20 coroutines, looking closer at the folly implementation used above, noticing that is works primarily with tasks, which are not exactly equivalent to rust Future's.

Then it strikes her! request is implemented something like this:


#![allow(unused)]
fn main() {
impl Client {
    async fn request(&self) -> Result<Data> {
        let bytes = self.inner.network_request().await?
        Ok(serialization_libary::from_bytes(&bytes)?)
   }
}
}

The results from the network service are sometimes (but not always) VERY large, and the BufferedUnordered stream is contained within 1 tokio task. The request future does non-trivial cpu work to deserialize the data. This causes significant slowdowns in wall-time as the the process CAN BE bounded by the time it takes the single thread running the tokio-task to deserialize all the data. This problem hadn't shown up in test cases, where the results from the mocked network service are always small; many common uses of the network service only ever have small results, so it takes a specific production load to trigger this issue, or a large scale test.

The solution is to spawn tasks (note this requires 'static futures):


#![allow(unused)]
fn main() {
let mut queries = futures::stream::iter(...)
    .map(|query| async move {
        let d: Data = tokio::spawn(
        self.client.request(&query)).await??;
        d
     })
     .buffer_unordered(32);

use futures::stream::StreamExt;
let results = queries.collect::<Vec<Data>>().await;
}

Barbara was able to figure this out by reading enough and trying things out, but had that not worked, it would have probably required figuring out how to use perf or some similar tool.

Later on, Barbara gets surprised by this code again. It's now being used as part of a system that handles a very high number of requests per second, but sometimes the system stalls under load. She enlists Grace to help debug, and the two of them identify via perf that all the CPU cores are busy running serialization_libary::from_bytes. Barbara revisits this solution, and discovers tokio::task::block_in_place which she uses to wrap the calls to serialization_libary::from_bytes:


#![allow(unused)]
fn main() {
impl Client {
    async fn request(&self) -> Result<Data> {
        let bytes = self.inner.network_request().await?
        Ok(tokio::task::block_in_place(move || serialization_libary::from_bytes(&bytes))?)
   }
}
}

This resolves the problem as seen in production, but leads to Niklaus's code review suggesting the use of tokio::task::spawn_blocking inside request, instead of spawn inside buffer_unordered. This discussion is challenging, because the tradeoffs between spawn on a Future including block_in_place and spawn_blocking and then not spawning the containing Future are subtle and tricky to explain. Also, either block_in_place and spawn_blocking are heavyweight and Barbara would prefer to avoid them when the cost of serialization is low, which is usually a runtime-property of the system.

๐Ÿค” Frequently Asked Questions

Are any of these actually the correct solution?

  • Only in part. It may cause other kinds of contention or blocking on the runtime. As mentioned above, the deserialization work probably needs to be wrapped in something like block_in_place, so that other tasks are not starved on the runtime, or might want to use spawn_blocking. There are some important caveats/details that matter:
    • This is dependent on how the runtime works.
    • block_in_place + tokio::spawn might be better if the caller wants to control concurrency, as spawning is heavyweight when the deserialization work happens to be small. However, as mentioned above, this can be complex to reason about, and in some cases, may be as heavyweight as spawn_blocking
    • spawn_blocking, at least in some executors, cannot be cancelled, a departure from the prototypical cancellation story in async Rust.
    • "Dependently blocking work" in the context of async programming is a hard problem to solve generally. https://github.com/async-rs/async-std/pull/631 was an attempt but the details are making runtime's agnostic blocking are extremely complex.
    • The way this problem manifests may be subtle, and it may be specific production load that triggers it.
    • The outlined solutions have tradeoffs that each only make sense for certain kind of workloads. It may be better to expose the io aspect of the request and the deserialization aspect as separate APIs, but that complicates the library's usage, lays the burden of choosing the tradeoff on the callee (which may not be generally possible).

What are the morals of the story?

  • Producing concurrent, performant code in Rust async is not always trivial. Debugging performance issues can be difficult.
  • Rust's async model, particularly the blocking nature of polling, can be complex to reason about, and in some cases is different from other languages choices in meaningful ways.
  • CPU-bound code can be easily hidden.

What are the sources for this story?

  • This is a issue I personally hit while writing code required for production.

Why did you choose Barbara to tell this story?

That's probably the person in the cast that I am most similar to, but Alan and to some extent Grace make sense for the story as well.

How would this story have played out differently for the other characters?

  • Alan: May have taken longer to figure out.
  • Grace: Likely would have been as interested in the details of how polling works.
  • Niklaus: Depends on their experience.

๐Ÿ˜ฑ Status quo stories: Template

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara is working on the [YouBuy] server. In one particular part of the story, she has a process that has to load records from a database on the disk. As she receives data from the database, the data is sent into a channel for later processing. She writes an async fn that looks something like this:


#![allow(unused)]
fn main() {
async fn read_send(db: &mut Database, channel: &mut Sender<...>) {
  loop {
    let data = read_next(db).await;
    let items = parse(&data);
    for item in items {
      channel.send(item).await;
    }
  }
}
}

This database load has to take place while also fielding requests from the user. The routine that invokes read_send uses select! for this purpose. It looks something like this:


#![allow(unused)]
fn main() {
let mut db = ...;
let mut channel = ...;
loop {
    futures::select! {
        _ = read_send(&mut file, &mut channel) => {},
        some_data = socket.read_packet() => {
            // ...
        }
    }
}
}

This setup seems to work well a lot of the time, but Barbara notices that the data getting processed is sometimes incomplete. It seems to be randomly missing some of the rows from the middle of the database, or individual items from a row.

Debugging

She's not sure what could be going wrong! She starts debugging with print-outs and logging. Eventually she realizes the problem. Whenever a packet arrives on the socket, the select! macro will drop the other futures. This can sometime cause the read_send function to be canceled in between reading the data from the disk and sending the items over the channel. Ugh!

Barbara has a hard time figuring out the best way to fix this problem.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Cancellation doesn't always cancel the entire task; particularly with select!, it sometimes cancels just a small piece of a given task.
    • This is in tension with Rust's original design, which was meant to tear down an entire thread or task at once, precisely because of the challenge of writing exception-safe code.
  • Cancellation in Async Rust therefore can require fine-grained recovery.

What are the sources for this story?

This was based on tomaka's blog post, which also includes a number of possible solutions, all of them quite grungy.

Why did you choose Barbara to tell this story?

The problem described here could strike anyone, including veteran Rust users. It's a subtle interaction that is independent of source language. Also, the original person who reported it, tomaka, is a veteran Rust user.

How would this story have played out differently for the other characters?

They would likely have a hard time diagnosing the problem. It really depends on how well they have come to understand the semantics of cancellation. This is fairly independent from programming language background; knowing non-async Rust doesn't help in particular, as this concept is specific to async code.

What is different between this story and other cancellation stories?

There is already a story, "Alan builds a cache" that covers some of the challenges around cancellation. It is quite plausible that those stories could be combined, but the focus of this story is different. The key moral of this story is that certain combinators, notably select!, can cause small pieces of a single task to be torn down and canceled. This cancellation can occur for any reason -- it is not always associated with (for example) clients timing out or closing sockets. It might be (as in this story) the result of clients sending data!

This is one key point that makes cancellation in async Rust rather different than panics in sync Rust. Panics in sync Rust generally occur for bugs, to start, and they are typically not meant to be recovered from except at a coarse-grained level. In contrast, as this story shows, cancellation can require fine-grained recovery and for non-bug events.

[YouBuy]; ../projects/YouBuy.md

๐Ÿ˜ฑ Status quo stories: Barbara makes their first foray into async

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

Barbara's first big project in Rust: a journey marred by doubt

It's Barbara's last year at their university and for their master's thesis, they have chosen to create a distributed database. They have chosen to use their favorite language, Rust, because Rust is a suitable language for low latency applications that they have found very pleasant to work in. Their project presents quite a challenge since they have only written some small algorithms in Rust, and it's also their first foray into creating a big distributed system.

Deciding to use Async

Up until now, Barbara has followed the development of Async from afar by reading the occasional Boats blog post, and celebrating the release announcements with the rest of the happy community. Due to never having worked with async in other languages, and not having had a project suitable for async experimentation, their understanding of async and its ecosystem remained superficial. However, since they have heard that async is suitable for fast networked applications, they decide to try using async for their distributed database. After all, a fast networked application is exactly what they are trying to make.

To further solidify the decision of using async, Barbara goes looking for some information and opinions on async in Rust. Doubts created by reading some tweets about how most people should be using threads instead of async for simplicity reasons are quickly washed away by helpful conversations on the Rust discord.

Learning about Async

Still enamored with the first edition of the Rust book, they decide to go looking for an updated version, hoping that it will teach them async in the same manner that it taught them so much about the language and design patterns for Rust. Disappointed, they find no mention of async in the book, aside from a note that it exists as a keyword.

Not to be deterred, they go looking further, and start looking for similarly great documentation about async. After stumbling upon the async book, their disappointment is briefly replaced with relief as the async book does a good job at solidifying what they have already learned in various blog posts about async, why one would use it and even a bit about how it all works under the hood. They skim over the parts that seem a bit too in-depth for now like pinning, as they're looking to quickly get their hands dirty. Chapter 8: The Async Ecosystem teaches them what they already picked up on through blog posts and contentious tweets: the choice of the runtime has large implications on what libraries they can use.

The wrong time for big decisions

Barbara's dreams to quickly get their hands dirty with async Rust are shattered as they discover that they first need to make a big choice: what executor to use. Having had quite a bit of exposure to the conversations surrounding the incompatible ecosystems, Barbara is perhaps a bit more paranoid about making the wrong choice than the average newcomer. This feels like a big decision to them, as it would influence the libraries they could use and switching to a different ecosystem would be all but impossible after a while. Since they would like to choose what libraries they use before having to choose an executor, Barbara feels like the decision-making is turned on its head.

Their paranoia about choosing the right ecosystem is eased after a few days of research, and some more conversations on the Rust subreddit, after which they discover that most of the RPC libraries they might want to use are situated within the most popular Tokio ecosystem anyways. Tokio also has a brief tutorial, which teaches them some basic concepts within Tokio and talks a bit more about async in general.

Woes of a newcomer to async

Being reasonably confident in their choice of ecosystem, Barbara starts building their distributed system. After a while, they want to introduce another networking library of which the api isn't async. Luckily Barbara picked up on that blocking was not allowed in async (or at least not in any of the currently existing executors), through reading some blog posts about async. More reddit discussions point them towards spawn_blocking in Tokio, and even rayon. But they're none the wiser about how to apply these paradigms in a neat manner.

Previously the design patterns learned in other languages, combined with the patterns taught in the book, were usually sufficient to come to reasonably neat designs. But neither their previous experience, nor the async book nor the Tokio tutorial were of much use when trying to neatly incorporate blocking code into their previously fully async project.

Confused ever after

To this day the lack of a blessed approach leaves Barbara unsure about the choices they've made so far and misconceptions they might still have, evermore wondering if the original tweets they read about how most people should just stick to threads were right all along.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • When entering Rust's async world without previous async experience, and no benchmarks for what good async design patters look like, getting started with async can be a bit overwhelming.
  • Other languages which only have a single ecosystem seem to have a much better story for beginners since there's no fear of lock in, or ecosystem fomo about making the wrong choices early on.
  • This lack of documentation on design patterns, and solid guidance about the async ecosystem for unopiniated newcomers is partially made up for by Rust's community which often provides educated opinions on the design and technical choices one should make. Because of this getting started in async favors those who know where to find answers about Rust: blogs, Discord, Reddit, etc.

What are the sources for their story?

This is based on the author's personal experience

What documentation did the character read during this story?

  • Various blog posts of withoutboats
  • A blog post which spurred a lot of discussion about blocking in async: https://async.rs/blog/stop-worrying-about-blocking-the-new-async-std-runtime/
  • A nice blog post about blocking in Tokio, which still doesn't have any nice design patterns: https://ryhl.io/blog/async-what-is-blocking/
  • An example of design patterns being discussed for sync Rust in the book: https://doc.rust-lang.org/book/ch17-03-oo-design-patterns.html#trade-offs-of-the-state-pattern
  • Perhaps I should've read a bit more of Niko's blogs and his async interviews.

Why did you choose Barbara to tell their story?

Like the author of this story, Barbara had previous experience with Rust. Knowing where to find the community also played a significant part in this story. This story could be construed as how Barbara got started with async while starting to maintain some async projects.

How would their story have played out differently for the other characters?

  • Characters with previous async experience would probably have had a better experience getting started with async in Rust since they might know what design patterns to apply to async code. On the other hand, since Rust's async story is noticeably different from other languages, having async experience in other languages might even be harmful by requiring the user to unlearn certain habits. I don't know if this is actually the case since I don't have any experience with async in other languages.
  • Characters which are less in touch with Rust's community than Barbara might have had a much worse time, since just skimming over the documentation might leave some lost, and unaware of common pitfalls. On the other hand, not having learned a lot about async through blog posts and other materials, might compel someone to read the documentation more thoroughly.

๐Ÿ˜ฑ Status quo stories: Barbara needs Async Helpers

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara, an experienced Rust user, is prototyping an async Rust service for work. To get things working quickly, she decides to prototype in tokio, since it is unclear which runtime her work will use.

She starts adding warp and tokio to her dependencies list. She notices that warp suggests using tokio with the full feature. She's a bit concerned about how this might affect the compile times and also that all of tokio is needed for her little project, but she pushes forward.

As she builds out functionality, she's pleased to see tokio provides a bunch of helpers like join! and async versions of the standard library types like channels and mutexes.

After completing one endpoint, she moves to a new one which requires streaming http responses to the client. Barbara quickly finds out from tokio docs, that it does not provide a stream type, and so she adds tokio-stream to her dependencies.

Moving on she tries to make some functions generic over the web framework underneath, so she tries to abstract off the functionality to a trait. So she writes an async function inside a trait, just like a normal function.


#![allow(unused)]
fn main() {
trait Client {
    async fn get();
}
}

Then she gets a helpful error message.

error[E0706]: functions in traits cannot be declared `async`
 --> src/lib.rs:2:5
  |
2 |     async fn get();
  |     -----^^^^^^^^^^
  |     |
  |     `async` because of this
  |
  = note: `async` trait functions are not currently supported
  = note: consider using the `async-trait` crate: https://crates.io/crates/async-trait

She then realizes that Rust doesn't support async functions in traits yet, so she adds async-trait to her dependencies.

Some of her functions are recursive, and she wanted them to be async functions, so she sprinkles some async/.await keywords in those functions.


#![allow(unused)]
fn main() {
async fn sum(n: usize) -> usize {
    if n == 0 {
        0
    } else {
        n + sum(n - 1).await
    }
}
}

Then she gets an error message.

error[E0733]: recursion in an `async fn` requires boxing
 --> src/lib.rs:1:27
  |
1 | async fn sum(n: usize) -> usize {
  |                           ^^^^^ recursive `async fn`
  |
  = note: a recursive `async fn` must be rewritten to return a boxed `dyn Future`

So to make these functions async she starts boxing her futures the hard way, fighting with the compiler. She knows that async keyword is sort of a sugar for impl Future so she tries the following at first.


#![allow(unused)]
fn main() {
fn sum(n: usize) -> Box<dyn Future<Output = usize>> {
    Box::new(async move {
        if n == 0 {
            0
        } else {
            n + sum(n - 1).await
        }
    })
}
}

The compiler gives the following error.

error[E0277]: `dyn Future<Output = usize>` cannot be unpinned
  --> src/main.rs:11:17
   |
11 |             n + sum(n - 1).await
   |                 ^^^^^^^^^^^^^^^^ the trait `Unpin` is not implemented for `dyn Future<Output = usize>`
   |
   = note: required because of the requirements on the impl of `Future` for `Box<dyn Future<Output = usize>>`
   = note: required by `poll`

She then reads about Unpin and Pin, and finally comes up with a solution.


#![allow(unused)]
fn main() {
fn sum(n: usize) -> Pin<Box<dyn Future<Output = usize>>> {
    Box::pin(async move {
        if n == 0 {
            0
        } else {
            n + sum(n - 1).await
        }
    })
}
}

The code works!

She searches online for better methods and finds out the async-book. She reads about recursion and finds out a cleaner way using the futures crate.


#![allow(unused)]
fn main() {
use futures::future::{BoxFuture, FutureExt};

fn sum(n: usize) -> BoxFuture<'static, usize> {
    async move {
        if n == 0 {
            0
        } else {
            n + sum(n - 1).await
        }
    }.boxed()
}
}

She also asks one of her peers for a code review asynchronously, and after awaiting their response, she learns about the async-recursion crate. Then she adds async-recursion to the dependencies. Now she can write the follwing, which seems reasonably clean:


#![allow(unused)]
fn main() {
#[async_recursion]
async fn sum(n: usize) -> usize {
        if n == 0 {
            0
        } else {
            n + sum(n - 1).await
        }
}
}

As she is working, she realizes that what she really needs is to write a Stream of data. She starts trying to write her Stream implementation and spends several hours banging her head against her desk in frustration (her challenges are pretty similar to what Alan experienced). Ultimately she's stuck trying to figure out why her &mut self.foo call is giving her errors:

error[E0277]: `R` cannot be unpinned
  --src/main.rs:52:26
   |
52 |                 Pin::new(&mut self.reader).poll_read(cx, buf)
   |                          ^^^^^^^^^^^^^^^^ the trait `Unpin` is not implemented for `R`
   |
   = note: required by `Pin::<P>::new`
help: consider further restricting this bound
   |
40 |     R: AsyncRead + Unpin,
   |                  ^^^^^^^

Fortunately, that weekend, @fasterthanlime publishes a blog post covering the gory details of Pin. Reading that post, she learns about pin-project, which she adds as a dependency. She's able to get her code working, but it's kind of a mess. Feeling quite proud of herself, she shows it to a friend, and they suggest that maybe she ought to try the async-stream crate. Reading that, she realizes she can use this crate to simplify some of her streams, though not all of them fit.

"Finally!", Barbara says, breathing a sigh of relief. She is done with her prototype, and shows it off at work, but to her dismay, the team decides that they need to use a custom runtime for their use case. They're building an embedded system and it has relatively limited resources. Barbara thinks, "No problem, it should be easy enough to change runtimes, right?"

So now Barbara starts the journey of replacing tokio with a myriad of off the shelf and custom helpers. She can't use warp so now she has to find an alternative. She also has to find a new channel implementations and there are a few:

  • In futures
  • async-std has one, but it seems to be tied to another runtime so she can't use that.
  • smol has one that is independent.

This process of "figure out which alternative is an option" is repeated many times. She also tries to use the select! macro from futures but it requires more pinning and workarounds (not to mention a stack overflow or two).

But Barbara fights through all of it. In the end, she gets it to work, but she realizes that she has a ton of random dependencies and associated compilation time. She wonders if all that dependencies will have a negative effect on the binary size. She also had to rewrite some bits of functionality on her own.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Functionality is found either in "framework"-like crates (e.g., tokio) and spread around many different ecosystem crates.
  • It's sometimes difficult to discover where this functionality lives.
  • Additionally, the trouble of non runtime-agnostic libraries becomes very apparent.
  • Helpers and utilities might have analogues across the ecosystem, but they are different in subtle ways.
  • Some patterns are clean if you know the right utility crate and very painful otherwise.

What are the sources for this story?

Issue 105

What are helper functions/macros?

They are functions/macros that helps with certain basic pieces of functionality and features. Like to await on multiple futures concurrently (join! in tokio), or else race the futures and take the result of the one that finishes first.

Will there be a difference if lifetimes are involved in async recursion functions?

Lifetimes would make it a bit more difficult. Although for simple functions it shouldn't be much of a problem.


#![allow(unused)]
fn main() {
fn concat<'a>(string: &'a mut String, slice: &'a str) -> Pin<Box<dyn Future<Output = ()> + 'a>> {
    Box::pin(async move {
        if !slice.is_empty() {
            string.push_str(&slice[0..1]);
            concat(string, &slice[1..]).await;
        }
    })
}
}

Why did you choose Barbara to tell this story?

This particular issue impacts all users of Rust even (and sometimes especially) experienced ones.

How would this story have played out differently for the other characters?

Other characters may not know all their options and hence might have fewer problems as a result.

๐Ÿ˜ฑ Status quo stories: Barbara plays with async

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara has been following async rust for a long time, in eager anticipation of writing some project using async. The last time she tried to do anything with futures in rust was more than a year ago (before async functions), and when you had to chain futures together with many calls to then (often leading to inscrutable error messages hundreds of characters long). This was not a pleasant experience for Barbara.

After watching the development of rust async/await (by following discussions on /r/rust and the internals forums), she wants to start to play around with writing async code. Before starting on any real project, she starts with a "playground" where she can try to write some simple async rust code to see how it feels and how it compares to how async code feels in other languages she knows (like C# and JavaScript).

She starts by opening a blank project in VSCode with rust-analyzer. Because she's been following the overall state of rust async, she knows that she needs a runtime, and quickly decides to use tokio, because she knows its quite popular and well documented.

After looking the long length of the tokio tutorial, she decides to not read most of it right now, and tries to dive right in to writing code. But she does look at the "Hello Tokio" section that shows what feature flags are required by tokio:

[dependencies]
tokio = { version = "1", features = ["full"] }

Poking around the tokio API docs in search of something to play with, she sees a simple future that looks interesting: the sleep future that will wait for a certain duration to elapse before resolving.

Borrowing again from the "Hello Tokio" tutorial to make sure she has the correct spelling for the tokio macros, she writes up the following code:

#[tokio::main]
pub async fn main() {
    let mut rng = thread_rng();
    let t = Uniform::new(100, 5000);

    let mut futures = Vec::new();
    for _ in 0..10 {
        let delay = rng.sample(t);
        futures.push(tokio::time::sleep(Duration::from_millis(delay)));
    }
    println!("Created 10 futures");

    for f in futures {
        f.await;
    }

    println!("Done waiting for all futures");
}

This very first version she wrote compiled on the first try and had no errors when running it. Barbara was pleased about this.

However, this example is pretty boring. The program just sits there for a few seconds doing nothing, and giving no hints about what it's actually doing. So for the next iteration, Barbara wants to have a message printed out when each future is resolved. She tries this code at first:


#![allow(unused)]
fn main() {
let mut futures = Vec::new();
for _ in 0..10 {
    let delay = rng.sample(t);
    futures.push(tokio::time::sleep(Duration::from_millis(delay)).then(|_| {
        println!("Done!");
    }));
}
println!("Created 10 futures");
}

But the compiler gives this error:

error[E0277]: `()` is not a future
  --> src\main.rs:13:71
   |
13 |         futures.push(tokio::time::sleep(Duration::from_millis(delay)).then(|_| {
   |                                                                       ^^^^ `()` is not a future
   |
   = help: the trait `futures::Future` is not implemented for `()`

Even though the error is pointing at the then function, Barbara pretty quickly recognizes the problem -- her closure needs to return a future, but () is not a future (though she wonders "why not?"). Looking at the tokio docs is not very helpful. The Future trait isn't even defined in the tokio docs, so she looks at the docs for the Future trait in the rust standard library docs and she sees it only has 5 implementors; one of them is called Ready which looks interesting. Indeed, this struct is a future that will resolve instantly, which is what she wants:


#![allow(unused)]
fn main() {
for _ in 0..10 {
    let delay = rng.sample(t);
    futures.push(tokio::time::sleep(Duration::from_millis(delay)).then(|_| {
        println!("Done!");
        std::future::ready(())
    }));
}
}

This compiles without error, but when Barbara goes to run the code, the output surprises her a little bit: After waiting running the program, nothing happened for about 4 seconds. Then the first "Done!" message was printed, followed very quickly by the other 9 messages. Based on the code she wrote, she expected 10 "Done!" messages to be printed to the console over the span of about 5 seconds, with roughly a uniform distribution.

After running the program few more times, she always observes that while the first view messages are printed after some delay, the last few messages are always printed all at once.

Barbara has experience writing async code in JavaScript, and so she thinks for a moment about how this toy code might have looked like if she was using JS:

async function main() {
    const futures = [];
    for (let idx = 0; idx < 10; idx++) {
        const delay = 100 + (Math.random() * 4900);
        const f = new Promise(() => {
            setTimeout(() => console.log("Done!"), delay)
        })
        futures.push(f);
    }

    Promise.all(futures);
}

After imagining this code, Barbara has an "ah-ha!" moment, and realizes the problem is likely how she is waiting for the futures in her rust code. In her rust code, she is waiting for the futures one-by-one, but in the JavaScript code she is waiting for all of them simultaneously.

So Barbara looks for a way to wait for a Vec of futures. After a bunch of searching in the tokio docs, she finds nothing. The closet thing she finds is a join! macro, but this appears to only work on individually specified futures, not a Vec of futures.

Disappointed, she then looks at the future module from the rust standard library, but module is tiny and very clearly doesn't have what she wants. Then Barbara has another "ah-ha!" moment and remembers that there's a 3rd-party crate called "futures" on crates.io that she's seen mentioned in some /r/rust posts. She checks the docs and finds the join_all function which looks like what she wants:


#![allow(unused)]
fn main() {
let mut futures = Vec::new();
for _ in 0..10 {
    let delay = rng.sample(t);
    futures.push(tokio::time::sleep(Duration::from_millis(delay)).then(|_| {
        println!("Done!");
        std::future::ready(())
    }));
}
println!("Created 10 futures");

futures::future::join_all(futures).await;
println!("Done");
}

It works exactly as expected now! After having written the code, Barbara begins to remember an important detail about rust futures that she once read somewhere: rust futures are lazy, and won't make progress unless you await them.

Happy with this success, Barbara continues to expand her toy program by making a few small adjustments:


#![allow(unused)]
fn main() {
for counter in 0..10 {
    let delay = rng.sample(t);
    let delay_future = tokio::time::sleep(Duration::from_millis(delay));

    if counter < 9 {
        futures.push(delay_future.then(|_| {
            println!("Done!");
            std::future::ready(())
        }));
    } else {
        futures.push(delay_future.then(|_| {
            println!("Done with the last future!");
            std::future::ready(())
        }));
    }
}
}

This fails to compile:

error[E0308]: mismatched types

   = note: expected closure `[closure@src\main.rs:16:44: 19:14]`
              found closure `[closure@src\main.rs:21:44: 24:14]`
   = note: no two closures, even if identical, have the same type
   = help: consider boxing your closure and/or using it as a trait object

This error doesn't actually surprise Barbara that much, as she is familiar with the idea of having to box objects sometimes. She does notice the "consider boxing your closure" error, but thinks that this is not likely the correct solution. Instead, she thinks that she should box the entire future.

She first adds explicit type annotations to the Vec:


#![allow(unused)]
fn main() {
let mut futures: Vec<Box<dyn Future<Output=()>>> = Vec::new();
}

She then notices that her IDE (VSCode + rust-analyzer) has a new error on each call to push. The code assist on each error says Store this in the heap by calling 'Box::new'. She is exactly what she wants, and it happy that rust-analyzer perfectly handled this case.

Now each future is boxed up, but there is one final error still, this time on the call to join_all(futures).await:

error[E0277]: `dyn futures::Future<Output = ()>` cannot be unpinned
  --> src\main.rs:34:31
   |
34 |     futures::future::join_all(futures).await;

Barbara has been around rust for long enough to know that there is a Box::pin API, but she doesn't really understand what it does, nor does she have a good intuition about what this API is for. But she is accustomed to just trying things in rust to see if they work. And indeed, after changing Box::new to Box::pin:


#![allow(unused)]
fn main() {
futures.push(Box::pin(delay_future.then(|_| {
    println!("Done!");
    std::future::ready(())
})));
}

and adjusting the type of the Vec:


#![allow(unused)]
fn main() {
let mut futures: Vec<Pin<Box<dyn Future<Output=()>>>> = Vec::new();
}

the code compiles and runs successfully.

But even though the run is working correctly, she wishes she had a better idea why pinning is necessary here and feels a little uneasy having to use something she doesn't yet understand well.

As one final task, Barbara wants to try to replace the chained call to then with a async block. She remembers that these were a big deal in a recent release of rust, and that they looked a lot nicer than a long chain of then calls. She doesn't remember the exact syntax for this, but she read a blog post about async rust a few weeks ago, and has a vague idea of how it looks.

She tries writing this:


#![allow(unused)]
fn main() {
futures.push(Box::pin(async || {
    tokio::time::sleep(Duration::from_millis(delay)).await;
    println!("Done after {}ms", delay);
}));
}

The compiler gives an error:

error[E0658]: async closures are unstable
  --> src\main.rs:14:31
   |
14 |         futures.push(Box::pin(async || {
   |                               ^^^^^
   |
   = note: see issue #62290 <https://github.com/rust-lang/rust/issues/62290> for more information
   = help: add `#![feature(async_closure)]` to the crate attributes to enable
   = help: to use an async block, remove the `||`: `async {`

Barbara knows that async is stable and using this nightly feature isn't what she wants. So the tries the suggestion made by the compiler and removes the || bars:


#![allow(unused)]
fn main() {
futures.push(Box::pin(async {
    tokio::time::sleep(Duration::from_millis(delay)).await;
    println!("Done after {}ms", delay);
}));
}

A new error this time:

error[E0597]: `delay` does not live long enough
15 | |             tokio::time::sleep(Duration::from_millis(delay)).await;
   | |                                                      ^^^^^ borrowed value does not live long enough

This is an error that Barbara is very familiar with. If she was working with a closure, she knows she can use a move-closure (since her delay type is Copy). But she not using a closure (she just tried, but the compiler told her to switch to an async block), but Barbara's experience with rust tells her that it's a very consistent language. Maybe the same keyword used in move closures will work here? She tries it:


#![allow(unused)]
fn main() {
futures.push(Box::pin(async move {
    tokio::time::sleep(Duration::from_millis(delay)).await;
    println!("Done after {}ms", delay);
}));
}

It works! Satisfied but still thinking about async rust, Barbara takes a break to eat a cookie.

๐Ÿค” Frequently Asked Questions

Here are some standard FAQ to get you started. Feel free to add more!

Why did you choose Barbara to tell this story?

Barbara has years of rust experience that she brings to bear in her async learning experiences.

What are the morals of the story?

  • Due to Barbara's long experience with rust, she knows most of the language pretty well (except for things like async, and advanced concepts like pinned objects). She generally trusts the rust compiler, and she's learned over the years that she can learn how to use an unfamiliar library by reading the API docs. As long as she can get the types to line up and the code to compile, things generally work as she expects.

    But this is not the case with rust async:

    • There can be new syntax to learn (e.g. async blocks)
    • It can be hard to find basic functionality (like futures::future::join_all)
    • It's not always clear how the ecosystem all fits together (what functionality is part of tokio? What is part of the standard library? What is part of other crates like the futures crate?)
    • Sometimes it looks like there multiple ways to do something:
      • What's the difference between futures::future::Future and std::future::Future?
      • What's the difference between tokio::time::Instant and std::time::Instant?
      • What's the difference between std::future::ready and futures::future::ok?
  • Barbara's has a lot to learn. Her usual methods of learning how to use new crates doesn't really work when learning tokio and async. She wonders if she actually should have read the long tokio tutorial before starting. She realizes it will take her a while to build up the necessary foundation of knowledge before she can be proficient in async rust.

  • There were several times where the compiler or the IDE gave helpful error messages and Barbara appreciated these a lot.

What are the sources for this story?

Personal experiences of the author

How would this story have played out differently for the other characters?

Other characters would likely have written all the same code as Barbara, and probably would have run into the same problems. But other characters might have needed quite a bit longer to get to the solution.

For example, it was Barbara's experience with move-closures that led her to try adding the move keyword to the async block. And it was her general "ambient knowledge" of things that allowed her to remember that things like the futures crate exist. Other characters would have likely needed to resort to an internet search or asking on a rust community.

๐Ÿ˜ฑ Status quo stories: Barbara tries async streams

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

The story

Barbara has years of experience in Rust and was looking forward to using some of that experience with the brand-new async functionality. Async/await had been a dream of Rust for so long, and it was finally here!

As she began her next side project, she would quickly partner up with other experienced Rust developers. One of these Rust developers, who had more async experience than Barbara, suggested they use 'async streams' as the core abstraction for this project. Barbara trusted the experience of this other developer. Though she didn't yet understand how async streams worked, she was happy to go along with the decision and build her experience over time.

Month after month, the side project grew in scope and number of users. Potential contributors would try to contribute, but some would leave because they found the combination of concepts and the additional set of borrowchecker-friendly code patterns difficult to understand and master. Barbara was frustrated to lose potential contributors but kept going.

Users also began to discover performance bottlenecks as they pushed the system harder. Barbara, determined to help the users as best she could, pulled her thinking cap tight and started to probe the codebase.

In her investigations, she experimented with adding parallelism to the async stream. She knew that if she called .next() twice, that in theory she should have two separate futures. There were a few ways to run multiple futures in parallel, so this seemed like it might pan out to be a useful way of leveraging the existing architecture.

Unfortunately, to Barbara's chagrin, async streams do not support this kind of activity. Each .next() must be awaited so that the ownership system allowed her to get the next value in the stream. Effectively, this collapsed the model to being a synchronous iterator with a more modern scent. Barbara was frustrated and started to clarify her understanding of what asynchrony actually meant, looking through the implementations for these abstractions.

When she was satisfied, she took a step back and thought for a moment. If optional parallelism was a potential win and the core data processing system actually was going to run synchronously anyway -- despite using async/await extensively in the project -- perhaps it would make more sense to redesign the core abstraction.

With that, Barbara set off to experiment with a new engine for her project. The new engine focused on standard iterators and rayon instead of async streams. As a result, the code was much easier for new users, as iterators are well-understood and have good error messages. Just as importantly, the code was noticeably faster than its async counterpart. Barbara benchmarked a variety of cases to be sure, and always found that the new, simpler approach performed better than the async stream original.

To help those who followed after her, Barbara sat down to write out her experiences to share with the rest of the world. Perhaps future engineers might learn from the twists and turns her project took.

๐Ÿค” Frequently Asked Questions

Here are some standard FAQ to get you started. Feel free to add more!

What are the morals of the story?

  • Easy to get the wrong idea. The current state of documentation does not make the use cases clear, so it's easy to grab this as an abstraction because it's the closest that fits.
  • Async streams are just iterators. Async streams do not offer useful asynchrony in and of themselves. A possible help here might be renaming "async streams" to "async iterators" to help underscore their use case and help developers more quickly understand their limitations.
  • A single async stream can not be operated on in parallel. They open up asynchrony only during the .next() step and are unable to offer asynchrony between steps (eg by calling .next() twice and operating on the resulting Futures).

What are the sources for this story?

Why did you choose Barbara to tell this story?

Barbara is an experienced engineer who may come to async streams and async/await in general with a partially-incorrect set of baseline understanding. It may take her time to understand and see more clearly where her model was wrong because there are things similar to other experiences she's had. For example, Rust futures differ from C++ futures and do not offer the same style of asynchrony. Terms like "streams" sound like they may have more internal functionality, and it would be easy for an experienced developer to trip up with the wrong starting assumption.

How would this story have played out differently for the other characters?

  • Alan may have come to a similar idea for an architecture, as async/await is popular in languages like JavaScript and C#. Once Alan attempted to use asynchrony between units of work, namely using async streams, this is where Alan may have failed. The amount of Rust one has to know to succeed here is quite high and includes understanding Arc, Pin, Streams, traits/adapters, the borrowchecker and dealing with different types of errors, and more.
  • Grace may have chosen a different core abstraction from the start. She has a good chance of thinking through how she'd like the data processing system to work. It's possible she would have found threads and channels a better fit. This would have had different trade-offs.
  • Niklaus may have also tried to go down the async stream path. The information available is mixed and hype around async/await is too strong. This makes it shine brighter than it should. Without experience with different systems languages to temper the direction, the most likely path would be to experiment with asynchrony and hope that "underneath the surface it does the right thing."

๐Ÿ˜ฑ Status quo stories: Barbara trims a stacktrace

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara is triaging the reported bugs for her SLOW library. For each bug, she tries to quickly see if she can diagnose the basic area of code that is affected so she knows which people to ping to help fix it. She opens a bug report from a user complaining about a panic when too many connections arrive at the same time. The bug report includes a backtrace from the user's code, and it looks like this:

thread 'main' panicked at 'something bad happened here', src/main.rs:16:5
stack backtrace:
   0: std::panicking::begin_panic
             at /home/serg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:519:12
   1: slow_rs::process_one::{{closure}}
             at ./src/main.rs:16:5
   2: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /home/serg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:80:19
   3: slow_rs::process_many::{{closure}}
             at ./src/main.rs:10:5
   4: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /home/serg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:80:19
   5: slow_rs::main::{{closure}}::{{closure}}
             at ./src/main.rs:4:9
   6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /home/serg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:80:19
   7: slow_rs::main::{{closure}}
             at ./src/main.rs:3:5
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /home/serg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:80:19
   9: tokio::park::thread::CachedParkThread::block_on::{{closure}}
             at /home/serg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.3.0/src/park/thread.rs:263:54
  10: tokio::coop::with_budget::{{closure}}
             at /home/serg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.3.0/src/coop.rs:106:9
  11: std::thread::local::LocalKey<T>::try_with
             at /home/serg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:272:16
  12: std::thread::local::LocalKey<T>::with
             at /home/serg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:248:9
  13: tokio::coop::with_budget
             at /home/serg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.3.0/src/coop.rs:99:5
  14: tokio::coop::budget
             at /home/serg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.3.0/src/coop.rs:76:5
  15: tokio::park::thread::CachedParkThread::block_on
             at /home/serg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.3.0/src/park/thread.rs:263:31
  16: tokio::runtime::enter::Enter::block_on
             at /home/serg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.3.0/src/runtime/enter.rs:151:13
  17: tokio::runtime::thread_pool::ThreadPool::block_on
             at /home/serg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.3.0/src/runtime/thread_pool/mod.rs:71:9
  18: tokio::runtime::Runtime::block_on
             at /home/serg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.3.0/src/runtime/mod.rs:452:43
  19: slow_rs::main
             at ./src/main.rs:1:1
  20: core::ops::function::FnOnce::call_once
             at /home/serg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Barbara finds the text overwhelming. She can't just browse it to figure out what code is affected. Instead, she pops up a new tab with gist.github.com copies the text into that handy text box and starts deleting stuff. To start, she deletes the first few lines until her code appears, then she deletes:

  • the extra lines from calls to poll that are introduced by the async fn machinery;
  • the bits of code that come from tokio that don't affect her;
  • the intermediate wrappers from the standard library pertaining to thread-local variables.

She's a bit confused by the ::{closure} lines on her symbols but she learned by now that this is normal for async fn. After some work, she has reduced her stack to this:

thread 'main' panicked at 'something bad happened here', src/main.rs:16:5
stack backtrace:
   1: slow_rs::process_one::{{closure}} at ./src/main.rs:16:5
   3: slow_rs::process_many::{{closure}} at ./src/main.rs:10:5
   5: slow_rs::main::{{closure}}::{{closure}} at ./src/main.rs:4:9
   7: slow_rs::main::{{closure}} at ./src/main.rs:3:5
  13: <tokio stuff> 
  19: slow_rs::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Based on this, she is able to figure out who to ping about the problem. She pastes her reduced stack trace into the issue pings Alan, who is responsible that module. Alan thanks her for reducing the stack trace and mentions, "Oh, when I used to work in C#, this is what the stack traces always looked like. I miss those days."

Fin.

๐Ÿค” Frequently Asked Questions

Here are some standard FAQ to get you started. Feel free to add more!

What are the morals of the story?

  • Rust stack traces -- but async stack traces in particular -- reveal lots of implementation details to the user:
    • Bits of the runtime and intermediate libraries whose source code is likely not of interest to the user (but it might be);
    • Intermediate frames from the stdlib;
    • ::{closure} symbols on async functions and blocks (even though they don't appear to be closures to the user);
    • calls to poll.

What are the sources for this story?

Sergey Galich reported this problem, among many others.

Why did you choose Barbara to tell this story?

She knows about the desugarings that give rise to symbols like ::{closure}, but she still finds them annoying to deal with in practice.

How would this story have played out differently for the other characters?

  • Other characters might have wasted a lot of time trying to read through the stack trace in place before editing it.
  • They might not have known how to trim down the stack trace to something that focused on their code, or it might have taken them much longer to do so.

How does this compare to other languages?

  • Rust's async model does have some advantages, because the complete stack trace is available unless there is an intermediate spawn.
  • Other languages have developed special tools to connect async functions to their callers, however, which gives them a nice experience. For example, Chrome has a UI for enabling stacktraces that cross await points.

Why doesn't Barbara view this in a debugger?

  • Because it came in an issue report (or, freqently, as a crash report or email).
  • But also, that isn't necessarily an improvement! Expand below if you would like to see what we mean.
(click to see how a backtrace looks in lldb)
* thread #1, name = 'foo', stop reason = breakpoint 1.1
  * frame #0: 0x0000555555583d24 foo`foo::main::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h617d49d0841ffc0d((null)=closure-0 @ 0x00007fffffffae38, (null)=<unavailable>) at main.rs:11:13
    frame #1: 0x0000555555583d09 foo`_$LT$T$u20$as$u20$futures_util..fns..FnOnce1$LT$A$GT$$GT$::call_once::hc559b1f3f708a7b0(self=closure-0 @ 0x00007fffffffae68, arg=<unavailable>) at fns.rs:15:9
    frame #2: 0x000055555557f300 foo`_$LT$futures_util..future..future..map..Map$LT$Fut$C$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hebf5b295fcc0837f(self=(pointer = 0x0000555555700e00), cx=0x00007fffffffcf50) at map.rs:57:73
    frame #3: 0x00005555555836ac foo`_$LT$futures_util..future..future..Map$LT$Fut$C$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h482f253651b968e6(self=Pin<&mut futures_util::future::future::Map<tokio::time::driver::sleep::Sleep, closure-0>> @ 0x00007fffffffb268, cx=0x00007fffffffcf50)
at lib.rs:102:13
    frame #4: 0x000055555557995a foo`_$LT$futures_util..future..future..flatten..Flatten$LT$Fut$C$$LT$Fut$u20$as$u20$core..future..future..Future$GT$..Output$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hd62d2a2417c0f2ea(self=(pointer = 0x0000555555700d80), cx=0x00007fffffffcf50) at flatten.rs:48:36
    frame #5: 0x00005555555834fc foo`_$LT$futures_util..future..future..Then$LT$Fut1$C$Fut2$C$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hf60f05f9e9d6f307(self=Pin<&mut futures_util::future::future::Then<tokio::time::driver::sleep::Sleep, core::future::ready::Ready<()>, closure-0>> @ 0x00007fffffffc148, cx=0x00007fffffffcf50) at lib.rs:102:13
    frame #6: 0x000055555558474a foo`_$LT$core..pin..Pin$LT$P$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h4dad267b4f10535d(self=Pin<&mut core::pin::Pin<alloc::boxed::Box<Future, alloc::alloc::Global>>> @ 0x00007fffffffc188, cx=0x00007fffffffcf50) at future.rs:119:9
    frame #7: 0x000055555557a693 foo`_$LT$futures_util..future..maybe_done..MaybeDone$LT$Fut$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hdb6db40c2b3f2f1b(self=(pointer = 0x00005555557011b0), cx=0x00007fffffffcf50) at maybe_done.rs:95:38
    frame #8: 0x0000555555581254 foo`_$LT$futures_util..future..join_all..JoinAll$LT$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::ha2472a9a54f0e504(self=Pin<&mut futures_util::future::join_all::JoinAll<core::pin::Pin<alloc::boxed::Box<Future, alloc::alloc::Global>>>> @ 0x00007fffffffc388, cx=0x00007fffffffcf50) at join_all.rs:101:16
    frame #9: 0x0000555555584095 foo`foo::main::_$u7b$$u7b$closure$u7d$$u7d$::h6459086fc041943f((null)=ResumeTy @ 0x00007fffffffcc40) at main.rs:17:5
    frame #10: 0x0000555555580eab foo`_$LT$core..future..from_generator..GenFuture$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h272e2b5e808264a2(self=Pin<&mut core::future::from_generator::GenFuture<generator-0>> @ 0x00007fffffffccf8, cx=0x00007fffffffcf50) at mod.rs:80:19
    frame #11: 0x00005555555805a0 foo`tokio::park::thread::CachedParkThread::block_on::_$u7b$$u7b$closure$u7d$$u7d$::hbfc61d9f747eef7b at thread.rs:263:54
    frame #12: 0x00005555555795cc foo`tokio::coop::with_budget::_$u7b$$u7b$closure$u7d$$u7d$::ha229cfa0c1a2e13f(cell=0x00007ffff7c06712) at coop.rs:106:9
    frame #13: 0x00005555555773cc foo`std::thread::local::LocalKey$LT$T$GT$::try_with::h9a2f70c5c8e63288(self=0x00005555556e2a48, f=<unavailable>) at local.rs:272:16
    frame #14: 0x0000555555576ead foo`std::thread::local::LocalKey$LT$T$GT$::with::h12eeed0906b94d09(self=0x00005555556e2a48, f=<unavailable>) at local.rs:248:9
    frame #15: 0x000055555557fea6 foo`tokio::park::thread::CachedParkThread::block_on::h33b270af584419f1 [inlined] tokio::coop::with_budget::hcd477734d4970ed5(budget=(__0 = core::option::Option<u8> @ 0x00007fffffffd040), f=closure-0 @ 0x00007fffffffd048) at coop.rs:99:5
    frame #16: 0x000055555557fe73 foo`tokio::park::thread::CachedParkThread::block_on::h33b270af584419f1 [inlined] tokio::coop::budget::h410dced2a7df3ec8(f=closure-0 @ 0x00007fffffffd008) at coop.rs:76
    frame #17: 0x000055555557fe0c foo`tokio::park::thread::CachedParkThread::block_on::h33b270af584419f1(self=0x00007fffffffd078, f=<unavailable>) at thread.rs:263
    frame #18: 0x0000555555578f76 foo`tokio::runtime::enter::Enter::block_on::h4a9c2602e7b82840(self=0x00007fffffffd0f8, f=<unavailable>) at enter.rs:151:13
    frame #19: 0x000055555558482b foo`tokio::runtime::thread_pool::ThreadPool::block_on::h6b211ce19db8989d(self=0x00007fffffffd280, future=(__0 = foo::main::generator-0 @ 0x00007fffffffd200)) at mod.rs:71:9
    frame #20: 0x0000555555583324 foo`tokio::runtime::Runtime::block_on::h5f6badd2dffadf55(self=0x00007fffffffd278, future=(__0 = foo::main::generator-0 @ 0x00007fffffffd968)) at mod.rs:452:43
    frame #21: 0x0000555555579052 foo`foo::main::h3106d444f509ad81 at main.rs:5:1
    frame #22: 0x000055555557b69b foo`core::ops::function::FnOnce::call_once::hba86afc3f8197561((null)=(foo`foo::main::h3106d444f509ad81 at main.rs:6), (null)=<unavailable>) at function.rs:227:5
    frame #23: 0x0000555555580efe foo`std::sys_common::backtrace::__rust_begin_short_backtrace::h856d648367895391(f=(foo`foo::main::h3106d444f509ad81 at main.rs:6)) at backtrace.rs:125:18
    frame #24: 0x00005555555842f1 foo`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h24c58cd1e112136f at rt.rs:66:18
    frame #25: 0x0000555555670aca foo`std::rt::lang_start_internal::h965c28c9ce06ee73 [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::hbcc915e668c7ca11 at function.rs:259:13
    frame #26: 0x0000555555670ac3 foo`std::rt::lang_start_internal::h965c28c9ce06ee73 [inlined] std::panicking::try::do_call::h6b0f430d48122ddf at panicking.rs:379
    frame #27: 0x0000555555670ac3 foo`std::rt::lang_start_internal::h965c28c9ce06ee73 [inlined] std::panicking::try::h6ba420e2e21b5afa at panicking.rs:343
    frame #28: 0x0000555555670ac3 foo`std::rt::lang_start_internal::h965c28c9ce06ee73 [inlined] std::panic::catch_unwind::h8366719d1f615eee at panic.rs:431
    frame #29: 0x0000555555670ac3 foo`std::rt::lang_start_internal::h965c28c9ce06ee73 at rt.rs:51
    frame #30: 0x00005555555842d0 foo`std::rt::lang_start::ha8694bc6fe5182cd(main=(foo`foo::main::h3106d444f509ad81 at main.rs:6), argc=1, argv=0x00007fffffffdc88) at rt.rs:65:5
    frame #31: 0x00005555555790ec foo`main + 28
    frame #32: 0x00007ffff7c2f09b libc.so.6`__libc_start_main(main=(foo`main), argc=1, argv=0x00007fffffffdc88, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffdc78) at libc-start.c:308:16

Doesn't Rust have backtrace trimming support?

Yes, this is the reduced backtrace. You don't even want to know what the full one looks like. Don't click it. Don't!

๐Ÿ˜ฑ Status quo stories: Barbara wants Async Insights

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara has an initial prototype of a new service she wrote in sync Rust. She then decides, since the service is extremely I/O bound, to port it to async Rust and her benchmarks have led her to believe that performance is being left on the table.

She does this by sprinkling async/.await everywhere, picking an executor, and moving dependencies from sync to async.

Once she has the program compiling, she thinks "oh that was easy". She runs it for the first time and surprisingly she finds out that when hitting an endpoint, nothing happens.

Barbara, always prepared, has already added logging to her service and she checks the logs. As she expected, she sees here that the endpoint handler has been invoked but then... nothing. Barbara exclaims, "Oh no! This was not what I was expecting, but let's dig deeper."

She checks the code and sees that the endpoint spawns several tasks, but unfortunately those tasks don't have much logging in them.

Barbara knows that debugging with a traditional debugger is not very fruitful in async Rust. She does a deep dive into the source code and doesn't find anything. Then she adds much more logging, but to her dismay she finds that a particular task seems stuck, but she has no idea why.

She really wishes that there was a way to get more insight into why the task is stuck. These were the thoughts inside her head at that moment:

  • Is it waiting on I/O?
  • Is there a deadlock?
  • Did she miss some sync code that might still be there and messing with the executor?

For the I/O question she knows to use some tools on her operating system (lsof). This reveals some open sockets but she's not sure how to act on this.

She scans the code for any std lib imports that might be blocking, but doesn't find anything.

After hours of crawling through the code, she notices that her task is receiving a message from a bounded async channel. She changes this to be an unbounded channel and then things start working.

She wants to know why the code was not working, but unfortunately she has no way to gain insight into this issue. She fears that her task might use too much memory knowing that the channel is unbounded, but she can't really tell.

She thinks, "Anyhow it is working now, let's see if we got some performance gains." After thorough benchmarking she finds out that she didn't quite get the performance gain she was expecting. "Something is not working, as intended", she thinks.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • There are very few ways to get insights into running systems. Tracing is state of the art. console.log #ftw
  • Tracing is a static activity and there's no way to dynamically gain insights.
  • While it's possible to find solutions to these issues, often you don't have insight into if those solutions bring new problems.
  • Debugging process for non-trivial issues is almost guaranteed to be painful and expensive.

What are the sources for this story?

Issue 75

What are examples of the kinds of things a user might want to have insight into?

  • Custom Events - logging/tracing (Per task?)
  • Memory consumption per task.
  • I/O handles in waiting state per task.
  • Number of tasks and their states over time.
  • Wake and drop specific tasks.
  • Denoised stack traces and/or stack traces that are task aware.
  • Who spawned the task?
  • Worker threads that are blocked from progressing tasks forward.
  • Tasks that are not progressing.

Why did you choose Barbara to tell this story?

Barbara knows what she's doing, but still there is little way to get insights.

How would this story have played out differently for the other characters?

Depending on what languages he was using before, Alan would likely have had experience with a stronger tooling story:

Barbara wants to use GhostCell-like cell borrowing with futures

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara quite likes using statically-checked cell borrowing. "Cell" in Rust terminology refers to types like Cell or RefCell that enable interior mutability, i.e. modifying or mutably borrowing stuff even if you've only got an immutable reference to it. Statically-checked cell borrowing is a technique whereby one object (an "owner") acts as a gatekeeper for borrow-access to a set of other objects ("cells"). So if you have mutable borrow access to the owner, you can temporarily transfer that mutable borrow access to a cell in order to modify it. This is all checked at compile-time, hence "statically-checked".

In comparison RefCell does borrow-checking, but it is checked at runtime and it will panic if you make a coding mistake. The advantage of statically-checked borrowing is that it cannot panic at runtime, i.e. all your borrowing bugs show up at compile time. The history goes way back, and the technique has been reinvented at least 2-3 times as far as Barbara is aware. This is implemented in various forms in GhostCell and qcell.

Barbara would like to use statically-checked cell borrowing within futures, but there is no way to get the owner borrow through the Future::poll call, i.e. there is no argument or object that the runtime could save the borrow in. Mostly this does not cause a problem, because there are other ways for a runtime to share data, e.g. data can be incorporated into the future when it is created. However in this specific case, for the specific technique of statically-checked cell borrows, we need an active borrow to the owner to be passed down the call stack through all the poll calls.

So Barbara is forced to use RefCell instead and be very careful not to cause panics. This seems like a step back. It feels dangerous to use RefCell and to have to manually verify that her cell borrows are panic-free.

There are good habits that you can adopt to offset the dangers, of course. If you are very careful to make sure that you call no other method or function which might in turn call code which might attempt to get another borrow on the same cell, then the RefCell::borrow_mut panics can be avoided. However this is easy to overlook, and it is easy to fail to anticipate what indirect calls will be made by a given call, and of course this may change later on due to maintenance and new features. A borrow may stay active longer than expected, so calls which appear safe might actually panic. Sometimes it's necessary to manually drop the borrow to be sure. In addition you'll never know what indirect calls might be made until all the possible code-paths have been explored, either through testing or through running in production.

So Barbara prefers to avoid all these problems, and use statically-checked cell borrowing where possible.

Example 1: Accessing an object shared outside the runtime

In this minimized example of code to interface a stream to code outside of the async/await system, the buffer has to be accessible from both the stream and the outside code, so it is handled as a Rc<RefCell<StreamBuffer<T>>>.


#![allow(unused)]
fn main() {
pub struct StreamPipe<T> {
    buf: Rc<RefCell<StreamBuffer<T>>>,
    req_more: Rc<dyn Fn()>,
}

impl<T> Stream for StreamPipe<T> {
    type Item = T;

    fn poll_next(self: Pin<&mut Self>, _: &mut Context<'_>) -> Poll<Option<T>> {
        let mut buf = self.buf.borrow_mut();
        if let Some(item) = buf.value.take() {
            return Poll::Ready(Some(item));
        }
        if buf.end {
            return Poll::Ready(None);
        }
        (self.req_more)();  // Callback to request more data
        Poll::Pending
    }
}
}

Probably req_more() has to schedule some background operation, but if it doesn't and attempts to modify the shared buf immediately then we get a panic, because buf is still borrowed. The real life code could be a lot more complicated, and the required combination of conditions might be harder to hit in testing.

With statically-checked borrowing, the borrow would be something like let mut buf = self.buf.rw(cx);, and the req_more call would either have to take the cx as an argument (forcing the previous borrow to end) or would not take cx, meaning that it would always have to defer the access to the buffer to other code, because without the cx there is no possible way to access the buffer.

Example 2: Shared monitoring data

In this example, the app keeps tallies of various things in a Monitor structure. This might be data in/out, number of errors detected, maybe a hashmap of current links, etc. Since it is accessed from various components, it is kept behind an Rc<RefCell<_>>.

// Dependency: futures-lite = "1.11.3"
use std::cell::RefCell;
use std::rc::Rc;

fn main() {
    let monitor0 = Rc::new(RefCell::new(Monitor { count: 0 }));
    let monitor1 = monitor0.clone();

    let fut0 = async move {
        let mut borrow = monitor0.borrow_mut();
        borrow.count += 1;
    };

    let fut1 = async move {
        let mut borrow = monitor1.borrow_mut();
        borrow.count += 1;
        fut0.await;
    };

    futures_lite::future::block_on(fut1);
}

struct Monitor {
    count: usize,
}

The problem is that this panics with a borrowing error because the borrow is still active when the fut0.await executes and attempts another borrow. The solution is to remember to drop the borrow before awaiting.

In this example code the bug is obvious, but in real life maybe fut0 only borrows in rare situations, e.g. when an error is detected. Or maybe the future that borrows is several calls away down the callstack.

With statically-checked borrowing, there is a slight problem in that currently there is no way to access the poll context from async {} code. But if there was then the borrow would be something like let mut borrow = monitor1.rw(cx);, and since the fut0.await implicitly requires the cx in order to poll, the borrow would be forced to end at that point.

Further investigation by Barbara

The mechanism

Barbara understands that statically-checked cell borrows work by having an owner held by the runtime, and various instances of a cell held by things running on top of the runtime (these cells would typically be behind Rc references). A mutable borrow on the owner is passed down the stack, which enables safe borrows on all the cells, since a mutable borrow on a cell is enabled by temporarily holding onto the mutable borrow of the owner, which is all checked at compile-time.

So the mutable owner borrow needs to be passed through the poll call, and Barbara realizes that this would require support from the standard library.

Right now a &mut Context<'_> is passed to poll, and so within Context would be the ideal place to hold a borrow on the cell owner. However as far as Barbara can see there are difficulties with all the current implementations:

  • GhostCell (or qcell::LCell) may be the best available solution, because it doesn't have any restrictions on how many runtimes might be running or how they might be nested. But Rust insists that the lifetimes <'id> on methods and types are explicit, so it seems like that would force a change to the signature of poll, which would break the ecosystem.

    Here Barbara experiments with a working example of a modified Future trait and a future implementation that makes use of LCell:

// Requires dependency: qcell = "0.4"
use qcell::{LCell, LCellOwner};
use std::pin::Pin;
use std::rc::Rc;
use std::task::Poll;

struct Context<'id, 'a> {
    cell_owner: &'a mut LCellOwner<'id>,
}

struct AsyncCell<'id, T>(LCell<'id, T>);
impl<'id, T> AsyncCell<'id, T> {
    pub fn new(value: T) -> Self {
        Self(LCell::new(value))
    }
    pub fn rw<'a, 'b: 'a>(&'a self, cx: &'a mut Context<'id, 'b>) -> &'a mut T {
        cx.cell_owner.rw(&self.0)
    }
}

trait Future<'id> {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'id, '_>) -> Poll<Self::Output>;
}

struct MyFuture<'id> {
    count: Rc<AsyncCell<'id, usize>>,
}
impl<'id> Future<'id> for MyFuture<'id> {
    type Output = ();
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'id, '_>) -> Poll<Self::Output> {
        *self.count.rw(cx) += 1;
        Poll::Ready(())
    }
}

fn main() {
    LCellOwner::scope(|mut owner| {
        let mut cx = Context { cell_owner: &mut owner };
        let count = Rc::new(AsyncCell::new(0_usize));
        let mut fut = Box::pin(MyFuture { count: count.clone() });
        let _ = fut.as_mut().poll(&mut cx);
        assert_eq!(1, *count.rw(&mut cx));
    });
}
  • The other qcell types (QCell, TCell and TLCell) have various restrictions or overheads which might make them unsuitable as a general-purpose solution in the standard library. However they do have the positive feature of not requiring any change in the signature of poll. It looks like they could be added to Context without breaking anything.

    Here Barbara tries using TLCell, and finds that the signature of poll doesn't need to change:

// Requires dependency: qcell = "0.4"
use qcell::{TLCell, TLCellOwner};
use std::pin::Pin;
use std::rc::Rc;
use std::task::Poll;

struct AsyncMarker;
struct Context<'a> {
    cell_owner: &'a mut TLCellOwner<AsyncMarker>,
}

struct AsyncCell<T>(TLCell<AsyncMarker, T>);
impl<T> AsyncCell<T> {
    pub fn new(value: T) -> Self {
        Self(TLCell::new(value))
    }
    pub fn rw<'a, 'b: 'a>(&'a self, cx: &'a mut Context<'b>) -> &'a mut T {
        cx.cell_owner.rw(&self.0)
    }
}

trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

struct MyFuture {
    count: Rc<AsyncCell<usize>>,
}
impl Future for MyFuture {
    type Output = ();
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        *self.count.rw(cx) += 1;
        Poll::Ready(())
    }
}

fn main() {
    let mut owner = TLCellOwner::new();
    let mut cx = Context { cell_owner: &mut owner };
    let count = Rc::new(AsyncCell::new(0_usize));
    let mut fut = Box::pin(MyFuture { count: count.clone() });
    let _ = fut.as_mut().poll(&mut cx);
    assert_eq!(1, *count.rw(&mut cx));
}

(For comparison, TCell only allows one owner per marker type in the whole process. QCell allows many owners, but requires a runtime check to make sure you're using the right owner to access a cell. TLCell allows only one owner per thread per marker type, but also lets cells migrate between threads and be borrowed locally, which the others don't -- see qcell docs.)

So the choice is GhostCell/LCell and lifetimes everywhere, or various other cell types that may be too restrictive.

Right now Barbara thinks that none of these solutions is likely to be acceptable for the standard library. However still it is a desirable feature, so maybe someone can think of a way around the problems. Or maybe someone has a different perspective on what would be acceptable.

Proof of concept

The Stakker runtime makes use of qcell-based statically-checked cell borrowing. It uses this to get zero-cost access to actors, guaranteeing at compile time that no actor can access any other actor's state. It also uses it to allow inter-actor shared state to be accessed safely and zero-cost, without RefCell.

(For example within a Stakker actor, you can access the contents of a Share<T> via the actor context cx as follows: share.rw(cx), which blocks borrowing or accessing cx until that borrow on share has been released. Share<T> is effectively a Rc<ShareCell<T> and cx has access to an active borrow on the ShareCellOwner, just as in the long examples above.)

Stakker doesn't use GhostCell (LCell) because of the need for <'id> annotations on methods and types. Instead it uses the other three cell types according to how many Stakker instances will be run, either one Stakker instance only, one per thread, or multiple per thread. This is selected by cargo features.

Switching implementations like this doesn't seem like an option for the standard library.

Way forward

Barbara wonders whether there is any way this can be made to work. For example, could the compiler derive all those <'id> annotations automatically for GhostCell/LCell?

Or for multi-threaded runtimes, would qcell::TLCell be acceptable? This allows a single cell-owner in every thread. So it would not allow nested runtimes of the same type. However it does allow borrows to happen at the same time independently in different threads, and it also allows the migration of cells between threads, which is safe because that kind of cell isn't Sync.

Or is there some other form of cell-borrowing that could be devised that would work better for this?

The interface between cells and Context should be straightforward once a particular cell type is demonstrated to be workable with the poll interface and futures ecosystem. For example copying the API style of Stakker:

let rc = Rc::new(AsyncCell::new(1_u32));
*rc.rw(cx) = 2;

So logically you obtain read-write access to a cell by naming the authority by which you claim access, in this case the poll context. In this case it really is naming rather than accessing since the checks are done at compile time and the address that cx represents doesn't actually get passed anywhere or evaluated, once inlining and optimisation is complete.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

The main problem is that Barbara has got used to a safer environment and it feels dangerous to go back to RefCell and have to manually verify that her cell borrows are panic-free.

What are the sources for this story?

The author of Stakker is trying to interface it to async/await and futures.

Why did you choose Barbara to tell this story?

Barbara has enough Rust knowledge to understand the benefits that GhostCell/qcell-like borrowing might bring.

How would this story have played out differently for the other characters?

The other characters perhaps wouldn't have heard of statically-checked cell borrows so would be unaware of the possibility of making things safer.

๐Ÿ˜ฑ Status quo stories: Barbara writes a runtime-agnostic library

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories [cannot be wrong], only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Barbara and Alan work at AmoolgeSoft, where many teams are switching from Java to Rust. These teams have many different use cases and various adoption stories. Some teams are happy users of tokio, others happy users of async-std, and others still are using custom runtimes for highly specialized use cases.

Barbara is tasked with writing a library for a custom protocol, SLOW (only in use at AmoogleSoft) and enlists the help of Alan in doing so. Alan is already aware that not all libraries in Rust work with all runtimes. Alan and Barbara start by writing a parser which works on std::io::Read and get their tests working with Strings. After this they contemplate the question of how to accept a TCP connection.

Incompatible AsyncRead traits

Alan asks Barbara what is the async equivalent is of std::io::Read, and Barbara sighs and says that there isn't one. Barbara brings up tokio's and the futures crate's versions of AsyncRead. Barbara decides not to talk about AsyncBufRead for now.

Barbara and Alan decide to use the future's AsyncRead for no other reason other than it is runtime-agnostic. Barbara tells Alan not to worry as they can translate between the two. With some effort they convert their parser to using AsyncRead.

Alan, excited about the progress they've made, starts working on hooking this up to actual TCP streams. Alan looks at async-std and tokio and notices their interfaces for TCP are quite different. Alan waits for Barbara to save the day.

Barbara helps abstract over TCP listener and TCP stream (TODO: code example). One big hurdle is that tokio uses AsyncRead from their own crate and not the one from futures crate.

Task spawning

After getting the TCP handling part working, they now want to spawn tasks for handling each incoming TCP connection. Again, to their disappointment, they find that there's no runtime-agnostic way to do that.

Unsure on how to do this, they do some searching and find the agnostik crate. They reject it because this only supports N number of runtimes and their custom runtime is not one of them. However it gives them the idea to provide a trait for specifying how to spawn tasks on the runtime. Barbara points out that this has disadvantage of working against orphan rules meaning that either they have to implement the trait for all known runtimes (defeating the purpose of the exercise) or force the user to use new types.

They punt on this question by implementing the trait for each of the known runtimes. They're disappointed that this means their library actually isn't runtime agnostic.

The need for timers

To make things further complicated, they also are in need for a timer API. They could abstract runtime-specific timer APIs in their existing trait they use for spawning, but they find a runtime-agnostic library. It works but is pretty heavy in that it spawns an OS thread (from a pool) every time they want to sleep. They become sadder.

Channels

They need channels as well but after long searches and discussions on help channels, they learn of a few runtime-agnostic implementations: async-channel, futures-channel, and trimmed down ( through feature flags) async-std/tokio. They pick one and it seems to work well. They become less sadder.

First release

They get things working but it was a difficult journey to get to the first release. Some of their users find the APIs harder to use than their runtime-specific libs.

๐Ÿค” Frequently Asked Questions

Here are some standard FAQ to get you started. Feel free to add more!

Why did you choose Barbara to tell this story?

Barbara has years of rust experience that she brings to bear in her async learning experiences.

What are the morals of the story?

  • People have to roll their own implementations which can lead to often subtle differences between runtimes (For example TCPListeners in async-std and tokio).
  • Orphan rules and no standard traits guarantee that a truly agnostic library is not possible.
  • Takes way more time than writing synchronous protocols.
  • It's a hard goal to achieve.
  • Leads to poorer APIs sometimes (both in ease of use and performance).
  • More API design considerations need to go into making an generic async library than a generic sync library.

What are the sources for this story?

Personal experiences of the author from adding async API in zbus crate, except for AsyncRead, which is based on common knowledge in async Rust community.

How would this story have played out differently for the other characters?

Alan, Grace, and Niklaus would be overwhelmed and will likely want to give up.

TODO:

What are the downside of using runtime agnostic crates?

Some things can be implemented very efficiently in a runtime-agnostic way but even then you can't integrate deeply into the runtime. For example, see tokioโ€™s pre-emption strategy, which relies on deep integration with the runtime.

What other runtime utilities are generally needed?

๐Ÿ˜ฑ Status quo stories: Grace deploys her service and hits obstacles

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

The story

When examining her service metrics, Grace notices tail latencies in the P99 that exceed their target. She identifies GC in the routing layer as the culprit. Grace follows industry trends and is already aware of Rust and its ecosystem at a high level. She decides to investigate rewriting the routing service in Rust.

To meet throughput requirements, Grace has already decided to use a thread-per-core model and minimize cross-thread communication. She explores available ecosystem options and finds no option that gets her exactly what she is looking for out of the box. However, she can use Tokio with minimal configuration to achieve her architecture.

A few months of frantic hacking follow.

montage of cats typing

Soon enough, she and her team have a proof of concept working. They run some local stress tests and notice that 5% of requests hang and fail to respond. The client eventually times out. She cannot reproduce this problem when running one-off requests locally. It only shows up when sending above 200 requests-per-second.

She realizes that she doesn't have any tooling to give her insight into what's going on. She starts to add lots of logging, attempting to tie log entries to specific connections. Using an operating system tool, she can identify the socket addresses for the hung connections, so she also includes the socket addresses in each log message. She then filters the logs to find entries associated with hung connections. Of course, the logs only tell her what the connection managed to do successfully; they don't tell her why it stopped -- so she keeps going back to add more logging until she can narrow down the exact call that hangs.

Eventually, she identifies that the last log message is right before authenticating the request. An existing C library performs authentication, integrated with the routing service using a custom future implementation. She eventually finds a bug in the implementation that resulted in occasional lost wake-ups.

She fixes the bug. The service is now working as expected and meeting Grace's performance goals.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • When coming from a background of network engineering, users will bring their own design choices around architecture.
  • There is a lack of debugging tools for async.
  • Writing futures by hand is error prone.

What are the sources for this story?

This is based on the experiences of helping a tokio user to diagnose a bug in their code.

Why did you choose Grace to tell this story?

  • The actual user who experienced this problem fit the profile of Grace.
  • The story is focused on the experience of people aiming to use workflows they are familiar with from C in a Rust setting.

How would this story have played out differently for the other characters?

Alan or Niklaus may well have had a much harder time diagnosing the problem due to not having as much of a background in systems programming. For example, they may not have known about the system tool that allowed them to find the list of dangling connections.

Could Grace have used another runtime to achieve the same objectives?

  • Maybe! But in this instance the people this story is based on were using tokio, so that's the one we wrote into the story.
  • (If folks want to expand this answer with details of how to achieve similar goals on other runtimes that would be welcome!)

๐Ÿ˜ฑ Status quo stories: Grace tries new libraries

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

The story

When Grace searched crates.io for a library, she found an interesting library that she wants to use. The code examples use a map/reduce style. As Grace is more familiar with C and C++, as a first step she wants to convert them from this style to using loops.

Controller::new(root_kind_api, ListParams::default())
    .owns(child_kind_api, ListParams::default())
    .run(reconcile, error_policy, context)
    .for_each(|res| async move {
        match res {
            Ok(o) => info!("reconciled {:?}", o),
            Err(e) => warn!("reconcile failed: {}", Report::from(e)),
        }
    })
    .await;

(Example code from taken from https://github.com/clux/kube-rs)

So she takes the naive approach to just convert that as follows:

let controller = Controller::new(root_kind_api, ListParams::default())
    .owns(child_kind_api, ListParams::default())
    .run(reconcile, error_policy, context);

while let Ok(o) = controller.try_next().await {
    info!("reconciled {:?}", o),
}

when she compiles her source code she ends up with wall of error messages like the following:

$ cargo run
   Compiling kube-rs-test v0.1.0 (/home/project-gec/src/kube-rs-test)
error[E0277]: `from_generator::GenFuture<[static generator@watcher<Secret>::{closure#0}::{closure#0} for<'r, 's, 't0, 't1> {ResumeTy, kube::Api<Secret>, &'r kube::Api<Secret>, ListParams, &'s ListParams, watcher::State<Secret>, impl futures::Future, ()}]>` cannot be unpinned
  --> src/main.rs:23:41
   |
23 |     while let Ok(o) = controller.try_next().await {
   |                                  ^^^^^^^^ within `futures_util::unfold_state::_::__Origin<'_, (kube::Api<Secret>, ListParams, watcher::State<Secret>), impl futures::Future>`, the trait `Unpin` is not implemented for `from_generator::GenFuture<[static generator@watcher<Secret>::{closure#0}::{closure#0} for<'r, 's, 't0, 't1> {ResumeTy, kube::Api<Secret>, &'r kube::Api<Secret>, ListParams, &'s ListParams, watcher::State<Secret>, impl futures::Future, ()}]>`
   |
   = note: required because it appears within the type `impl futures::Future`
   = note: required because it appears within the type `futures_util::unfold_state::_::__Origin<'_, (kube::Api<Secret>, ListParams, watcher::State<Secret>), impl futures::Future>`
   = note: required because of the requirements on the impl of `Unpin` for `futures_util::unfold_state::UnfoldState<(kube::Api<Secret>, ListParams, watcher::State<Secret>), impl futures::Future>`
   = note: required because it appears within the type `futures::stream::unfold::_::__Origin<'_, (kube::Api<Secret>, ListParams, watcher::State<Secret>), [closure@watcher<Secret>::{closure#0}], impl futures::Future>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::Unfold<(kube::Api<Secret>, ListParams, watcher::State<Secret>), [closure@watcher<Secret>::{closure#0}], impl futures::Future>`
   = note: required because it appears within the type `impl std::marker::Send+futures::Stream`
   = note: required because it appears within the type `futures::stream::try_stream::into_stream::_::__Origin<'_, impl std::marker::Send+futures::Stream>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::IntoStream<impl std::marker::Send+futures::Stream>`
   = note: required because it appears within the type `futures::stream::stream::map::_::__Origin<'_, futures::stream::IntoStream<impl std::marker::Send+futures::Stream>, futures_util::fns::InspectFn<futures_util::fns::InspectOkFn<[closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>>>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::Map<futures::stream::IntoStream<impl std::marker::Send+futures::Stream>, futures_util::fns::InspectFn<futures_util::fns::InspectOkFn<[closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>>>`
   = note: required because it appears within the type `futures::stream::stream::_::__Origin<'_, futures::stream::IntoStream<impl std::marker::Send+futures::Stream>, futures_util::fns::InspectOkFn<[closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::Inspect<futures::stream::IntoStream<impl std::marker::Send+futures::Stream>, futures_util::fns::InspectOkFn<[closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>>`
   = note: required because it appears within the type `futures::stream::try_stream::_::__Origin<'_, impl std::marker::Send+futures::Stream, [closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::InspectOk<impl std::marker::Send+futures::Stream, [closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>`
   = note: required because it appears within the type `impl futures::Stream`

error[E0277]: `from_generator::GenFuture<[static generator@watcher<Secret>::{closure#0}::{closure#0} for<'r, 's, 't0, 't1> {ResumeTy, kube::Api<Secret>, &'r kube::Api<Secret>, ListParams, &'s ListParams, watcher::State<Secret>, impl futures::Future, ()}]>` cannot be unpinned
  --> src/main.rs:23:27
   |
23 |     while let Ok(o) = controller.try_next().await {
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^ within `futures_util::unfold_state::_::__Origin<'_, (kube::Api<Secret>, ListParams, watcher::State<Secret>), impl futures::Future>`, the trait `Unpin` is not implemented for `from_generator::GenFuture<[static generator@watcher<Secret>::{closure#0}::{closure#0} for<'r, 's, 't0, 't1> {ResumeTy, kube::Api<Secret>, &'r kube::Api<Secret>, ListParams, &'s ListParams, watcher::State<Secret>, impl futures::Future, ()}]>`
   |
   = note: required because it appears within the type `impl futures::Future`
   = note: required because it appears within the type `futures_util::unfold_state::_::__Origin<'_, (kube::Api<Secret>, ListParams, watcher::State<Secret>), impl futures::Future>`
   = note: required because of the requirements on the impl of `Unpin` for `futures_util::unfold_state::UnfoldState<(kube::Api<Secret>, ListParams, watcher::State<Secret>), impl futures::Future>`
   = note: required because it appears within the type `futures::stream::unfold::_::__Origin<'_, (kube::Api<Secret>, ListParams, watcher::State<Secret>), [closure@watcher<Secret>::{closure#0}], impl futures::Future>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::Unfold<(kube::Api<Secret>, ListParams, watcher::State<Secret>), [closure@watcher<Secret>::{closure#0}], impl futures::Future>`
   = note: required because it appears within the type `impl std::marker::Send+futures::Stream`
   = note: required because it appears within the type `futures::stream::try_stream::into_stream::_::__Origin<'_, impl std::marker::Send+futures::Stream>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::IntoStream<impl std::marker::Send+futures::Stream>`
   = note: required because it appears within the type `futures::stream::stream::map::_::__Origin<'_, futures::stream::IntoStream<impl std::marker::Send+futures::Stream>, futures_util::fns::InspectFn<futures_util::fns::InspectOkFn<[closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>>>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::Map<futures::stream::IntoStream<impl std::marker::Send+futures::Stream>, futures_util::fns::InspectFn<futures_util::fns::InspectOkFn<[closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>>>`
   = note: required because it appears within the type `futures::stream::stream::_::__Origin<'_, futures::stream::IntoStream<impl std::marker::Send+futures::Stream>, futures_util::fns::InspectOkFn<[closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::Inspect<futures::stream::IntoStream<impl std::marker::Send+futures::Stream>, futures_util::fns::InspectOkFn<[closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>>`
   = note: required because it appears within the type `futures::stream::try_stream::_::__Origin<'_, impl std::marker::Send+futures::Stream, [closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>`
   = note: required because of the requirements on the impl of `Unpin` for `futures::stream::InspectOk<impl std::marker::Send+futures::Stream, [closure@reflector<Secret, impl std::marker::Send+futures::Stream>::{closure#0}]>`
   = note: required because it appears within the type `impl futures::Stream`
   = note: required because of the requirements on the impl of `futures::Future` for `TryNext<'_, impl futures::Stream>`
   = note: required by `futures::Future::poll`

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0277`.
error: could not compile `kube-rs-test`

To learn more, run the command again with --verbose.

From her background she has an understanding what could go wrong. So she remembered, that she could box the values to solve the issue with calling .boxed() on the controller. But on the other hand she could see no reason why this while loop should fail when the original .for_each() example just works as expected.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Working with async can give huge errors from fairly common place transforms, and requires knowing some "not entirely obvious" workarounds.

What are the sources for this story?

  • Personal experience.

Why did you choose Grace to tell this story?

  • Reflects the background of the author.

How would this story have played out differently for the other characters?

  • Ultimately the only way to know how to solve this problem is to have seen it before and learned how to solve it. The compiler doesn't help and the result is not obvious.
  • So it probably doesn't matter that much which character is used, except that Barbara may be more likely to have seen how to solve it.

Status quo: Grace waits for gdb next

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Grace wants to walk through the behavior of a toy program.

She first fires up cargo run --verbose to remind herself what the path to the target binary is. Part of the resulting Cargo output is:

     Running `target/debug/toy`

From that, Grace tries running gdb on the printed path.

    gdb target/debug/toy

and then

(gdb) start

to start the program and set a breakpoint on the main function.

Grace hits Ctrl-x a and gets a TUI mode view that includes this:

โ”‚   52          }                                                                                                                                                                                                                    โ”‚
โ”‚   53                                                                                                                                                                                                                               โ”‚
โ”‚   54          #[tokio::main]                                                                                                                                                                                                       โ”‚
โ”‚B+>55          pub(crate) async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {                                                                                                                                   โ”‚
โ”‚   56              println!("Hello, world!");                                                                                                                                                                                       โ”‚
โ”‚   57              let record = Box::new(Mutex::new(Record::new()));                                                                                                                                                                โ”‚
โ”‚   58              let record = &*Box::leak(record);                                                                                                                                                                                โ”‚
โ”‚   59                                                                                                                                                                                                                              

Excitedly Grace types next to continue to the next line of the function.

And waits. And the program does not stop anywhere.

...

Eventually Grace remembers that #[tokio::main] injects a different main function that isn't the one that she wrote as an async fn, and so the next operation in gdb isn't going to set a breakpoint within Grace's async fn main.

So Grace restarts the debugger, and then asks for a breakpoint on the first line of her function:

(gdb) start
(gdb) break 56
(gdb) continue

And now it stops on the line that she expected:

โ”‚   53                                                                                                                                                                                                                               โ”‚
โ”‚   54          #[tokio::main]                                                                                                                                                                                                       โ”‚
โ”‚   55          pub(crate) async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {                                                                                                                                   โ”‚
โ”‚B+>56              println!("Hello, world!");                                                                                                                                                                                       โ”‚
โ”‚   57              let record = Box::new(Mutex::new(Record::new()));                                                                                                                                                                โ”‚
โ”‚   58              let record = &*Box::leak(record);                                                                                                                                                                                โ”‚
โ”‚   59                                                                                                                                                                                                                               โ”‚
โ”‚   60              let (tx, mut rx) = channel(100);                                                                                                                                                                                 โ”‚

Grace is now able to use next to walk through the main function. She does notice that the calls to tokio::spawn are skipped over by next, but that's not as much of a surprise to her, since those are indeed function calls that are taking async blocks. She sets breakpoints on the first line of each async block so that the debugger will stop when control reaches them as she steps through the code.

๐Ÿค” Frequently Asked Questions

Here are some standard FAQ to get you started. Feel free to add more!

What are the morals of the story?

  • A common usage pattern: hitting next to go to what seems like the next statement, breaks down due to implementation details of #[tokio::main] and async fn.
  • This is one example of where next breaks, in terms of what a user is likely to want. The other common scenario where the behavior of next is non-ideal is higher-order functions, like option.and_then(|t| { ... }, where someone stepping through the code probably wants next to set a temporary breakpoint in the ... of the closure.

What are the sources for this story?

Personal experience. I haven't acquired the muscle memory to stop using next, even though it breaks down in such cases.

Why did you choose Grace to tell this story?

I needed someone who, like me, would actually be tempted to use gdb even when println debugging is so popular.

How would this story have played out differently for the other characters?

* Alan might have used whatever debugger is offered by his IDE, which might have the same problem (via a toolbar button that has the same semantics as `next`); but many people using IDE's to debugger just naturally set breakpoints by hand on the lines in their IDE editor, and thus will not run into this.
* Most characters would probably have abandoned using gdb much sooner. E.g. Grace may have started out by adding `println` or `tracing` instrumention to the code, rather than trying to open it up in a debugger.

๐Ÿ˜ฑ Status quo stories: Grace wants to integrate a C-API

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

The story

Grace is integrating a camera into an embedded project. Grace has done similar projects before in the past, and has even used this particular hardware before. Fortunately, the camera manufacturer provides a library in C to interface with the driver.

Grace knows that Rust provides strong memory safety guarantees, and the library provided by the manufacturer sports an API that is easy to misuse. In particular, ownership concerns are tricky and Grace and her team have often complained in the past that making memory mistakes is very easy and one has to be extremely careful to manage lifetimes. Therefore, for this project, Grace opts to start with Rust as many of the pitfalls of the manufacturer's library can be automatically caught by embedding the lifetimes into a lightweight wrapper over code bridged into Rust with bindgen.

Grace's team manages to write a thin Rust wrapper over the manufacturer's library with little complication. This library fortunately offers two interfaces for grabbing frames from the camera: a blocking interface that waits for the next frame, and a non-blocking interface that polls to check if there are any frames currently available and waiting. Grace is tempted to write a callback-based architecture by relying on the blocking interface that waits; however, early the next morning the customer comes back and informs her that they are scaling up the system, and that there will now be 5 cameras instead of 1.

She knows from experience that she cannot rely on having 5 threads blocking just for getting camera frames, because the embedded system she is deploying to only has 2 cores total! Her team would be introducing a lot of overhead into the system with the continuous context switching of every thread. Some folks were unsure of Rust's asynchronous capabilities, and with the requirements changing there were some that argued maybe they should stick to the tried and true in pure C. However, Grace eventually convinced them that the benefits of memory safety were still applicable, and that a lot of bugs that have taken weeks to diagnose in the past have already been completely wiped out. The team decided to stick with Rust, and dig deeper into implementing this project in async Rust.

Fortunately, Grace notices the similarities between the polling interface in the underlying C library and the Poll type returned by Rust's Future trait. "Surely," she thinks, "I can asynchronously interleave polls to each camera over a single thread, and process frames as they become available!" Such a thing would be quite difficult in C while guaranteeing memory safety was maintained. However, Grace's team has already dodged that bullet thanks to writing a thin wrapper in Rust that manages these tricky lifetimes!

The first problem: polls and wake-ups

Grace sets out to start writing the pipeline to get frames from the cameras. She realizes that while the polling call that the manufacturer provided in their library is similar in nature to a future, it doesn't quite encompass everything. In C, one might have to set some kind of heartbeat timer for polling. Grace explains to her team that this heartbeat is similar to how the Waker object works in a Future's Context type, in that it is how often the execution environment should re-try the future if the call to poll returns Poll::Pending.

A member of Grace's team asks her how she was able to understand all this. After all, Grace had been writing Rust about as long as the rest of her team. The main difference was that she had many more years of systems programming under C and C++ under her belt than they had. Grace responded that for the most part she had just read the documentation for the Future trait, and that she had intuited how async-await de-sugars itself into a regular function that returns a future of some kind. The de-sugaring process was, after all, very similar to how lambda objects in C++ were de-sugared as well. She leaves her teammate with an article she once found online that explained the process in a lot more detail for a problem much harder than they were trying to solve.

Something Grace and her team learn to love immediately about Rust is that writing the Future here does not require her team to write their own execution environment. In fact, the future can be entirely written independently of the execution environment. She quickly writes an async method to represent the polling process:


#![allow(unused)]
fn main() {
/// Gets the next frame from the camera, waiting `retry_after` time until polling again if it fails.
///
/// Returns Some(frame) if a frame is found, or None if the camera is disconnected or goes down before a frame is
/// available.
async fn next_frame(camera: &Camera, retry_after: Duration) -> Option<Frame> {
    while camera.is_available() {
        if let Some(frame) = camera.poll() {
            return Some(frame);
        } else {
            task::sleep_for(retry_after).await;
        }
    }

    None
}
}

The underlying C API doesn't provide any hooks that can be used to wake the Waker object on this future up, so Grace and her team decide that it is probably best if they just choose a sufficiently balanced retry_after period in which to try again. It does feel somewhat unsatisfying, as calling sleep_for feels about as hacky as calling std::this_thread::sleep_for in C++. However, there is no way to directly interoperate with the waker without having a separate thread of execution wake it up, and the underlying C library doesn't have any interface offering a notification for when that should be. In the end, this is the same kind of code that they would write in C, just without having to implement a custom execution loop themselves, so the team decides it is not a total loss.

The second problem: doing this many times

Doing this a single time is fine, but an end goal of the project is to be able to stream frames from the camera for unspecified lengths of time. Grace spends some time searching, and realizes that what she actually wants is a Stream of some kind. Stream objects are the asynchronous equivalent of iterators, and her team wants to eventually write something akin to:


#![allow(unused)]
fn main() {
let frame_stream = stream_from_camera(camera, Duration::from_millis(5));

while let Some(frame) = frame_stream.next().await {
    // process frames
}

println!("Frame stream closed.");
}

She scours existing crates, in particular looking for one way to transform the above future into a stream that can be executed many times. The only available option to transform a future into a series of futures is stream::unfold, which seems to do exactly what Grace is looking for. Grace begins by adding a small intermediate type, and then plugging in the remaining holes:


#![allow(unused)]
fn main() {
struct StreamState {
    camera: Camera,
    retry_after: Duration,
}

fn stream_from_camera(camera: Camera, retry_after: Duration) -> Unfold<Frame, ??, ??> {
    let initial_state = StreamState { camera, retry_after };

    stream::unfold(initial_state, |state| async move {
        let frame = next_frame(&state.camera, state.retry_after).await
        (frame, state)
    })
}
}

This looks like it mostly hits the mark, but Grace is left with a couple of questions for how to get the remainder of this building:

  1. What is the type that fills in the third template parameter in the return? It should be the type of the future that is returned by the async closure passed into stream::unfold, but we don't know the type of a closure!
  2. What is the type that fills in the second template parameter of the closure in the return?

Grace spends a lot of time trying to figure out how she might find those types! She asks Barbara what the idiomatic way to get around this in Rust would be. Barbara explains again how closures don't have concrete types, and that the only way to do this will be to use the impl keyword.


#![allow(unused)]
fn main() {
fn stream_from_camera(camera: Camera, retry_after: Duration) -> impl Stream<Item = Frame> {
    // same as before
}
}

While Grace was was on the correct path and now her team is able to write the code they want to, she realizes that sometimes writing the types out explicitly can be very hard. She reflects on what it would have taken to write the type of an equivalent function pointer in C, and slightly laments that Rust cannot express such as clearly.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • Rust was the correct choice for the team across the board thanks to its memory safety and ownership. The underlying C library was just too complex for any single programmer to be able to maintain in their head all at once while also trying to accomplish other tasks.
  • Evolving requirements meant that the team would have had to either start over in plain C, giving up a lot of the safety they would gain from switching to Rust, or exploring async code in a more rigorous way.
  • The async code is actually much simpler than writing the entire execution loop in C themselves. However, the assumption that you would write the entire execution loop is baked into the underlying library which Grace's team cannot rewrite entirely from scratch. Integrating Rust async code with other languages which might have different mental models can sometimes lead to unidiomatic or unsatisfying code, even if the intent of the code in Rust is clear.
  • Grace eventually discovered that the problem was best modeled as a stream, rather than as a single future. However, converting a future into a stream was not necessarily something that was obvious for someone with a C/C++ background.
  • Closures and related types can be very hard to write in Rust, and if you are used to being very explicit with your types, tricks such as the impl trick above for Streams aren't immediately obvious at first glance.

What are the sources for this story?

My own personal experience trying to incorporate the Intel RealSense library into Rust.

Why did you choose Grace to tell this story?

  • I am a C++ programmer who has written many event / callback based systems for streaming from custom camera hardware. I mirror Grace in that I am used to using other systems languages, and even rely on libraries in those languages as I've moved to Rust. I did not want to give up the memory and lifetime benefits of Rust because of evolving runtime requirements.
  • In particular, C and C++ do not encourage async-style code, and often involve threads heavily. However, some contexts cannot make effective use of threads. In such cases, C and C++ programmers are often oriented towards writing custom execution loops and writing a lot of logic to do so. Grace discovered the benefit of not having to choose an executor upfront, because the async primitives let her express most of the logic without relying on a particular executor's behaviour.

How would this story have played out differently for the other characters?

  • Alan would have struggled with understanding the embedded context of the problem, where GC'd languages don't see much use.
  • Niklaus and Barbara may not have approached the problem with the same assimilation biases from C and C++ as Grace. Some of the revelations in the story such as discovering that Grace's team didn't have to write their own execution loop were unexpected benefits when starting down the path of using Rust!

Could Grace have used another runtime to achieve the same objectives?

Grace can use any runtime, which was an unexpected benefit of her work!

How did Grace know to use Unfold as the return type in the first place?

She saw it in the rustdoc for stream::unfold.

๐Ÿ˜ฑ Status quo stories: Niklaus Builds a Hydrodynamics Simulator

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Problem

Niklaus is a professor of physics at the University of Rustville. He needed to build a tool to solve hydrodynamics simulations; there is a common method for this that subdivides a region into a grid and computes the solution for each grid patch. All the patches in a grid for a point in time are independent and can be computed in parallel, but they are dependent on neighboring patches in the previously computed frame in time. This is a well known computational model and the patterns for basic parallelization are well established.

Niklaus wanted to write a performant tool to compute the solutions to the simulations of his research. He chose Rust because he needed high performance but he also wanted something that could be maintained by his students, who are not professional programmers. Rust's safety guarantees giver him confidence that his results are not going to be corrupted by data races or other programming errors. After implementing the core mathematical formulas, Niklaus began implementing the parallelization architecture.

His first attempt to was to emulate a common CFD design pattern: using message passing to communicate between processes that are each assigned a specific patch in the grid. So he assign one thread to each patch and used messages to communicate solution state to dependent patches. With one thread per patch this usually meant that there were 5-10x more threads than CPU cores.

This solution worked, but Niklaus had two problems with it. First, it gave him no control over CPU usage so the solution would greedily use all available CPU resources. Second, using messages to communicate solution values between patches did not scale when his team added a new feature (tracer particles) the additional messages caused by this change created so much overhead that parallel processing was no faster than serial. So, Niklaus decided to find a better solution.

Solution Path

To address the first problem: Niklaus' new design decoupled the work that needed to be done (solving physics equations for each patch in the grid) from the workers (threads), this would allow him to set the number of threads and not use all the CPU resources. So, he began looking for a tool in Rust that would meet this design pattern. When he read about async and how it allowed the user to define units of work and send those to an executor which would manage the execution of those tasks across a set of workers, he thought he'd found exactly what he needed. He also thought that the .await semantics would give a much better way of coordinating dependencies between patches. Further reading indicated that tokio was the runtime of choice for async in the community and, so, he began building a new CFD solver with async and tokio.

After making some progress, Niklaus ran into his firts problem. Niklaus had been under a false impression about what async executors do. He had assumed that a multi-threaded executor could automatically move the execution of an async block to a worker thread. When this turned out to wrong, he went to Stackoverflow and learned that async tasks must be explicitly spawned into a thread pool if they are to be executed on a worker thread. This meant that the algorithm to be parallelized became strongly coupled to both the spawner and the executor. Code that used to cleanly express a physics algorithm now had interspersed references to the task spawner, not only making it harder to understand, but also making it impossible to try different execution strategies, since with Tokio the spawner and executor are the same object (the Tokio runtime). Niklaus felt that a better design for data parallelism would enable better separation of concerns: a group of interdependent compute tasks, and a strategy to execute them in parallel.

Niklaus second problem came as he tried to fully replace the message passing from the first design: sharing data between tasks. He used the async API to coordinate computation of patches so that a patch would only go to a worker when all its dependencies had completed. But he also needed to account for the solution data which was passed in the messages. He setup a shared data structure to track the solutions for each patch now that messages would not be passing that data. Learning how to properly use shared data with async was a new challenge. The initial design:


#![allow(unused)]
fn main() {
    let mut stage_primitive_and_scalar = |index: BlockIndex, state: BlockState<C>, hydro: H, geometry: GridGeometry| {
        let stage = async move {
            let p = state.try_to_primitive(&hydro, &geometry)?;
            let s = state.scalar_mass / &geometry.cell_volumes / p.map(P::lorentz_factor);
            Ok::<_, HydroError>( ( p.to_shared(), s.to_shared() ) )
        };
        stage_map.insert(index, runtime.spawn(stage).map(|f| f.unwrap()).shared());
    };
}

lacked performance because he needed to clone the value for every task. So, Niklaus switched over to using Arc to keep a thread safe RC to the shared data. But this change introduced a lot of .map and .unwrap function calls, making the code much harder to read. He realized that managing the dependency graph was not intuitive when using async for concurrency.

As the program matured, a new problem arose: a steep learning curve with error handling. The initial version of his design used panic!s to fail the program if an error was encountered, but the stack traces were almost unreadable. He asked his teammate Grace to migrate over to using the Result idiom for error handling and Grace found a major inconvenience. The Rust type inference inconsistently breaks when propagating Result in async blocks. Grace frequently found that she had to specify the type of the error when creating a result value:


#![allow(unused)]
fn main() {
Ok::<_, HydroError>( ( p.to_shared(), s.to_shared() ) )  
}

And she could not figure out why she had to add the ::<_, HydroError> to some of the Result values.

Finally, once Niklaus' team began using the new async design for their simulations, they noticed an important issue that impacted productivity: compilation time had now increased to between 30 and 60 seconds. The nature of their work requires frequent changes to code and recompilation and 30-60 seconds is long enough to have a noticeable impact on their quality of life. What he and his team want is for compilation to be 2 to 3 seconds. Niklaus believes that the use of async is a major contributor to the long compilation times.

This new solution works, but Niklaus is not satisfied with how complex his code became after the move to async and that compilation time is now 30-60 seconds. The state sharing adding a large amount of cruft with Arc and async is not well suited for using a dependency graph to schedule tasks so implementing this solution created a key component of his program that was difficult to understand and pervasive. Ultimately, his conclusion was that async is not appropriate for parallelizing computational tasks. He will be trying a new design based upon Rayon in the next version of her application.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

  • async looks to be the wrong choice for parallelizing compute bound/computational work
  • There is a lack of guidance to help people solving such problems get started on the right foot
  • Quality of Life issues (compilation time, type inference on Result) can create a drag on users ability to focus on their domain problem

What are the sources for this story?

This story is based on the experience of building the kilonova hydrodynamics simulation solver.

Why did you choose Niklaus and Grace to tell this story?

I chose Niklaus as the primary character in this story because this work was driven by someone who only uses programming for a small part of their work. Grace was chosen as a supporting character because of that persons experience with C/C++ programming and to avoid repeating characters.

How would this story have played out differently for the other characters?

  • Alan: there's a good chance he would have already had experience working with either async workflows in another language or doing parallelization of compute bound tasks; and so would already know from experience that async was not the right place to start.
  • Grace: likewise, might already have experience with problems like this and would know what to look for when searching for tools.
  • Barbara: the experience would likely be fairly similar, since the actual subject of this story is quite familiar with Rust by now

๐Ÿ˜ฑ Status quo stories: Niklaus Wants to Share Knowledge

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!

The story

Niklaus, who sometimes goes by the pen name "Starol Klichols", has authored some long-form documentation about Rust that people have found helpful. One could even go so far as to call this documentation a "book".

Niklaus has typically minimized the use of crates in documentation like this as much as possible. Niklaus has limited time to dedicate to keeping the documentation up to date, and given the speed at which the ecosystem sometimes evolves, it's hard to keep up when crates are involved. Also, Niklaus would like to avoid limiting the readership of the documentation to the users of a particular crate only, and would like to avoid any accusations of favoritism.

But Niklaus would really really like to document async to avoid disappointing people like Barbara!

Niklaus was excited about the RFC proposing that block_on be added to the stdlib, because it seemed like that would solve Niklaus' problems. Niklaus would really like to include async in a big update to the documentation. No pressure.

๐Ÿค” Frequently Asked Questions

What are the morals of the story?

Writing documentation to go with the language/stdlib for something that is half in the language/stdlib and half in the ecosystem is hard. This is related to Barbara's story about wanting to get started without needing to pick an executor. There are topics of async that apply no matter what executor you pick, but it's hard to explain those topics without picking an executor to demonstrate with. We all have too much work to do and not enough time.

What are the sources for this story?

Why did you choose Niklaus to tell this story?

Niko said I couldn't add new characters.

How would this story have played out differently for the other characters?

I happen to know that the next version of Programming Rust, whose authors might be described as different characters, includes async and uses async-std. So it's possible to just pick an executor and add async to the book, but I don't wanna.

โœจ Shiny future: Where we want to get to

๐Ÿšง Under construction! Help needed! ๐Ÿšง

We are still in the process of drafting the vision document. The stories you see on this page are examples meant to give a feeling for how a shiny future story looks; you can expect them to change. We encourage you to propose your own by opening a PR -- see the "How to vision" page for instructions and details.

What it this

The "shiny future" is here to tell you what we are trying to build over the next 2 to 3 years. That is, it presents our "best guess" as to what will look like a few years from now. When describing specific features, it also embeds links to design notes that describe the constraints and general plans around that feature.

๐Ÿง You may also enjoy reading the blog post announcing the brainstorming effort.

Think big -- too big, if you have to

You'll notice that the ideas in this document are maximalist and ambitious. They stake out an opinionated position on how the ergonomics of Async I/O should feel. This position may not, in truth, be attainable, and for sure there will be changes along the way. Sometimes the realities of how computers actually work may prevent us from doing all that we'd like to. That's ok. This is a dream and a goal.

We fully expect that the designs and stories described in this document will change as we work towards realizing them. When there are areas of particular uncertainty, we use the Frequently Asked Questions and the design docs to call them out.

Where are the stories?

We haven't written these yet!

โœจ Shiny future stories: template

This is a template for adding new "shiny future" stories. To propose a new shiny future PR, do the following:

  • Create a new file in the shiny_future directory named something like Alan_loves_foo.md or Grace_does_bar_and_its_great.md, and start from the raw source from this template. You can replace all the italicized stuff. :)
  • Do not add a link to your story to the SUMMARY.md file; we'll do it after merging, otherwise there will be too many conflicts.

For more detailed instructions, see the How To Vision: Shiny Future page!

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "shiny future" story submitted as part of the brainstorming period. It is derived from what actual Rust users wish async Rust should be, and is meant to deal with some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as peoples needs and desires for async Rust may differ greatly, shiny future stories cannot be wrong. At worst they are only useful for a small set of people or their problems might be better solved with alternative solutions). Alternatively, you may wish to add your own shiny vision story!

The story

Write your story here! Feel free to add subsections, citations, links, code examples, whatever you think is best.

๐Ÿค” Frequently Asked Questions

NB: These are generic FAQs. Feel free to customize them to your story or to add more.

What status quo stories are you retelling?

Link to status quo stories if they exist. If not, that's ok, we'll help find them.

What are the key attributes of this shiny future?

Summarize the main attributes of the design you were trying to convey.

What is the "most shiny" about this future?

Thing about Rust's core "value propositions": performance, safety and correctness, productivity. Which benefit the most relative to today?

What are some of the potential pitfalls about this future?

Thing about Rust's core "value propositions": performance, safety and correctness, productivity. Are any of them negatively impacted? Are there specific application areas that are impacted negatively? You might find the sample projects helpful in this regard, or perhaps looking at the goals of each character.

Did anything surprise you when writing this story? Did the story go any place unexpected?

The act of writing shiny future stories can uncover things we didn't expect to find. Did you have any new and exciting ideas as you were writing? Realize some complications that you didn't foresee?

What are some variations of this story that you considered, or that you think might be fun to write? Have any variations of this story already been written?

Often when writing stories, we think about various possibilities. Sketch out some of the turning points here -- maybe someone will want to turn them into a full story! Alternatively, if this is a variation on an existing story, link back to it here.

What are some of the things we'll have to figure out to realize this future? What projects besides Rust itself are involved, if any? (Optional)

Often the 'shiny future' stories involve technical problems that we don't really know how to solve yet. If you see such problems, list them here!

โœจ Shiny future stories: Alan switches runtimes

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "shiny future" story submitted as part of the brainstorming period. It is derived from what actual Rust users wish async Rust should be, and is meant to deal with some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as peoples needs and desires for async Rust may differ greatly, shiny future stories cannot be wrong. At worst they are only useful for a small set of people or their problems might be better solved with alternative solutions). Alternatively, you may wish to add your own shiny vision story!

The story

Since his early adventures with Async I/O went so well, Alan has been looking for a way to learn more. He finds a job working in Rust. One of the projects he works on is DistriData. Looking at their code, he sees an annotation he has never seen before:

#[humboldt::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let result = std::async_thread::spawn(async move {
        do_something()
    });
}

He asks Barbara, one of his coworkers, "What is this humboldt::main annotation? What's humboldt?" She answers by explaining to him that Rust's support for async I/O is actually based around an underlying runtime. "Rust gives you a pretty decent runtime by default," she says, "but it's not tuned for our workloads. We wrote our own runtime, which we call humboldt."

Alan asks, "What happens with the various std APIs? For example, I see we are calling std::async_thread::spawn -- when I used that before, it spawned tasks into the default runtime. What happens now?"

Barbara explains that the "async" APIs in std generally execute relative to the current runtime that is in use. "When you call std::async_thread::spawn, it will spawn a task onto the current runtime. It's the same with the routines in std::async_io and so forth. The humboldt::main annotation actually just creates a synchronous main function that initializes the humboldt runtime and launches the first future. When you just write an async fn main without any annotation, the compiler synthesizes the same main function with the default runtime."

Learning more about Humboldt

Alan sees that some of the networking code that is being used in their application is creating network connections using humboldt APIs:


#![allow(unused)]
fn main() {
use humboldt::network;
}

He asks Barbara, "Why don't we use the std::async_io APIs for that?" She explains that Humboldt makes use of some custom kernel extensions that, naturally enough, aren't part of the std library. "TCP is for rubes," she says, "we are using TTCP -- Turbo TCP." Her mind wanders briefly to Turbo Pascal and she has a brief moment of yearning for the days when computers had a "Turbo" button that changed them from 8 MHz to 12 MHz. She snaps back into the present day. "Anyway, the std::async_io APIs just call into humboldt's APIs via various traits. But we can code directly against humboldt when we want to access the extra capabilities it offers. That does make it harder to change to another runtime later, though."

Integrating into other event loops

Later on, Alan is working on a visualizer front-end that integrates with DistriData to give more details about their workloads. To do it, he needs to integrate with Cocoa APIs and he wants to run certain tasks on Grand Central Dispatch. He approaches Barbara and asks, "If everything is running on humboldt, is there a way for me to run some things on another event loop? How does that work?"

Barbara explains, "That's easy. You just have to use the gcd wrapper crate -- you can find it on crates.io. It implements the runtime traits for gcd and it has a spawn method. Once you spawn your task onto gcd, everything you run within gcd will be running in that context."

Alan says, "And so, if I want to get things running on humboldt again, I spawn a task back on humboldt?"

"Exactly," says Barbara. "Humboldt has a global event loop, so you can do that by just doing humboldt::spawn. You can also just use the humboldt::io APIs directly. They will always use the Humboldt I/O threads, rather than using the current runtime."

Alan winds up with some code that looks like this:


#![allow(unused)]
fn main() {
async fn do_something_on_humboldt() {
    gcd::spawn(async move {
        let foo = do_something_on_gcd();

        let bar = humboldt::spawn(async move {
            do_a_little_bit_of_stuff_on_humboldt();
        });

        combine(foo.await, bar.await);
    });
}
}

๐Ÿค” Frequently Asked Questions

What status quo story or stories are you retelling?

Good question! I'm not entirely sure! I have to go looking and think about it. Maybe we'll have to write some more.

What are the key points you were trying to convey with this status quo story?

  • There is some way to seamlessly change to a different default runtime to use for async fn main.
  • There is no global runtime, just the current runtime.
  • When you are using this different runtime, you can write code that is hard-coded to it and which exposes additional capabilities.
  • You can integrate multiple runtimes relatively easily, and the std APIs work with whichever is the current runtime.

How do you imagine the std APIs and so forth know the current runtime?

I was imagining that we would add fields to the Context<'_> struct that is supplied to each async fn when it runs. Users don't have direct access to this struct, but the compiler does. If the std APIs return futures, they would gain access to it that way as well. If not, we'd have to create some other mechanism.

What happens for runtimes that don't support all the features that std supports?

That feels like a portability question. See the (yet to be written) sequel story, "Alan runs some things on WebAssembly". =)

What is Alan most excited about in this future? Is he disappointed by anything?

Alan is excited about how easy it is to get async programs up and running, and he finds that they perform pretty well once he does so, so he's happy.

What is Grace most excited about in this future? Is she disappointed by anything?

Grace is concerned with memory safety and being able to deploy her tricks she knows from other languages. Memory safety works fine here. In terms of tricks she knows and loves, she's happy that she can easily switch to another runtime. The default runtime is good and works well for most things, but for the [DistriData] project, they really need something tailored just for them. She is also happy she can use the extended APIs offered by humboldt.

What is Niklaus most excited about in this future? Is he disappointed by anything?

Niklaus finds it async Rust quite accessible, for the same reasons cited as in "Alan's Trust in the Rust Compiler is Rewarded".

What is Barbara most excited about in this future? Is she disappointed by anything?

Depending on the technical details, Barbara may be a bit disappointed by the details of how std interfaces with the runtimes, as that may introduce some amount of overhead. This may not matter in practice, but it could also lead to library authors avoiding the std APIs in favor of writing generics or other mechanisms that are "zero overhead".

What projects benefit the most from this future?

Projects like DistriData really benefit from being able to customize their runtime.

Are there any projects that are hindered by this future?

We have to pay careful attention to embedded projects like MonsterMesh. Some of the most obvious ways to implement this future would lean on dyn types and perhaps boxing, and that would rule out some embedded projects. Embedded runtimes like embassy are also the most different in their overall design and they would have the hardest time fitting into the std APIs (of course, many embedded projects are already no-std, but many of them make use of some subset of the std capabilities through the facade). In general, traits and generic functions in std could lead to larger code size, as well.

What are the incremental steps towards realizing this shiny future?

There are a few steps required to realize this future:

  • We have to determine the core mechanism that is used for std types to interface with the current scheduler.
    • Is it based on dynamic dispatch? Delayed linking? Some other tricks we have yet to invent?
    • Depending on the details, language changes may be required.
  • We have to hammer out the set of traits or other interfaces used to define the parts of a runtime (see below for some of the considerations).
    • We can start with easier cases and proceed to more difficult ones, however.

Does realizing this future require cooperation between many projects?

Yes. We will need to collaborate to define traits that std can use to interface with each runtime, and the runtimes will need to implement those traits. This is going to be non-trivial, because we want to preserve the ability for independent runtimes to experiment, while also preserving the ability to "max and match" and re-use components. For example, it'd probably be useful to have a bunch of shared I/O infrastructure, or to have utility crates for locks, for running threadpools, and the like. On the other hand, tokio takes advantage of the fact that it owns the I/O types and the locks and the scheduler to do some nifty tricks and we would want to ensure that remains an option.

โœจ Shiny future stories: Alan's trust in the compiler is rewarded

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "shiny future" story submitted as part of the brainstorming period. It is derived from what actual Rust users wish async Rust should be, and is meant to deal with some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as peoples needs and desires for async Rust may differ greatly, shiny future stories cannot be wrong. At worst they are only useful for a small set of people or their problems might be better solved with alternative solutions). Alternatively, you may wish to add your own shiny vision story!

The story

Trust the compiler

Alan has a lot of experience in C#, but in the meantime has created some successful projects in Rust. He has dealt with his fair share of race conditions/thread safety issues during runtime in C#, but is now starting to trust that if his Rust code compiles, he won't have those annoying runtime problems to deal with.

This allows him to try to squeeze his programs for as much performance as he wants, because the compiler will stop him when he tries things that could result in runtime problems. After seeing the performance and the lack of runtime problems, he starts to trust the compiler more and more with each project finished.

He knows what he can do with external libraries, he does not need to fear concurrency issues if the library cannot be used from multiple threads, because the compiler would tell him.

His trust in the compiler solidifies further the more he codes in Rust.

The first async project

Alan now starts with his first async project. He opens up the Rust book to the "Async I/O" chapter and it guides him to writing his first program. He starts by writing some synchronous code to write to the file system:

use std::fs::File;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = File::create("a.txt")?;
    file.write_all(b"Hello, world!")?;
    Ok(())
}

Next, he adapts that to run in an async fashion. He starts by converting main into async fn main:

use std::fs::File;

async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = File::create("a.txt")?;
    file.write_all(b"Hello, world!")?;
    Ok(())
}

The code compiles, but he gets a warning:

warning: using a blocking API within an async function
 --> src/main.rs:4:25
1 | use std::fs::File;
  |     ------------- try changing to `std::async_io::fs::File`
  | ...
4 |     let mut file: u32 = File::create("a.txt")?;
  |                         ^^^^^^^^^^^^ blocking functions should not be used in async fn
help: try importing the async version of this type
 --> src/main.rs:1
1 | use std::async_fs::File;

"Oh, right," he says, "I am supposed to use the async variants of the APIs." He applies the suggested fix in his IDE, and now his code looks like:

use std::async_fs::File;

async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = File::create("a.txt")?;
    file.write_all(b"Hello, world!")?;
    Ok(())
}

His IDE recompiles instantaneously and he now sees two little squiggles, one under each ?. Clicking on the errors, he sees:

error: missing await
 --> src/main.rs:4:25
4 |     let mut file: u32 = File::create("a.txt")?;
  |                                              ^ returns a future, which requires an await
help: try adding an await
 --> src/main.rs:1
4 |     let mut file: u32 = File::create("a.txt").await?;

He again applies the suggested fix, and his code now shows:

use std::async_fs::File;

async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = File::create("a.txt").await?;
    file.write_all(b"Hello, world!").await?;
    Ok(())
}

Happily, it compiles, and when he runs it, everything works as expected. "Cool," he thinks, "this async stuff is pretty easy!"

Making some web requests

Next, Alan decides to experiment with some simple web requests. This isn't part of the standard library, but the fetch_rs package is listed in the Rust book. He runs cargo add fetch_rs to add it to his Cargo.toml and then writes:

use std::async_fs::File;
use fetch_rs;

async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = File::create("a.txt")?;
    file.write_all(b"Hello, world!")?;

    let body = fetch_rs::get("https://www.rust-lang.org")
        .await?
        .text()
        .await?;
    println!("{}", body);

    Ok(())
}

This feels pretty easy!

๐Ÿค” Frequently Asked Questions

What status quo story or stories are you retelling?

What are the key points you were trying to convey with this status quo story?

  • Getting started with async should be as automated as possible:
    • change main to an async fn;
    • use the APIs found in modules like std::async_foo, which should map as closely as possible to their non-async equivalents.
  • You should get some sort of default runtime that is decent
  • Lints should guide you in using async:
    • identifying blocking functions
    • identifying missing await
  • You should be able to grab libraries from the ecosystem and they should integrate with the default runtime without fuss

Is there a "one size fits all" runtime in this future?

This particular story doesn't talk about what happens when the default runtime isn't suitable. But you may want to read its sequel, "Alan Switches Runtimes".

What is Alan most excited about in this future? Is he disappointed by anything?

Alan is excited about how easy it is to get async programs up and running. He also finds the performance is good. He's good.

What is Grace most excited about in this future? Is she disappointed by anything?

Grace is happy because she is getting strong safety guarantees and isn't getting surprising runtime panics when composing libraries. The question of whether she's able to use the tricks she knows and loves is a good one, though. The default scheduler may not optimize for maximum performance -- this is something to explore in future stories. The "Alan Switches Runtimes", for example, talks more about the ability to change runtimes.

What is Niklaus most excited about in this future? Is he disappointed by anything?

Niklaus is quite happy. Async Rust is fairly familiar and usable for him. Further, the standard library includes "just enough" infrastructure to enable a vibrant crates-io ecosystem without centralizing everything.

What is Barbara most excited about in this future? Is she disappointed by anything?

Barbara quite likes that the std APIs for sync and sync fit together, and that there is a consistent naming scheme across them. She likes that there is a flourishing ecosystem of async crates that she can choose from.

What projects benefit the most from this future?

A number of projects benefit:

  • Projects like YouBuy are able to get up and going faster.
  • Libraries like SLOW become easier because they can target the std APIs and there is a defined plan for porting across runtimes.

Are there any projects that are hindered by this future?

It depends on the details of how we integrate other runtimes. If we wound up with a future where most libraries are "hard-coded" to a single default runtime, this could very well hinder any number of projects, but nobody wants that.

What are the incremental steps towards realizing this shiny future?

This question can't really be answered in isolation, because so much depends on the story for how we integrate with other runtimes. I don't think we can accept a future where is literally a single runtime that everyone has to use, but I wanted to pull out the question of "non-default runtimes" (as well as more details about the default) to other stories.

Does realizing this future require cooperation between many projects?

Yes. For external libraries like fetch_rs to interoperate they will want to use the std APIs (and probably traits).

โœจ Shiny future stories: Barbara enjoys her async-sync-async sandwich :sandwich:

:::warning Alternative titles:

  • Barbara enjoys her async-sync-async sandwich :sandwich:
  • Barbara recursively blocks
  • Barbara blocks and blocks and blocks :::

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "shiny future" story submitted as part of the brainstorming period. It is derived from what actual Rust users wish async Rust should be, and is meant to deal with some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as peoples needs and desires for async Rust may differ greatly, shiny future stories cannot be wrong. At worst they are only useful for a small set of people or their problems might be better solved with alternative solutions). Alternatively, you may wish to add your own shiny vision story!

The story

Barbara wants to customize a permissions lookup when accepting requests. The library defines a trait PermitRequest, to allow the user to define their own rules. Nice!


#![allow(unused)]
fn main() {
trait PermitRequest {}
}

She starts small, to get her feet wet.


#![allow(unused)]
fn main() {
struct Always;

impl PermitRequest for Always {
    fn permit(&self, _: &Request) -> bool {
        true
    }
}
}

All requests are permitted! Simple, but now to actually to implement the permissions logic.

One of the basic rules Barbara has is to check the request for the existence of a header, but the function is written as async, since Barbara figured it might need to be eventually.


#![allow(unused)]
fn main() {
async fn req_has_header(req: &Request) -> bool {
    req.headers().contains_key("open-sesame")
}
}

When Barbara goes to implement the PermitRequest trait, she realizes a problem: the trait did not think permissions would require an async lookup, so its method is not async. Barbara tries the easiest thing first, hoping that she can just block on the future.


#![allow(unused)]
fn main() {
struct HasHeader;

impl PermitRequest for HasHeader {
    fn permit(&self, req: &Request) -> bool {
        task::block_on(req_has_header(req))
    }
}
}

When Barbara goes to run the code, it works! Even though she was already running an async runtime at the top level, trying to block on this task didn't panic or deadlock. This is because the runtime optimistically hoped the future would be available without needing to go to sleep, and so when it found the currently running runtime, it re-used it to run the future.

The compiler does emit a warning, thanks to a blocking lint (link to shiny future when written). It let Barbara know this could have performance problems, but she accepts the trade offs and just slaps a #[allow(async_blocking)] attribute in there.

Barbara, now energized that things are looking good, writes up the other permission strategy for her application. It needs to fetch some configuration from another server based on a request header, and to keep it snappy, she limits it with a timeout.


#![allow(unused)]
fn main() {
struct FetchConfig;

impl PermitRequest for FetchConfig {
    fn permit(&self, req: &Request) -> bool {
        let token = req.headers().get("authorization");
        
        #[allow(async_blocking)]
        task::block_on(async {
            select! {
                resp = fetch::get(CONFIG_SERVER).param("token", token) => {
                    resp.status() == 200
                },
                _ = time::sleep(2.seconds()) => {
                    false
                }
            }
        })
    }
}
}

This time, there's no compiler warning, since Barbara was ready for that. And running the code, it works as expected. The runtime was able to reuse the IO and timer drivers, and not need to disrupt other tasks.

However, the runtime chose to emit a runtime log at the warning level, informing her that while it was able to make the code work, it could have degraded behavior if the same parent async code were waiting on this and another async block, such as via join!. In the first case, since the async code was ready immediately, no actual harm could have happened. But this time, since it had to block the task waiting on a timer and IO, the log was emitted.

Thanks to the runtime warning, Barbara does some checking that the surround code won't be affected, and once sure, is satisfied that it was easier than she thought to make an async-sync-async sandwich.

๐Ÿค” Frequently Asked Questions

What status quo stories are you retelling?

While this story isn't an exact re-telling of an existing status quo, it covers the morals of a couple:

What are the key attributes of this shiny future?

  • block_on tries to be forgiving and optimistic of nested usage.
    • It does a best effort to "just work".
  • But at the same time, it provides information to the user that it might not always work out.
    • A compiletime lint warns about the problem in general.
      • This prods a user to try to use .await instead of block_on if they can.
    • A runtime log warns when the usage could have reacted badly with other code.
      • This gives the user some more information if a specific combination degrades their application.

What is the "most shiny" about this future?

It significantly increases the areas where block_on "just works", which should improve productivity.

What are some of the potential pitfalls about this future?

  • While this shiny future tries to be more forgiving when nesting block_on, the author couldn't think of a way to completely remove the potential dangers therein.
  • By making it easier to nest block_on, it might increase the times a user writes code that degrades in performance.
    • Some runtimes would purposefully panic early to try to encourage uses to pick a different design that wouldn't degrade.
    • However, by keeping the warnings, hopefully users can evaluate the risks themselves.

Thing about Rust's core "value propositions": performance, safety and correctness, productivity. Are any of them negatively impacted? Are there specific application areas that are impacted negatively? You might find the sample projects helpful in this regard, or perhaps looking at the goals of each character.

Did anything surprise you when writing this story? Did the story go any place unexpected?

No.

What are some variations of this story that you considered, or that you think might be fun to write? Have any variations of this story already been written?

A variation would be an even more optimistic future, where we are able to come up with a technique to completely remove all possible bad behaviors with nested block_on. The author wasn't able to think of how, and it seems like the result would be similar to just being able to .await in every context, possibly implicitly.

What are some of the things we'll have to figure out to realize this future? What projects besides Rust itself are involved, if any? (Optional)

  • A runtime would need to be modified to be able to lookup through a thread-local or similar whether a runtime instance is already running.
  • A runtime would need some sort of block_in_place mechanism.
  • We could make a heuristic to guess when block_in_place would be dangerous.
    • If the runtime knows the task's waker has been cloned since the last time it was woken, then probably the task is doing something like join! or select!.
    • Then we could emit a warning like "nested block_on may cause problems when used in combination with join! or select!"
    • The heuristic wouldn't work if the nested block_on were part of the first call of a join!/select!.
    • Maybe a warning regardless is a good idea.
    • Or a lint, that a user can #[allow(nested_block_on)], at their own peril.
  • This story uses a generic task::block_on, to not name any specific runtime. It doesn't specifically assume that this could work cross-runtimes, but maybe a shinier future would assume it could?
  • This story referes to a lint in a proposed different shiny future, which is not yet written.

โœจ Shiny future stories: Barbara makes a wish

๐Ÿšง Warning: Draft status ๐Ÿšง

This is a draft "shiny future" story submitted as part of the brainstorming period. It is derived from what actual Rust users wish async Rust should be, and is meant to deal with some of the challenges that Async Rust programmers face today.

If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as peoples needs and desires for async Rust may differ greatly, shiny future stories cannot be wrong. At worst they are only useful for a small set of people or their problems might be better solved with alternative solutions). Alternatively, you may wish to add your own shiny vision story!

The story

Barbara has an initial prototype of a new service she wrote in sync Rust. She then decides, since the service is extremely I/O bound, to port it to async Rust and her benchmarks have led her to believe that performance is being left on the table.

She does this by sprinkling async/.await everywhere, picking an executor, and moving dependencies from sync to async.

Once she has the program compiling, she thinks "oh that was easy". She runs it for the first time and surprisingly she finds out that when hitting an endpoint, nothing happens.

Barbara, always prepared, has already added logging to her service and she checks the logs. As she expected, she sees here that the endpoint handler has been invoked but then... nothing. Barbara exclaims, "Oh no! This was not what I was expecting, but let's dig deeper."

She checks the code and sees that the endpoint spawns several tasks, but unfortunately those tasks don't have much logging in them.

Barbara now remembers hearing something about a wish4-async-insight crate, which has gotten some buzz on her Rust-related social media channels. She decides to give that a shot.

She adds the crate as a dependency to her Cargo.toml, renaming it to just insight to make it easier to reference in her code, and then initializes it in her main async function.


#![allow(unused)]
fn main() {
async fn accept_loop(addr: impl ToSocketAddrs) -> Result<()> {
    insight::init(); // new code
    ...
}
}

Barbara rebuilds and runs her program again. She doesn't see anything different in the terminal output for the program itself though, and the behavior is the same as before: hitting an endpoint, nothing happens. She double-checks the readme for the wish4-async-insight crate, and realizes that she needs to connect other programs to her service to observe the insights being gathered. Barbara decides that she wants to customize the port that insight is listening on before she starts her experiments with those programs.


#![allow(unused)]
fn main() {
async fn accept_loop(addr: impl ToSocketAddrs) -> Result<()> {
    insight::init(listen_port => 8080); // new code, leveraging keyword arguments feature added in 2024
    ...
}
}

While her code rebuilds, Barbara investigates what programs she might use to connect to the insight crate.

One such program, consolation, can run in the terminal. Barbara is currently just deploying her service locally on her development box, so she opts to try that out and see what it tells her.

% rustup install wish4-consolation
...
% consolation --port 8080

This brings up a terminal window that looks similar to the Unix top program, except that instead of a list of OS processes, this offers a list of tasks, with each task having a type, ID, and status history (i.e. percentage of time spent in running, ready to poll, or blocked). Barbara skims the output in the list, and sees that one task is listed as currently blocked.

Barbara taps the arrow-keys and sees that this causes a cursor to highlight different tasks in the list. She highlights the blocked task and hits the Enter key. This causes the terminal to switch to a Task view, describing more details about that task and its status.

The Task view here says that the task is blocked, references a file and line number, and also includes the line from the source code, which says chan.send(value).await. The blocked task also lists the resources that the task is waiting on: prototype_channel, and next to that there is text on a dark red background: "waiting on channel capacity." Again, Barbara taps the arrow-keys and sees that she can select the line for the resource.

Barbara notices that this whole time, at the bottom of the terminal, there was a line that says "For help, hit ? key"; she taps question mark. This brings up a help message in a scrollable subwindow explaining the task view in general as well as link to online documentation. The help message notes that the user can follow the chain: One can go from the blocked task to the resource it's waiting on, and from that resource to a list of tasks responsible for freeing up the resource.

Barbara hits the Escape key to close the help window. The highlight is still on the line that says "prototype_channel: waiting on channel capacity"; Barbara hits Enter, and this brings up a list with just one task on it: The channel reader task. Barbara realizes what this is saying: The channel resource is blocking the sender because it is full, and the only way that can be resolved is if the channel reader manages to receive some inputs from the channel.

Barbara opens the help window again, and brings up the link to the online documentation. There, she sees discussion of resource starvation and the specific case of a bounded channel being filled up before its receiver makes progress. The main responses outlined there are 1. decrease the send rate, 2. increase the receive rate, or 3. increase the channel's internal capacity, noting the extreme approach of changing to an unbounded channel (with the caveat that this risks resource exhaustion).

Barbara skims the task view for the channel reader, since she wants to determine why it is not making progress. However, she is eager to see if her service as a whole is workable apart from this issue, so she also adopts the quick fix of swapping in an unbounded channel. Barbara is betting that if this works, she can use the data from wish4-async-insight about the channel sizes to put a bounded channel with an appropriate size in later.

Barbara happily moves along to some initial performance analysis of her "working" code, eager to see what other things wish4-async-insight will reveal during her explorations.

Alternate History

The original status quo story just said that Barbara's problem was resolved (sort of) by switching to an unbounded channel. I, much like Barbara, could not tell why this resolved her problem. In particular, I could not tell whether there was an outright deadlock due to a cycle in the task-resource dependency chain that, or if there something more subtle happening. In the story above, I assumed it was the second case: something subtle.

Here's an important alternate history though, for the first case of a cycle. Its ... the same story, right up to when Barbara first runs consolation:

% rustup install wish4-consolation
...
% consolation --port 8080

This brings up a terminal window that looks similar to the Unix top program, except that instead of a list of OS processes, this offers a list of tasks, and shows their status (i.e. running, ready to poll, or blocked), as well as some metrics about how long the tasks spend in each state.

At the top of the screen, Barbara sees highlighted warning: "deadlock cycle was detected. hit P for more info."

Barbara types capital P. The terminal switches to "problem view," which shows

  • The task types, ID, and attributes for each type.
  • The resources being awaited on
  • The location / backtrace of the await.
  • A link to a documentation page expanding on the issue.

The screen also says "hit D to generate a graphviz .dot file to disk describing the cycle."

Barbara hits D and stares at the resulting graph, which shows a single circle (labelled "task"), and an arc to a box (labelled "prototype_channel"), and an arc from that box back to the circle. The arc from the circle to the box is labelled send: waiting on channel capacity, and the arc from the box to the circle is labelled "sole consumer (mpsc channel)".

graph TD
  task -- send: waiting on channel capacity --> prototype_channel
  prototype_channel -- "sole receiver (mpsc channel)" --> task
  task((task))

Barbara suddenly realizes her mistake: She had constructed a single task that was sometimes enqueuing work (by sending messages on the channel), and sometimes dequeuing work, but she had not put any controls into place to ensure that the dequeuing (via recv) would get prioritized as the channel filled up.

Barbara reflects on the matter: she knows that she could swap in an unbounded channel to resolve this, but she thinks that she would be better off thinking a bit more about her system design, to see if she can figure out a way to supply back-pressure so that the send rate will go down as the channel fills up.

๐Ÿค” Frequently Asked Questions

What status quo story or stories are you retelling?

Barbara wants Async Insights

What is Alan most excited about in this future? Is he disappointed by anything?

Alan is happy to see a tool that gives one a view into the internals of the async executor.

Alan is not so thrilled about using the consolation terminal interface; but luckily there are other options, namely IDE/editor plugins as well as a web-browser based client, that offer even richer functionality, such as renderings of the task/resource dependency graph.

What is Grace most excited about in this future? Is she disappointed by anything?

Grace is happy to see a tool, but wonders whether it could have been integrated into gdb.

Grace is not so thrilled to learn that this tool is not going to try to provide specific insight into performance issues that arise solely from computational overheads in her own code. (The readme for wish4-async-insight says on this matter "for that, use perf," which Grace finds unsatisfying.)

What is Niklaus most excited about in this future? Is he disappointed by anything?

Niklaus is happy to learn that the wish4-async-insight is supported by both async-std and tokio, since he relies on friends in both communities to help him learn more about Async Rust.

Niklaus is happy about the tool's core presentation oriented around abstractions he understands (tasks and resources). Niklaus is also happy about the integrated help.

However, Niklaus is a little nervous about some of the details in the output that he doesn't understand.

What is Barbara most excited about in this future? Is she disappointed by anything?

Barbara is thrilled with how this tool has given her insight into the innards of the async executor she is using.

She is disappointed to learn that not every async executor supports the wish4-async-insight crate. The crate works by monitoring state changes within the executor, instrumented via the tracing crate. Not every async-executor is instrumented in a fashion compatible with wish4-async-insight.

What projects benefit the most from this future?

Any async codebase that can hook into the wish4-async-insight crate and supply its data via a network port during development would benefit from this. So, I suspect any codebase that uses a sufficiently popular (i.e. appropriately instrumented) async executor will benefit.

The main exception I can imagine right now is MonsterMesh: its resource constraints and #![no_std] environment are almost certainly incompatible with the needs of the wish4-async-insight crate.

Are there any projects that are hindered by this future?

The only "hindrance" is that the there is an expectation that the async-executor be instrumented appropriately to feed its data to the wish4-async-insight crate once it is initialized.

What are the incremental steps towards realizing this shiny future? (Optional)

  • Get tracing crate to 1.0 so that async executors can rely on it.

  • Prototype an insight console atop a concrete async executor (e.g. tokio)

  • Develop a shared protocol atop tracing that compatible async executors will use to provide the insightful data.

Does realizing this future require cooperation between many projects? (Optional)

Yes. Yes it does.

At the very least, as mentioned among the "incremental steps", we will need a common protocol that the async executors use to communicate their internal state.

๐Ÿ“… The roadmap: what we're doing in 2021

This page describes the current plans for 2021. It is updated on a monthly basis.

๐Ÿ›‘ Not time for this yet ๐Ÿ›‘

We're not really ready to work on this section yet. We're still focused on writing out the status quo. What you see here are really just placeholders to give you the idea of what this section might look like.

Key

EmojiMeaning
๐Ÿฅฌ"Healthy" -- on track with the plan as described in the doc
โœ๏ธ"Planning" -- Still figuring out the plan
๐Ÿค’"Worried" -- things are looking a bit tricky, plans aren't working out
๐Ÿ–๏ธ"On vacation" -- taking a break right now
โšฐ๏ธWe gave up on this idea =)

Roadmap items

PlanOwnerStatusLast updated
Async functions in traitsnikomatsakis๐Ÿฅฌ2021-02

๐Ÿ” Triage meetings

When, where

The weekly triage meeting is held on Zulip at 13:00 US Eastern time on Fridays (google calendar event for meeting).

So you want to fix a bug?

If you're interested in fixing bugs, there is no need to wait for the triage meeting. Take a look at the mentored async-await bugs that have no assignee. Every mentored bug should have a few comments. If you see one you like, you can add the @rustbot claim comment into the bug and start working on it! Feel to reach out to the mentor on Zulip to ask questions.

Project board

The project board tracks various bugs and other work items for the async foundation group. It is used to drive the triage process.

Triage process

In our weekly triage meetings, we take new issues assigned A-async-await and categorize them. The process is:

  • Review the project board, from right to left:
    • Look at what got Done, and celebrate! :tada:
    • Review In progress issues to check we are making progress and there is a clear path to finishing (otherwise, move to the appropriate column)
    • Review Blocked issues to see if there is anything we can do to unblock
    • Review Claimed issues to see if they are in progress, and if the assigned person still intends to work on it
    • Review To do issues and assign to anyone who wants to work on something
  • Review uncategorized issues
    • Mark P-low, P-medium, or P-high
    • Add P-high and assigned E-needs-mentor issues to the project board
    • Mark AsyncAwait-triaged
  • If there's still a shortage of To do issues, review the list of P-medium or P-low issues for candidates

Mentoring

If an issue is a good candidate for mentoring, mark E-needs-mentor and try to find a mentor.

Mentors assigned to issues should write up mentoring instructions. Often, this is just a couple lines pointing to the relevant code. Mentorship doesn't require intimate knowledge of the compiler, just some familiarity and a willingness to look around for the right code.

After writing instructions, mentors should un-assign themselves, add E-mentor, and remove E-needs-mentor. On the project board, if a mentor is assigned to an issue, it should go to the Claimed column until mentoring instructions are provided. After that, it should go to To do until someone has volunteered to work on it.

๐Ÿ”ฌ Design documents

The design documents (or "design docs", more commonly) describe potential designs. These docs vary greatly in terms of their readiness to be implemented:

  • Early on, they describe a vague idea for a future. Often this takes the shape of capturing constraints on the solution, rather than the solution itself.
  • When a feature is getting ready to ship, they can evolve into a full blown RFC, with links to tracking issues or other notes.

Early stage design docs

In the early stages, design docs are meant to capture interesting bits of "async design space". They are often updated to capture the results of a fruitful conversation or thread which uncovered contraints or challenges in solving a particular problem. They will capture a combination of the following:

  • use cases;
  • interesting aspects to the design;
  • alternatives;
  • interactions with other features.

Late stage design docs

As a design progresses, the doc should get more and more complete, until it becomes something akin to an RFC. (Often, at that point, we will expand the design document into a directory, adding an actual RFC draft and other contents; those things can live in this repo or elsewhere, depending.) Once we decide to put a design doc onto the roadmap, it will also contain links to tracking issues or other places to track the status.

โš ๏ธ Yield-safe lint

Use-case

Some types should not be held across a "yield" bound. A typical example is a MutexGuard:

async fn example(x: &Lock<u32>) {
    let data = x.lock().unwrap();
    something().await;
    *data += 1;
}

async fn something() { }

In practice, a lot of these issues are avoided because MutexGuard is not Send, but single-thread runtimes hit these issues.

Types where this would apply

  • MutexGuard for mutexes, read-write locks
  • Guards for ref-cells
  • Things that might use these types internally and wish to bubble it up
  • The #[must_use] lint on types, we would want their design to work very closely.
  • Non-async-friendly functions like sleep or task::block_on.

โ˜” Stream trait

Trait definition

pub trait Stream {
    type Item;

    fn poll_next(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Self::Item>>;

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) {
        (0, None)
    }
}

Concerns

Poll-based design

  • You have to think about Pin if you implement this trait.
  • Combinators can be more difficult.
  • One solution: generator syntax.

Attached streams are commonly desired

Sometimes streams need to reuse internal storage (Discussion).

Combinators

  • Currently the combinations are stored in the StreamExt module.
  • In some cases, this is because of the lack of async closures support.
    • Also serves as a "semver barrier".
    • Also no-std compatibility.
  • One question: what combinators (if any) to include when stabilizing?
    • e.g., poll_next_unpin can make working with pin easier, albeit at a loss of generality
      • folks who are new to pinning could use this method, and it can help us to guide the diagnostics by suggesting that they Box::pin

โšก Generator syntax

  • It would be useful to be able to write a function to return an iterator or (in the async context) a generator
  • The basic shape might be (modulo bikeshedding) gen fn that contains yield
  • Some question marks:
    • How general of a mechanism do we want?
      • Just target iterators and streams, or shoot for something more general?
  • Some of the question marks that arise if you go beyond iterators and streams:
    • Return values that are not unit
    • Have yield return a value that is passed by the caller of next ("resume args")

๐Ÿ“ AsyncRead, AsyncWrite traits

๐Ÿงฌ Async fn in traits

General goal

trait Foo {
    // Currently disallowed:
    async fn bar();
}

Concerns

How to name the resulting future

If you wanted to name the future that results from calling bar (or whatever), you can't.

Also true for functions fn bar() -> impl Trait.

Requiring Send on futures

Relevant thread

async fn foo() {}

// desugars to
fn foo() -> impl Future<Output = ()> { } // resulting type is Send if it can be

// alternative desugaring we chose not to adopt would require Send
fn foo() -> impl Future + Send { }

If I want to constrain the future I get back from a method, it is difficult to do without a name:

trait Service {
    async fn request(&self);
}

fn parallel_service<S: Service>()
where
    S::Future: Send,
{
    ...
}
  • Should this be solved at the impl trait layer
  • Or should we specialize something for async functions
  • Would be nice, if there are many, associated types, to have some shorthand

Example use case: the Service

trait Service {
    type Future: Future<Output = Response>;

    fn request(&self, ...) -> Self::Future;
}

impl Service for MyService {
    type Future = impl Future<Output = Response>;

    fn request(&self) -> Self::Future {
        async move { .. }
    }
}
  • Dependent on impl Trait, see lang-team repo

Example use case: capturing lifetimes of arguments

trait MyMethod {
    async fn foo(&self);
}

๐Ÿค” Frequently Asked Questions

What do people say about this to their friends on twitter?

  • (Explain your key points here)

๐Ÿ”’ Mutex (future-aware)

Description of various challenges with async mutexes

๐Ÿ“บ Async aware channels

๐Ÿ“ฆ Async closures

๐Ÿค Join combinator

๐Ÿคทโ€โ™€๏ธ Select combinator

๐Ÿšฐ Sink trait

๐ŸŽ‡ Async main

What is it?

Motivation

Frequently Asked Questions

๐Ÿ—‘๏ธ Async drop

โ™ป๏ธ Async lifecycle

โณ Completion-based futures

Notes on io_uring

๐Ÿ’ฌ Conversations

This section contains notes and summaries from conversations that we have had with people are using Rust and async and describing their experiences. These conversations and links are used as "evidence" when building the "status quo" section.

Not exhaustive nor mandatory

This section is not meant to be an "exhaustive list" of all sources. That would be impossible. Many conversations are short, not recorded, and hard to summaize. Others are subject to NDA. We certainly don't require that all claims in the status quo section are backed by evidence found here. Still, it's useful to have a place to dump notes and things for future reference.

๐Ÿฆ 2021-02-12 Twitter thread

Notes taken from the thread in response to Niko's tweet.

  • Enzo
    • A default event loop. "choosing your own event loop" takes time, then you have to understand the differences between each event loop etc.
    • Standard way of doing for await (variable of iterable) would be nice.
    • Standard promise combinators.
  • creepy_owlet
    • https://github.com/dtantsur/rust-osauth/blob/master/src/sync.rs
  • async trait --
    • https://twitter.com/jcsp_tweets/status/1359820431151267843
    • "I thought async was built-in"?
    • nasty compiler errors
    • ownership puzzle blog post
  • rubdos
    • blog post describes integrating two event loops
    • mentions desire for runtime independent libraries
    • qt provides a mechanism to integrate one's own event loop
    • llvm bug generates invalid arm7 assembly
  • alexmiberry
    • kotlin/scala code, blocked by absence of async trait
  • helpful blog post
    • jamesmcm
      • note that join and Result play poorly together
    • the post mentions rayon but this isn't really a case where one ought to use rayon -- still, Rayon's APIs here are SO much nicer :)
    • rust aws and lambda
  • issue requiring async drop
  • fasterthanlime --
    • this post is amazing
    • the discussion on Send bounds and the ways to debug it is great
  • bridging different runtimes using GATs
  • first server app, great thread with problems
    • "I wasn't expecting that it will be easy but after Go and Node.js development it felt extremely hard to start off anything with Rust."
    • "felt like I have to re-learn everything from scratch: structuring project and modules, dependency injection, managing the DB and of course dealing with concurrency"
    • common thread: poor docs, though only somewhat in async libraries
      • I had enums in the DB and it was a bit more complex to map them to my custom Rust enums but I succeeded with the help of a couple of blog posts โ€“ and not with Diesel documentation
      • I used Rusoto for dealing with AWS services. It's also pretty straightforward and high quality package โ€“ but again the documentation was sooo poor.
  • implaustin wrote a very nice post but it felt more like a "look how well this worked" post than one with actionable feedback
    • "Async has worked well so far. My top wishlist items are Sink and Stream traits in std. It's quite difficult to abstract over types that asynchronously produce or consume values."
    • "AsyncRead/AsyncWrite work fine for files, tcp streams, etc. But once you are past I/O and want to pass around structs, Sink and Stream are needed. One example of fragmentation is that Tokio channels used to implement the futures Sink/Stream traits, but no longer do in 1.0."
    • "I usually use Sink/Stream to abstract over different async channel types. Sometimes to hide the details of external dependencies from a task (e.g. where is this data going?). And sometimes to write common utility methods."
    • "One thing I can think of: there are still a lot of popular libraries that don't have async support (or are just getting there). Rocket, Criterion, Crossterm's execute, etc."
  • EchoRior:
    • "I've written a bit of rust before, but rust is my first introduction to Async. My main gripes are that it's hard to figure our what the "blessed" way of doing async is. I'd love to see async included in the book, but I understand that async is still evolving too much for that."
    • "Adding to the confusion: theres multiple executors, and they have a bit of lock in. Async libraries being dependent on which executor version I use is also confusing for newcomers. In other langs, it seems like one just uses everything from the stdlib and everything is compatible"
    • "That kind of gave me a lot of hesitation/fomo in the beginning, because it felt like I had to make some big choices around my tech stack that I felt I would be stuck with later. I ended up chatting about this in the discord & researching for multiple days before getting started."
    • "Also, due to there not being a "blessed" approach, I don't know if I'm working with some misconceptions around async in rust, and will end up discovering I will need to redo large parts of what I wrote."

โค๏ธ Acknowledgments

Thanks to everyone who helped forming the future of Rust async.

โœ๏ธ Participating in an writing session

Thanks to everyone who helped writing Stories by participating in one of the Async Rust writing sessions.

๐Ÿ’ฌ Discussing about stories

Thanks to everyone who discussed about stories, shiny future and new features.

โœจ Directly contributing

Thanks to everyone who opened a Pull Request and wrote a story, shiny future or improved the organization of the repository.