Parallelism through simple threads
Every year, parallelism and concurrency become more important as processors tend to have more and more physical cores. In most languages, writing parallel code is tricky. Very tricky. Not so in Rust, as it has been designed around the principle of fearless concurrency since the beginning.
How to do it...
In the
src/bin
folder, create a file calledparallelism.rs
Add the following code and run it with
cargo run --bin parallelism
1 use std::thread; 2 3 fn main() { 4 // Spawning a thread lets it execute a lambda 5 let child = thread::spawn(|| println!("Hello from a new thread!")); 6 println!("Hello from the main thread!"); 7 // Joining a child thread with the main thread means 8 // that the main thread waits until the child has 9 // finished it's work 10 child.join().expect("Failed to join the child thread"); 11 12 let sum = parallel_sum(&[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]); 13 println!("The sum of the numbers 1 to 10 is {}", sum); 14 } 15 16 // We are going to write a function that 17 // sums the numbers in a slice in parallel 18 fn parallel_sum(range: &[i32]) -> i32 { 19 // We are going to use exactly 4 threads to sum the numbers 20 const NUM_THREADS: usize = 4; 21 22 // If we have less numbers than threads, 23 // there's no point in multithreading them 24 if range.len() < NUM_THREADS { 25 sum_bucket(range) 26 } else { 27 // We define "bucket" as the amount of numbers 28 // we sum in a single thread 29 let bucket_size = range.len() / NUM_THREADS; 30 let mut count = 0; 31 // This vector will keep track of our threads 32 let mut threads = Vec::new(); 33 // We try to sum as much as possible in other threads 34 while count + bucket_size < range.len() { 35 let bucket = range[count..count + bucket_size].to_vec(); 36 let thread = thread::Builder::new() 37 .name("calculation".to_string()) 38 .spawn(move || sum_bucket(&bucket)) 39 .expect("Failed to create the thread"); 40 threads.push(thread); 41 42 count += bucket_size 43 } 44 // We are going to sum the rest in the main thread 45 let mut sum = sum_bucket(&range[count..]); 46 47 // Time to add the results up 48 for thread in threads { 49 sum += thread.join().expect("Failed to join thread"); 50 } 51 sum 52 } 53 } 54 55 // This is the function that will be executed in the threads 56 fn sum_bucket(range: &[i32]) -> i32 { 57 let mut sum = 0; 58 for num in range { 59 sum += *num; 60 } 61 sum 62 }
How it works...
You can create a new thread by calling thread::spawn
, which will then begin executing the provided lambda. This will return a JoinHandle
, which you can use to, well, join the thread. Joining a thread means waiting for the thread to finish its work. If you don't join a thread, you have no guarantee of it actually ever finishing. This might be valid though when setting up threads to do tasks that never complete, such as listening for incoming connections.
Keep in mind that you cannot predetermine the order in which your threads will complete any work. In our example, it is impossible to foretell whether Hello from a new thread! or Hello from the main thread! is going to be printed first, although most of the time it will probably be the main thread, as the operating system needs to put some effort into spawning a new thread. This is the reason why small algorithms can be faster when not executed in parallel. Sometimes, the overhead of letting the OS spawn and manage new threads is just not worth it.
As demonstrated by line [49], joining a thread will return a Result
that contains the value your lambda returned.
Threads can also be given names. Depending on your OS, in case of a crash, the name of the responsible thread will be displayed. In line [37], we call our new summation threads calculation. If one of them were to crash, we would be able to quickly identify the issue. Try it out for yourself, insert a call to panic!();
at the beginning of sum_bucket
in order to intentionally crash the program and run it. If your OS supports named threads, you will now be told that your thread calculation panicked with an explicit panic.
parallel_sum
is a function that takes a slice of integers and adds them together in parallel on four threads. If you have limited experience in working with parallel algorithms, this function will be hard to grasp at first. I invite you to copy it by hand into your text editor and play around with it in order to get a grasp on it. If you still feel a bit lost, don't worry, we will revisit parallelism again later.
Adapting algorithms to run in parallel normally comes at the risk of data races. A data race is defined as the behavior in a system where the output is dependent on the random timing of external events. In our case, having a data race would mean that multiple threads try to access and modify a resource at the same time. Normally, programmers have to analyze their usage of resources and use external tools in order to catch all of the data races. In contrast, Rust's compiler is smart enough to catch data races at compile time and stops if it finds one. This is the reason why we had to call .to_vec()
in line [35]:
let bucket = range[count..count + bucket_size].to_vec();
We will cover vectors in a later recipe (the Using a vector section in Chapter 2, Working with Collections), so if you're curious about what is happening here, feel free to jump to Chapter 2, Working with Collections and come back again. The essence of it is that we're copying the data into bucket
. If we instead passed a reference into sum_bucket
in our new thread, we would have a problem, the memory referenced by range
is only guaranteed to live inside of parallel_sum
, but the threads we spawn are allowed to outlive their parent threads. This would mean that in theory, if we didn't join
the threads at the right time, sum_bucket
might get unlucky and get called late enough for range
to be invalid.
This would then be a data race, as the outcome of our function would depend on the uncontrollable sequence in which our operating system decides to launch the threads.
But don't just take my word for it, try it yourself. Simply replace the aforementioned line with let bucket = &range[count..count + bucket_size];
and try to compile it.
There's more...
If you're experienced with parallelism, you might have noticed how suboptimal our algorithm here is. This is intentional, as the elegant and efficient way of writing parallel_sum
would require using techniques we have not discussed yet. We will revisit this algorithm in Chapter 7, Parallelism and Rayon, and rewrite it in a professional manner. In that chapter, we will also learn how to concurrently modify resources using locks.
See also
- Access resources in parallel with RwLocks, recipe in Chapter 7, Parallelism and Rayon