Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Rust Standard Library Cookbook

You're reading from   Rust Standard Library Cookbook Over 75 recipes to leverage the power of Rust

Arrow left icon
Product type Paperback
Published in Mar 2018
Publisher Packt
ISBN-13 9781788623926
Length 360 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Jan Hohenheim Jan Hohenheim
Author Profile Icon Jan Hohenheim
Jan Hohenheim
Daniel Durante Daniel Durante
Author Profile Icon Daniel Durante
Daniel Durante
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
1. Learning the Basics 2. Working with Collections FREE CHAPTER 3. Handling Files and the Filesystem 4. Serialization 5. Advanced Data Structures 6. Handling Errors 7. Parallelism and Rayon 8. Working with Futures 9. Networking 10. Using Experimental Nightly Features 1. Other Books You May Enjoy Index

Querying with regexes


When parsing simple data formats, it is often easier to write regular expressions (or regex for short) than use a parser. Rust has pretty decent support for this through its regex crate.

Getting ready

In order to really understand this chapter, you should be familiar with regexes. There are countless free online resources for this, like regexone (https://www.regexone.com/).

Note

This recipe will not conform to clippy, as we kept the regexes intentionally too simple because we want to keep the focus of the recipe on the code, not the regex. Some of the examples shown could have been rewritten to use .contains() instead.

How to do it...

  1. Open the Cargo.toml file that was generated earlier for you

  2. Under [dependencies], add the following line:
regex = "0.2"
  1. If you want, you can go to regex's crates.io page (https://crates.io/crates/regex) to check for the newest version and use that one instead
  2. In the bin folder, create a file called regex.rs

  3. Add the following code and run it with cargo run --bin regex:

1   extern crate regex;
2
3   fn main() {
4     use regex::Regex;
5     // Beginning a string with 'r' makes it a raw string,
6     // in which you don't need to escape any symbols
7     let date_regex =
        Regex::new(r"^\d{2}.\d{2}.\d{4}$").expect("Failed
          to create regex");
8     let date = "15.10.2017";
9     // Check for a match
10    let is_date = date_regex.is_match(date);
11    println!("Is '{}' a date? {}", date, is_date);
12
13    // Let's use capture groups now
14    let date_regex = Regex::new(r"(\d{2}).(\d{2})
        .(\d{4})").expect("Failed to create regex");
15    let text_with_dates = "Alan Turing was born on 23.06.1912 and
          died on 07.06.1954. \
16      A movie about his life called 'The Imitation Game' came out
          on 14.11.2017";
17    // Iterate over the matches
18    for cap in date_regex.captures_iter(text_with_dates) {
19      println!("Found date {}", &cap[0]);
20      println!("Year: {} Month: {} Day: {}", &cap[3], &cap[2],
          &cap[1]);
21    }
22    // Replace the date format
23    println!("Original text:\t\t{}", text_with_dates);
24    let text_with_indian_dates =
        date_regex.replace_all(text_with_dates, "$1-$2-$3");
25    println!("In indian format:\t{}", text_with_indian_dates);
26
27    // Replacing groups is easier when we name them
28    // ?P<somename> gives a capture group a name
29    let date_regex = Regex::new(r"(?P<day>\d{2}).(?P<month>\d{2})
        .(?P<year>\d{4})")
30      .expect("Failed to create regex");
31    let text_with_american_dates =
        date_regex.replace_all(text_with_dates,
          "$month/$day/$year");
32    println!("In american format:\t{}", 
      text_with_american_dates);
33    let rust_regex = Regex::new(r"(?i)rust").expect("Failed to
        create regex");
34    println!("Do we match RuSt? {}", 
      rust_regex.is_match("RuSt"));
35    use regex::RegexBuilder;
36    let rust_regex = RegexBuilder::new(r"rust")
37      .case_insensitive(true)
38      .build()
39      .expect("Failed to create regex");
40    println!("Do we still match RuSt? {}",
        rust_regex.is_match("RuSt"));
41  }

How it works...

You can construct a regex object by calling Regex::new() with a valid regex string[7]. Most of the time, you will want to pass a raw string in the form of r"...". Raw means that all symbols in the string are taken at literal value without being escaped. This is important because of the backslash (\) character that is used in regex to represent a couple of important concepts, such as digits(\d) or whitespace (\s). However, Rust already uses the backslash to escape special non-printable symbols, such as the newline (\n) or the tab (\t)[23]. If we wanted to use a backslash in a normal string, we would have to escape it by repeating it ( \\). Or the regex on line [14] would have to be rewritten as:

"(\\d{2}).(\\d{2}).(\\d{4})"

Worse yet, if we wanted to match for the backslash itself, we would have to escape it as well because of regex. With normal strings, we would have to quadruple-escape it! ( \\\\) We can save ourselves the headache of missing readability and confusion by using raw strings and write our regex normally. In fact, it is considered good style to use raw strings in every regex, even when it doesn't have any backslashes [33]. This is a help for your future self if you notice down the line that you actually would like to use a feature that requires a backslash.

We can iterate over the results of our regex [18]. The object we get on every match is a collection of our capture groups. Keep in mind that the zeroeth index is always the entire capture [19]. The first index is then the string from our first capture group, the second index is the string of the second capture group, and so on. [20]. Unfortunately, we do not get a compile-time check on our index, so if we accessed &cap[4], our program would compile but then crash during runtime.

When replacing, we follow the same concept: $0 is the entire match, $1 the result of the first capture group, and so on. To make our life easier, we can give the capture groups names by starting them with ?P<somename>[29] and then use this name when replacing [31].

There are many flags that you can specify, in the form of (?flag), for fine-tuning, such as i, which makes the match case insensitive [33], or x, which ignores whitespace in the regex string. If you want to read up on them, visit their documentation (https://doc.rust-lang.org/regex/regex/index.html). Most of the time though, you can get the same result by using the RegexBuilder that is also in the regex crate [36]. Both of the rust_regex objects we generate in lines [33] and [36] are equivalent. While the second version is definitely more verbose, it is also way easier to understand at first glance.

There's more...

The regexes work by compiling their strings into the equivalent Rust code on creation. For performance reasons, you are advised to reuse your regexes instead of creating them anew every time you use them. A good way of doing this is by using the lazy_static crate, which we will look at later in the book, in the Creating lazy static objects section in Chapter 5, Advanced Data Structures.

Note

Be careful not to overdo it with regexes. As they say, "When all you have is a hammer, everything looks like a nail." If you parse complicated data, regexes can quickly become an unbelievably complex mess. When you notice that your regex has become too big to understand at first glance, try to rewrite it as a parser.

See also

  • Creating lazy static objects recipe inChapter 5, Advanced Data Structures
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime
Visually different images