Swift Regex: Beyond the basics

Description: Go beyond the basics of string processing with Swift Regex. We'll share an overview of Regex and how it works, explore Foundation’s rich data parsers and discover how to integrate your own, and delve into captures. We’ll also provide best practices for matching strings and wielding Regex-powered algorithms with ease.

Swift 5.7 String Processing updates

Regex

  • new type in the Swift Standard Library
  • A language built-in Regex literal syntax

RegexBuilder

  • @resultBuilder API
  • pushes the readability of Regex to a whole new level

Examples

  • Regex from String
let input = "name:  John Appleseed,  user_id:  100"

/// 👇🏻 Matches `user_id:`, followed by zero or more whitespaces, followed by one or more digits.
let regex = try Regex(#"user_id:\s*(\d+)"#)

if let match = input.firstMatch(of: regex) {
  print("Matched: \(match[0])")  // Matched: user_id:  100 
  print("User ID: \(match[1])")  // User ID: 100
}
  • Regex from a literal

If the regex is known at compile time, we can use a shorthand literal:

let input = "name:  John Appleseed,  user_id:  100"

/// 👇🏻 Same Regex as before, but using a literal.
let regex = /user_id:\s*(\d+)/ 

if let match = input.firstMatch(of: regex) {
  print("Matched: \(match.0)")  // Matched: user_id:  100 
  print("User ID: \(match.1)")  // User ID: 100
}

RegexBuilder helps a lot with readability

import RegexBuilder

let input = "name:  John Appleseed,  user_id:  100"

/// 👇🏻 Same Regex as before, but using `RegexBuilder`.
let regex = Regex {
  "user_id:"
  OneOrMore(.whitespace)
  Capture(.localizedInteger)
}

if let match = input.firstMatch(of: regex) {
  print("Matched: \(match.0)")  // Matched: user_id:  100 
  print("User ID: \(match.1)")  // User ID: 100
}

Regex use

Swift Regex engine provides multiple algorithms:

  • firstMatch(of:) - finds the first occurrence of the pattern defined by this Regex
  • wholeMatch(of:) - matches the entire string against a Regex (if the Regex doesn't match the whole string, it will fail)
  • prefixMatch(of:) - matches the prefix of a string against a Regex (if the Regex doesn't match the prefix of the string, it will fail)
let input = "name:  John Appleseed,  user_id:  100"

let regex = /user_id:\s*(\d+)/

input.firstMatch(of: regex)       // Regex.Match<(Substring, Substring)>
input.wholeMatch(of: regex)       // nil
input.prefixMatch(of: regex)      // nil

The Swift standard library also adds APIs for Regex-based predication:

  • starts(with:) - returns true if prefixMatch(of:) succeeds
  • replacing(_:with:) - replace the matched string of the regex with the given value
  • trimmingPrefix(_:) - removes the prefixMatch(of:) match from the string
  • split(separator:) - splits the string using regex matches as separators
input.starts(with: regex)       // false
input.replacing(regex, with: "456")   // "name:  John Appleseed,  456"
input.trimmingPrefix(regex)       // "name:  John Appleseed,  user_id:  100"
input.split(separator: /\s*,\s*/)   // ["name:  John Appleseed", "user_id:  100"]

Swift Regex can also be used in Swift's pattern matching syntax in control flow statements:

switch "abc" {
case /\w+/:
  print("It's a word!")
}

Foundation also has regex support, it can be used for:

  • formatters
  • parsers

Support for:

  • Date
  • Number (ISO8610, Currency)
  • URL

Example:

let statement = """
  DSLIP  04/06/20 Paypal  $3,020.85
  CREDIT   04/03/20 Payroll $69.73
  DEBIT  04/02/20 Rent  ($38.25)
  DEBIT  03/31/20 Grocery ($27.44)
  DEBIT  03/24/20 IRS   ($52,249.98)
  """

let regex = Regex {
  //        👇🏻 Foundation-provided date parser with a custom format
  Capture(.date(format: "\(month: .twoDigits)/\(day: .twoDigits)/\(year: .twoDigits)"))
  OneOrMore(.whitespace)
  OneOrMore(.word)
  OneOrMore(.whitespace)
  //        👇🏻 Foundation-provided currency parser with a domain-specific parse strategy
  Capture(.currency(code: "USD").sign(strategy: .accounting))
}

Regex literal

  • A Regex literal starts and ends with a slash /
  • Swift infers the correct strong type for it
  • strongly typed capturing groups
/Hello, WWDC Notes!/
// Regex<Substring>

//                    👇🏻 we can give the name year to this capturing group
/Hello, WWDC Notes (?<year>\d{2})!/ // Matches "Hello, WWDC Notes 22!"
// Regex<(Substring, year: Substring)>
  • support for extended Regex literals #/ .... /#
    • allows non-semantic whitespaces
    • can split your patterns into multiple lines

Transforming capture

  • Capture with a transform closure
  • upon matching, the Regex engine calls the transform closure on the matched substring, which produces a result of the desired type
  • the corresponding Regex output type becomes the closure's return type
Regex {
  Capture {
    OneOrMore(.digit)
  } transform: {
    Int($0)   // Int.init?(_: some StringProtocol)
  }
} 
// Regex<(Substring, Int?)>
  • TryCapture removes optionality from transform
  • if the transform returns nil, it's considered a non-match (the Regex engine will backtrack and try an alternative path)
Regex {
  TryCapture {
    OneOrMore(.digit)
  } transform: {
    Int($0)   // Int.init?(_: some StringProtocol)
  }
}
// Regex<(Substring, Int)> 
//                    👆🏻 no longer optional!

Reuse an existing parser

  • We can use our own regex matching matching engine with Regex, even when it's not Swift
  • To do so, we need to create our own parser conforming to CustomConsumingRegexComponent

Missing anything? Corrections? Contributions are welcome 😃

Related

Written by

Federico Zanetello

Federico Zanetello

Software engineer with a strong passion for well-written code, thought-out composable architectures, automation, tests, and more.