Performance-Profiling Swift on Linux: Getting Started

Learn how to profile Server-Side Swift with perf on Linux. You’ll discover the basic principles of profiling and how to view events, call-graph-traces and perform basic analysis. By kelvin ma.

Leave a rating/review
Download materials
Save for later
Share
You are currently viewing page 4 of 4 of this article. Click here to view the first page.

Configuring Perf for Swift

If you scroll through enough of the crypto-bot samples in perf script, you might notice the call graphs are detailed for the C-language portions of the call chain, but relatively sparse for the Swift-language portions. You might see a lot of [unknown] symbols, short call chains and call chains where the only resolved Swift symbols are general-purpose entrypoints such as swift_retain and swift_release.

Enabling DWARF Metadata

You might not see a lot of call chain information because the Swift compiler performs a lot of transformations on the Swift code you write. The machine code coming out of the compiler ends up looking significantly different from the source code that went in. C compilers are comparatively transparent in how they transform C code into machine code, which makes it easier to map points in a call graph to meaningful landmarks in source code.

Note: There is a bug in perf when viewing DWARF metadata on ARM machines. So on Apple Silicon or other ARM machines you’ll need to continue to use frame pointers.

To address that problem, Swift provides the missing binary-to-source mappings as DWARF metadata. As you have seen, perf record supports loading DWARF metadata through its --call-graph option. Set it to dwarf mode:

perf record --call-graph dwarf -o profile.perf .build/release/crypto-bot

On ARM environments, run:

perf record --call-graph fp -F 999 -o profile.perf .build/release/crypto-bot

If using DWARF mode, you’ll see perf record much more data to profile.perf than it did in frame pointer mode. For long-running applications, perf in DWARF mode might record more data than you can manageably store, in which case you might need to reduce the sampling frequency.

Open profile.perf in perf script again, and scroll down to the crypto-bot samples:

perf script -i profile.perf -F comm,pid,tid,time,event,period,ip,sym,symoff,dso

You’ll see far more detailed information about crypto-bot’s Swift call chain:

crypto-bot 86/86 4678.431018: 2273989 cycles:ppp: 
    39eb03 $sSe4fromxs7Decoder_p_tKcfCTj+0xffff006bdedf2003 (/usr/lib/swift...
     27288 $s4JSONAAO10DictionaryV6decode_6forKeyqd__qd__m_xtKSeRd+0xffff54...
     28118 $s4JSONAAO10DictionaryVy_xGs30KeyedDecodingContainerProtocolAAsA...
     27f10 $s4JSONAAO10DictionaryVy_xGs30KeyedDecodingContainerProtocolAAsA...
    ...
     3aa9e $s10crypto_bot6decode7messageAA3FTXO7MessageVSg4JSONAIO_tF+0xfff...
     3f6b6 $s10crypto_bot4MainO4mainyyKFZTf4d_n+0xffff5446a31ba0a6 (/crypto...

Demangling Swift Function Names

You might recognize some function names in the call graph symbols. They look strange because the Swift compiler mangled them. Demangle them with the swift demangle tool, which is part of the Swift tool chain. For example, to demangle “s10crypto_bot6decode7messageAA3FTXO7MessageVSg4JSONAIO_tF”, run the following in a terminal:

swift demangle s10crypto_bot6decode7messageAA3FTXO7MessageVSg4JSONAIO_tF
crypto_bot.decode(message: JSON.JSON) -> crypto_bot.FTX.Message?
Note: You can also demangle Swift symbols in Swift by loading the "swift_demangle" symbol from the Swift runtime. However, that’s out of the scope of this tutorial.

You can pipe the output of another tool through swift demangle, in which case it will automatically recognize all the mangled Swift symbols in the input and replace them with human-readable descriptions. To mass-demangle all the symbols from perf script, run the following commands in a terminal:

perf script -i profile.perf -F comm,pid,tid,time,event,period,ip,sym,symoff,dso | swift demangle > profile.txt

The final shell redirection (>) in those commands saves the demangled call graphs to a file named profile.txt.

Putting It All Together

Sifting through profiling data is a science of its own, and we won’t be able to scratch the surface of it in this tutorial. However, one fast way to get actionable statistics about your application is to use the wc tool to count occurrences of function names you are interested in.

Counting Function Names

First, regenerate a performance profile for crypto-bot, and save it to a file named unoptimized.perf for comparison:

perf record --call-graph dwarf -o unoptimized.perf .build/release/crypto-bot

On ARM and Apple Silicon run:

perf record --call-graph fp -F 999 -o unoptimized.perf .build/release/crypto-bot

Recall that the sample code contains an @inlinable(never) function called decode(message:) that wraps its JSON decoding implementation. The @inlinable(never) attribute forbids the compiler from restructuring its invocation, so you can be reasonably confident the number of samples containing crypto_bot.decode(message:) in their call graph traces reflects the amount of time spent executing that function.

This isn’t an entirely sound method of measuring the performance of an application. Notably, it doesn’t account for the variability of the sampling period, which can skew the results. But in a pinch, it can be a useful proxy metric.

To count occurrences of crypto_bot.decode(message:, run the following in the terminal:

perf script -i unoptimized.perf -F ip,sym | swift demangle | grep crypto_bot\.decode\(message\: | wc -l

This command contains four piped subcommands:

  1. perf script: This deserializes the binary unoptimized.perf file. It only loads the ip and sym fields because the others are unnecessary here.
  2. swift demangle: This allows us to search for demangled function names instead of mangled names. If we knew the mangled name of crypto_bot.decode(message:) ahead of time, this step wouldn’t be necessary.
  3. grep: This searches for the string crypto_bot.decode(message:, which is distinct enough to not return any false positives.
  4. wc: This counts the number of lines piped to its input. Becaise grep prints each match on a separate line, this gives the number of matches grep returned.

The unoptimized profile should contain 2,000 to 3,000 instances of crypto_bot.decode(message:. If using frame pointers you’ll see around 1,000 instances.

Observing Changes in Performance

To demonstrate what successful optimizations might look like when using this method, return to the while loop in Main.main in Sources/crypto-bot/example.swift. Replace the call to decode(message:) with a call to the more-efficient decodeFast(message:) function:

guard let message = decodeFast(message: json) else {
  continue
}

Recompile the application:

swift build -c release

Then, generate a new performance profile named optimized.perf:

perf record --call-graph dwarf -o optimized.perf .build/release/crypto-bot

If running on Apple Silicon or ARM, run:

perf record --call-graph fp -F 999 -o optimized.perf .build/release/crypto-bot

Rerun the commands from the last section, this time using the optimized.perf data file and the search string decodeFast(message::

perf script -i optimized.perf -F ip,sym | swift demangle | grep crypto_bot\.decodeFast\(message\: | wc -l

This time, you should see only a few hundred occurrences of decodeFast(message: on x86 machines or tens on ARM, which is compelling evidence that the decodeFast(message:) implementation is faster.

Where to Go From Here?

You can download the dockerfile and sources for crypto-bot by clicking the Download Materials button below this section.

In this tutorial, you’ve learned how to record an application with perf, how to view and interpret the recordings, how to generate call graph traces, and how to configure perf to produce detailed profiles specifically for Swift binaries. You’ve also learned some basic techniques for post-processing and sifting through this data programatically.

This tutorial should give you enough of an understanding of the fundamentals of performance sampling for you to start applying these techniques to your own projects. The data science of analyzing a performance profile is a huge topic, and much more remains to discover on your own!

To learn more about profiling Swift on Linux, check the Server-Side Swift performance guide. If you’re more interested in profiling Swift on macOS, check our Instruments Tutorial with Swift: Getting Started. I’ve also written about Low-level Swift Optimization Tips on my own blog.

If you have any suggestions, questions or performance-profiling tips you’d like to share, join the discussion below.