Better Performance in Swift Using Task Groups | by Simone Giordano

How to drastically improve the performance of your iOS apps using the power of concurrency

Photo by Haithem Ferdi on Unsplash

The capability of building apps and softwares that rely on parallel computations has yet to be fully explored. The technologies offered by Grand Central Dispatch are available since 2009, which is earlier than the introduction of Swift.

In the last years, Apple increased its support for multi-threaded processing using both synchronous and asynchronous executions.

Differently from the old and standard POSIX API for creating and managing pthreadsSwift allows the usage of queues that encapsulate the low-level manipulations by the virtue of exposing a much simpler API to software developers.

While most of the current uses of the GCD from app developers are about working with the web and executing tasks and requests asynchronously, there’s more about it that can be used just to run parallel tasks for the optimization of heavy-load code performance.

A task in Swift is a unit of asynchronous work that runs on the processor separately from other code. A Task is created with the following syntax:

let task = Task {
print ("This is a task")
}

Once instantiated, a task is automatically scheduled by the operating system and the developer can’t directly choose when to start its execution.

Due to their asynchronous nature, tasks are very useful to run specific code that is independent from the rest of the execution.

Waiting for the task to complete is done by using the await keyword in swift in the following way:

let taskResult = await task.value

While this syntax is useful to wait for the task completion, the return value can also be used to store a meaningful result of the code that is being executed by the task.

For example, by taking a closure parameter when defining it, is possible to specify the actual return type of the task:

let task = Task { () -> Int in
// Some code being exectued
let result = doSomeStuff()
return result
}//Somewhere else
let taskResult = await task.value //Waiting for the completion
if taskResult == 0 {
print("The task has succesfully finished executing")
}
else {
print("The task failed during its execution")
}

In Swift, a Task Group is a collection of dynamically created tasks that run concurrently.

Differently from tasks, task groups are typically used to split the workload of the code being executed, so that parallelism can be exploited to significantly improve the overall performance of the application at hand.

In particular, they are employed when the number of needed tasks is not known at compile-time and is therefore dynamically decided when the program is running.

All the tasks in a task group share the same return type and are instantiated by using the withTaskGroup function of the Swift Standard Library.

For instance, a task group could be used to get an array of values, obtained by asynchronously computing the single tasks of the group.

To start, let’s define the task group:

let result : [Float] = withTaskGroup(of: Float.self) { group in 
//Task definition
}

groupthe captured parameter of the closure, is used to refer to the newly created task group from the code that belongs to the closure.

The return type of each task in the group is specified by the parameter of:in this case the tasks will return a float.

At this point, using the addTask method, we can add a task to the group by defining the code that has to be executed. For example we could dynamically spawn 10 tasks that produce a random generated float with the following code:

let result : [Float] = withTaskGroup(of: Float.self) { group in 
for _ in 0..<10 {
group.addTask {
return Float.random(in: -100...100) //Returning a random number in -100 - 100
}
}
}

After that, for the purpose of the creation of the final array, it’s fundamental to wait for the completion of all the tasks and then construct the array by appending every returned value.

Swift easily allows this by using a for awaitloop that will wait for the moment in which all the task terminated their execution and then iterate in every returned value

let result : [Float] = withTaskGroup(of: Float.self) { group in 
let array = [Float]()
for _ in 0..<10 {
group.addTask {
return Float.random(in: -100...100) //Returning a random number in -100 - 100
}
}
for await value in group {
array.append(value) //Append the value obtained from the task
}
return array //Returning the created array
}

By appending every value from each task the array is created and is therefore ready to be returned.

Without diving in details, a Discrete Fourier Transform (DFT) is an algorithm that can be employed to transform a signal (like an audio signal) in order to extract its frequency spectrum.

In the field of DSP (Digital Signal Processing), in order to spare computing resources, an algorithm named Fast Fourier Transform (FFT) is widely used to process signals and quickly obtain DFTs.

Despite being an approximation of actual Fourier transforms, FFTs can be still categorized under the kind of code that relies on having enough processing power so that it can be executed on time.

The vDSP package of the Accelerate framework by Apple can be used to successfully achieve DSTs without the need of useless convoluted code. By exploiting the API provided by vDSP, a DFT can be obtained by the virtue of the DiscreteFourierTransform class that encapsulates the needed functionalities.

In order to compute the Fast Fourier transformthe Accelerate DFT API was used as follows:

typealias SignalType = Float
func computeFFT(of signal: [SignalType]) -> [SignalType]{
var returnedResult: (real: [SignalType], imaginary: [SignalType])?
do {
let fwdDFT = try vDSP.DiscreteFourierTransform(
previous: nil,
count: signal.count,
direction: .forward,
transformType: .complexComplex,
ofType: SignalType.self)
// Creating the imaginary part of the signal with zeroes
let imaginary: [SignalType] = [SignalType](repeating: 0.0, count: signal.count) returnedResult = fwdDFT.transform(real: signal, imaginary: imaginary) // Run the transform algorithm
}
catch {
//Print an error in case DiscreteFourierTransform could not be instantiated
print(error)
}
guard let real = returnedResult?.real else {
return []
}
//We are interested only to the real part of the transformed signal
return real
}

Also, for the purposes of the article, a function for generating random signals with decided count and number of samples has been implemented.

The first approach that could be taken to compute the DFTs involves a sequential iteration through every signal (stored in an array) and applying the Fourier transform to it. Since a signal is already stored in memory as an array of Floats, a Matrix is ​​used to keep track of the collection of signals that are generated:

let sigCount = 200
let sampleCount = 2048
let signals = generateSignals(numSignals: sigCount, samples: sampleCount)
//Returns a matrix, each row representing a different signal

Once we have the signals, we can then iterate trough them and apply the computeFFT function defined above using a standard for loop:

var sequentialResults = [[SignalType]]()
for signal in signals {
//Append every computed result
sequentialResults.append(computeFFT(of:signal))
}

On top of that we need a way to keep track of how much time our processor takes to accomplish the given task.

For that reason, the CFAbsoluteTimeGetCurrent()function from the Core Foundation framework is used.

By storing the time in which the algorithm starts executing the completion time can be easily calculated as follows:

var sequentialResults = [[SignalType]]()
var startTime = CFAbsoluteTimeGetCurrent()
for signal in signals {
//Append every computed result
sequentialResults.append(computeFFT(of:signal))
}
var endTime = CFAbsoluteTimeGetCurrent() //Time of terminationvar elapsedTime = endTime - startTime
print("Sequential algorithm: (elapsedTime)"

The code introduced above executes the DFTs and prints on screen how much time is needed for the processors to complete the assigned task.

In order to perform a smart parallel computation, we need to distribute the total workload among all the CPU cores available.

This is done by getting the activeProcessorCount through the Foundation framework as follows:

let  processors = ProcessInfo.processInfo.activeProcessorCount

This variable contains the number of CPU Cores.

At this point, it will be possible to create the TaskGroup that will be responsible for performing the computation:

let task = Task { () -> [[SignalType]] in 
let concurrentResults : [[SignalType]] = await withTaskGroup(of [[SignalType]].self) {
group in

// Create an empty matrix that will contain all the Fourier transforms
var transformedSignals = [[SignalType]]()
for i in 0..<Int(processors) {
group.addTask {
// For every task create a sub-matrix to be filled
// with the Fourier transforms of the signals
var matrix = [[SignalType]]()

// The bounds for each task range from (i/processors) to ((i+1)/processors)
// This will divide the signals equally among the various processors
let lowerBound = Float(i)/processors * Float(signals.count)
let upperBound = Float(i+1)/processors * Float(signals.count) for index in Int(lowerBound)..<Int(upperBound) {
matrix.append(computeFFT(of signals[index])
}
return matrix
}
} // After every task is created and automatically scheduled by the OS,
// wait for the end of each one and append the sub-matrices to the return variable
for await value in group {
transformedSignals.append(contentsOf: value)
}
return transformedSignals
}

return concurrentResults
}

The logic behind the code is actually pretty straightforward.

Apart from the syntax, what is actually going on is a simple splitting of the total amount of tasks among the available CPU Cores.

This allows each core to have a similar workload when compared to the others, thus decreasing the necessary time to complete the execution.

Theoretically, the choice of the number of spawned tasks could also be higher than the actual processors.

However, the most relevant issue about using an elevate number of tasks is the amount of overhead that comes into place when instantiating multiple tasks.

In fact, under the hood, creating a task and scheduling it requires the CPU to execute long scheduling algorithms and different context switches (ie switching from the running task to another back and forth), consequently deteriorating the performances when too many tasks/thread have to be scheduled by the Operating System.

Every CPU has a fixed number of logic processors that can be employed to run different simultaneously operations.

For instance, a M1 processor is equipped with an 8-core CPU, meaning that if we could take a snapshot of the processor at any point in time, only 8 threads would be found actually running in that moment. For that reason instantiating a higher number of tasks than the available logic processors would be superfluous and even do more harm than good.

In our specific case, on the M1 only eight batches of signals could be simultaneously handled at any given time, meaning that splitting the workload, even more, would force some of the tasks in the group to wait until the CPU is freed, bottlenecking the performance of the program.

In order to quantify how much of the concurrent algorithm improves the execution, multiple tests were ran with different amount of signals to be transformed.

The tests clearly show that the concurrent algorithm is much better when handling very heavy workloads.

The graph is represented through a logarithmic scale, which is much more suitable for representing numbers that are very far apart from each other.

In fact, logarithmic scales separate numbers in terms of powers of 10, which is to say that the distance of the interval [0.1, 1]in a logarithmic scale, is the same of the intervals [1, 10], [10, 100] and so on.

In this very example, since the distance between the two lines remains more or less constant, there will be an larger gap between the value below and the one above it.

This is the demonstration of a case where concurrency drastically improves the execution of a specific task, and there are still a lot of unexplored paths in the iOS environment that are yet to be discovered.

Written by Simone Giordano & Michele Aversana

Leave a Comment