Splitting text and collections lazily in Swift

Swift version 2.1

There are already methods for splitting collections in the Swift Standard Library, but they do all the work immediately and return the results in an array. When dealing with large strings, or streams of text, I find it better to do the work lazily, when needed. The overall performance is not necessarily better, but it is smoother, as you get the first results immediately instead of having to wait a little while and then get everything at once. And memory usage is lower, no need to store everything in an array first.

The original

This is how the CollectionType.split method in the Standard Library splits strings over “,”:

I want our split method to behave in exactly the same way.

The results when allowing empty slices may seem a bit strange, but think of it as text split on the newline character into separate lines, and consider there may be several empty lines in a row plus they may be at the beginning and/or the end.

The first step

So what is the simplest operation needed here, and will it be useful on its own? When splitting collections into smaller pieces the smallest possible operation is to split the collection once into 2 pieces. And yes I believe that operation will be useful on its own:

Pretty straightforward, just find the first separator and return everything before it and everything after it.

The output

This functionality naturally belongs in LazyCollectionType. We will be following Apple’s guidelines for extending it (they are the same as for LazySequenceType), except we will use the same struct for the sequence and generator because I don’t want to write this monster of a generic where clause more often than strictly necessary:

This defines restrictions on the source collection’s subsequences which you would think must be valid for all subsequences. All sequences, and therefore collections, should satisfy them, so I don’t think we are excluding anything here. These restrictions are in any case necessary for the following code, which deals entirely in subsequences and their subsequences:

In the initialiser we set remaining to a subsequence of the entire collection. Then for every iteration we return everything before the next occurrence of separator, set everything after it to remaining and optionally skip empty results.

You may have noticed this sequence does not have a ‘generate’ method even though it is a required part of the protocol. That is because of this clever little extension in the standard library, which automatically generates one for us.

The actual method

Then all that remains is the method itself, where we meet the monstrous generic where clause again:

It can be called like this:

The complete code can be found in the SwiftShell project, with unit tests.