Designing Code to "Feel" Like Swift, Not Just Compile

A Case Study

Swift is able to access some system C libraries, but it's not an easy process. Some Swift developers have made requests for Swift to better support things like libc on some platforms, but Darwin on others. But one goal of Swift is to write almost identical code on many platforms. It also has strong preferences for how that code is written, and most existing C libraries were not written with Swift's features in mind. To facilitate both of these, Swift now ships with 3 modules built-in, one of which is Foundation. By writing our Swift code on many platforms to depend on these basic libraries, we'll be future-proofed, or as close as we can get to it. And we won't be writing completely different code for other platforms.

Today, we'll demonstrate some of the philosophies of the Foundation module, and how to write our code to feel at home amongst other Swift developers. There are lots of options. Foundation provides the basic ways to work with URLs, files, the internet, data, characters, numbers, dates & calendars, encoding & decoding JSON and XML, regular expresions, undo and units. I won't talk about all of these today, but I want to take you through some of the design decisions I made while developing SwiftFoundationCompression, a module freely available on github. (insert link). While we'll discuss zipping and unzipping briefly, my main goal today is to talk about why the I made the design decisions I did, with the ultimate goal of making a module which felt like Foundation.

Why .zip as an example?

First off, there are many file formats which use a .zip format. Microsoft Office documents, MusicXML, .epub, are all other kinds of files which have been .zip'd together. I've worked on several applications where it is advantageous to open one of these files, and thus I needed to unzip in code. But the underlying compression procedures are written in C, so we need a Swift wrapper to make it feel at home.

Why not use an existing module?

I actually did find some functional Swift 3 zip/unzip code on the internet when I poked. Unfortunately, that code was not designed to work like Foundation, so it felt alien, and I didn't know how the pieces fit together. It was clearly written by someone with an expert knowledge of the Apple-included zlib and the C language, and how Swift interoperates with C. But that was the problem, it felt like C, not Swift. It used UnsafeMutableRawBufferPointer and String file paths, and (turns nose up in air) that ain't Swift. It worked exclusively with files, and sometimes I don't want to bother slowing everything down by writing files! To do that, it used integer file handles, which, frankly, I couldn't even call the function to obtain one without a work around in Swift.

So, like I so often do, I decided to write my own, from scratch.

Data

First off, if you've worked with lots of arrays of bytes, or pointers to byte buffers, you may be eyeing UnsafeMutableRawPointer and UnsafeMutableBufferPointer as types you're going to love. Swift 3 did introduce them for very specific reasons. However, in Foundation, we have a type which is designed to handle vast arrays of data efficiently, Data.

The first way I want to compress or decompress a file is with a Data object. Data is specifically designed to be the object which represents the bytes in a file. Reading a file involves creating a Data, and writing a file involves writing a Data. There is one more way, but we'll get to that later. Data was also the way to handle arrays of bytes in Obj-C before Swift came along, so many of those convenient insertion, replacing, and partitioning features are still there, and still useful merely for working with bytes, and completely independently of files.

So to compress & decompress byte arrays, I decided to write an extension on Data. The data object could be either the compressed or decompressed bytes, so I wrote both the compress and decompress methods in the extension.

    extension Data {
        public func compressed(using technique:CompressionTechnique = .deflate,
                           progress:CompressionProgressHandler? = nil)throws->Data {
        ...

        public func decompressed(using technique:CompressionTechnique,
                        progress:CompressionProgressHandler? = nil)throws->Data {

I also created an enum CompressionTechnique, where I add compression techniques as I write the code for them. So far, I'm supporting .deflate, which is the standard .zip compression, and .gzip, which we won't discuss today. When I write code, I always envision it as a module someone else will use, and then image that person is me. I won't want a ton of lose ends; I want using the code to be a win, not a workaround. So, when the defaults are used, note that these methods get incredibly simple.

    let original:Data = ...
    guard let zipped:Data = try? original.compressed()
        else { ... }

That's about as simple as it gets, for something as complex as compression. But by using default arguments, I enable progress checking, early-cancellation, and changing the compression technique to be added in as my user, me, progresses in his design.

Designing for the development process

I'm not suggesting that you write shipping code that doesn't track the progress of a potentially long-running operation, nor that code not enable the user to cancel a run-away process. I'm suggesting that during a proof-of-concept phase of development, those features are a few steps into the future. So I'm not just designing to make the final app well-written, I'm designing to make things easier on the developer.

Now decompressing is a different story. Sure, I could write a speculative analyzer which attempts to determine which compression format was used based on some magic byte values, but that takes even more time than just knowing it upfront. So I did not include a default technique value there.

I designed these methods to throw errors. Remember, Swift lets me optionally treat a thrown error as having returned nil from the method, so the methods are written to return a non-optional value. The thrown errors may be useful. Being able to distinguish between running out of memory and a format which is invalid is nice, as is knowing that the format may very well have been correct, but not supported by the code I've written thus far.

As a side note, notice that I decided not to build support for encrypted files into the module. On Apple platforms, encryption is built into the hardware and file system, and thus encryption inside a given file makes less sense.

Files

Those Data methods did actually compress and decompress Data, but we don't often get a .zip file that just contains a single compressed file. It actually contains an entire directory structure of sub-files. So while the Data methods work with a compression library, they don't work much with the .zip format container. Nor are they designed to separate those internal files and report them individually.

We'll cover how I did that later, for now, know that I did, and that another primary use of .zip files is actually as package files. Maybe I want to zip a bunch of files together, or unzip the contents of a file into a directory. That's the next step.

URL's

First of all, let's talk about representing file paths. In Swift and Foundation, we don't use Strings to represent file paths (anymore). While some Stringy methods have been inherited when Foundation was written for Obj-C before the NSURL type was invented, all new file-based methods and types work with URLs instead.

File URL's aren't just any URL, they must have a scheme of file. If you have to have a file path String, you can convert to a URL with URL(fileURLWithPath:...). Otherwise, reduce the impedance mismatch of your code by keeping references to files as URLs.

FileManager

Second, many of the tasks involved with unzipping files, such as creating a directory for sub-files in the .zip hierarchy, involve the use of an instance of a FileManager. When it comes to file-system operations, like querying, enumerating, creating, file attributes and deleting, FileManager is the one-stop shop. It even provides method for moving files into and out of iCloud, if you're on an Apple platform.

At first you might think, ok, fine, but we can get a FileManager whenever we need one, right? Well, sort of. If you use the FileManager.default singleton, we're actually limited to 4 per thread, at least on some Apple platforms. This means if we have 5 nested subdirectories, our app would crash. So it works out better to instantiate one file manager for all our operations, and then let it go.

Considering all this, I decided to add instance methods to the FileManager for working with zipping and unzipping to/from files. My declaration for the unzipping process is:

    extension FileManager {
        public func decompress(item compressedFile:URL, using technique:CompressionTechnique, into directory:URL, progress:CompressionProgressHandler? = nil)throws->[URL]

So, why decompress instead of unzip? Primarily, because there are several compression formats, like .zip or .gzip, and the work flow for each is essentially the same. I did create separate methods for each technique, but they are not exposed for the end-user to see, nor learn, nor remember, nor make decisions on. The technique is chosen as the enum, and if I got smart about it, I'd make the technique optional, and nil would mean the function figured it out from the file extension. Unfortunately, there is no standard file extension for .zip files. (You thought it was .zip, didn't you! But .mxl, .epub, .docx are all .zip files!)

For compressing I provided two basic ways. One was to compress all the files in a directory together. This maps to a .zip file really well. However, frequently real-world directories contain files we don't want .zipped. macOS adds all kind of hidden files into directories, and they sometimes show up to confuse users on other platforms. Thus, the second compression method takes an array of URLs and actually computes their deepest common parent directory. This is essentially a convenience for the developer to avoid having to do exactly the same thing himself.

The file handling methods are lacking one thing: streaming. Frequently, one reason to work with zip files is to handle data that's too big for RAM, or at least too big to bother with RAM. Too many files is one thing, but if there happened to be a single very large file inside, I could run into a problem, and my app could be jettisoned from memory.

We'll mention streaming briefly in the end, but I do want to mention one thing about Data: when reading Data from a file, I can get a memory mapped instance. That means I'm not really loading the entire file into memory, nor am I waiting for it. Since disk reads are orders of magnitude slower than RAM, it's as advantageous to not wait as it is to not fill up memory. Mapped Data pages data from the disk into memory as needed, and is able to load and jettison individual memory pages (around 4K) when memory pressure gets high. Neither UnsafeBufferPointer nor UnsafeRawPointer have such features, because they know nothing of files. So, using Data isn't just a win for the philosophical consistency, I'm getting real performance benefits.

(Yes, I know we don't use spinning disks much anymore, but what does solid-state read mean? RAM is also solid-state. This phrase is going to stick around like dialing a phone number when we haven't used dials in 30 years. Ok, well we did dial our iPods there for a while...)

Here's where it gets fun

Now, we've covered the memory-to-memory case, and the file-to-file case, what else is there? Well, specifically, for .zip files, it is often the case that we want to source our data from a file, but use it in RAM, without re-writing it to disk. We'll want to access each file individually as we need it using its directory structure, but we don't want to force the user to wait while we slowly write everything back to a disk which is potentially almost full. Maybe we don't even want to decompress a sub file until we need it!

Foundation has a type for representing file/folder structures in memory: FileWrapper. FileWrappers act like efficient containers which read on-disk directories and file names, and load their contents into RAM as needed by the app. They can also be used to construct file/folder structures entirely in RAM, and then write to disk in one swoop. They can also be used in between, reading in data, making tweaks, writing it out, more tweaks, and so on... If you're going to write a package file format for macOS or iOS, you will be working with FileWrappers.

The basic idea here is to subclass a file-wrapper, and initialize it with a URL to the .zip file. It would read the .zip file's central directory and produce a folder/file structure that could be read in-RAM as child FileWrappers. Later, when the app wanted the contents of one of those files, it could be paged into memory, and decompressed. Deferring expensive operations is considered a best practice, unless your expense could interfere with a real-time operation, like playing media, in which case pre-loading expensive resources is preferred.

I won't go into detail here, but Apple platforms try to optimize media playback with dedicated hardware, like chips that decode .mp4's. So if you look into AVFoundation, included on Apple's platforms, you'll notice that optimal playback means providing a URL to a media file. The system is capable of optimizing the memory footprint and latency better if your app doesn't touch it!

But FileWrappers were designed a long time ago, like the API's that use Strings for file paths. I would have designed them differently. Specifically, instead of providing a single class, FileWapper to represent both files and directories, I would have made a protocol which represents a child, two concrete types which represent regular files and directories independently.

You may look at FileWrapper properties like regularFileContents: Data?, and say, "ah, obviously it just gives me a nil if it's a directory. No; it crashes. Similarly, the var fileWrappers: [String : FileWrapper]? property does not return nil if the wrapper is a regular file and not a directory, it crashes. Instead, you have to check the .isRegularFile and .isDirectory properties and then attempt to access the contents. That ain't Swift.

So, down in my SwiftPatterns module, I defined three protocols:

SerializedResourceWrapping, which defines something that wraps the serialization of a resource.

SubResourceWrapping, which contains other SerializedResourceWrapping, keyed by their names, and DataWrapping, which has Data as its contents.

I then define two concrete classes, FileWrapping : DataWrapping, and DirectoryWrapping : SubResourceWrapping.

It's critical that I have protocols for the children, because of my next trick: declare separate types which inherit from DataWrapping, and SubResourceWrapping to implement my optimized .zip file reading.

Back in SwiftFoundationCompression, I create class ZipDirectoryWrapping : SubResourceWrapping, which will be my top-level, initialize-it-with-a-URL-to-a-.zip file type, and class ZipDataWrapping : DataWrapping, which is my dynamically-unzip-data-for-one-subfile-only-when-needed type.

I design the ZipDataWrapping type as a class, so I can secretly cache the unzipped version of the data so that subsequent accesses don't require re-unzipping.

Injecting a Dependency

Since these are classes, and have identity semantics, I want to create them all up front, which means the ZipDirectoryWrapping owns the instances of the ZipDataWrapping. But the ZipDataWrapping need to hold onto the .zip file's compressed Data so they can fulfill their contract. If they needed to hold onto their own copies of a segment of the original file, I would have lost the battle. I'd be reading and modifying data that I don't need. But if they all need to own the ZipDirectoryWrapping to point to common Data, I've created a retain cycle. So I create another type entirely, ZippedDataOwner. This type both owns the underlying Data, and is owned by the ZipDirectoryWrapping, and all the ZipDataWrappings. This allows common ownership of a shared resource, (this is what reference counting was designed for!) and facilitates dependency injection for the ZipDataWrapping.

Model Indexes

So how do multiple ZipDataWrapping own the same ZippedDataOwner and yet decompress their correct sub-data? I create a model index to represent each file. Conceptually, a model index is a value-type which represents a path into some model. The model could be a reference-type node graph, but if I have an identified starting point, and a way to represent each edge in the graph with an Int or String key, then I can represent the position of each node with a non-reference model index or model index path, which is the series of indexes and keys into arrays and dictionaries until I arrive at the correct node.

Model indexes are a critical concept of managing large-scale document data structures, and a .zip file has an idea of one built in. The central directory which told me about all the files contained an offset into the .zip file where the data for the sub-file resides. By storing the offset into file in the ZipDataWrapping, I've achieved a minimal-memory, retain-cycle free way to re-reference the right sub-data. Best yet, no one who merely uses the SwiftFoundationCompression module needs to know anything about it. They just get an object which conforms to SubResourceWrapping and use it like any other.

Putting it all together

An end user, (me once I take off the module developer hat) will initialize a ZipDirectoryWrapping with a URL to a .zip-format file. It will serve SerializedResourceWrapping instances, which will behave like directories by conforming to SubResourceWrapping or like files by conforming to DataWrapping, which have standard Data contents property. That's all my reader code needs to know! It doesn't need to know what secret awesome design principles I'm using under the hood, or how many years it took me to learn to read the zlib header file. (It took me a year to find the header file, and even then only with some help from a friend!)

And from the other end, I can create a set of actual FileWrappers, initialize a ZipDirectoryWrapping with them, and have it write out a .zip file without worrying about writing every individual file into the proper place on the file system.

Memory-mapped, dependency-injected, model-indexed, abstraction-depending, pubicly-hosted type-safe, deferred, use-case optimized, encapsulated, Foundation-impedance-matching unzipping goodness; now that's Swift!

Handling Files

P.S. I mentioned before there was one more step in optimization I could add when writing large files to disk. That is: streaming the bytes to disk as they are generated in small batches. Foundation contains a type designed to handle that: FileHandle. FileHandle is very specifically designed to do the tape reel type interface. Start at the beginning and read some number of bytes. Read some other number of bytes, seek ahead or behind some number of bytes, and seek to the end. It can also sequentially write bytes to a file. When its internal buffers get filled up, or when you close it, it efficiently flushes the bytes to disk. Unfortunately, to use one, we need a file descriptor, and well, ain't nobody got time for that. So if you'd like to jump in on the project, feel free to branch my repo, write your solution, and make a pull-request.

For more on Foundation in Swift 3, Check out a free week of the Daily Drip Swift 3 Course.

The first of 2 weeks on Foundation is free.