GCD, NSOperation and Core Data/ Files on disk

Core data is fast, even on iOS, where handheld devices lack in speed when compared to desktop computers. However, there will be times where you will be required to do some heavy reading, wether that will be plain files from disk or information stored in Core Data. In my case, I initially tried to load too much data from a Core Data database using GCD, resulting in deadlocks, and later, I tried that using NSOperation. To save many people from frustration, I thought I should post my experiences and some general guidelines here.

In this article, I am only scratching the surface about common and hard to figure out problem that arise when using GCD, multiple operation queues. I am avoiding to give out much code, because the implementation of the general guidelines I am giving here varies between programming styles and paradigms.

Things to have in mind:

  • Your HD (as a hardware component) is NOT thread safe. It never is, and never was. So, while reading data from a disk, you must limit threads that are accessing to 1, maximum 2.
  • For that reason, if you have a large database and need to store data, it is wise to not separate different aspects of this database to multiple files. Use Core Data with SQLite as the save option. If you need to store images, then you must choose between Core Data, and using separate files. I believe that saving images to separate files and not storing NSData inside a Core Data database is the best choice for many reasons.
  • Choose wisely between the available threading options. GCD is meant to be used as a “fire and forget” method to do multithreading. You create a queue, add some methods to is, and the system takes care of the rest. However: You cannot cancel operations inside a GCD queue! Once you add them to this queue, they must be executed, even if the object that created is will be deallocated. That leads to serious multihreading issues. So, consider using NSOperation, where you have more control over the object.
  • Some aspects of Core Data are NOT thread safe. Particularly, NSManageObjectContext is not thread safe. If you need to access your database from multiple threads, consider using wrapper classes that operate on their own NSManagedObjectContext. Fortunately, you can have as many contexts as you wish for a persistent store. Which brings us to the next advice:
  • Although you are expected to use different object contexts for multithreading, but try to limit the number of different managed object context as much as possible. Otherwise, situation will arise when a managed object context will try to access a resource that is being updated by another, and will lead to much greater delays in accessing the data than a user (or the programmer) is prepared to accept.
Solutions to general problems:
Suppose that you want to load data from an SQLite Database using Core Data. This data consists a few thousand entries and for each we will need to run a query to our database, and perform some operations. Concurrently, you will want to edit some data and store them to the database again.
Loading the data will need a dedicated system to load the data. I suggest using NSOperation or a GCD serial queue. For a similar situation, I opted for NSOperation, mainly because it can be cancelled, and is a concrete way of performing an operation inside a specific context.

{codecitation}

- (void)start
{
    SFDebugLog(@"starting local download operation...");
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    [self.operationThread start];

    [pool drain];    
}
- (void)threadEntryPoint
{
    self.executing = YES;
    for (DownloadObjectProxy *proxy in self.downloadProxies) {
        if (!self.cancelled) {

            ItemContainer *container = [self.webCache getContainerFullOfItems];
            for (MyItemClass *cachedItem in container.items) {
                if (!self.cancelled) {
                    //do some heavy lifting
                }
            }
        }
    }
    if (!self.cancelled) {
        [self willChangeValueForKey:@"isFinished"];
        self.finished = YES;
        [self didChangeValueForKey:@"isFinished"];
    }
}
{/codecitation}
In this operation we start the thread that is supposed to do all the heaby lifting for us in the background. Note that in order to ensure that all NSOperation expected features will work correctly, we put observers for the “isFinished” value. This way we ensure that objects added in an NSOperationQueue will start and finish correctly, and that the completion block of the operation will be called when it the operation is completed.
The webCache object is a wrapper object around NSManagedObjectContext. Each such object creates a separate Managed Object Context, and every one points into the same NSPersistentStoreCoordinator. Remember, NSPersistentStoreCoordinator is thread safe. Managed Object Contexts are NOT. The key in this object is the constant check of the operation being cancelled. You see, when the object is deallocated, the actions inside -threadEntryPoint may still be running, due to the heavy lifting. If the object gets deallocated before -threadEntryPoint is completed, the application will crash. Since we can’t cancel a thread that has started running without causing a memory leak, we are checking each time throughout our iteration if the operation is cancelled before we continue executing the actions in our loop.
This is how we are goint to call our operation from the main program:
{codecitation}
- (void)loadFromLocalWithProxyObjects:(NSArray *)urlArray
{
    self.cacheLoadOperation = [[[SFCacheLoadOperation alloc] initWithDownloadproxies:urlArray andSearchString:self.searchString] autorelease];
    self.cacheLoadOperation.completionBlock = ^{

        if (self.cacheLoadOperation.isFinished) {
            dispatch_async(dispatch_get_main_queue(), ^{
                //process the results. It's not necessary to return to the main thread, but eventually, you will have to.
            });
        }
    };

    //[self.cacheLoadOperation start];
    [self.downloadOperationQueue addOperation:self.cacheLoadOperation];
}
{/codecitation}
Note that CacheLoadOperation works in its own managed object context, without interfering with the rest of the application.

 

Returning to the main thread

While on a GCD block, there are three ways of returning to the main thread: One is using dispatch_sync and another one is using dispatch_async. The second one is wrong. It will cause a deadlock on the main thread. The reason why this works sometimes is sheer luck. Sending a dispatch_sync message to the main thread will cause that particular block to exit only after it has finished executing. However, nothning must block the main thread from executing, as Apple continuously points out inside throughout their SDK Documentation.

One last method is to use NSObject’s -performSelectorOnMainThread: function, which will return to main thread. That will ensure you won’t have any dealocks either, but you can have crashes. That’s because when you call this function, the selector inside that will be executed immediately, without first enterning into a queue, like it happens with GCD. That can lead to crashes, if you return to the main thread an access a resource that is already occupied.

Before GCD came out, programmers until now coped with these kinds of problems using NSOperationQueues. Regardless of what would happen in the background, they were using some kind of “main thread Operation Queue”, with a maximum concurrent operation count of 1. Whatever they wanted to do in the main thread, they put into this operation queue, which would consist of NSInvocation targets. NSOperationQueue should take care of concurrency (that is, no concurrency whatsoever in the main thread, which is exactly the point) but it doesn’t take care of the NSOperationQueue accessing a resource that is currently being accessed by another thread somewhere else within the application. With GCD, you don’t have these issues. GCD automatically puts your block into an internal operation queue (doesn’t have anything to do with the class with the same name) and you don’t have to worry about concurrency in the main thread anymore.

Bottom line: Use GCD whenever possible, especially when returning to the main thread. Even if you don’t use GCD in any other context, returning to the main thread using GCD is always the best choice you have, and it’s simple.

Too much multithreading will kill your application…

…especially when saving and loading data into a Core Data database. Apple suggests not to use just one managed object context when working with Core Data. I have found out that you must use 2, maximum 3 managed object contexts when working with Core Data. Using more than that and constantly performing operations on them will surely result in your application getting slower until you reach a deadlock. The methods I am discussing here concern demanding applications that perform hundreds of calls per second into a Core Data database.

Youe best bet is to write a wrapper class for your application. Once instantiated, it will create a managed Object Context that will point to your application’s persistent store coordinator. Remember that NSPersistentStoreCoordinator is thread-safe, whereas NSManagedObjectContext is not.

{codecitation}

- (id)init {
    self = [super init];
    if (self) {
        self.internalManagedObjectContext = [[[NSManagedObjectContext alloc] init] autorelease];
        self.internalManagedObjectContext.persistentStoreCoordinator = **myapplicationDelegate**.persistentStoreCoordinator;
    }
    return self;
}
- (void)mergeContext
{
    SFDebugLog(@"merging changes...");
    [self.internalManagedObjectContext mergeChangesFromContextDidSaveNotification:nil];
    if (![self.internalManagedObjectContext save:NULL]) {
        SFDebugLog(@"exception!");
    }
}
//... you can add stuff like adding and removing objects from the persistent store, querrying stuff, etc
- (void)dealloc {
    [internalManagedObjectContext release];
    [super dealloc];
}

{/codecitation}

Each time you create a controller object, you can create one such Core Data handling object, and perform all the operations inside it using the Core Data handling object. You may be tempted to create a CD handling object inside a block inside a dispatch_queue. You can do that, but you should be aware that creating and releasing objects such as this inside GCD blocks may result in deadlocks and greatly decreased performance. One thing to remember about GCD operation queues is that they have internal mechanisms of determining how many or which operations will be executed concurrently. That means that if you create and release objects such as the above inside and perform operations with it, you are bound to hit a performance wall. Apple says to create managed object contects (MOCs) into the thread that is going to use them, but since creating and releasing them multiple times causes problems, how are we going to use them?

So how do you perform operations using GCD in the background when working with multiple managed object context?

You can do this using a serial GCD queue.

From the Apple Documentation:

Serial queues (also known as private dispatch queues) execute one task at a time in the order in which they are added to the queue. The currently executing task runs on a distinct thread (which can vary from task to task) that is managed by the dispatch queue. Serial queues are often used to synchronize access to a specific resource. You can create as many serial queues as you need, and each queue operates concurrently with respect to all other queues. In other words, if you create four serial queues, each queue executes only one task at a time but up to four tasks could still execute concurrently, one from each queue. For information on how to create serial queues, see “Creating Serial Dispatch Queues.”

Creating a serial queue will ensure that the contents of your Core Data database will be accessed from a single thread. Therefore, you don’t need to create and release MOCs on each possible iteration. You can access an already created MOC and perform your operations there.

Conclusion

This article became longer than I anticipated. I tried to scratch the surface of some common problems and hard to find out issues about using GCD and multiple threads. I will try to update this article to clarify some points, and add more information.