Mar 14, 2015 6 min read iOS

Vuforia SDK + remote video streaming on iOS

I recently have undertaken a project on iOS that requires integration with the Vuforia SDK. It’s an augmented reality proprietary framework, built for iOS and Android and has been very popular due to its innovative recognition library. One of the coolest demos that are appealing to advertisers or people looking to incorporate commercial campaigns inside their applications concerns the ability to play a video on top of a target. Vuforia even provides a sample application for that. However, remote video streaming on texture does not work on textures.

This is a long standing issue, with people on the forums asking for a solution, some providing either free solutions which are outdated and / or non-performant, or paid solutions that are very expensive.

The video-on-texture-rendering process in general

In order to stream a video onto an OpenGL texture, the following actions must happen:

Initialise a renderable surface. This is a once-per-session operation.
Create a texture with an ID.
Assign the texture to this surface (applying shaders, etc, etc).
On each render, apply transformations to the renderable surface, according to the coordinates of the recognised object.
Get the video bytes and decode them to get actual video data.
Convert the video data to OpenGL data (ready to be drawn)
Apply those video data to the texture you have gained from step 2.

Steps 1 – 3 are already happening inside Vuforia’s sample. Step 4 is also the reason why the vuforia SDK exists; to give you the transformation and coordinates of a recognised object inside the world coordinate space. Therefore, step 4 is also included in the sample (and all samples from vuforia).

The difficult part, and the part that Vuforia SDK is not responsible for, is step 5 – 7. This is where we, the third party developers come into play.

The actual problem with Vuforia’s sample:

As I have already mentioned, Vuforia’s SDK is only responsible for recognising an object into world space, and providing you with its coordinate and transform. What you do with these information is up to you. Therefore, Vuforia’s VideoPlayback sample should be taken as a demonstration of what you can do with it, not see its limitations.

Inside the sample Vuforia makes heavy use of AVAssetReader and AVAssetReaderOutput in order to perform the following actions. As many people have already pointed out in the forums, AVAssetReader is responsible for reading from local file URLs, and does not support remote files. So, Step 5 in the video-on-texture-rendering is the problematic one, as you need to decode the video data you get from a remote location to actual OpenGL data, and then render those data on-screen. Many people have said in the forums that remote on-texture rendering is not possible on iOS.

This couldn’t be further from the truth.

The solution

What we need to do is to get the actual OpenGL data ready to be rendered, and apply those data as a texture onto the texture created by Vuforia. The SDK and the sample have already created an OpenGL coordinate system, so all that’s left is to get the OpenGL data, and divert the data flow from the original sample code.

Instead of using AVAssetReader, we are going to use AVPlayerItemVideoOutput, which was introduced in iOS 6. This class has the method – copyPixelBufferForItemTime:itemTimeForDisplay: , which is exactly the one that we want to use, in order to get the raw OpenGL data to render on the texture.

The following code samples are intended to replace / update the corresponding functionality on Vuforia’s VideoPlayback sample. The code can certainly be improved.

First, let’s set up the the video player, and the video output item, in order to later extract the video buffer contents.

- (BOOL)loadMediaURL:(NSURL*)url
{
	BOOL ret = NO;
	asset = [[[AVURLAsset alloc] initWithURL:url options:nil] retain];
	
	if (nil != asset) {
		// We can now attempt to load the media, so report success.  We will
		// discover if the load actually completes successfully when we are
		// called back by the system
		ret = YES;
		
		[asset loadValuesAsynchronouslyForKeys:@[kTracksKey] completionHandler: ^{
			// Completion handler block (dispatched on main queue when loading
			// completes)
			dispatch_async(dispatch_get_main_queue(),^{
				NSError *error = nil;
				AVKeyValueStatus status = [asset statusOfValueForKey:kTracksKey error:&error];
				
				
				NSDictionary *settings = @{(id) kCVPixelBufferPixelFormatTypeKey : @(kCVPixelFormatType_32BGRA)};
				AVPlayerItemVideoOutput *output = [[[AVPlayerItemVideoOutput alloc] initWithPixelBufferAttributes:settings] autorelease];
				self.videoOutput = output;
				
				
				if (status == AVKeyValueStatusLoaded) {
					// Asset loaded, retrieve info and prepare
					// for playback
					if (![self prepareAssetForPlayback]) {
						mediaState = ERROR;
					}
				}
				else {
					// Error
					mediaState = ERROR;
				}
			});
		}];
	}
	
	return ret;
}

After each frame is called, the -updateVideoData is responsible of preparing the video data for display. The following code is a modified sample code from Vuforia, and uses -copyPixelBufferForItemTime:itemTimeForDisplay: in order to extract the streamed video content, and bind it with the OpenGL texture that is being rendered at this point.

// Update the OpenGL video texture with the latest available video data
- (GLuint)updateVideoData
{
	GLuint textureID = 0;
	
	// If currently playing on texture
	if (PLAYING == mediaState && PLAYER_TYPE_ON_TEXTURE == playerType) {
		[latestSampleBufferLock lock];
		
		playerCursorPosition = CACurrentMediaTime() - mediaStartTime;
//		self.playerCursorCurrentCMTIME = self.player.currentTime;
		//		CMTime caCurrentTime = CMTimeMake(self.playerCursorPosition * TIMESCALE, TIMESCALE);
		
		unsigned char* pixelBufferBaseAddress = NULL;
		CVPixelBufferRef pixelBuffer;
		
		
		
		// If we have a valid buffer, lock the base address of its pixel buffer
		//        if (NULL != latestSampleBuffer) {
		//            pixelBuffer = CMSampleBufferGetImageBuffer(latestSampleBuffer);
		pixelBuffer = [self.videoOutput copyPixelBufferForItemTime:player.currentItem.currentTime itemTimeForDisplay:nil];
		
		CVPixelBufferLockBaseAddress(pixelBuffer, 0);
		pixelBufferBaseAddress = (unsigned char*)CVPixelBufferGetBaseAddress(pixelBuffer);
		//        }
		//        else {
		// No video sample buffer available: we may have been asked to
		// provide one before any are available, or we may have read all
		// available frames
		//            DEBUGLOG(@"No video sample buffer available");
		//        }
		
		if (NULL != pixelBufferBaseAddress) {
			// If we haven't created the video texture, do so now
			if (0 == videoTextureHandle) {
				videoTextureHandle = [self createVideoTexture];
			}
			
			glBindTexture(GL_TEXTURE_2D, videoTextureHandle);
			const size_t bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer);
			
			if (bytesPerRow / BYTES_PER_TEXEL == videoSize.width) {
				// No padding between lines of decoded video
				glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, (GLsizei) videoSize.width, (GLsizei) videoSize.height, 0, GL_BGRA, GL_UNSIGNED_BYTE, pixelBufferBaseAddress);
			}
			else {
				// Decoded video contains padding between lines.  We must not
				// upload it to graphics memory as we do not want to display it
				
				// Allocate storage for the texture (correctly sized)
				glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, (GLsizei) videoSize.width, (GLsizei) videoSize.height, 0, GL_BGRA, GL_UNSIGNED_BYTE, NULL);
				
				// Now upload each line of texture data as a sub-image
				for (int i = 0; i < videoSize.height; ++i) {
					GLubyte* line = pixelBufferBaseAddress + i * bytesPerRow;
					glTexSubImage2D(GL_TEXTURE_2D, 0, 0, i, (GLsizei) videoSize.width, 1, GL_BGRA, GL_UNSIGNED_BYTE, line);
				}
			}
			
			glBindTexture(GL_TEXTURE_2D, 0);
			
			// Unlock the buffers
			CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
			
			textureID = videoTextureHandle;
		}
		
		if (pixelBuffer) {
			CFRelease(pixelBuffer);
		}
		
		
		
		[latestSampleBufferLock unlock];
	}
	
	return textureID;
}

There are also some other interventions we must do, in order to change the setup of the video output.

// Prepare the AVURLAsset for playback
- (BOOL)prepareAssetForPlayback
{
    // Get video properties
	NSArray *videoTracks = [self.asset tracksWithMediaType:AVMediaTypeVideo];
	AVAssetTrack *videoTrack = videoTracks[0];
	self.videoSize = videoTrack.naturalSize;
	
    self.videoLengthSeconds = CMTimeGetSeconds([self.asset duration]);
    
    // Start playback at time 0.0
    self.playerCursorStartPosition = kCMTimeZero;
    
    // Start playback at full volume (audio mix level, not system volume level)
    self.currentVolume = PLAYER_VOLUME_DEFAULT;
    
    // Create asset tracks for reading
    BOOL ret = [self prepareAssetForReading:self.playerCursorStartPosition];
    
    if (ret) {
        // Prepare the AVPlayer to play the audio
        [self prepareAVPlayer];
        // Inform our client that the asset is ready to play
        self.mediaState = READY;
    }
    
    return ret;
}


// Prepare the AVURLAsset for reading so we can obtain video frame data from it
- (BOOL)prepareAssetForReading:(CMTime)startTime
{
    BOOL ret = YES;

    // ===== Audio =====
    // Get the first audio track
   NSArray * arrayTracks = [self.asset tracksWithMediaType:AVMediaTypeAudio];
    if (0 < [arrayTracks count]) {
        self.playAudio = YES;
        AVAssetTrack* assetTrackAudio = arrayTracks[0];
	
        AVMutableAudioMixInputParameters* audioInputParams = [AVMutableAudioMixInputParameters audioMixInputParameters];
        [audioInputParams setVolume:self.currentVolume atTime:self.playerCursorStartPosition];
        [audioInputParams setTrackID:[assetTrackAudio trackID]];

        NSArray* audioParams = @[audioInputParams];
        AVMutableAudioMix* audioMix = [AVMutableAudioMix audioMix];
        [audioMix setInputParameters:audioParams];

        AVPlayerItem* item = [self.player currentItem];
        [item setAudioMix:audioMix];
    }
	
    return ret;
}

Those are all the changes one can do in order to setup the video playback and render to texture the video streamed. However, Vuforia’s sample must also be updated in many areas in order to understand that now remote videos CAN be played.

// Indicates whether the movie is playable on texture
- (BOOL)isPlayableOnTexture
{
    // We can render local files on texture
    return YES;
}

That’s it! You may need to do some more minor changes, but this is the general concept in order to make the tutorial run. This methodology has been tested with Vuforia 4.0, and works perfectly (and is also used in an application released to the app store)

Want the full source?

Before you download the source, please understand that there are many optimisations to be made to the example. Vuforia’s example is constructed to support iOS 4, and as such, if you target iOS 6 and later, you can get rid at least half of the code, you can convert the project to ARC (which is certainly advised), and you can also optimise the video playback to use hardware acceleration. I have implemented all of these functionalities to my released applications, however, it would be confusing writing a tutorial here that would cope with many problems at once.

Grab the source here!

Edit 2017-06-14

I haven’t touched Vuforia’s SDK for a while. I will not be able to offer support with the newer SDK versions.

I have received some comments mentioning that this solution does not work with the newest versions of the Vuforia SDK.

The point is, however, is that it is impossible by definition for this solution to not work. Maybe the code will change, but the solution is valid, since it features streaming a video from a video source, decoding the data to OpenGL using CMSampleBufferGetImageBuffer(), and rendering the OpenGL data to a texture.

I’m sorry for the fact I am not able to offer code-level support for this solution but the methodology is still valid. It requires a little bit of effort and OpenGL knowledge, but it’s definitely doable.

Edit 2017-06-14

You might also like...

Secure Kubernetes: Setting Up OpenID Connect w/ Apisix, Keycloak, Terraform

Unlocking the Power of Kubernetes Operators

Bug hunting: Chance or Chore?

7 Pieces Of Advice I Wish I Knew In My Early career.

Mechanical keyboards: an expensive but addictive hobby