6 min read

Vuforia SDK + remote video streaming on iOS

Vuforia SDK + remote video streaming on iOS

I recently have undertaken a project on iOS that requires integration with the Vuforia SDK.  It’s an augmented reality proprietary framework, built for iOS and  Android and has been very popular due to its innovative recognition  library. One of the coolest demos that are appealing to advertisers or  people looking to incorporate commercial campaigns inside their  applications concerns the ability to play a video on top of a target. Vuforia even provides a sample application for that. However,  remote video streaming on texture does not work on textures.

This is a long standing issue,  with people on the forums asking for a solution, some providing either  free solutions which are outdated and / or non-performant, or paid  solutions that are very expensive.

The video-on-texture-rendering process in general

In order to stream a video onto an OpenGL texture, the following actions must happen:

  1. Initialise a renderable surface. This is a once-per-session operation.
  2. Create a texture with an ID.
  3. Assign the texture to this surface (applying shaders, etc, etc).
  4. On each render, apply transformations to the renderable surface, according to the coordinates of the recognised object.
  5. Get the video bytes and decode them to get actual video data.
  6. Convert the video data to OpenGL data (ready to be drawn)
  7. Apply those video data to the texture you have gained from step 2.

Steps 1 – 3 are already happening inside Vuforia’s sample. Step 4 is also the reason why the vuforia SDK exists; to give you the  transformation and coordinates of a recognised object inside the world  coordinate space. Therefore, step 4 is also included in the sample (and  all samples from vuforia).

The difficult part, and the part that Vuforia SDK is not responsible  for, is step 5 – 7. This is where we, the third party developers come  into play.

The actual problem with Vuforia’s sample:

As I have already mentioned, Vuforia’s SDK is only responsible for  recognising an object into world space, and providing you with its  coordinate and transform. What you do with these information is up to  you. Therefore, Vuforia’s VideoPlayback sample should be taken as a  demonstration of what you can do with it, not see its limitations.

Inside the sample Vuforia makes heavy use of AVAssetReader and AVAssetReaderOutput in order to perform the following actions. As many people have already  pointed out in the forums, AVAssetReader is responsible for reading from  local file URLs, and does not support remote files.   So, Step 5 in the video-on-texture-rendering is the problematic one, as  you need to decode the video data you get from a remote location to  actual OpenGL data, and then render those data on-screen. Many people  have said in the forums that remote on-texture rendering is not possible  on iOS.

This couldn’t be further from the truth.

The solution

What we need to do is to get the actual OpenGL data ready to be  rendered, and apply those data as a texture onto the texture created by  Vuforia. The SDK and the sample have already created an OpenGL  coordinate system, so all that’s left is to get the OpenGL data, and  divert the data flow from the original sample code.

Instead of using AVAssetReader, we are going to use AVPlayerItemVideoOutput, which was introduced in iOS 6. This class has the method – copyPixelBufferForItemTime:itemTimeForDisplay: , which is exactly the one that we want to use, in order to get the raw OpenGL data to render on the texture.

The following code samples are intended to replace / update the  corresponding functionality on Vuforia’s VideoPlayback sample. The code  can certainly be improved.

First, let’s set up the the video player, and the video output item, in order to later extract the video buffer contents.

- (BOOL)loadMediaURL:(NSURL*)url
{
	BOOL ret = NO;
	asset = [[[AVURLAsset alloc] initWithURL:url options:nil] retain];
	
	if (nil != asset) {
		// We can now attempt to load the media, so report success.  We will
		// discover if the load actually completes successfully when we are
		// called back by the system
		ret = YES;
		
		[asset loadValuesAsynchronouslyForKeys:@[kTracksKey] completionHandler: ^{
			// Completion handler block (dispatched on main queue when loading
			// completes)
			dispatch_async(dispatch_get_main_queue(),^{
				NSError *error = nil;
				AVKeyValueStatus status = [asset statusOfValueForKey:kTracksKey error:&error];
				
				
				NSDictionary *settings = @{(id) kCVPixelBufferPixelFormatTypeKey : @(kCVPixelFormatType_32BGRA)};
				AVPlayerItemVideoOutput *output = [[[AVPlayerItemVideoOutput alloc] initWithPixelBufferAttributes:settings] autorelease];
				self.videoOutput = output;
				
				
				if (status == AVKeyValueStatusLoaded) {
					// Asset loaded, retrieve info and prepare
					// for playback
					if (![self prepareAssetForPlayback]) {
						mediaState = ERROR;
					}
				}
				else {
					// Error
					mediaState = ERROR;
				}
			});
		}];
	}
	
	return ret;
}

After each frame is called, the -updateVideoData is responsible of  preparing the video data for display. The following code is a  modified  sample code from Vuforia, and uses  -copyPixelBufferForItemTime:itemTimeForDisplay: in order to extract the  streamed video content, and bind it with the OpenGL texture that is  being rendered at this point.

// Update the OpenGL video texture with the latest available video data
- (GLuint)updateVideoData
{
	GLuint textureID = 0;
	
	// If currently playing on texture
	if (PLAYING == mediaState && PLAYER_TYPE_ON_TEXTURE == playerType) {
		[latestSampleBufferLock lock];
		
		playerCursorPosition = CACurrentMediaTime() - mediaStartTime;
//		self.playerCursorCurrentCMTIME = self.player.currentTime;
		//		CMTime caCurrentTime = CMTimeMake(self.playerCursorPosition * TIMESCALE, TIMESCALE);
		
		unsigned char* pixelBufferBaseAddress = NULL;
		CVPixelBufferRef pixelBuffer;
		
		
		
		// If we have a valid buffer, lock the base address of its pixel buffer
		//        if (NULL != latestSampleBuffer) {
		//            pixelBuffer = CMSampleBufferGetImageBuffer(latestSampleBuffer);
		pixelBuffer = [self.videoOutput copyPixelBufferForItemTime:player.currentItem.currentTime itemTimeForDisplay:nil];
		
		CVPixelBufferLockBaseAddress(pixelBuffer, 0);
		pixelBufferBaseAddress = (unsigned char*)CVPixelBufferGetBaseAddress(pixelBuffer);
		//        }
		//        else {
		// No video sample buffer available: we may have been asked to
		// provide one before any are available, or we may have read all
		// available frames
		//            DEBUGLOG(@"No video sample buffer available");
		//        }
		
		if (NULL != pixelBufferBaseAddress) {
			// If we haven't created the video texture, do so now
			if (0 == videoTextureHandle) {
				videoTextureHandle = [self createVideoTexture];
			}
			
			glBindTexture(GL_TEXTURE_2D, videoTextureHandle);
			const size_t bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer);
			
			if (bytesPerRow / BYTES_PER_TEXEL == videoSize.width) {
				// No padding between lines of decoded video
				glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, (GLsizei) videoSize.width, (GLsizei) videoSize.height, 0, GL_BGRA, GL_UNSIGNED_BYTE, pixelBufferBaseAddress);
			}
			else {
				// Decoded video contains padding between lines.  We must not
				// upload it to graphics memory as we do not want to display it
				
				// Allocate storage for the texture (correctly sized)
				glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, (GLsizei) videoSize.width, (GLsizei) videoSize.height, 0, GL_BGRA, GL_UNSIGNED_BYTE, NULL);
				
				// Now upload each line of texture data as a sub-image
				for (int i = 0; i < videoSize.height; ++i) {
					GLubyte* line = pixelBufferBaseAddress + i * bytesPerRow;
					glTexSubImage2D(GL_TEXTURE_2D, 0, 0, i, (GLsizei) videoSize.width, 1, GL_BGRA, GL_UNSIGNED_BYTE, line);
				}
			}
			
			glBindTexture(GL_TEXTURE_2D, 0);
			
			// Unlock the buffers
			CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
			
			textureID = videoTextureHandle;
		}
		
		if (pixelBuffer) {
			CFRelease(pixelBuffer);
		}
		
		
		
		[latestSampleBufferLock unlock];
	}
	
	return textureID;
}

There are also some other interventions we must do, in order to change the setup of the video output.

// Prepare the AVURLAsset for playback
- (BOOL)prepareAssetForPlayback
{
    // Get video properties
	NSArray *videoTracks = [self.asset tracksWithMediaType:AVMediaTypeVideo];
	AVAssetTrack *videoTrack = videoTracks[0];
	self.videoSize = videoTrack.naturalSize;
	
    self.videoLengthSeconds = CMTimeGetSeconds([self.asset duration]);
    
    // Start playback at time 0.0
    self.playerCursorStartPosition = kCMTimeZero;
    
    // Start playback at full volume (audio mix level, not system volume level)
    self.currentVolume = PLAYER_VOLUME_DEFAULT;
    
    // Create asset tracks for reading
    BOOL ret = [self prepareAssetForReading:self.playerCursorStartPosition];
    
    if (ret) {
        // Prepare the AVPlayer to play the audio
        [self prepareAVPlayer];
        // Inform our client that the asset is ready to play
        self.mediaState = READY;
    }
    
    return ret;
}


// Prepare the AVURLAsset for reading so we can obtain video frame data from it
- (BOOL)prepareAssetForReading:(CMTime)startTime
{
    BOOL ret = YES;

    // ===== Audio =====
    // Get the first audio track
   NSArray * arrayTracks = [self.asset tracksWithMediaType:AVMediaTypeAudio];
    if (0 < [arrayTracks count]) {
        self.playAudio = YES;
        AVAssetTrack* assetTrackAudio = arrayTracks[0];
	
        AVMutableAudioMixInputParameters* audioInputParams = [AVMutableAudioMixInputParameters audioMixInputParameters];
        [audioInputParams setVolume:self.currentVolume atTime:self.playerCursorStartPosition];
        [audioInputParams setTrackID:[assetTrackAudio trackID]];

        NSArray* audioParams = @[audioInputParams];
        AVMutableAudioMix* audioMix = [AVMutableAudioMix audioMix];
        [audioMix setInputParameters:audioParams];

        AVPlayerItem* item = [self.player currentItem];
        [item setAudioMix:audioMix];
    }
	
    return ret;
}

Those are all the changes one can do in order to setup the video  playback and render to texture the video streamed. However, Vuforia’s  sample must also be updated in many areas in order to understand that  now remote videos CAN be played.

// Indicates whether the movie is playable on texture
- (BOOL)isPlayableOnTexture
{
    // We can render local files on texture
    return YES;
}

That’s it! You may need to do some more minor changes, but this is  the general concept in order to make the tutorial run. This methodology  has been tested with Vuforia 4.0, and works perfectly (and is also used  in an application released to the app store)

Want the full source?

Before you download the source, please understand that there are many  optimisations to be made to the example. Vuforia’s example is  constructed to support iOS 4, and as such, if you target iOS 6 and  later, you can get rid at least half of the code, you can convert the  project to ARC (which is certainly advised), and you can also optimise  the video playback to use hardware acceleration. I have implemented all  of these functionalities to my released applications, however, it would  be confusing writing a tutorial here that would cope with many problems  at once.

Grab the source here!

Edit 2017-06-14

I haven’t touched Vuforia’s SDK for a while. I will not be able to offer support with the newer SDK versions.

I have received some comments mentioning that this solution does not work with the newest versions of the Vuforia SDK.

The point is, however, is that it is impossible by definition for  this solution to not work. Maybe the code will change, but the solution  is valid, since it features streaming a video from a video source,  decoding the data to OpenGL using CMSampleBufferGetImageBuffer(), and  rendering the OpenGL data to a texture.

I’m sorry for the fact I am not able to offer code-level support for  this solution but the methodology is still valid. It requires a little  bit of effort and OpenGL knowledge, but it’s definitely doable.