Vuforia SDK + remote video streaming on iOS
I recently have undertaken a project on iOS that requires integration with the Vuforia SDK. It’s an augmented reality proprietary framework, built for iOS and Android and has been very popular due to its innovative recognition library. One of the coolest demos that are appealing to advertisers or people looking to incorporate commercial campaigns inside their applications concerns the ability to play a video on top of a target. Vuforia even provides a sample application for that. However, remote video streaming on texture does not work on textures.
This is a long standing issue, with people on the forums asking for a solution, some providing either free solutions which are outdated and / or non-performant, or paid solutions that are very expensive.
The video-on-texture-rendering process in general
In order to stream a video onto an OpenGL texture, the following actions must happen:
- Initialise a renderable surface. This is a once-per-session operation.
- Create a texture with an ID.
- Assign the texture to this surface (applying shaders, etc, etc).
- On each render, apply transformations to the renderable surface, according to the coordinates of the recognised object.
- Get the video bytes and decode them to get actual video data.
- Convert the video data to OpenGL data (ready to be drawn)
- Apply those video data to the texture you have gained from step 2.
Steps 1 – 3 are already happening inside Vuforia’s sample. Step 4 is also the reason why the vuforia SDK exists; to give you the transformation and coordinates of a recognised object inside the world coordinate space. Therefore, step 4 is also included in the sample (and all samples from vuforia).
The difficult part, and the part that Vuforia SDK is not responsible for, is step 5 – 7. This is where we, the third party developers come into play.
The actual problem with Vuforia’s sample:
As I have already mentioned, Vuforia’s SDK is only responsible for recognising an object into world space, and providing you with its coordinate and transform. What you do with these information is up to you. Therefore, Vuforia’s VideoPlayback sample should be taken as a demonstration of what you can do with it, not see its limitations.
Inside the sample Vuforia makes heavy use of AVAssetReader and AVAssetReaderOutput in order to perform the following actions. As many people have already pointed out in the forums, AVAssetReader is responsible for reading from local file URLs, and does not support remote files. So, Step 5 in the video-on-texture-rendering is the problematic one, as you need to decode the video data you get from a remote location to actual OpenGL data, and then render those data on-screen. Many people have said in the forums that remote on-texture rendering is not possible on iOS.
This couldn’t be further from the truth.
The solution
What we need to do is to get the actual OpenGL data ready to be rendered, and apply those data as a texture onto the texture created by Vuforia. The SDK and the sample have already created an OpenGL coordinate system, so all that’s left is to get the OpenGL data, and divert the data flow from the original sample code.
Instead of using AVAssetReader, we are going to use AVPlayerItemVideoOutput, which was introduced in iOS 6. This class has the method – copyPixelBufferForItemTime:itemTimeForDisplay: , which is exactly the one that we want to use, in order to get the raw OpenGL data to render on the texture.
The following code samples are intended to replace / update the corresponding functionality on Vuforia’s VideoPlayback sample. The code can certainly be improved.
First, let’s set up the the video player, and the video output item, in order to later extract the video buffer contents.
- (BOOL)loadMediaURL:(NSURL*)url
{
BOOL ret = NO;
asset = [[[AVURLAsset alloc] initWithURL:url options:nil] retain];
if (nil != asset) {
// We can now attempt to load the media, so report success. We will
// discover if the load actually completes successfully when we are
// called back by the system
ret = YES;
[asset loadValuesAsynchronouslyForKeys:@[kTracksKey] completionHandler: ^{
// Completion handler block (dispatched on main queue when loading
// completes)
dispatch_async(dispatch_get_main_queue(),^{
NSError *error = nil;
AVKeyValueStatus status = [asset statusOfValueForKey:kTracksKey error:&error];
NSDictionary *settings = @{(id) kCVPixelBufferPixelFormatTypeKey : @(kCVPixelFormatType_32BGRA)};
AVPlayerItemVideoOutput *output = [[[AVPlayerItemVideoOutput alloc] initWithPixelBufferAttributes:settings] autorelease];
self.videoOutput = output;
if (status == AVKeyValueStatusLoaded) {
// Asset loaded, retrieve info and prepare
// for playback
if (![self prepareAssetForPlayback]) {
mediaState = ERROR;
}
}
else {
// Error
mediaState = ERROR;
}
});
}];
}
return ret;
}
After each frame is called, the -updateVideoData is responsible of preparing the video data for display. The following code is a modified sample code from Vuforia, and uses -copyPixelBufferForItemTime:itemTimeForDisplay: in order to extract the streamed video content, and bind it with the OpenGL texture that is being rendered at this point.
// Update the OpenGL video texture with the latest available video data
- (GLuint)updateVideoData
{
GLuint textureID = 0;
// If currently playing on texture
if (PLAYING == mediaState && PLAYER_TYPE_ON_TEXTURE == playerType) {
[latestSampleBufferLock lock];
playerCursorPosition = CACurrentMediaTime() - mediaStartTime;
// self.playerCursorCurrentCMTIME = self.player.currentTime;
// CMTime caCurrentTime = CMTimeMake(self.playerCursorPosition * TIMESCALE, TIMESCALE);
unsigned char* pixelBufferBaseAddress = NULL;
CVPixelBufferRef pixelBuffer;
// If we have a valid buffer, lock the base address of its pixel buffer
// if (NULL != latestSampleBuffer) {
// pixelBuffer = CMSampleBufferGetImageBuffer(latestSampleBuffer);
pixelBuffer = [self.videoOutput copyPixelBufferForItemTime:player.currentItem.currentTime itemTimeForDisplay:nil];
CVPixelBufferLockBaseAddress(pixelBuffer, 0);
pixelBufferBaseAddress = (unsigned char*)CVPixelBufferGetBaseAddress(pixelBuffer);
// }
// else {
// No video sample buffer available: we may have been asked to
// provide one before any are available, or we may have read all
// available frames
// DEBUGLOG(@"No video sample buffer available");
// }
if (NULL != pixelBufferBaseAddress) {
// If we haven't created the video texture, do so now
if (0 == videoTextureHandle) {
videoTextureHandle = [self createVideoTexture];
}
glBindTexture(GL_TEXTURE_2D, videoTextureHandle);
const size_t bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer);
if (bytesPerRow / BYTES_PER_TEXEL == videoSize.width) {
// No padding between lines of decoded video
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, (GLsizei) videoSize.width, (GLsizei) videoSize.height, 0, GL_BGRA, GL_UNSIGNED_BYTE, pixelBufferBaseAddress);
}
else {
// Decoded video contains padding between lines. We must not
// upload it to graphics memory as we do not want to display it
// Allocate storage for the texture (correctly sized)
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, (GLsizei) videoSize.width, (GLsizei) videoSize.height, 0, GL_BGRA, GL_UNSIGNED_BYTE, NULL);
// Now upload each line of texture data as a sub-image
for (int i = 0; i < videoSize.height; ++i) {
GLubyte* line = pixelBufferBaseAddress + i * bytesPerRow;
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, i, (GLsizei) videoSize.width, 1, GL_BGRA, GL_UNSIGNED_BYTE, line);
}
}
glBindTexture(GL_TEXTURE_2D, 0);
// Unlock the buffers
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
textureID = videoTextureHandle;
}
if (pixelBuffer) {
CFRelease(pixelBuffer);
}
[latestSampleBufferLock unlock];
}
return textureID;
}
There are also some other interventions we must do, in order to change the setup of the video output.
// Prepare the AVURLAsset for playback
- (BOOL)prepareAssetForPlayback
{
// Get video properties
NSArray *videoTracks = [self.asset tracksWithMediaType:AVMediaTypeVideo];
AVAssetTrack *videoTrack = videoTracks[0];
self.videoSize = videoTrack.naturalSize;
self.videoLengthSeconds = CMTimeGetSeconds([self.asset duration]);
// Start playback at time 0.0
self.playerCursorStartPosition = kCMTimeZero;
// Start playback at full volume (audio mix level, not system volume level)
self.currentVolume = PLAYER_VOLUME_DEFAULT;
// Create asset tracks for reading
BOOL ret = [self prepareAssetForReading:self.playerCursorStartPosition];
if (ret) {
// Prepare the AVPlayer to play the audio
[self prepareAVPlayer];
// Inform our client that the asset is ready to play
self.mediaState = READY;
}
return ret;
}
// Prepare the AVURLAsset for reading so we can obtain video frame data from it
- (BOOL)prepareAssetForReading:(CMTime)startTime
{
BOOL ret = YES;
// ===== Audio =====
// Get the first audio track
NSArray * arrayTracks = [self.asset tracksWithMediaType:AVMediaTypeAudio];
if (0 < [arrayTracks count]) {
self.playAudio = YES;
AVAssetTrack* assetTrackAudio = arrayTracks[0];
AVMutableAudioMixInputParameters* audioInputParams = [AVMutableAudioMixInputParameters audioMixInputParameters];
[audioInputParams setVolume:self.currentVolume atTime:self.playerCursorStartPosition];
[audioInputParams setTrackID:[assetTrackAudio trackID]];
NSArray* audioParams = @[audioInputParams];
AVMutableAudioMix* audioMix = [AVMutableAudioMix audioMix];
[audioMix setInputParameters:audioParams];
AVPlayerItem* item = [self.player currentItem];
[item setAudioMix:audioMix];
}
return ret;
}
Those are all the changes one can do in order to setup the video playback and render to texture the video streamed. However, Vuforia’s sample must also be updated in many areas in order to understand that now remote videos CAN be played.
// Indicates whether the movie is playable on texture
- (BOOL)isPlayableOnTexture
{
// We can render local files on texture
return YES;
}
That’s it! You may need to do some more minor changes, but this is the general concept in order to make the tutorial run. This methodology has been tested with Vuforia 4.0, and works perfectly (and is also used in an application released to the app store)
Want the full source?
Before you download the source, please understand that there are many optimisations to be made to the example. Vuforia’s example is constructed to support iOS 4, and as such, if you target iOS 6 and later, you can get rid at least half of the code, you can convert the project to ARC (which is certainly advised), and you can also optimise the video playback to use hardware acceleration. I have implemented all of these functionalities to my released applications, however, it would be confusing writing a tutorial here that would cope with many problems at once.
Edit 2017-06-14
I haven’t touched Vuforia’s SDK for a while. I will not be able to offer support with the newer SDK versions.
I have received some comments mentioning that this solution does not work with the newest versions of the Vuforia SDK.
The point is, however, is that it is impossible by definition for this solution to not work. Maybe the code will change, but the solution is valid, since it features streaming a video from a video source, decoding the data to OpenGL using CMSampleBufferGetImageBuffer(), and rendering the OpenGL data to a texture.
I’m sorry for the fact I am not able to offer code-level support for this solution but the methodology is still valid. It requires a little bit of effort and OpenGL knowledge, but it’s definitely doable.