DevLog: How I created my first AR NodeJS-backed web service

Back when NodeJS was in the mid 10.x, a client of mine had an idea of a free application that would leverage technology in order to promote his monthly magazine. He wanted the application to leverage augmented reality features in order to play videos on top of specific images that would be featured into pages of his magazine which concerns the automotive industry.

Conceptualising

We sat together one afternoon and written down our ideas on what would later become a full-featured web and mobile application.

  • The application would play videos on top of pages of my client’s magazine, using the camera of the mobile device.
  • The videos that would be played, should be uploaded into a web server.
  • We should be able to provide a solution that would allow my client to associate images to be recognised with the uploaded videos
  • My client also had some partners (around 700), that sold tires and made repairs to vehicles. He wanted to promote his partners as well.

After carefully examining the problem, we ended up deciding our approach:

  • Our solution should be native iOS and Android, because it would incorporate augmented reality features, so flexibility in terms of performance and features would be the key. 
  • A backend server would be created to provide the following capabilities:
    • Uploading videos with tags
    • Uploading images for the mobile app to recognise in the magazine’s pages
    • Handling the data entry for my customer to advertise his partners. The user should be able to enter the location of his partner (using google maps), upload photos, and fill in other arbitrary information in the form of text.
    • Serve the mobile application with data.

There we were, the project began. I will focus more on the AR features, and I will leave the rest of the features outside of this post.

Choosing the technology stack

Augmented Reality

The core concept of the application would be the video streaming by recognising images which will not be embedded in the app. I needed a good image recognition library, which was cloud backed.

I examined many alternatives, but only one framework had reached stable status and provided all of these features: Vuforia. My client had already proposed this in advance, but I did my research prior to accepting it, because vuforia’s solution is considered to be much more “barebones” than other ones offered at the time (like Meta.io, which was later acquired by Apple), because it required to go deep into OpenGL programming, or use an external rendering engine.

Vuforia already provided many examples, one of them showcasing how to play video on top of a recognised image, but was using some of Apple’s components which didn’t allow downloading video data from a remote server.

Many people were saying that the only way to do what I wanted was to use Unity (which was not an option for me). I was almost ready to give up, but I decided to steer away from Vuforia’s examples and start reading more about Vuforia’s and iOS OpenGL internals. I started rewriting Vuforia’s example from scratch.

The key was to separate the logic between the representation of the video data onto a texture, and the way the device acquires this video data and transforms it to OpenGL.

After some days, I found the solution, and I wrote a post here:

Vuforia SDK + remote video streaming on iOS

It was done. I solved an issue many people had, and the post is now stickied in Vuforia’s forums. That was a huge step.

Backend

I chose NodeJS as my backend system, mainly because of its orientation around rapid prototyping. As a hosting provider, I had the choice of using a sandbox like Heroku, rent my own VPS. I chose the latter, because I wanted a file management solution, and the ability to scale without blowing our budget out of proportion. We rented a dual core system w/ 2 GB RAM and an SSD for a start. As a hosting provider, we chose OVH, because of the price-to-features ratio. Much has been said about OVH’s support, mostly bad. I hadn’t have any issues to date, and I was always able to setup a basic linux server, so I figured it was a good choice.

 I setup an ubuntu 14.0.4 LTS system, and I decided to start my development on the following backend stack:

  • NodeJS 0.10.x
  • MongoDB 3.0.6
  • AngularJS 1.3

I chose MongoDB w/ mongoose, because I figured it would be easier to change my schema later on. Javascript really shines when using MongoDB, however it was the last project I chose to use it. I dropped it in favour of PostgreSQL, since for larger projects I believe data integrity always comes before ease of use.

Web Frontend

As a website, the frontend should be visible to my client only, therefore I needed to create an administrative dashboard. I eventually used SmartAdmin, but I ended up regretting it. It’s a great theme, with much attention paid to detail. It provides examples for using the them along with some famous libraries, like Dropzone (which I ended up making extensive use of). However, the Angular scaffolding made was nearly incomprehensible, especially for me as I was just starting w/ Angular. If I would use the same frontend theme again, I would start with the HTML 5 template for the styling of the UI, and write everything else (like the animations) from scratch. Or I would use another admin template.

Mobile App

Back in 2014, there weren’t many cross-platform choices available. Even if they were, I would still choose native over everything else, since I wanted to integrate Vuforia, which required C++ linkage, and access to C++ interfaces.

AFNetworking and MBProgressHUD were the only 3rd party frameworks I used. For advertisements, we chose Mopub, because it was the only one providing a way of creating custom ad, and also allowed native ads (remember – 2014!).

Server configuration

At first, I chose JXCore as a runtime engine, because of its simple multicore support. NodeJS uses a single thread and a single core by default, but I was using a dual-core machine, so I needed more granularity. JXCore didn’t actually help me, however, since I needed to start writing JXCore-specific code in order to handle things such as buffers. I didn’t have any problem doing so, but I was afraid to steer away from the standard programming practices and examples of plain NodeJS, because it was a relatively new ecosystem and when NodeJS was forked, it was clear that I needed to be careful regarding my runtime engine choices.

I migrated to NodeJS and used PM2 as a process manager, and ran 4 instances of the backend API, to support many concurrent users.

My server was running on port 4000, and I should be able to serve static files like videos and photos. NodeJS is infamous for its bad performance when serving static files, therefore, I decided to use NginX when serving static files. I wrote a post about configuring it here:

NodeJS, Varnish + NginX

I set up two upstream sources at Nginx, one acting as a proxy between port 80 and 4000 (for the backend API) and another one serving static resources (uploaded photos and videos from the file system). 

Both are listening to port 443, and the one serving the files is listening also to 443, but it points to another path.

When it was first created, I had used a paid service for obtaining an SSL certificate. Now, I am using Let’s Encrypt, which is free and works for my case (which is encrypting the requests with SSL). 

Getting down to business

I started by using MeanJS as my seed. It was using express, and had a grunt task runner, for building an uglified and packaged version of the frontend. It took me one month (part-time) to set up the prototype web service.

I believe the most difficult part was the file management, and how we were going to associate the videos uploaded with Vuforia. Vuforia doesn’t let you upload videos to the server. Instead, it only allows uploading the image to be recognised, and a limited amount of metadata for the image. I decided to store JSON  information to the metadata, linking this image with the ID of the video file ready to be played. In addition, when uploading a marker to Vuforia, you will have to wait approx. 20 minutes until the image is ready to be used (until then, it will be in “processing” state, unable to be altered, used or deleted).

So here’s the process for the backend:

  • I developed a specific frontend section where videos would be uploaded at will, and associated with a name.
  • In the “markers” section, I offer the option of uploading an image file to vuforia, and associating it with:
    • The id of  video to be played
    • A description of the video
    • Add the option for the video to be played on-camera or full screen
  • When the user uploaded a marker, I would store this image to the local store, and then create a polling function, and would add it to a pool of polling functions. Those functions would query Vuforia directly about images that are not currently finished processing. In the meantime, the frontend would not be allowed to make any changes to the target.
  • When a polling function returns that an image had completed successfully, it would be removed from the pool. If not, then prior to removing it, I would remove all associated images and files with it.

For the mobile application:

  • Present the camera to the user
  • When a target gets recognised, its metadata (previously uploaded to Vuforia) would indicate the ID of the video ready to be played. With this ID, I would make a request to the server to get the full path of the video.
  • I would use Apple’s AVPlayerItem to download OpenGL data to the device’s renderbuffer, and then render the video content to where Vuforia’s recognition library indicated through their SDK.

If you noticed, I mentioned that when downloading the metadata from the recognised marker, I would still need to perform a REST request to the server to get the file’s location. I opted using this round trip instead of actually hardcoding the file path to the metadata. The reason for this is because I wanted granularity in my local data. Since Vuforia doesn’t handle anything other than uploading a marker, data syncing between Vuforia and my own server was a pain. I needed the opportunity to change the location of the file, without contacting Vuforia again.

I developed the application for the iOS in Objective C++. I also had the ability to use Unity, but I am happy I didn’t. The application is now very lightweight, and the remote video streaming worked flawlessly.

It took me 2 months to have my prototype ready. During these two months I used:

  • Crashlytics for distributing betas to my client
  • AFNetworking for performing network requests
  • PromiseKit.

Continuous Integration / Deployment

Deploying to the server was  a time consuming process for me, so I wanted to streamline my deployment process a bit more, in order to save time for development. For a long time now, I am using TeamCity for my CI and CD needs. 

There are much cheaper solutions out there, however, I really cannot get away from Jetbrain’s attention to quality and stability. Teamcity has been proven an invaluable tool, and didn’t hide any surprises for me.

pm2run is a PM2 process file, configuring pm2 in order to run this process in cluster mode, with 4 instances.

This way, each time I committed something in git, in the ‘deployment’ branch, Teamcity would immediately deploy the new web app with all my changes.  This is a very simplified method of restarting  server, and can result in a downtime of 1-10 seconds during each deployment. I now have an Nginx load balancer in front, and I am deploying the service progressively in two instances, redirecting the Nginx to the active one each time, until the other one has finished deployment.

Writing down the API

The Augmented reality service is not the only feature of the service, it has many more (panic buttons, location awareness, seeing associated nearby partners on the map, loyalty features, and many more)

The mobile REST API grew up to be quite large.
apiary_screenAs mentioned, the web application would serve an Android and an iOS Application. The android application would not be created by me, therefore I also needed to communicate the API specifications to another developer.

I used Apiary.io for this purpose.

I don’t know if you have the habit of documenting your API’s when you write them. If you don’t then you should. And when you do, you should use Apiary. Back in 2014, it supported only the Markdown format for documenting the APIs, but they lately added support for Swagger.

This literally saved me hundreds of hours in communicating the specifications to the other guy developing the Android version of the app.

Release mode

After much testing, the initial prototype was released to the App Store.

There are many things to be done still, but I really enjoyed it. I think it was one of the most interesting things I developed the last few years, due to its multi-discipline requirements regarding development.