How do you create augmented reality?
In the process of building an Augmented Reality proof of concept in under 4 weeks (see details here), the team at Valtech evaluated a series of AR frameworks and software development kits (SDKs) that would enable them to rapidly pull in data from a headless CMS (Contentstack) and display it in an Augmented Reality interface on a phone or tablet web browser. Here is their quick research report.
For total beginners to AR (like me), an AR framework is the SDK to merge the digital world on-screen with the physical world in real-life. AR frameworks generally work with a graphics library, bundling a few different technologies under the hood — a vision library that tracks markers, images, or objects in the camera; a lot of math to make points in the camera registered to 3D space — and then hooks to a graphics library to render things on top of the camera view.
Which software is best for our web-based Augmented Reality use case?
The key considerations for the research were:
- Speed. The goal was to create a working prototype as fast as possible. Once we were successfully displaying content and had completed an MVP, we could continue testing more advanced methods of object detection and tracking
- Training custom models
- Identifying and distinguishing objects without explicit markers
- Potentially using OCR as a way to identify product names
- More of a wow-factor
- The team was agnostic on whether to work with marker or image-tracking -- willing to use whichever was most feasible for our use case.
- Object tracking - Since the team was not trying to place objects on a real-world plane (like a floor), they realized they may not need all the features of a native iOS or Android AR library (aside from marker tracking)
- Content display. That said, the framework needed to allow for content to be displayed in a cool and engaging way, even if we didn’t achieve fancy detection methods in 3 weeks
- Something more dynamic than just billboarded text on video
- Maybe some subtle animation touches to emphasize the 3D experience (e.g. very light Perlin movement in z plane)
- Platform. The preference was for a web-based build (not requiring an app installation)
Comparing the available AR Frameworks: Marker tracking, object tracking, and platform-readiness
Here's an overview of our AR / ML library research notes:
- Uses Vuforia*
- Cross-browser & lightweight
Probably the least-effort way to get started - Offers both marker & image tracking. Image tracking uses NFT markers.
- Platforms: Web (works with Three.js or A-Frame.js)
- Has SDK for Three.js.
- SDK seems free; content creation tools are paid
- Image tracking only
- Platforms: Web (Three.js / A-Frame / vanilla JS); Unity; C++
- Not web-based
- Image tracking is straightforward, but can’t distinguish between two similar labels with different text
- Offers both marker & image tracking
- Platforms: iOS
- Uses Vuforia
- Has a complex absolute coordinate system that must be translated into graphics coordinates. No Github updates since 2017.
- Offers both marker & image tracking
- Platforms: Works in Argon4 browser
- Primarily for interacting with specialized AR/VR hardware (headsets, etc.)
- Primarily an AR content publishing tool to create 3D scenes
- Uses template images to match objects in different orientations (allows for perspective distortion.) You can learn more here.
- Marker and image tracking: Yes, sort of...even better. KNIFT is an advanced machine learning model that does NFT (Natural Feature Tracking), or image tracking -- the same as AR.js does, but much better and faster. It doesn't have explicit fiducial markers tracking, but markers are high-contrast simplified images, so it would handle them well, too.
- Platforms: Just Android so far, doesn't seem to have been ported to iOS or Web yet
Google Vision API - product search
- Create a set of product images, match a reference image to find the closest match in the set.
- Cloud-based. May or may not work sufficiently in real-time?
- Image classification
- Platforms: Mobile / web
Google AutoML (Also option for video-based object tracking)
- Train your own models to classify images according to custom labels
- Image classification
- Platforms: Any
- Friendly ML library for the web. Experimented with some samples that used pre-trained models for object detection. Was able to identify “bottles” and track their position.
- Object detection
- Platforms: Web
- AR add-on for p5. Uses WebXR.
- Platforms: Seems geared towards VR / Cardboard
* Vuforia is an API that is popular among a lot of AR apps for image / object tracking. Their tracking technology is widely used in apps and games, but is rivaled by modern computer vision APIs - from Google, for example
Graphics Library Research
Under the hood, browsers usually use WebGL to render 3D to a <canvas> element, but there are several popular graphics libraries that make writing WebGL code easier. Here's what we found in our graphics library research:
- WebGL framework in Javascript. Full control over creating graphics objects, etc., but requires more manual work.
- Examples: Github Repo
- HTML wrapper for Three.js that integrates an entity-component system for composability, as well as a visual 3D inspector. Built on HTML / the DOM
- Easy to create custom components with actions that happen in a lifecycle (on component attach, on every frame, etc.)
- Examples: Github Repo
- WebGL framework with Unity-like editor
- Could be convenient for quickly throwing together complex scenes. You can link out a scene to be displayed on top of a marker, or manually program a scene. Potentially less obvious to visualize / edit / collaborate / see what’s going on in code if you use an editor and publish a scene.
- Slightly unclear how easy it is to dynamically generate scenes based on incoming data / how to instantiate a scene with parameters
- Examples: Github Repo
Recommendations for this project
Here is what we decided to go with for our AR demo.
- Start with AR.js (another option was Zappar) + A-Frame.js for a basic working prototype
- In the longer term, explore options for advanced object recognition and tracking
Read more about determining the best way to do marker tracking; narrowing down the use case and developing the interaction design; and content modeling for AR in our full coverage of week one of development.