Selecting an Augmented Reality Framework for an Enterprise Web-Based AR Application

How do you create augmented reality?

In the process of building an Augmented Reality proof of concept in under 4 weeks (see details here), the team at Valtech evaluated a series of AR frameworks and software development kits (SDKs) that would enable them to rapidly pull in data from a headless CMS (Contentstack) and display it in an Augmented Reality interface on a phone or tablet web browser. Here is their quick research report.

For total beginners to AR (like me), an AR framework is the SDK to merge the digital world on-screen with the physical world in real-life. AR frameworks generally work with a graphics library, bundling a few different technologies under the hood — a vision library that tracks markers, images, or objects in the camera; a lot of math to make points in the camera registered to 3D space — and then hooks to a graphics library to render things on top of the camera view.

Which software is best for our web-based Augmented Reality use case?

The key considerations for the research were:

Speed. The goal was to create a working prototype as fast as possible. Once we were successfully displaying content and had completed an MVP, we could continue testing more advanced methods of object detection and tracking
- Training custom models
- Identifying and distinguishing objects without explicit markers
- Potentially using OCR as a way to identify product names
- More of a wow-factor

The team was agnostic on whether to work with marker or image-tracking -- willing to use whichever was most feasible for our use case.

Object tracking - Since the team was not trying to place objects on a real-world plane (like a floor), they realized they may not need all the features of a native iOS or Android AR library (aside from marker tracking)

Content display. That said, the framework needed to allow for content to be displayed in a cool and engaging way, even if we didn’t achieve fancy detection methods in 3 weeks
- Something more dynamic than just billboarded text on video
- Maybe some subtle animation touches to emphasize the 3D experience (e.g. very light Perlin movement in z plane)

Platform. The preference was for a web-based build (not requiring an app installation)

Comparing the available AR Frameworks: Marker tracking, object tracking, and platform-readiness

Here's an overview of our AR / ML library research notes:

AR.js

Uses Vuforia*
Cross-browser & lightweight
Probably the least-effort way to get started
Offers both marker & image tracking. Image tracking uses NFT markers.
Platforms: Web (works with Three.js or A-Frame.js)

Zappar WebAR

Has SDK for Three.js.
SDK seems free; content creation tools are paid
Image tracking only
Platforms: Web (Three.js / A-Frame / vanilla JS); Unity; C++

ARKit

Not web-based
Image tracking is straightforward, but can’t distinguish between two similar labels with different text
Offers both marker & image tracking
Platforms: iOS

Argon.js

Uses Vuforia
Has a complex absolute coordinate system that must be translated into graphics coordinates. No Github updates since 2017.
Offers both marker & image tracking
Platforms: Works in Argon4 browser

Web XR

Primarily for interacting with specialized AR/VR hardware (headsets, etc.)

XR.plus

Primarily an AR content publishing tool to create 3D scenes

Google MediaPipe (KNIFT)

Uses template images to match objects in different orientations (allows for perspective distortion.) You can learn more here.
Marker and image tracking: Yes, sort of...even better. KNIFT is an advanced machine learning model that does NFT (Natural Feature Tracking), or image tracking -- the same as AR.js does, but much better and faster. It doesn't have explicit fiducial markers tracking, but markers are high-contrast simplified images, so it would handle them well, too.
Platforms: Just Android so far, doesn't seem to have been ported to iOS or Web yet

Google Vision API - product search

Create a set of product images, match a reference image to find the closest match in the set.
Cloud-based. May or may not work sufficiently in real-time?
Image classification
Platforms: Mobile / web

Google AutoML (Also option for video-based object tracking)

Train your own models to classify images according to custom labels
Image classification
Platforms: Any

Ml5.js

Friendly ML library for the web. Experimented with some samples that used pre-trained models for object detection. Was able to identify “bottles” and track their position.
Object detection
Platforms: Web

p5xr

AR add-on for p5. Uses WebXR.
Platforms: Seems geared towards VR / Cardboard

* Vuforia is an API that is popular among a lot of AR apps for image / object tracking. Their tracking technology is widely used in apps and games, but is rivaled by modern computer vision APIs - from Google, for example

Graphics Library Research

Under the hood, browsers usually use WebGL to render 3D to a <canvas> element, but there are several popular graphics libraries that make writing WebGL code easier. Here's what we found in our graphics library research:

Three.js

WebGL framework in Javascript. Full control over creating graphics objects, etc., but requires more manual work.
Examples: Github Repo

A-Frame.js

HTML wrapper for Three.js that integrates an entity-component system for composability, as well as a visual 3D inspector. Built on HTML / the DOM
Easy to create custom components with actions that happen in a lifecycle (on component attach, on every frame, etc.)
Examples: Github Repo

PlayCanvas

WebGL framework with Unity-like editor
Could be convenient for quickly throwing together complex scenes. You can link out a scene to be displayed on top of a marker, or manually program a scene. Potentially less obvious to visualize / edit / collaborate / see what’s going on in code if you use an editor and publish a scene.
Slightly unclear how easy it is to dynamically generate scenes based on incoming data / how to instantiate a scene with parameters
Examples: Github Repo

Recommendations for this project

Here is what we decided to go with for our AR demo.

Start with AR.js (another option was Zappar) + A-Frame.js for a basic working prototype
In the longer term, explore options for advanced object recognition and tracking

Read more about determining the best way to do marker tracking; narrowing down the use case and developing the interaction design; and content modeling for AR in our full coverage of week one of development.

Augmented Reality Frameworks for an Enterprise Web-Based AR Application

How do you create augmented reality?

Which software is best for our web-based Augmented Reality use case?

Comparing the available AR Frameworks: Marker tracking, object tracking, and platform-readiness

Graphics Library Research

Recommendations for this project