Back to all Blog

Augmented Reality Frameworks for an Enterprise Web-Based AR Application

How do you create augmented reality?

In the process of building an Augmented Reality proof of concept in under 4 weeks (see details here), the team at Valtech evaluated a series of AR frameworks and software development kits (SDKs) that would enable them to rapidly pull in data from a headless CMS (Contentstack) and display it in an Augmented Reality interface on a phone or tablet web browser. Here is their quick research report.

For total beginners to AR (like me), an AR framework is the SDK to merge the digital world on-screen with the physical world in real-life. AR frameworks generally work with a graphics library, bundling a few different technologies under the hood — a vision library that tracks markers, images, or objects in the camera; a lot of math to make points in the camera registered to 3D space — and then hooks to a graphics library to render things on top of the camera view.

Which software is best for our web-based Augmented Reality use case?

The key considerations for the research were:

  • Speed. The goal was to create a working prototype as fast as possible. Once we were successfully displaying content and had completed an MVP, we could continue testing more advanced methods of object detection and tracking
    • Training custom models
    • Identifying and distinguishing objects without explicit markers
    • Potentially using OCR as a way to identify product names
    • More of a wow-factor
  • The team was agnostic on whether to work with marker or image-tracking -- willing to use whichever was most feasible for our use case.
  • Object tracking - Since the team was not trying to place objects on a real-world plane (like a floor), they realized they may not need all the features of a native iOS or Android AR library (aside from marker tracking)
  • Content display. That said, the framework needed to allow for content to be displayed in a cool and engaging way, even if we didn’t achieve fancy detection methods in 3 weeks
    • Something more dynamic than just billboarded text on video
    • Maybe some subtle animation touches to emphasize the 3D experience (e.g. very light Perlin movement in z plane)
  • Platform. The preference was for a web-based build (not requiring an app installation)

Comparing the available AR Frameworks: Marker tracking, object tracking, and platform-readiness

Here's an overview of our AR / ML library research notes:

AR.js

  • Uses Vuforia*
  • Cross-browser & lightweight
    Probably the least-effort way to get started
  • Offers both marker & image tracking. Image tracking uses NFT markers.
  • Platforms: Web (works with Three.js or A-Frame.js)


Zappar WebAR

  • Has SDK for Three.js.
  • SDK seems free; content creation tools are paid
  • Image tracking only
  • Platforms: Web (Three.js / A-Frame / vanilla JS); Unity; C++


ARKit

  • Not web-based
  • Image tracking is straightforward, but can’t distinguish between two similar labels with different text
  • Offers both marker & image tracking
  • Platforms: iOS


Argon.js

  • Uses Vuforia
  • Has a complex absolute coordinate system that must be translated into graphics coordinates. No Github updates since 2017.
  • Offers both marker & image tracking
  • Platforms: Works in Argon4 browser


Web XR

  • Primarily for interacting with specialized AR/VR hardware (headsets, etc.)


XR.plus

  • Primarily an AR content publishing tool to create 3D scenes


Google MediaPipe (KNIFT)

  • Uses template images to match objects in different orientations (allows for perspective distortion.) You can learn more here.
  • Marker and image tracking: Yes, sort of...even better. KNIFT is an advanced machine learning model that does NFT (Natural Feature Tracking), or image tracking -- the same as AR.js does, but much better and faster. It doesn't have explicit fiducial markers tracking, but markers are high-contrast simplified images, so it would handle them well, too. 
  • Platforms: Just Android so far, doesn't seem to have been ported to iOS or Web yet


Google Vision API - product search

  • Create a set of product images, match a reference image to find the closest match in the set.
  • Cloud-based. May or may not work sufficiently in real-time?
  • Image classification
  • Platforms: Mobile / web


Google AutoML (Also option for video-based object tracking)

  • Train your own models to classify images according to custom labels
  • Image classification
  • Platforms: Any


Ml5.js

  • Friendly ML library for the web. Experimented with some samples that used pre-trained models for object detection. Was able to identify “bottles” and track their position.
  • Object detection
  • Platforms: Web


p5xr

  • AR add-on for p5. Uses WebXR.
  • Platforms: Seems geared towards VR / Cardboard

* Vuforia is an API that is popular among a lot of AR apps for image / object tracking. Their tracking technology is widely used in apps and games, but is rivaled by modern computer vision APIs - from Google, for example

Graphics Library Research

Under the hood, browsers usually use WebGL to render 3D to a <canvas> element, but there are several popular graphics libraries that make writing WebGL code easier. Here's what we found in our graphics library research:

Three.js

  • WebGL framework in Javascript. Full control over creating graphics objects, etc., but requires more manual work.
  • Examples: Github Repo


A-Frame.js

  • HTML wrapper for Three.js that integrates an entity-component system for composability, as well as a visual 3D inspector. Built on HTML / the DOM
  • Easy to create custom components with actions that happen in a lifecycle (on component attach, on every frame, etc.)
  • Examples: Github Repo


PlayCanvas

  • WebGL framework with Unity-like editor
  • Could be convenient for quickly throwing together complex scenes. You can link out a scene to be displayed on top of a marker, or manually program a scene. Potentially less obvious to visualize / edit / collaborate / see what’s going on in code if you use an editor and publish a scene.
  • Slightly unclear how easy it is to dynamically generate scenes based on incoming data / how to instantiate a scene with parameters
  • Examples: Github Repo


Recommendations for this project

Here is what we decided to go with for our AR demo.

  • Start with AR.js (another option was Zappar) + A-Frame.js for a basic working prototype
  • In the longer term, explore options for advanced object recognition and tracking

Read more about determining the best way to do marker tracking; narrowing down the use case and developing the interaction design; and content modeling for AR in our full coverage of week one of development.