One of the showcase features of iOS 11 is ARKit, an augmented-reality mode available on devices powered by A9 and newer chips (basically, 6S and better iPhones, 2017 iPads, and iPad Pros). With ARKit, users hold up the device and view a composite of the video feed and computer-generated imagery (either 2D SpriteKit or 3D SceneKit) that appears “attached” to horizontal surfaces in the real world.

Ultra-brief overview of ARKit

The lion’s share of work in ARKit is done behind-the-scenes on the device. Basically, an ARSession object combines device motion data, the camera’s physical characteristics (focal length, pixel size, etc.), and computational geometry to detect trackable “feature points” in the input video stream, locates them relative to a fixed world coordinate system, and creates ARAnchor objects that can be used to bridge between real-world and computer-generated imagery.

There are limitations. As mentioned previously, ARKit leaves behind older iOS devices. Also, ARKit can only detect horizontal surfaces (ARPlaneDetection is an enumeration that only defines Horizontal, but I have to believe that the limitation is due to some quirky behavior that Apple will fix sooner rather than later). Finally, the computer imagery in an AR scene is rendered above the video and appears above real-world objects that would occlude it.

The five critical concepts in ARKit are:

  • Everything is defined in terms of the world coordinate system, which is initialized soon after the ARSession begins running.
  • Image-processing finds high-contrast points in the real world that are stable from frame-to-frame. These “feature points” are intermediate results that are available to the developer, but mostly inform the system’s detection of ARPlaneAnchor objects. A good number of feature points are necessary for quite a few frames before ARKit detects a plane. Not surprisingly, bright lighting and textured surfaces seem to generate many more trackable feature points than evenly-illuminated matte surfaces.
  • Planes are created, removed, and coalesced as the image-processing continues. Their extent shifts, expands and shrinks, depending on the world-tracking.
  • Computer imagery is anchored to these planes and rendered on the screen as if at that location and orientation in the real world.
  • Because AR processing occurs 60 times per second, for optimal performance you need to be careful about memory and disposing no-longer-needed resources. Xamarin’s Profiler is your friend!

The following two images show these concepts. The first image shows the axes of the world origin “floating in space”, the current feature points as a set of yellow dots, and a small red cube where ARKit placed an ARPlaneAnchor.

The second image shows a conceptual view of the world coordinate system, camera / iOS device, and anchor, as well as showing the plane defined by the ARPlaneAnchor and a piece of 3D SceneKit geometry placed relative to the world coordinate system and the plane.

The Simplest ARKit That Could Work

Developers actually have complete control over rendering in ARKit but many scenarios will use the predefined ARSCNView and ARSKView that provide augmented-reality content based on SceneKit (3D) and SpriteKit (2D) respectively.

The following ARKit program allows you to place a piece of SceneKit geometry (a cube) so that it appears “attached to” a recognized horizontal plane. Before that happens, ARKit has to be initialized and run long enough to recognize a horizontal plane.

I initialize ARKit in two methods: ViewDidLoad and ViewWillAppear. The first to run is ViewDidLoad:

This is a typical iOS UIViewController subclass. It creates a new ARSCNView (the easy-to-use route towards SceneKit 3D geometry), sets its delegate-object to an instance of a class I wrote called ARDelegate (discussed later), turns on some debug visualizations, enables the view to respond to touches, and adds the ARSCNView to the view hierarchy.

The second part of initialization occurs during ViewWillAppear (it may be that this could be done during ViewDidLoad but I am always a little leery of the highly stateful view-initialization process in iOS):

This simply configures the ARSession of the previously-created ARSCNView and begins its processing.

At this point, ARKit begins background processing. On the screen, the user will see the feed from their camera and, in a second or two, the debug visualizations will begin to show the feature cloud and world-coordinate’s origin (similar to the screenshot image shown previously).

At some point after that, ARKit will hopefully discover enough co-planar feature points to track a horizontal plane. When that happens, ARKit will automatically add an ARPlaneAnchor to its world-tracking. That addition triggers the DidAddNode method on the ARSCNView object’s delegate-object, which in our case is ARDelegate:

DidAddNode is called any time a node is added to the view, but we are only interested in special processing if it’s an ARPlaneAnchor, indicating that ARKit is adding to its internal model of horizontal planes. We test for that condition and, if true, call this.PlaceAnchorCube. That method, in turn, creates some SceneKit geometry: a node that holds the plane’s geometry and is positioned in the same world coordinate as the anchor, and a small red box as a visual indicator of the ARPlaneAnchor. Note that because SceneKit uses a scene-graph architecture, the anchorNode position of [0,0,0] is relative to its parent’s position — the planeNode, whose position is based on the planeAnchor, whose position, in turn, is in world coordinates.

Once this method is called, the user will see something essentially identical to the screenshot above.


Once at least one plane is detected and tracked, the user can place additional geometry by touching on the screen. Back in the ViewController class:

Skip to the TouchesBegan method and you can see that what we do is pretty straightforward at a high level: we grab the position of the touch and then perform a hit test for horizontal planes. If that hit-test is successful, we return the position on the first such plane as a tuple of type (SCNVector3, ARPlaneAnchor). The hit-testing is done using the built-in ARSceneView.HitTest(SCNVector3, ARHitTestResultType) method, which projects a ray outward into the augmented-reality “world” and returns an array containing anchors on any of the planes that it intersects, ordered nearest-to-furthest. If that array is not empty, we grab the first and return its position as an SCNVector3 (which we extract from the appropriate components of the anchor’s NMatrix4 matrix). (Historical note: During the iOS 11 beta period, the type used for these matrices switched between row-major and column-major in the Xamarin bindings. If you review code written during the beta period, rotations and translations may appear transposed.)

The PlaceCube method just creates a box 10cm on a side and places it in the augmented-reality “world” at pos, whose value is the SCNVector3 returned by WorldPositionFromHitTest as mentioned above.

The result is something like:

Learn More

The #MadeWithARKit hashtag on Twitter has been a great source of inspiration this Summer, with people demo’ing such great concepts as virtual portals, sound nodes located in space, and the combination of ARKit with video filters.

After the iPhone X launch announcement, Apple revealed some new APIs relating to face detection and mapping, including an ARFaceAnchor and ARFaceGeometry.

All the code for this sample is available at Pull-requests and questions welcome! Make sure to also check out the Introduction to iOS 11 guide in the Xamarin Developer Center.

Discuss this post in the Xamarin Forums