Intention Recognition

Trainable software for recognizing and utilizing interaction between humans, objects, and digital interfaces.

Intention Recognition (IR) is an intuitive interaction-to-interface software, exposed as a streaming API. Using computer vision and machine learning it detects objects and gestures to control digital interfaces.

IR removes the need for peripheral devices such as keyboards, mouses, and touchscreens. With it, you can turn any space into a physical-digital hybrid environment. Physical interactions controlling digital interfaces.

Combining physical artifacts with digital interfaces opens up a new space for many situations. Pick up a book in the library and receive extra relevant information. Work with physical archive material that is directly connected to its digital inventory system.

IR is trainable software that detects gestures and actions. Both the gestures and actions a human makes and the gesture and actions a human does with a particular object. IR can be trained by feeding it visual information of the objects it needs to detect. This way IR is fit for any situation and remains fast for each purpose.

In IR we make a distinction between gestures and actions. Gestures are simple single queues such as a fist or holding your hand with only an extended index finger. Actions are more complex as they are a sequence of gestures and only a specific sequence makes an action. For humans though actions are really simple such as waving your hand or making a sweeping movement. IR can be trained to recognize specific gestures and actions as well.

Besides recognizing objects and humans moving we’ve built IR to recognize the interaction between the human and the object. Creating a powerful addition to the possible gestures and actions it can detect. Your physical interaction with objects becomes directly linked to digital interaction.

IR is easily implemented in any modern framework using its streaming API. Fetching and querying is reliable and fast making the API fit in all sorts of applications, be it websites, digital catalogs, or mobile applications.


  • Gesture and action recognition

    IR recognizes simple gestures and complex actions which can be used to control digital interfaces. The nature of gestures and actions can be of any type: hand, feet, or full body.

  • Human-with-object interaction

    The way someone interacts with an object is very specific to both the type of object and the intent of the person. Be it skimming through a book or holding a wine bottle.

    These interactions are meaningful and can be augmented with our digital systems through IR. IR allows for defining specific human-with-object interactions so that those can be utilized to control an interface.

  • Trainable and customizable

    IR can be trained on specific objects, human gestures/actions, and human-with-object gestures/actions. This feature makes IR ideal for very specific situations and stays optimized for each job.

  • Almost zero hardware needed

    IR comes in a very compact package and works with any regular, budget, or embedded camera. It also makes a range of standard hardware unnecessary such as keyboards, mouses, touchscreens, depth-cameras, sensors, and other peripheral devices.

  • Local off-the-grid

    IR can be installed locally without using the API when stable connections are not guaranteed. This will however slightly increase the hardware needed and require a power source. Ideal for unique situations such as public installations.

How it works

Intention Recognition’s architecture is split into two endpoint groups. Each is concerned with a particular geometry.

The two endpoint groups don’t have a specific order but the first group is for human-action detection and classification. In this group, we have a machine learning model to detect and learn new actions performed by humans. We have two endpoints designed for this group, a hand-based endpoint, and a facial-based endpoint.

The other group is the human-asset-action detection and classification group. Here our machine learning model is located to detect a defined object and actions done with that object. For this group, we have one endpoint.

Each endpoint can be used with a special state, called observer mode. This mode sets IR in observation mode in which IR keeps observing the incoming input. Incoming non-predefined actions are classified and can be used after the observation. This mode is great for an intuitive approach without the need to follow predefined actions and gestures. 

Both endpoint groups receive video input using WebRTC (for improved performance) and analyze the sequence of frames to detect an action.


  • April 2020

    Designing software architecture

    Key for having robust and optimized software.

  • June 2020

    Building collection of training datasets

    12+ datasets and around 100,000 labeled entries both in videos and images.

  • July 2020

    Deployment of models

    Seeing the result of all the training.

  • August 2020

    Further training

    Both our action model and object-interaction model needed more training.

  • April 2021

    Opening API (beta) demo registration form

  • June 2021

    Designing and development API (beta) interface

  • August 2021

    Exclusive release beta version for registered customers

  • September 2021

    Retrieving feedback from the beta release

  • October 2021

    Development API (v1.0.0)

  • February 2022

    Launch of API v1.0.0


  • Amir Houieh

    Co-founder and developer

  • Martijn de Heer

    Co-founder and designer

  • Homayuon Moradi

    Computer vision engineer

  • Ada Popowicz

    Designer, intern