Computer Vision Drives the Understanding of Business Processes
In this blog post, we will learn how Computer Vision Drives the Understanding of Business Processes. To future-proof and optimize business processes, organizational leaders have embraced digital transformation. Specifically, they are turning to process mining – a family of techniques that pulls relevant data from event logs and creates process maps to discover opportunities to optimize processes and perform audit and compliance checks.
In more detail, an event log records an employee's tasks during a specific point in time. Since the events are mostly closed transactions, this model opens a window into the methods an employee uses, and the steps it takes to complete a task. While most process mining companies are employing this structured, event log model, Skan is harnessing the power of computer vision to create visual process logs and analyze data in real-time.
What makes visual logs so powerful?
In short, event logs fail to capture the interaction between human beings and devices. That remarkably untouched data is what Skan taps into. Through observing employee processes, leaders can identify bottlenecks and come up with targeted solutions and interventions. For instance, if the tool records suboptimal behavioral patterns – such as an employee scrolling for an extended period before clicking on an item or going back and forth before making a decision – s/he could be selected for training. Similarly, if the visual log shows that a compliance check is taking longer than expected, an intervention can be suggested to streamline the process.
Though it sounds straightforward, computer vision – in its attempt to compete with human intuition and mind – must wade through various levels of complexity.
Four levels of computer vision
Let's consider a self-driving car to see how it works its way through four successive levels of complexity.
- Identification: Imagine you are on a highway seated in your autonomous vehicle. As you come across an elephant, without consciously realizing it, you identify and label it. Even though it's the first time you've had such a sighting, your confidence stems from your visual perception, abstract reasoning skills, knowledge, and indirect experiences with elephants. Alternatively, your car's computer vision software "sees" an array of pixels that it analyzes to make sense of the image. If the software is not trained to identify a picture in this context, it may either not detect it, or it may do so inaccurately.
- Semantics: In the next level, the computer vision software interprets the scene it records. In the first step, the computer just identified that there are cars in the scene, but are they in the parking lot or on the street? This kind of scene interpretation between the objects in the scene happens next.
- Dynamics: The third level absorbs information from multiple scenes that are captured by a single camera in a timeline view. If the back camera shows you a moving car in level one, you can compute the speed of the vehicle and predict that (in a few seconds) it will appear in your front view. In another case, if you are driving in stop-and-go traffic, you can tell from the video whether the car in front of you has stopped or is moving.
- Collation: The fourth level is the highest in complexity, as it involves stitching multiple scenes together using several cameras and then collating the data. If the rear-view camera shows you a car that's traveling at a higher speed than yours, you can predict that it will soon come into your front camera view.
Illustration 1: The 4-levels of Computer Vision
Computer Vision Drives the Understanding of Business Processes:
Despite its challenges, applying computer vision to process mining has significant potential to transform business processes. Here's what it looks like in the context of applying for a mortgage.
- Identification: As the home buyer begins the mortgage process, the representative asks him or her to fill out an application with personal information, employment information, and the new home details. Here, the OCR (optical character recognition) technology will read the text the lender feeds into the system and convert it into a format that the computer can identify, i.e., structured textual representation.
- Semantics: This step involves making sense of the data it reads. If John, a Skan employee, is applying for the loan, the software will identify his name, his company's name, his residential address, and other pertinent details.
- Dynamics: Next, the software tracks the information as it flows from one screen to another. It limits itself to capturing the successive actions of a single employee. In the case of the mortgage processor, it will record the steps s/he takes to prepare an applicant's file before sending it to a lender for approval.
- Collation: In this final level, the algorithm collates information across multiple screens and employees to form an end-to-end view of the process. In this case, it will develop a holistic view – from the mortgage processor creating an application and an underwriter reviewing it to an escrow officer closing the real estate transaction. It will also create a detailed activity-level view of the process (e.g., check the credit score during the underwriting step) and exact timestamps (e.g., the time taken by the mortgage processor to submit personal information, and the time taken by an underwriter to review the credit history).
In addition to using these computer vision techniques for process mining and process discovery, Skan is applying artificial intelligence to drive business insights. Across locations, Skan provides a detailed view of process variants that employees are following. In the home loan example, it could mean revealing underwriters performing credit checks differently or loan processors recording time disparities when manually filling in applications. These insights enable organizations to address gaps in skills, segment tasks according to skill level, create real-time transparency, and support compliance.