Transforming visual accessibility
Be My Eyes uses GPT-4 to transform visual accessibility.
Already the company has a case where a user was able to navigate the railway system—arguably an impossible task for the sighted as well—not only getting details about where they were located on a map, but point-by-point instructions on how to safely reach where they wanted to go.
Yet traversing the complicated physical world is only half the story. Understanding what’s on a screen can be twice as arduous for a person who isn’t sighted. Screen readers, embedded in most modern operating systems, read through the pieces of a web page or desktop application line by line, section by section, speaking each word. Images, the heart of communication on the web, can be even worse.
Yet, Henriksen says now they’re able to show GPT‑4 the webpage and the system knows—after countless training hours where deep learning algorithms build relationships to understand the “important” part of a webpage—which part to read or summarize. This can not only simplify tasks like reading the news online, but grants people who need visual assistance access to some of the most cluttered pages on the web: shopping and e-commerce sites. GPT‑4 is able to summarize the search results the way the sighted naturally scan them—not reading every minuscule detail but bouncing between important data points—and help those needing sight support make the right purchase, in real-time.
“This is a fantastic development for humanity”, Buckley says, “but it also represents an enormous commercial opportunity.”