How Autonomous Cars See The World

PublishedMay 21, 2018

We may earn a commission from links on this page.

If we’re going to be talking and thinking about autonomous cars, self-driving cars, robo-cars, drive-o-droids (copyright pending) or whatever the hell we want to call these things, we should get a sense of exactly what they do and how they do it. How do they know what’s around them, and how can they potentially be better than human drivers?

Ultrasonic Sensors

You know those little round button-like things you see on some cars’ bumpers? Those are ultrasonic sensors, and they’re most often used as parking-assist sensors, since they’re good at telling what’s close to you, at low speeds. They bounce ultrasonic sound waves off objects to determine how close you are to them.

These actually don’t have much use in fully autonomous vehicle use, but they still do help a car understand its environment, so I thought they were worth a mention. Automatic parallel parking systems do use them, so there are some autonomous-driving/parking contexts where they’re used.

We can’t hear the pulses they make, since those pulses, while loud, tend to be between 40 kHz and 48 kHz (or higher, with newer sensors) or so. Human hearing stops at about 20 kHz. Dogs, cats, and bats, though, they should be able to hear them, which must be pretty annoying.

Cameras

Vision, is, of course, the most important sense we use when driving, so most self-driving machines will need a way to replicate it. Modern technology is capable of making some very small and high-resolution camera systems, and modern cars already are getting pretty laden with cameras, even if they don’t have any interest in driving themselves.

Cameras, usually mounted just above the inside rear-view mirror in the top-center of the windshield, are used for lane-departure systems, where computers run software that analyzes each frame of video to identify the lines painted on a highway, and makes sure the car stays inside them. These cameras may also be used for emergency braking systems and traffic sign identification. All of these examples would be for camera systems with some degree of artificial intelligence, since they’re actually attempting to make some sort of sense out of the images they capture.

“Sense” is a bit of an anthropomorphizing term, of course: they’re really just analyzing frames of video for a very specific set of criteria, and acting on that criteria in very defined ways.

Most autonomous vehicle camera systems with use two cameras to get binocular vision for real depth perception. While the cameras are good, they’re not usually as good as the one in, say, your phone. Most tend to be between 1 to 2 megapixels, which means they’re imaging the world at a resolution of about 1600 x 1200 pixels. Not bad, but much less than human vision. Still, this seems to be good enough to resolve what’s needed for driving, and is small enough to allow for image processing at the sorts of speeds required for driving.

Really, it’s not about image quality or color saturation or any of the sorts of criteria we normally use when we evaluate cameras for our use. For driving a car, you want fast image acquisition—the more frames per second you can capture and evaluate, the quicker the car’s reaction time will be.

When processing images from the camera, the car’s artificial vision system has to look out for and identify a number of things:

• Road markings

• Road Boundaries

• Other vehicles

• Cyclists, pedestrians, pets, discarded mattresses, and anything else in the road that is not a vehicle

• Street signs, traffic signs, traffic signals

• Other car’s signal lamps

To identify these objects and people, the camera systems must figure out what pixels in the image are background, and what are the actual things that need to be paid attention to? Humans can do this instinctively, but a machine doesn’t inherently understand that a 1600x1200 matrix of colored pixels that we see as a Porsche 356 parked in front of the burned remains of a Carl’s Jr. is actually a vehicle parked in front of a sub-par fast-food restaurant that fell victim to a grease fire.

To get a computer to understand what it’s seeing through its cameras, a number of different methods have to be employed. Objects are identified as separate from their surroundings via algorithms and processes like edge detection, which is a complex and math-intensive way for a computer to look at a given image and find where there’s boundaries between areas, usually based on differences in image brightness between regions of pixels.

As you can imagine, this process is non-trivial, since any given scene viewed through a camera is full of gradients of color and shade, shadows, bright spots, confusing boundaries, and so on. But, complicated math that looks like this...

...is precisely the sort of thing computers are good at, so, generally, this process works quite well.

Once individual objects are separated from their background, they then need to be identified. Size and proportion are big factors in this, as most cars are—very roughly—similarly sized and proportioned, as are most people or cyclists and so on. Things that are large 12-foot-by-five-foot-by-six-foot rectangles are likely cars, narrow things that are shaped like a book on its spine are probably bicycles or motorcycles, and tall oblongs that move around are probably people or magic walking cacti.

While most autonomous systems are pretty good at identifying cars and people and bikes, they’re still pretty stupid compared to humans. For example, where we humans would never mistake this for a real car:

…it’s absolutely good enough for autonomous car use. This object identification is accomplished via lots of training and machine learning with thousands and thousands of example, pre-categorized images, and while it’s extremely impressive it works, it can be fooled in troubling ways. For example, it’s hard to tell the difference between a picture of a bicycle from a real bicycle, especially if the bike is an image on the back of a moving car, allowing the bike image to move as would be expected to the computer.

That image is from an article in MIT Technology Review, and it highlights the biggest issue with cameras and image identification systems: they’re easy to fool, or, even if we’re not talking about any deliberate foolery, they can get confused. The solution to this is to not just rely on cameras, but to use cameras as part of a larger suite of other sensors.

There’s lots of good reasons to have as many different world-sensing options as well: you want to be able to ‘see’ what’s going on, even in conditions where visibility is limited. Darkness is a factor, of course, but so is bad weather. We’ve all been driving and gotten caught in torrential rains that render the view out our windshield into something that looks like what you see if you attempt to view the world through a nice, cold gin-and-tonic. You can’t really see, and a computer wouldn’t be able to either. But, other systems, like radar or lidar, both of which we’ll get to soon here, may not be as affected.

As far as how many cameras are used, at minimum an autonomous vehicle would need a pair of stereo cameras facing forward, though having rear and side cameras to help get as close to a 360° view would be ideal.

Radar

Cameras give a good overall view of the environment around the car, but turning that view into real three-dimensional space requires some complicated math and work. Radar systems are used to help the car understand how far it is from the other cars and objects around it.

Radar systems are already fairly common in cars today, as they form the basis of dynamic cruise control systems. Adaptive cruise is a form of semi-autonomy, where the car drives at a set speed like normal cruise control, but uses the radar emitter to determine the distance from the car in front, and adjusts the speed to maintain the desired distance.

You can usually tell if a car has a radar emitter/receiver by looking at the front of the car; if you see a strange, shiny flat panel masquerading as a piece of the grille, or if the front badge appears to be “printed” on a solid, shiny black panel then you can safely assume a radar transceiver is mounted there.

Radar data doesn’t attempt to give the full view of a camera-based vision system, but it is more reliable for distance information and more tolerant of darkness and other inclement weather conditions that could confuse or impair a camera-based system.

Lidar

In some ways, lidar is one of the most controversial of the sensor systems used on autonomous cars, not because of what it does or how it works, but because it’s one that Tesla doesn’t currently use. Tesla is always good at getting people’s attention, so their habit of ignoring lidar gets some notice.

It is a little odd that they’d shun lidar, because lidar is an incredibly powerful tool to help a moving machine sense the world around it. Lidar stands for Light Direction and Ranging, and can be thought of as a sort of light-based radar.

Lidar uses low-intensity, non-harmful, and invisible (to our meaty eyes) laser beams, which are pulsed at a target (or, in the case of most autonomous cars, all around, in a full 360° dome) and the reflected pulses are measured for return time and wavelength to compute the distance of the object from the sender.

In practice, lidar can produce some very detailed, high-resolution visualizations of the environment around a self-driving car. You can see what this looks like above there.

Impressive, right? The lidar is often detailed enough to make out different surface textures and fairly small details on passing cars, and even things like potholes and manhole covers in the road.

Lidar units are also the most likely things to really challenge designers of future autonomous vehicles, since they require a high vantage point and an unobstructed 360° view. That’s why lidar units are most commonly seen as domed objects mounted on roof racks on autonomous test vehicles. They’re not exactly sleek, but they provide very good information about the world around them, and they’re a good bit less affected by ambient light or weather conditions like camera-based vision systems. In fact, lidar could detect, say, a black-clad motorcyclist on a black motorcycle in the dark — you know, like Batman on the Batmobile— far better than a camera setup could.

There’s another big advantage to lidar: where camera systems, even binocular camera systems, require computer time and algorithms to translate the array of pixels from the camera into an actual three-dimensional spatial map, the information detected by lidar is inherently three-dimensional data. That means that a lidar image of a car’s surroundings can require significantly less processing time than a camera image, which translates to a faster response time for the autonomous vehicle.

Lidar is still a relatively new technology, and as such it’s still not cheap, and getting lidar systems able to withstand the comparatively brutal life of an automotive component is still in development.

GPS

This is pretty familiar technology to most of us by now, since it’s the reason we all get lost so much less often than in decades past. The Global Positioning System relies on a constellation of satellites circling the earth to let us know exactly where we are at all times. Autonomous cars will use GPS heavily not just to navigate, but to also just drive, since GPS means the the car can know what the road is about to do before it even gets there.

GPS allows autonomous cars to plan ahead, to be ready to slow down for a hairpin turn or speed up to jump an opening drawbridge. Just kidding. I mean, it’ll be able to know where those drawbridges are, but I suspect you’d have to do some serious hacking to convince your autonomous car just how boss it would be if it would jump the bridge like a cop car in a crappy ‘70s buddy-cop movie.

Standard GPS alone is only accurate to within one to three meters, or between three and nine feet. If you, as a driver, were only able to accurately steer within three to nine feet, there’s no way in hell you’d still have a license. Many, many things can fit within three feet of a car, including human beings, dogs, bicycles, oil drums full of acid, or chili, or both, a bear, and so on. That’s just not good enough.

So, to compensate for GPS’s relatively coarse resolution (I mean, considering that it’s pinpointing a location on the entire planet, three to nine feet is pretty damn good; it’s just not good enough for driving) autonomous car designers have come up with something called localization. There’s actually a number of ways to accomplish localization, but most methods rely on the car’s other sensor systems, with the goal of getting the car’s location pinpointed to within ten centimeters or so.

Some methods use a technique called Particle Filters, which seed a given map with some number of virtual ‘particles’ at known locations. The particles can be thought of as possible locations of the vehicle. As the vehicle moves, sensors give data about the speed and angle the car is moving at, and the associated cloud of particles moves along with it, with the particles locations compared against known landmarks on the map, which is measured via the car’s lidar systems.

Other methods can use the car’s camera systems and exciting things like an “algorithm based on the probabilistic noise model of RSM features” and other gleefully geeky stuff like that. It’s not a trivial problem, but there are many viable solutions.

GPS is also the technology that autonomous vehicles will use to report their location back to any number of possible organizations: local law enforcement, your insurance provider, the carmaker who built the car, the company that developed the driving software for the car, and any number of other companies and advertisers and market research groups that any of those organizations in the chain could have sold access to your car’s location information to.

Autonomous cars will very likely mean that no trip you take is ever going to be completely private, and we shouldn’t even kid ourselves into thinking otherwise. Everyone will be watching everything, everywhere.

Communication and Combining Everything

All of these different forms of world-sensing are combined to create an overall, composite image of the surrounding reality for the car. Some call this “sensor fusion,” and it’s especially important because each method, individually, has some pretty significant flaws and limitations that could cause real problems in practice.

Supplementing the sensor data is communication, both between cars on the road and from more centralized sources. Individual cars will communicate with others in their vicinity, a process known as vehicle-to-vehicle (or V2V) communication. There’s already bandwidth set aside on the radio spectrum (at least in the United States) to accomodate this data traffic, the 5.9 GHz band. Europe has settled on the same basic frequency, though Japan is using 5.770 to 5.850 and 715 to 725 MHz.

The thinking is that vehicles close to one another should share some essential information about what they’re doing: speed, destination, predicted path, any information regarding road or safety issues, and so on. Doing so will allow the cars to work together to find the optimal, most efficient traffic pattern and be hyper-aware of what everything around them is doing and/or planning to do at any moment.

Infrastructure-based information could be communicated as well, informing cars of the status of traffic signals, traffic density, lane closures, and so on. Emergency vehicles would be able to alert traffic of their presence, allowing cars to clear a lane in an orderly manner for vehicles like ambulances, fire trucks, or pizza delivery vehicles in cases where more than three toppings are specified, for example.

Really, the more information available to the cars on the road, the better, and communication will allow autonomous vehicles to act as a self-modifying system to maintain optimal traffic flow and, ideally, eliminate many of the traffic issues that so annoy us today.

There are Connected Car Consortiums and other groups of major car manufacturers working together on this, which is good, since eventually a global standard should be established. It’s also possible that a standard communication protocol could be used for human-driven cars as well, even possibly including systems that could be retrofitted to older cars, too. Even the primitive and crude deathtraps that I personally prefer to drive could, theoretically, be outfitted with a unit that sends such information as speed, throttle position, steering angle, and GPS location, all of which would help surrounding autonomous vehicles prepare themselves for the presence of the loon whipping a 60-year old Volkswagen around the streets.

The weird part

Really, autonomous vehicles will need to be, in some way, self-aware. This isn’t the sort of self-awareness that leads to your car one day refusing to drive until you answer the question displayed on its dash that reads TELL ME WHAT THIS THING CALLED “LOVE” IS that science fiction loves to wonder about, but it is still a sort of self-awareness nevertheless, and as such is pretty amazing.

Autonomous cars drive in essentially the same way we do: they look around, as carefully as they can, and they move, hopefully in the right direction and hopefully without hitting anyone or anything. Really, this should also be a realization about how incredible it is that we humans can drive as well as we do, even without doing a lot of complicated math in our heads.

Human ability to react at high speeds, gauge stopping distances, sense vehicle weight shifts, and, you know, just drive, all while singing along to a Styx album at the top of your lungs is pretty damn impressive. It’s no wonder that it’s such a complex problem to solve for autonomous cars, and it’s no wonder we’re not quite there, yet.

We’ve come incredibly far, but there’s still an awful long way to go.

(Most of this is an excerpt from my upcoming book about these crazy autonomous cars! Stay tuned!)

Show all 88 comments