Solving Airport Security Through Machine Learning and Artificial Intelligence

Solving Airport Security Through Machine Learning and Artificial Intelligence


In the busy weeks leading up to RSA this year, I was taking a rare break to drive my daughter to the airport. She was flying back to school to continue her 2nd year at University of Toronto (shout out to all of my Canadian peeps!). Btw, if you’ve not seen “Stronger Beer” highly recommended. 

Anyway, my daughter asked me an intriguing question on the ride to LAX. She said, “Last time I got caught in a random search… do you think the TSA finds anything doing that…”

Great question, and my answer was “No” it’s a horrible way to search people. With each search, the probability of finding something important stays the same, 1-in-heaven-knows-how-many-million. And that of course got me thinking, “Is there a better way mathematically to choose who gets searched?”

Having been heads down on the ZENEDGE Machine Learning Web Application Firewall (ZENEDGE AI – insert shameless plug here) for the better part of a year, my natural inclination was, of course there is a better way, and it must be based in Machine Learning theory.

I thought I would take a moment to break down the very complex problem of airport security and how Machine Learning could help, specifically with “random” search selection to make it…. well… not so random.

Let’s start off with a basic assumption. Airport security is an “unsupervised machine-learning problem”. Why is that? The TSA screens millions of people every day and rarely if ever do they find something noteworthy. Sure, they find an overly large container of body wash and the odd pocketknife, but do they routinely find something really “big”? Nope. It would be all over the headlines if they did, the story would help boost confidence that the TSA is actually doing something and catching the bad guys.

So from a machine learning perspective, if each data point is an individual taking a flight, most if not all data points are non-malicious. That is, there is nothing going on but a normal trip from point A to B. So that’s important because we don’t have labeled data. We don’t have good examples of malicious “trips” or activity. This is an unsupervised learning problem, because we are looking for outliers, without knowing in advance what those outliers look like. This is sometimes referred to as anomaly detection in security operations terms. We could use something called Gaussian distribution, which is sometimes also called Normal Distribution, to model our data and then generate an ML model that learns complex non-linear relationships between our data dimensions or features. The mathematic details aren’t important, there are plenty of good toolkits to assist here. Let’s just assume the math exists and is well tested.

So now that we’ve identified the problem and the potential algorithm, let’s describe what some potential feature of a “trip” might be. The basic features are attributes of the trip itself. The much more interesting features come from the individual taking the trips. Here we need to “de-normalize” the data so that everything about the trip and it’s associated individual is represented as 1 “row” of data which becomes, as we will see later, 1 “vector” of data.

  • From – To airports / countries
  • Flight time
  • Check-in time
  • Days booked in advance
  • Method of payment
  • 1 way or 2 way ticket
  • Destination is “Home”?
  • Individual details (which the airlines has, so by default the TSA has these)
    • Passport country
    • Place of Birth
    • Citizenship
    • Number of average trips per month, year
    • Trip rate deviation
    • Etc., and many more

You can see that the individual details are pretty ordinary, and honestly fairly boring. You can still get a tremendous amount of insight by modeling these “ordinary” features.

The data now needs to be represented as an n by m (n x m) matrix or tensor where n is the number of features we’ve identified, and m is the total number of trips to analyze. Needless to say, n is going to be in the hundreds or thousands and m will be in the hundreds of millions. This is a big data problem if I’ve ever seen one. We can dive deeper into possible solutions and how machine learning can be achieved in practical terms if there is interest. Just ping me as a follow up.

Finalizing the solution, all we need to do is generate a “probability” classifier that flags a particular trip as out of the ordinary. If the answer is yes, a real-time indicator can be sent to the TSA that extra care is to be taken with the individual. It doesn’t mean the trip is abnormal, or the person is malicious. It just means, there is a higher risk, and a search is warranted. Sure beats a random search!!

Can we do anything more interesting then the above approach? As you probably suspect the answer is YES. I leave it for a follow up post to dive into the details, but imagine the following:

What if we could collect fuzzy information about an individual, data that is all available in the public domain? People openly post of Face Book, Twitter, bogs, etc. Using another Machine Learning technique, Sentiment Analysis, which is a byproduct of Natural Language Processing, we could gather insight into the mood or level of an individual’s disgruntlement. How powerful could that derived dataset be?

At this point, I’ve probably raised a lot of privacy eyebrows. You are probably right. But let’s leave that for the next post…. 

If you would like to discuss SecOps, Machine Learning, Brazilian Jujitsu (or all of the above), drop me a note, or come visit us at RSA this week. ZENEDGE will be at booth N3023!

Laurent Hasson

Co-Founder, CTO at CapsicoHealth Inc

8y

Hey Leon :) Nice to hear from you!

Like
Reply
Laurent Hasson

Co-Founder, CTO at CapsicoHealth Inc

8y

Having done similar work in risk analysis for hospitals, the rate of false-positives and false-negatives is never zero. So there will always be holes, and they are not small. Additionally, the notion of a random search does serve a psychological purpose that anybody could be searched without any rule (that's the whole point of random). That has an impact in and of itself.

Chris Olive

#CyberSecurity Strategist | Advisor | Evangelist | Consultant | Hands-On Technologist | Human Router

8y

SMH - in a rush at RSA and can't give this the time I'd like to comment but once again, James Harris, CISSP is dead on. The TSA is basically a façade of security and that's about it. The Israelis actually employ the methods James has mentioned here to incredible success. They root out incredibly dangerous situations on a daily basis with incredible accuracy while your normal every day traveler remains marvelously and almost entirely inconvenienced. If I can find a recent article on it (it was in LinkedIn recently) I'll repost it where the Israelis describe their method. Makes our TSA look like kindergarten bullies.

Arthur Tschopp

Senior Digital Strategist, e-Commerce | eMail & Owned Channel Optimizer | Digital Performance Marketing Leader

8y

Great read Leon. Quite an elegant and intelligent solution. I'm afraid the TSA would reject the idea, based solely on it's efficiency. All the best.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics