How to think like a data scientist to become one

The author went from securities analyst to Head of Data Science at Amazon. He describes what he learned in his journey and gives 4 useful rules based on his experience.

By Karolis Urbonas, Amazon on March 23, 2017 in Amazon, Data Science Skills, Data Scientist, SQL, Statistics

comments

We have all read the headlines – data scientist is the sexiest job, there’s not enough of them and the salaries are very high. The role has been sold so well that the number of data science courses and college programs is growing like crazy. After my previous blog post I have received messages from people asking how to become a data scientist – which courses are the best, what steps to take, what is the fastest way to land a data science job?

I tried to really think it through and I reflected on my personal experience – how did I get here? How did I become a data scientist? Am I a data scientist? My experience has been very mixed – I have started out as a securities analyst in an investment house using mainly Excel. I then slowly shifted towards business intelligence in the banking industry and multiple consultancy projects, eventually doing the actual “data science” – building predictive models, working with Big Data, crunching tons of numbers and writing code to do data analysis and machine learning – what in the earlier days was called “data mining”.

When the data science hype has started I tried to understand how is it different from what I have been doing so far. Should I learn new skills and become the data scientist instead of someone working “in analytics”?

Like everybody obsessed with it I have started taking multiple courses, reading data books, doing data science specializations (and not finishing them …), coded a lot – I wanted to become THE one in the middle cross-section of the (in)famous data science Venn diagram. The reality I learned is that these “Data Science” unicorns ( the legendary people in the center “Data Science” section) rarely exist and even if they do – they are typically generalists who have knowledge in all of these areas but are “master of none”.

Although I now consider myself a data scientist – I lead a fantastically talented data science team in Amazon, build machine learning models, work with “Big data” – I still think there’s too much chaos around the craft and much less clarity, especially for people thinking of switching careers. Don’t get me wrong – there are a lot of very complex branches of data science – like AI, robotics, computer vision, voice recognition etc. – which require very deep technical and mathematical expertise, and potentially a PhD… or two. But if you are interested in getting into a data science role that was called a business / data analyst just a few years ago – here are the four rules that have helped me get into and are still helping me survive in the data science.

Rule 1 – Get your priorities and motivations straight.

Be very realistic about what skills you have right now and where you want to arrive – there are so many different types of roles in data science, it’s important to understand and assess you current knowledge base. Let’s say you’re working in HR and want to change careers – learn about HR analytics! If you’re a lawyer – understand the data applications in the legal industry. The fact is that the hunger for insights is so big that all industries and business functions have started using it. If you already have a job then try to understand what can be optimized or solved by using data and learn how to do it yourself. It’s going to be gradual and long shift but you will still have a job and learn by doing it in the real world. If you are a recent graduate or a student – you have a perfect chance to figure out what are you passionate about – maybe movies, maybe music, or maybe cars? You wouldn’t imagine the amount of data scientists these industries employ – and they are all crazy about the fields they’re working in.

Rule 2 – Learn the basics very well.

Although the specifics of the each data science field are very different, the basics are the same. There are three areas where you should develop a strong foundation – basic data analysis, introductory statistics and coding.

Data analysis. You should understand and practice (a lot!) the basic data analysis techniques – what is a table, how to join two tables, what are the main techniques to analyze data organized in such way, how to build summary views on your dataset and draw initial conclusions from it, what is the exploratory data analysis, which visualizations help you understand and learn from data. This is very basic but believe me – master this you’ll have the fundamental skill that is absolutely mandatory for the job.

Statistics. Also, get a very good grasp of introductory statistics – what is mean, median, when to use one over the other, what is a standard deviation and when it doesn’t make any sense to use it, why averages “lie” but are still the most used aggregated value everywhere, etc… When I say “introductory” I really mean “introductory”. Unless you are a mathematician and plan to become an econometrician who applies advanced statistical and econometric models to explain complex phenomenons – then yes, learn the advanced statistics. If you don’t have PhD in mathematics, just take your time and be patient and get a really good grasp of the basic statistics and probability.

Coding. And off course – learn how to code. This is the most over-used cliché advice but it’s actually a sound one. You should start from learning how to query a database with SQL first – believe it or not, most of the time data science teams spend are on data pulling and preparation, and a lot of it is done with SQL. So get your basics in place– build your own small database, write some “select * from my_table” lines and get a good grasp of the SQL fundamentals. You should also learn one (start with just one) data analysis language – be it R or Python. Both are great and knowing them does make a difference since many (although not all) positions require them. First learn the basics of the language you chose (quick tip – start from learning dplyr with ggplot2 packages for R, or pandas with Seaborn libraries for Python) and learn how to do data analysis with it. You don’t have to become a programmer to succeed in the field, it’s all about knowing how to use the language to do data analysis – you won’t have to become a world-class hacker to land a data science job.

Rule 3 – Data science is about solving problems – find and solve one.

One thing I have learned over the years is that one of the fundamental requirements for a data scientist is to be always asking questions and looking for problems. Now I don’t advise to do it 24/7 as you will definitely go insane, but be prepared to be the problem solver and look for the problems non-stop. You will be amazed how much available data is out there – maybe you want to analyze your spending patterns, identify sentiment patterns of your emails, or just build nice charts to track your city’s finances. The data scientist is responsible for questioning everything – is this campaign effective, are there any concerning trends, maybe some products under-perform and should be taken off the market, does the discount make sense or is it too big – these questions become hypotheses that are then validated or rejected by the data scientist. They are the raw material and key to success in the job as the more of them you will solve – the better you’ll be in your job.

Rule 4 – Start doing instead of planning what you will do “when”. This is applicable to any learning behavior but it’s especially true in data science. Be sure you start “doing” from the very first day you start learning. It’s very easy to put off the actual learning by just reading “about” data science, how it “should” be done, copy-pasting data analysis code from the book and running it on very simple datasets which you will never ever get in the real world.

With everything you learn – be sure you start applying it to the field you’re passionate about. That’s where the magic happens – writing your first line of code and seeing it fail, being stuck and not knowing what to do next, looking for an answer, finding a lot of different solutions none of which work, struggling to build your own one and finally passing a milestone – the “aha!” moment. This is where the actual learning happens. Learning by doing is the only way to learn data science – you don’t learn how to ride bike by reading about it, right? Same thing applies here – whatever you learn, be sure you apply it immediately and solve actual problems with real data.

“If you spend too much time thinking about a thing, you’ll never get it done.” – a quote from the one of the most famous martial artist Bruce Lee captures the essence of this post. You have to apply what you learn and make sure you make your own mistakes.

Thanks for reading! Subscribe to my blog www.cyborgus.comand get the latest updates:

Follow my blog updates on Facebook – https://www.facebook.com/cyborguscom/

Look me up on LinkedIn – https://www.linkedin.com/in/karolisurbonas/

Bio: Karolis Urbonas, is Head of Data Science at Amazon and energetic data executive with a demonstrated history of building high-performing data science teams and delivering strategic analytic projects. He blogs at cyborgus.com.

Related:

How to think like a data scientist to become one

More On This Topic

Latest Posts

Top Posts