What does a data scientist do? We talked to one to learn about this popular and lucrative field
Data scientists process and interpret what usually constitutes massive amounts of information to help deliver insights across a wide variety of fields and disciplines, including marketing, social media, finance, sales, and health care.
SEE: Building an effective data science team: A guide for business and tech leaders (free PDF) (TechRepublic)
Data science is an expanding, lucrative field offering plenty of potential. In fact, Glassdoor ranked data scientists as having the best job in America for 2019 based on earning potential, job satisfaction, and number of openings. In fact, the average data scientist salary clocks in at about $91K in the United States. A career in data science doesn't just happen; the field attracts certain candidates with specific skills and backgrounds oriented toward analysis. I spoke with one such data scientist, Sri Megha Vujjini, who works at Saggezza, a global managed services provider and technology consulting firm. She started her career at Deloitte for a year, then went back to school for her master's in data sciences. Originally interested in telecommunications engineering, she moved toward data science after building algorithms for robotics. Scott Matteson: You've said robotics sparked your interest in a data science career. Can you talk more about your work with algorithms for robotics and how that inspired you to get into data science? Sri Megha Vujjini: One of the first things I did when I started working with robots was automating the direction of a robot. You could say I was building a self-driving car, but a tinier and less-risky version. The concept behind it was still the same-it must move if it's safe to and it should stop if it's not-pretty much a black-or-white situation. It gets complicated when you add more functionality to it, for example, which direction should it go? Can it go right instead of stopping? Under what circumstances? All these scenarios push you to think outside the box because all the possibilities and all the odds that might affect the output. As we expand the scale on that and apply it to a business case, we have a data science problem. For me, it was pretty much like solving a puzzle-asking a lot of, "Why is this happening, and how is this working?" then replicating that in lines of code and optimizing that code-that's what led me to this field. Scott Matteson: Can you provide some examples of how you've focused on data mining, statistical modeling, pattern recognition and visualization methods throughout your career (or in your work today)? Sri Megha Vujjini: One simple example would be creating budgets for a company, irrespective of the industry. A budget is usually planned around the activities for the coming year, but there is an opportunity to use history statistically. There was an opportunity for me to solve one piece of a puzzle in this regard. I work with the retail industry, and I was able to create a time series model around the sales, promotions and external economic factors which would essentially predict the sales for the next few years. Using this as a baseline, a multitude of decisions and operations occurred. It took recognizing the trends (more sales in March and not just in November because of the holidays), visualizing it to explain it to the business better, and then automating the entire solution to be used as needed. In short, this career is all about understanding the business, understanding its problems and pain points, and providing a solution using data as your backbone. Scott Matteson: What's unique about data science? What sort of personality or character works best with it? What are the challenges? Sri Megha Vujjini: Ironically, one unique thing about this field is that it doesn't have one particular definition. It's a broad field with varied definitions all across the industry and academia. This is because it's a blend of mathematics, statistics, computer science, analytics, artificial intelligence and business. Data science is the elevated version of all the combination of all these fields. Not wanting to discourage anyone, there are some traits and characteristics that would make working in this field easier-solving problems, be it math or probability or even puzzles, always thinking about the bigger picture, thinking outside the box, and being organized sometimes helps. Data science sometimes presents chaotic problems, and the first step to solve them is usually breaking them down and organizing them in a matter of waterfall structure. The only challenge, and I hope everyone in this field would agree with me on this one is: data. The data is never perfect, it is either incomplete or not what you need. It might be small, which wouldn't give you insights or it might be too wide for you to narrow down the solution. It's always the data, but once we understand how to use it and how it works, we can use it the best way to derive all the insights we want. Scott Matteson: What are some of the problems solved by data science? Sri Megha Vujjini: Not world peace, not yet at least. But within the industry, we now have improved customer experiences and recommendation systems, made faster deliveries, and created smoother and improved business operations at companies because of some of the solutions provided by data science. If we look at Amazon's growth as an online retailer, we can pinpoint some of the improvements and tie them to the points I mentioned above. But outside the business, on a day-to-day basis, we have constantly improving Google/Apple Maps, performing cutting-edge research in medicine, physics, space, or even on self-driving cars. All of these problems and subsets of these problems were solved by data science. Scott Matteson: What are some technological products or tools used for this field? Sri Megha Vujjini: There are a tiny proportion of jobs which don't require programming skills which are reserved for veterans in the industry. Otherwise, it's always good to know Python, R, and SQL because they make life easier. From a mathematical/statistical perspective, we can use SAS, MATLAB, Python, R, and all rich libraries they all offer. And since so much data is moving to cloud, it would be helpful to know and understand cloud technologies. We have Azure, AWS, Google Cloud and Snowflake, all being used in varied capacities across the industry. In some cases, visualizations are important too, and they can be done using Python and R. We can always go above and beyond and use tools like PowerBI or Tableau.