The field of data science can include any number of different data-oriented tech jobs and the exact definition of a Data Scientist can be blurry. In summary, a data scientist is someone who manages, manipulates, and analyzes large amounts of data. They often use technical skills like Python and SQL to help them do this.
We wanted to find out more about what data scientists do day-to-day, what skills they use, and what their career paths might look like. What better way to learn all this than to ask a data scientist herself!
We chatted with Robin Linzmayer, a data scientist (technically, Data Insights Engineer), who works at Flatiron Health in New York City. Read what she had to say below!
Q: What is your job title?
I’m a Data Insights Engineer at Flatiron Health, which is an oncology healthtech company based out of New York City.
Q: Is a Data Insights Engineer like a Data Scientist?
Yea, so we’re essentially a Product Data Science team – there’s a lot of diversity within our function but the team goes by “Data Insights”. We’re basically data scientists, but that in itself is a very vague term.
Q: Did you have a different job before this and, if so, what was it?
Not really. I mean I graduated and started working here about two years ago now. In college, I actually started off as a Biology major so I was working in a research lab and a neuroscience lab for a couple of years part-time doing research. And then when I transitioned into Computer Science I got an internship on the Data Science team at Merck KGaA, a German Pharmaceutical company, so that was kind of my pivot into the data science area of computing. And then Flatiron was my first job out of college.
Q: How did you get to be in the position you are currently in?
In college, once I decided to study computer science – which was, in and of itself, a funky decision-making process. I think, when you go into computer science it’s pretty much branded as “now you can go into software engineering,” – the building of software products. That was something I was interested in but, not only was having a hard time getting internships in, but also wasn’t feeling really actively excited about. And I kind of figured, if I’m not actively excited about it while I’m looking for internships, that probably won’t change if I continue down that career path.
But one thing I was really interested in was the data work and the data-oriented classes I was taking in school. So I applied to, and ended up getting, a data science internship. I think one of the themes of my education was that I’m very indecisive and really enjoy doing a lot of different things. I was never like, “I want to be this type of biologist forever” and I loved the idea that learning data-oriented tooling is just like figuring out ways to ask questions and then having the technical skills to get the answers you’re looking for and be able to understand what they mean. So I was like, “oh, great, Data Science!”
I did the internship that I got at Merck and I had a great time. I loved all of the skills I was learning and it positioned me really well to look at data science jobs in industry. I think there are pretty limited numbers of data scientist jobs in industry coming out of undergraduate, but I got lucky that the company I’m working for hires new graduates.
Q: What are your main responsibilities as a data scientist at Flatiron Health?
Our data insights engineering team is kind of spread across the entire company. I think the company is about a thousand people. We have an engineering org of about four or five hundred and then, within that, the insights engineering team is only about thirty five of us and we’re spread on all of the different teams in groups of one to five.
So I sit specifically on one of our software product teams called OncoEMR, an electronic health records system, and I do a lot of product-oriented analytics. We categorize it as data-driven decision making and understanding how users, who, in this case, are physicians and administrators and people who work at oncology practices, are using our clinical data. I work a lot with our clinical professionals team as well. So I’m responsible for supporting feature rollouts, new features, monitoring clinical data quality and just ensuring the safety of our patients and users.
Q: What do you like about being a data scientist?
What I like best is that I work with people that have extremely different backgrounds both within the data science team and outside of the data science function. Since I’m in a healthcare space there’s a lot of MDs and clinical professionals. I also talk to a pharmacist almost everyday. At the same time, I interact with a lot of software engineers and non-technical people in the product organization. I really like that my job sits at the intersection of all these things. Working with data and not having context doesn’t mean a lot, so this puts me in a position to ask other people for the clinical context that I need to understand and ask the right questions, and also the product people, who are using this information to make decisions, and the engineers, who are actually building the software.
Q: Do you think that your studies in Biology have helped you fit into that intersection?
Yes, definitely. For example, I’m working on a project right now where we’re looking at a lot of genetics data and, because I studied biology, I have a lot of the vocabulary that I wouldn’t otherwise have been exposed to, just from my coursework in college. That being said, while it has given me a lot of context, I still know nowhere near as much as the MDs on my team know. I feel like I’ve got a strong foundation to work with but still have a lot to learn. I feel like the hardest thing, in almost any field, is trying to figure out what you don’t know. If you really know nothing, that’s the hardest first step.
Q: What is most difficult about being a data scientist?
Probably the fact that data is really messy. In college and in an academic setting, you’re always presented with the cleanest data sets. And using that starting point, you learn modeling skills and learn how to do different types of analyses on these data sets. But in the real world, these data sets are created from user interactions that are haphazard. Some people are using the software with your intended purpose, but also people figure out workarounds or their own specific workflows and ways to do things. This sometimes makes interpreting the data for asking – what you would expect to be a simple question – actually pretty complicated.
Working with messy data is definitely the hardest part of my job and I feel like that’s gotta be true for all data scientists.
Q: When you encounter data that’s messy, how do you go about finding a solution?
Something my coworkers taught me pretty early on was – every assumption you make, you should challenge. If you’re assuming that every user has one set of user permissions you should check that and see if that relationship really is one-to-one. Any assumption you make, you should challenge and look at the data.
Another example – maybe you’d expect every drug order that’s placed to have a dose associated with it, because that should be true for the workflow. But maybe there are certain cases where the order was cancelled, so you’re seeing drug orders that don’t have dosing associated with them. That doesn’t make actual sense in the context of a drug being ordered, but you need to account for that before you can work with the data.
Q: What skills are important for data scientists to have?
In terms of technical skills, I write a lot of Python and SQL. I feel like you can always learn those skills, but the hardest things are asking questions and communicating with people who have a very different background from you. Since I communicate with technical people – far more technical than I am in some capacity – and with product people who are not as technical usually, I have to tow that line and think about “how do I communicate a technical finding to a product person?” and “how do I communicate a product question to people in technical terms?” That’s probably the hardest thing to do.
Q: What advice would you give to someone who wanted to start a career in data science?
If they’re in school or taking courses, then I’d recommend taking courses that are data-oriented and just getting those fundamental skills. SQL and Python and statistics are all some of the data science fundamentals. If they’re already in-industry, then doing more analytics in whatever their role is would be really helpful. Also, interacting with a data organization, if it exists at your company, to see what those roles look like. In general, if you’re interested in data science and data-oriented roles, then just working with data in whatever capacity you can is probably your best entry point.
Data science can look like so many different things and so many different roles use data now in day-to-day jobs. Whether you’re a product manager or an analyst or a data engineer, there are usually opportunities to incorporate data into whatever you’re doing.
Q: Anything else you want people to know about data science?
Not specifically that I can think of. Well, actually, as a woman, going into a technical role, in general, whenever you’re going into a field where not everyone in the room looks like you is pretty intimidating. I think that, in college, I really learned that it’s ok to not know things. I learned how to google things and how to ask good questions when I have no idea what’s happening. I feel like working in data science shouldn’t be prohibitive for anyone who wants to do it. Also, if you reach out to people in-industry, people want to help. People want more people with data skills in-industry.
Bonus Question: What do you like to do outside of work?
Great question haha. I spend a lot of time biking around, not so much as a hobby but as a form of transportation in New York. I would say New York City is very bike-able. You should just wear a helmet. When I’m not biking around and seeing friends, I spend a lot of time rock climbing and, if the opportunity presents itself, backpacking.