DataEng
Data engineering includes what some companies might call Data Infrastructure or Data Architecture. The data engineer gathers and collects the data, stores it, does batch processing or real-time processing on it, and serves it via an API to a data scientist who can easily query it.
Position on the Adoption Curve
Presentations about DataEng
Patterns of Streaming Applications
Human-Centric Machine Learning Infrastructure @Netflix
Transaction Processing in FoundationDB
Training Deep Learning Models at Scale on Kubernetes
Training Deep Learning Models at Scale on Kubernetes
Massively scaling MySQL using Vitess
The Whys and Hows of Database Streaming
Interviews
Human-Centric Machine Learning Infrastructure @Netflix
What is the focus of the work you do at Netflix?
I'm with the machine learning infrastructure team at Netflix. We work with about a hundred data scientists who solve all kinds of business problems. It’s not only video recommendations, but we also help answer many other questions to make Netflix an even better experience. We help all these data scientists to be more productive and make it easier for them to start prototyping their models to produce business value.
Netflix has a 'paved path’ approach when it comes to software and microservices. Is it the same thing when it comes to machine learning?
It is very much the same thing. We want to provide a 'paved path' so there's always a very clear, recommended way to do things. This is especially important for data science since there are many people from academia who are very adept at creating really strong theoretical models but when it comes to actually taking something to production and making it operationally solid, typically they require a lot of help. Integrating with a platform like Netflix can be non-trivial. At the same time, we want to balance that with the idea of freedom and responsibility, so people still have the freedom to choose the exact modeling approach they want to take. The platform has to be flexible.