Track: Big Data Meets the Cloud
Day of week:
Big Data technology and best practices have seen widespread adoption over the past few years. Understandably, Big Data technology vendors primarily focus on the needs of enterprises, which means that most of their products are developed for deployment and use within private data centers. In these environments, network topology, compute and storage placement, and hardware specifications are all under the control of data center operations. During a similar period, public cloud providers such as AWS, Azure, and Google Cloud Platform have seen a migration of mostly smaller companies (and some notable larger ones) to their services. How do companies that want to leverage the cloud adapt their Big Data technologies to work efficiently? Come to this track to learn from companies that have implemented their Big Data use-cases in the Cloud.
- Learn practical experience in a track focused on moving and running Big Data infrastructures in a cloud environment.
- Hear practical lessons from companies leveraging cloud providers such as AWS and GCP.
- Gain a better understanding of the state-of-the-art for cloud infrastructure that leverages the cloud.
Jeff: Yeah, we’re a heavily data focused company. A primary output of the data science team is a recommendation engine that assists our personal stylists in finding the best clothing for our clients, but data science at Stitch Fix runs far deeper than that. It’s trying to predict not just what to sell to whom, but what to buy and where to place it, how to retain customers. All of that is heavily influenced by the data science team over here.
Jeff: The track is called Big Data Meets the Cloud. Some of that is based on my experience just running Big Data infrastructures in the cloud and scaling it. What I found is that there’s best practices and technologies that exist and are pretty ubiquitous in datacenter-based environments that just don’t work the same way in a public cloud based environment. There is tons of exciting tech out there right now. A lot of migration from traditional Hadoop to Spark. There are lots of new file formats, in-memory caching techniques, containerization through Docker.
All of that is super cool, it’s all open source. Most of it is being developed by companies that are big datacenter-based deployments, and there are assumptions baked into those technologies often times that just don’t work the same way in the cloud. We are trying to explore best practices for actually scaling out that infrastructure in cloud environments, what works and what doesn’t, what tricks and techniques can be used in a cloud environment, how we can exploit properties of the cloud such as elasticity to get a lot of mileage out of it.
Jeff: Primarily, a better understanding of the state-of-the-art for cloud-based data infrastructure deployment and coming things that are really exciting. Better knowledge of best practices that exist, especially when they differ from best practices in a datacenter-based deployment. Then, we want to feature people from different major cloud providers - AWS, Google, Azure. One thing that would be really exciting is people walking away with a comparison of the tradeoffs and efficiencies between running Big Data infrastructure on those different platforms. The final thing for me is just to give people an awareness of areas where more extensions or capability needs to be developed by the community.
The great thing about Big Data infrastructure is so much of it is open source but that means it is up to the engineering community to make it into what we need to make it, and the first step to that is awareness of where focus needs to be put.
Hundreds of millions of people use Quora to find accurate, informative, and trustworthy answers to their questions. All our infrastructure is built on top of AWS.
In this talk, we will be talking about Quanta, Quora's counting system powering our high-volume near-realtime analytics that serves many applications like ads, content views, and many dashboards.
Quanta counters support/are:
- High write throughput...
by Matti Pehrs
Software Engineer @Spotify
by Mārtiņš Kalvāns
Big Data Engineer @Spotify
Spotify is currently one of the most popular music streaming services in the world with over 100 million monthly active users. We have over the last few years have a phenomenal growth that now has pushed our backend infrastructure out from our data centers and into the cloud. Earlier this year we announced that we are transitioning all of our backend into Google Cloud Platform, GCP.
In this talk we are going to give an brief overview of what our Data Infrastructure tribe...
by Doug Daniels
Director of Engineering @Datadog
At Datadog, we collect almost a trillion metric data points per day from hosts, containers, services, and customers all over the world. We have built a highly elastic, cloud-based platform to power analytics, machine learning, and statistical analysis on this data at high scale.
In this talk, we will discuss the cloud-based platform we have built and how it differs from a traditional datacenter-based analytics stack. We will walk through the decisions we have made at each layer,...
by Stefan Krawczyk
Algo Dev Platform Lead @StitchFix
Stitch Fix is an online clothing retailer that not only focuses on delivering personalized clothing recommendations for our customers, but also applies the output of data science to automate numerous other business functions through the delivery of forecasts, predictions, and analyses via a robust API layer. We rely heavily on the ability for applied mathematics & statistics and our human decision makers to synergistically work; doing this well requires us to merge art & science...
by Dan Weeks
Leads Big Data Compute @Netflix
by Tom Gianos
Senior Software Engineer, Big Data Platform @Netflix
Netflix runs one of the largest big data analytics infrastructure in the public cloud. Our platform leverages the scalability, reliability, and flexibility of the cloud to move quickly and innovate.
In this talk, we will discuss the overall big data platform architecture and dive into the two key design choices that underpin our platform: Storage and Orchestration. We will discuss how we leverage S3 as our data warehouse storage layer. We rely on Parquet as our primary storage format...
Monday Nov 7
Architectures You've Always Wondered About
You know the names. Now learn lessons from their architectures
Distributed Systems War Stories
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
State of the art in Container deployment, management, scheduling
Art of Relevancy and Recommendations
Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
Next Generation Web Standards, Frameworks, and Techniques
Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.
Tuesday Nov 8
Next Generation Microservices
What will microservices look like in 3 years? What if we could start over?
Java: Are You Ready for This?
Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
Big Data Meets the Cloud
Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
Lessons/stories on optimizing the deployment pipeline
Software Engineering Softskills
Great engineers do more than code. Learn their secrets and level up.
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
Wednesday Nov 9
Architecting for Failure
Your system will fail. Take control before it takes you with it.
Stream Processing, Near-Real Time Processing
Bare Metal Performance
Native languages, kernel bypass, tooling - make the most of your hardware
Culture as a Differentiator
The why and how for building successful engineering cultures
//TODO: Security <-- fix this
Building security from the start. Stories, lessons, and innovations advancing the field of software security.
Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.