In late October, Strata+Hadoop World returns to the Big Apple. This year, we have a wide range of topics, from real-world case studies to hard-core data science to the ethics and challenges of a connected society.
Each year, in the weeks leading up to Strata, we assemble a set of our hottest topics and most anticipated speakers, and ask them to join us for a free preview event. It'll give you a taste of what to expect in New York, and introduce you to new ideas in big data, ubiquitous computing, and ground-breaking interfaces that change how we live, love, work, and play.
For attendees, it's a chance to see what we're covering and decide on the can't-miss tracks you should attend; for everyone else, it's a chance to see what Strata is about and why you should attend.
About Alistair Croll
Alistair has been an entrepreneur, author, and public speaker for nearly 20 years. He's worked on a variety of topics, from web performance, to big data, to cloud computing, to startups, in that time. In 2001, he co-founded web performance startup Coradiant (acquired by BMC in 2011), and since that time has also launched Rednod, CloudOps, Bitcurrent, Year One Labs, the Bitnorth conference, the International Startup Festival and several other early-stage companies.
Alistair is the chair of O'Reilly's Strata conference, Techweb's Cloud Connect, and the International Startup Festival. "Lean Analytics" is his fourth book on analytics, technology, and entrepreneurship. He lives in Montreal, Canada and tries to mitigate chronic ADD by writing about far too many things at "Solve For Interesting".
Machine learning constructs such as Recommendation engines take a simplistic approach to data modeling: a single kind of user interaction with a single kind of item is used to suggest the same kind of interaction with the same kind of item. In this talk Ted will cover why this approach is flawed and present an easily implemented recommendation architecture and implementation style that addresses these flaws.
About Ted Dunning
Serial startup and artist and open-source innovator, particularly interested in large data systems and statistical modeling.
What Makes Us Human? A Tale of Advertising Fraud
We've all taken road trips, right? So imagine driving for 24 hours straight and passing a billboard every three seconds. Now imagine someone hijacks your car, blindfolds you and ties you down in the passenger seat, and you proceed on your road trip, oblivious to that onslaught of billboard messages zooming past. That, my friends, is a peek into a disturbing phenomenon of forced website visitation.
We decided to look under the hood and a couple of things started to look really strange. In this talk we will highlight on:
The most predictive URL's all having only recently appeared in our data
They were equally predictive no matter whether we are advertising for a hotel chain, pizza, or running shoes
The co-visitation patterns between the sites seemed excessive and very unnatural
About Claudia Perlich
Claudia Perlich serves as Chief Scientist at m6d and in this role designs, develops, analyzes and optimizes the machine learning that drives digital advertising to prospective customers of brands. An active industry speaker and frequent contributor to industry publications, Claudia enjoys acting as a guide in world of data and was recently named winner of the Advertising Research Foundation's (ARF) Grand Innovation Award and was selected as member of the Crain's NY annual 40 Under 40 list. She has published numerous scientific articles, and holds multiple patents in machine learning and won many data mining competitions. Prior to joining m6d in February 2010, Claudia worked in Data Analytics Research at IBM's Watson Research Center, concentrating on data analytics and machine learning for complex real-world domains and applications. Claudia has a PhD in Information Systems from NYU and an MA in Computer Science from Colorado University. Claudia takes active interest in the making of the next generation of data scientists and is teaching "Data Mining for Business Intelligence" in the NYU Stern MBA program.
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Juliet Hougland and Jonathan Natkins
Users are constantly searching for new content and to stay competitive organizations must act immediately based on up-to-date data. Outdated recommendations decrease the likelihood of presenting the right offer and make it harder to maintain customer loyalty. In order to provide the most relevant recommendations and increase engagement, organizations must track customer interactions and re-score recommendations on the fly.
In this webcast talk, we'll highlight how developers can use open source components like HBase and Kiji to develop low-latency recommendation models that can be easily deployed by e-commerce companies. We will give practical advice on how to choose models and design data stores that make use of the architecture and quickly serve new recommendations.
About Juliet Hougland
Juliet Hougland is a Member of Technical Staff on the Product Engineering team at WibiData. She develops tools that enable data scientists to seamlessly develop and deploy real-time predictive models. She holds a B.A. in Mathematics-Physics from Reed College and a M.S. in Applied Mathematics from University of Colorado, Boulder.
About Jonathan Natkins
Jonathan Natkins is a Member of Technical Staff on the Field Engineering team at WibiData. He helps customers use their data to create better application experiences. Prior to WibiData, Jonathan was an engineer at Cloudera, working primarily on Cloudera Manager and contributing to various Hadoop related projects. Before joining Cloudera, Jonathan worked both as an engineer and a field engineer at Vertica, first building core database features and then working closely with customers to help them move their systems into production. Jonathan holds an Sc.B in Math-Computer Science from Brown University.
Leveling Up With Hadoop: How Blizzard's Business Intelligence Supports The Worlds of Warcraft, Starcraft, and Diablo
Brian Griffith and Amanda Gerdes
At Blizzard Entertainment, our business intelligence team helps users make data-driven decisions. One of the most interesting kinds of analysis we do is to look at in-game mechanics and features. We can tease out subtle insights from the petabytes of data from the many disparate data sources we have and we can feed those insights back to the game develop teams to make every player's experience epic.
The webcast talk will higlight the critical role users' needs play in the design, implementation, and adoption of Hadoop within a fast-moving, dynamic environment that supports three Triple A game titles and all supporting business units.
About Brian Griffith
Brian Griffith's career spans more than twelve years in the software, financial, and entertainment industries. Prior to Blizzard, he was the lead DBA and data warehouse engineer for Eastern Bank, integrating disparate systems into a single, secure enterprise data system. Currently at Blizzard, he works passionately with vast amounts of data to help game designers make their games even more epic. He holds a B.S. and an M.S. from Northeastern University, specializing in neuroanatomy and statistics. His Blood Elf paladin wears pink armor.
About Amanda Gerdes
Amanda Gerdes is a data engineer with Blizzard Entertainment, focusing on the Blizzard Data Warehouse and its many supporting data pipelines. With eight years of experience in ETL development and management, she currently rounds out a team responsible for providing fast, accurate data at scale to Blizzard's Business Intelligence team. Amanda holds a B.A. from UC Berkeley as well as an MBA and M.S. in Systems Engineering from Loyola Marymount University. Whenever she catches herself thinking that maybe she's not doing too badly at this "life" thing, she looks up her World of Warcraft playtime in the data warehouse and is humbled once again.
Interactive Visualization of "Big" Data
When the number of data elements gets large, standard visual representations and interaction techniques break down. In this webcast talk, we will highlight survey methods for scaling interactive visualizations to data sets too large to process or explore using traditional means. Attendees will hear about effective visualization techniques and interaction methods that are applicable to billion+ element databases.
About Jeffrey Heer
Jeffrey Heer is a co-founder and CXO (Chief Experience Officer) at Trifacta, a start-up company creating new tools for enhancing the productivity of data analysts. He is also a professor of Computer Science at Stanford University, where he leads the Stanford Visualization Group. His group has created a number of popular tools, including D3.js (Data-Driven Documents) and Data Wrangler. In Fall 2013, Jeff will join the faculty of Computer Science & Engineering at the University of Washington. In 2009 Jeff was named to MIT Technology Review's TR35; in 2012 he was named a Sloan Foundation Research Fellow. He holds BS, MS and PhD degrees in Computer Science from the University of California, Berkeley.