Alistair Croll

Data Everywhere: Data Anthropology, Quantified Self, Machine Data, Human Centered Design, and more

A Preview of Strata Santa Clara 2014

Date: This event took place live on February 04 2014

Presented by: Alistair Croll

Duration: Approximately 120 minutes.

Cost: Free

Questions? Please send email to


Join a lineup of thinkers and technologists for this free online event and learn about new tools and techniques that are providing us with better, more efficient insights into our data.

Learn how the Industrial Internet is optimizing the assets and operations of large industries; how to apply design thinking to your data and identify problems and opportunities; the ways that sleep data gathered from hundreds of thousands of UP wrist bands is pushing the boundaries of understanding our bodies. You'll also get a sneak peak into the future of spreadsheets, still the number one tool for financial analysis, as well as how to express yourself in R.

About Alistair Croll

Alistair has been an entrepreneur, author, and public speaker for nearly 20 years. He's worked on a variety of topics, from web performance, to big data, to cloud computing, to startups, in that time. In 2001, he co-founded web performance startup Coradiant (acquired by BMC in 2011), and since that time has also launched Rednod, CloudOps, Bitcurrent, Year One Labs, the Bitnorth conference, the International Startup Festival and several other early-stage companies.

Alistair is the chair of O'Reilly's Strata conference, Techweb's Cloud Connect, and the International Startup Festival. "Lean Analytics" is his fourth book on analytics, technology, and entrepreneurship. He lives in Montreal, Canada and tries to mitigate chronic ADD by writing about far too many things at "Solve For Interesting".

Spreadsheets: The Dark Matter of Big Data
Felienne Hermans

Spreadsheets are used extensively in industry: they are the number one tool for financial analysis and are also prevalent in other domains, such as logistics and planning. Their flexibility and immediate feedback make them easy to use for non-programmers. But they are as easy to build, as they are difficult to analyze, maintain and check. Felienne's research aims at developing methods to support spreadsheet users to understand, update and improve spreadsheets. Inspiration was taken from classic software engineering, as this field is specialized in the analysis of data and calculations. In this talk Felienne will summarize her recently completed PhD research on the topic of spreadsheet structure visualization, spreadsheet smells and clone detection, as well as presenting a sneak peek into the future of spreadsheet research as Delft University.

About Felienne Hermans

Felienne is a professor and an entrepreneur in the field of spreadsheets. Her PhD thesis centers around techniques to extract information from spreadsheets and present that in a visual way, to support users in improving and understanding them. In 2010 Felienne founded Infotron, a start up that uses the algorithms developed during the PhD project to analyze spreadsheet quality for large companies. In her spare time, Felienne volunteers as a judge for the First Lego League, a world wide technology competition for kids.

Expressing Yourself in R
Hadley Wickham

There are three main time sinks in any data science task:

1. Figuring out what you want to do. 2. Turning a vague goal into a precise set of tasks (i.e. programming). 3. Actually crunching the numbers.

A well-design domain specific language (or DSL) tightly coupled to the problem domain can make all three pieces faster. In this talk, I'll discuss two DSLs built in R: ggvis for visualisation and dplyr for data manipulation. These build on my previous packages ggplot2 and plyr, improving both expressivity and speed.

Data visualisation and manipulation are key parts of the data science process. ggvis makes it easy to declaratively describe interactive web graphics. It combines a declarative syntax based on ggplot2 with shiny's reactive programming model and vega's declarative JS rendering system. dplyr implements the most important verbs of data manipulation in a datastore-agnostic fashion, so you can think about and compute with your data in the same way regarldess of whether you're working with a local in-memory data frame or a remote on-disk database.

About Hadley Wickham

Hadley Wickham is Chief Scientist at RStudio. He is an active member of the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualisation. His research focusses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualisation to better understand data and models.

Big Industrial Internet Data: Connecting and Optimizing at New Scales
Steven Gustafson

The Industrial Internet is all about optimizing an industry's assets and operations. Advanced analytics, data, and software enable our devices to connect and be better understood in new ways. At GE, we are developing the next generation of software and analytic capabilities to enable our industries to benefit from the Industrial Internet. In this presentation, I will introduce the concepts and technologies behind the Industrial Internet, describe how Big Data is a critical component, survey how industry is approaching Big Data, and describe several of our existing efforts to research and develop technologies for the Industrial Internet utilizing Big Data. To fully realize the value out of massive industrial data, we need to think about the ecosystem of big distributed storage, compute, and knowledge to achieve re-usable, repeatable, and intelligent Big Data products.

About Steven Gustafson

Dr. Gustafson leads the Knowledge Discovery Lab at the General Electric Global Research Center in Niskayuna, New York. The Knowledge Discovery Lab is focused on large-scale data, semantics, ontologies and text mining, and pattern search and discovery.

As a former member of the Machine Learning Lab and Computational Intelligence Lab, he develops and applies advanced AI and machine learning algorithms for complex problem solving.

He received his PhD in computer science from the University of Nottingham, UK, where he was a research fellow in the Automated Scheduling, Optimisation and Planning Research Group. He received his BS and MS in computer science from Kansas State University, where he was a research assistant in the Knowledge Discovery in Databases Laboratory.

Dr. Gustafson is a member of several program committees, several journal editorial boards, and a Technical Editor-in-Chief of the journal Memetic Computing. In 2006, he received the IEEE Intelligent System's "AI's 10 to Watch" award.

Design Thinking for Dummies (Data Scientists)
Dean Malmgren

Being "data-driven" is about more than just storing lots of data and generating reports. As with many other types of projects, the most crucial part of any data-oriented project is choosing an appropriate problem or opportunity on which to focus in the first place. In this tutorial, you will learn how to apply design thinking to identify problems and opportunities where data can be used as part of a solution. We will go through a series of small-group exercises where we focus on defining problems, considering current solutions, creating new approaches, and building prototypes. Participants will leave armed with a new perspective on how to use data as a resource within their own organizations.

About Dean Malmgren

Dean Malmgren is co-founder and managing partner of Datascope Analytics. As an author of several peer-reviewed publications on big data analytics and visualization, Dean is excited about bringing cutting-edge techniques out of research and into practice. When not teasing himself or others, Dean can be found swimming, cycling, or running for silly long distances. Dean received a BS in math and chemical engineering from the University of Michigan and a PhD in chemical engineering from Northwestern University.

Soylent Mean: Data Science is Made of People
Cameran Hetrick & Kimberly Stedman

Combine your best algorithms and smartest data architecture, and what do you get? Without humans, you have an expensive, high tech brick. Humans generate data, which is used by and for humans to achieve human goals. If you want your data department to earn its keep by showing real value, you must build your social systems as meticulously as you build your pipeline.

As "big data" prepares to enter the "trough of disillusionment" in the Gartner hype curve, we all shoulder the burden of driving the industry forward and delivering on the promise of data, now. In this session, you will learn how to:

  • Hire for a balanced team of different types of data professionals who you can actually find.
  • Train data team members and data consumers in analytic rigor and cognitive biases.
  • Build a new organizational decision-making lifecycle, from goals, to business questions, to data analysis, to interpretation, to business strategy.
  • Lead organizations to identify and focus on metrics that truly make a difference.
  • Create data products that drive value for end-users and whole organizations.

About Cameran Hetrick

Cameran is the Director of Analytics for VMware. Combining advanced analytics with business acumen she is responsible for leading all data efforts on the Socialcast product, using big data to inform and create new products as well as mining customers' community data to help them optimize their investment. A veteran of the business intelligence community, Cameran was previously with Disney Parks and Resorts where she drove significant revenue growth through pricing and product optimization. She holds a degree in Economics/Mathematics from UC Santa Barbara and graduated at the top of her MBA class at UC Irvine.

About Kimberly Stedman

I do big data, social systems, and gameplay. I like wicked problems. Everything's people with me: cognition, culture, and the intelligence processes of the global brain.

I used to be a field anthropologist. I've lived in five developing countries. Nowadays I'm working in game analytics.

Right now, I'm thinking a lot about the adoption of big data, and how to make that process effective.

Bedtime Stories: Learning from Sleep Data
Monica Rogati

We optimize ads, but not our mood. We know more about our tweets than our own bodies. That's all about to change. As wearables transform the "quantified self" from a niche to a mainstream market, they are generating vast amounts of data about our health, habits, and lifestyles.

With sleep data gathered from hundreds of thousands of UP wrist bands, we can push the boundaries of understanding our bodies beyond what was possible with traditional studies alone. We can now understand not only whether men sleep longer than women, but also how that changes with age. We know that Americans lost sleep over the Boston Marathon bombing, but not over the birth of the royal baby. We can study how jetlag affects your body, at scale. We can find out how air quality affects your movement and your health. These insights are key to encouraging people to get more sleep, rewarding them when they do, and improving the quality of their lives along the way.

Join us if you'd like to hear more about sleep, health, and a world where an ecosystem of sensors is changing what we know about ourselves.

About Monica Rogati

Monica is a data scientist with a passion for turning data into products, actionable insights, and meaningful stories. As the VP of Data for Jawbone, she focuses on developing data-driven products that promote a healthier lifestyle and on finding stories in the UP wristband data.

Prior to Jawbone, Monica was one of the early members of the LinkedIn data science team, where she developed and improved some of LinkedIn's key data products for matching jobs to passive candidates, discovering people you may know, and recommending groups you may like. Monica's compelling data stories are often picked up by the mainstream press, including the Wall Street Journal, The Economist, NPR and CNN. Monica holds a Ph.D. in Computer Science from CMU, where she focused on text mining and applied machine learning. She authored eight US patents and numerous papers that appeared in top-tier peer-reviewed journals and conference proceedings.

You may also be interested in:

Strata Conference