How to tame your online services

Qingwei Lin; Jian-Guang Lou; Hongyu Zhang; Dongmei Zhang    Microsoft Research, Beijing, China

Abstract

Online service systems have become increasingly popular and important. Service incidents can lead to huge economic loss. We designed a set of incident management techniques based on the analysis of a huge amount of data collected at service runtime. Our tool is called Service Analysis Studio (SAS), which has been successfully applied to large-scale online services provided by Microsoft.

Keywords

Online service systems; Service incident; Incident management; Service Analysis Studio (SAS); Service-incident beacons; Transactional logs

Background

Online service systems, such as online banking systems and e-commerce ...

Get Perspectives on Data Science for Software Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.