Skip to Content
Data Management at Scale
book

Data Management at Scale

by Piethein Strengholt
July 2020
Intermediate to advanced
345 pages
10h 47m
English
O'Reilly Media, Inc.
Content preview from Data Management at Scale

Glossary

Apache Avro

Apache Avro is an open source project that provides data serialization and data exchange services for the Apache Kafka and Apache Hadoop ecosystems. It has a serialization service programs can use to serialize the data into files or messages efficiently. It relies on a schema-based system (repository). In Avro the schema is always provided with the data. It stores the data definition in JSON. Apache Avro currently works well within the Hadoop ecosystem (including Apache Kafka).

Apache Thrift

Apache Thrift was developed at Facebook in 2007 and is an open source project. It uses a wide ranges of languages and offers a full client/server stack many projects can directly work with. It also uses an IDL (interface definition language) for describing the data types, which is quite similar to JSON and easily readable by humans.

Access tokens

Rather than using a username and password, an access token is used to represent the identity of the user or user’s groups. It can contain additional attributes and abstracts that describe the context in which the token can be used or the time window in which the token is valid.

Accuracy

The degree to which the data reflect the truth or reality. A spelling mistake is a good example of inaccurate data.

ACID

ACID stands for atomicity, consistency, isolation, durability.

Atomicity ensures that a transaction is either fully completed, or is not begun at all. Consistency enforces that the system is in a valid state at the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Management at Scale, 2nd Edition

Data Management at Scale, 2nd Edition

Piethein Strengholt
The Self-Service Data Roadmap

The Self-Service Data Roadmap

Sandeep Uttamchandani
Data Governance: The Definitive Guide

Data Governance: The Definitive Guide

Evren Eryurek, Uri Gilad, Valliappa Lakshmanan, Anita Kibunguchy-Grant, Jessi Ashdown

Publisher Resources

ISBN: 9781492054771Errata Page