Chapter 3. Data Types and File Formats

Hive supports many of the primitive data types you find in relational databases, as well as three collection data types that are rarely found in relational databases, for reasons we’ll discuss shortly.

A related concern is how these types are represented in text files, as well as alternatives to text storage that address various performance and other concerns. A unique feature of Hive, compared to most databases, is that it provides great flexibility in how data is encoded in files. Most databases take total control of the data, both how it is persisted to disk and its life cycle. By letting you control all these aspects, Hive makes it easier to manage and process data with a variety of tools.

Primitive Data Types

Hive supports several sizes of integer and floating-point types, a Boolean type, and character strings of arbitrary length. Hive v0.8.0 added types for timestamps and binary fields.

Table 3-1 lists the primitive types supported by Hive.

Table 3-1. Primitive data types

TypeSizeLiteral syntax examples

TINYINT

1 byte signed integer.

20

SMALLINT

2 byte signed integer.

20

INT

4 byte signed integer.

20

BIGINT

8 byte signed integer.

20

BOOLEAN

Boolean true or false.

TRUE

FLOAT

Single precision floating point.

3.14159

DOUBLE

Double precision floating point.

3.14159

STRING

Sequence of characters. The character set can be specified. Single or double quotes can be used.

'Now is the time', "for all good men"

TIMESTAMP (v0.8.0+)

Integer, float, ...

Get Programming Hive now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.