SAS applications can also access data that is stored in Hadoop file systems. In addition,
SAS In-Database enables high-performance computing by running aggregations and
analytics inside the database.
The following sections describe each storage option in more detail. Central management
of data sources through the SAS metadata repository is also discussed.
Default SAS Storage
You can use SAS data sets (tables), the default SAS storage format, to store data of any
A SAS table is a file that SAS software creates and processes. Each SAS table is a
member of a SAS library. A SAS library is a collection of one or more SAS files that are
recognized by SAS software and that are referenced and stored as a unit.
Each SAS table contains the following:
• data values that are organized as a table of observations (rows) and variables
(columns) that can be processed by SAS software
• descriptor information such as data types, column lengths, and the SAS engine that
was used to create the data
For shared access to SAS tables, you can use SAS/SHARE software, which provides
concurrent Update access to SAS files for multiple users.
Third-Party Data Storage
Data can be stored in a wide range of third-party databases, including the following:
• relational databases such as Oracle, Sybase, DB2, SQL Server, and Teradata
• Hadoop data, which is divided into blocks and stored across multiple connected
nodes that work together
• hierarchical databases such as IBM Information Management System (IMS)
• Computer Associates Integrated Database Management System (CA-IDMS), which
is a network model database system
SAS/ACCESS interfaces provide fast, efficient loading of data to and from these
facilities. With these interfaces, SAS software can work directly from the data sources
without making a copy. Several of the SAS/ACCESS engines use an input/output (I/O)
subsystem that enables applications to read entire blocks of data instead of reading just
one record at a time. This feature reduces I/O bottlenecks and enables procedures to read
data as fast as they can process it. The SAS/ACCESS engines for Oracle, Sybase, DB2
(on UNIX and PC), ODBC, SQL Server, and Teradata support this functionality. These
engines, as well as the DB2 engine on z/OS, can also access database management
system (DBMS) data in parallel by using multiple threads to the parallel DBMS server.
Coupling the threaded SAS procedures with these SAS/ACCESS engines provides even
greater gains in performance.
SAS In-Database enables high-performance computing for complex, high-volume
analytics. This technology enables certain Base SAS and SAS/STAT procedures to run
aggregations and analytics inside the database. In-database technology minimizes the
movement of data across the network, while enabling more sophisticated queries and
18 Chapter 3 • Data in the SAS Intelligence Platform