Book description
Harness the power of PolyBase data virtualization software to make data from a variety of sources easily accessible through SQL queries while using the T-SQL skills you already know and have mastered.
PolyBase Revealed shows you how to use the PolyBase feature of SQL Server 2019 to integrate SQL Server with Azure Blob Storage, Apache Hadoop, other SQL Server instances, Oracle, Cosmos DB, Apache Spark, and more. You will learn how PolyBase can help you reduce storage and other costs by avoiding the need for ETL processes that duplicate data in order to make it accessible from one source. PolyBase makes SQL Server into that one source, and T-SQL is your golden ticket. The book also covers PolyBase scale-out clusters, allowing you to distribute PolyBase queries among several SQL Server instances, thus improving performance.
With great flexibility comes great complexity, and this book shows you where to look when queries fail, complete with coverage of internals, troubleshooting techniques, and where to find more information on obscure cross-platform errors. Data virtualization is a key target for Microsoft with SQL Server 2019. This book will help you keep your skills current, remain relevant, and build new business and career opportunities around Microsoft’s product direction.
- Install and configure PolyBase as a stand-alone service, or unlock its capabilities with a scale-out cluster
- Understand how PolyBase interacts with outside data sources while presenting their data as regular SQL Server tables
- Write queries combining data from SQL Server, Apache Hadoop, Oracle, Cosmos DB, Apache Spark, and more
- Troubleshoot PolyBase queries using SQL Server Dynamic Management Views
- Tune PolyBase queries using statistics and execution plans
- Solve common business problems, including "cold storage" of infrequently accessed data and simplifying ETL jobs
Table of contents
- Cover
- Frontmatter
- 1. Installing and Configuring PolyBase
- 2. Connecting to Azure Blob Storage
- 3. Connecting to Hadoop
- 4. Using Predicate Pushdown to Enhance Query Performance
- 5. Common Hadoop and Blob Storage Integration Errors
- 6. Integrating with SQL Server
- 7. Built-In Integrations: Cosmos DB, Oracle, and More
- 8. Integrating via ODBC
- 9. PolyBase in Azure Synapse Analytics
- 10. Examining PolyBase via Dynamic Management Views
- 11. Query Tuning with Statistics and Execution Plans
- 12. PolyBase in Practice
- Backmatter
Product information
- Title: PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
- Author(s):
- Release date: December 2019
- Publisher(s): Apress
- ISBN: 9781484254615
You might also like
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
book
The Self-Service Data Roadmap
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw …
book
Data Science from Scratch, 2nd Edition
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, …
book
Building Machine Learning Pipelines
Companies are spending billions on machine learning projects, but it’s money wasted if the models can’t …