book

Intel Xeon Phi Processor High Performance Programming, 2nd Edition

Name: Intel Xeon Phi Processor High Performance Programming, 2nd Edition
ISBN: 9780128091951

by James Jeffers, James Reinders, Avinash Sodani

May 2016

Intermediate to advanced

662 pages

20h 17m

English

Morgan Kaufmann

Read now

Unlock full access

Cover image
Title page
Table of Contents
Copyright
Acknowledgments
Foreword
Extending the Sports Car Analogy to Higher PerformanceWhat Exactly Is The Unfair Advantage?Peak Performance Versus Drivable/Usable PerformanceHow Does The Unfair Advantage Relate to This Book?Closing Comments
Preface
Sports Car Tutorial: Introduction for Many-Core Is OnlineParallelism Pearls: Inspired by Many CoresOrganizationStructured Parallel ProgrammingWhat’s New?lotsofcores.com
Section I: Knights Landing
Introduction
Chapter 1: Introduction
AbstractIntroduction to Many-Core ProgrammingTrend: More ParallelismWhy Intel® Xeon Phi™ Processors Are NeededProcessors Versus CoprocessorMeasuring Readiness for Highly Parallel ExecutionWhat About GPUs?Enjoy the Lack of Porting Needed but Still Tune!Transformation for PerformanceHyper-Threading Versus MultithreadingProgramming ModelsWhy We Could Skip To Section II NowFor More Information

Chapter 2: Knights Landing overview
AbstractOverviewInstruction SetArchitecture OverviewMotivation: Our Vision and PurposeSummaryFor More Information
Chapter 3: Programming MCDRAM and Cluster modes
AbstractProgramming for Cluster ModesProgramming for Memory ModesQuery Memory Mode and MCDRAM AvailableSNC Performance Implications of Allocation and ThreadingHow to Not Hard Code the NUMA Node NumbersApproaches to Determining What to Put in MCDRAMWhy Rebooting Is Required to Change ModesBIOSSummaryFor More Information
Chapter 4: Knights Landing architecture
AbstractTile ArchitectureCluster ModesMemory InterleavingMemory ModesInteractions of Cluster and Memory ModesSummaryFor More Information
Chapter 5: Intel Omni-Path Fabric
AbstractOverviewPerformance and ScalabilityTransport Layer APIsQuality of ServiceVirtual FabricsUnicast Address ResolutionMulticast Address ResolutionSummaryFor More Information
Chapter 6: μarch optimization advice
AbstractBest Performance From 1, 2, or 4 Threads Per Core, Rarely 3Memory Subsystemμarch Nuances (tile)Direct Mapped MCDRAM CacheAdvice: Use AVX-512SummaryFor More Information
Section II: Parallel Programming
Introduction
Chapter 7: Programming overview for Knights Landing
AbstractTo Refactor, or Not to Refactor, That Is the QuestionEvolutionary Optimization of ApplicationsRevolutionary Optimization of ApplicationsKnow When to Hold’em and When to Fold’emFor More Information
Chapter 8: Tasks and threads
AbstractOpenMPFortran 2008Intel TBBhStreamsSummaryFor More Information
Chapter 9: Vectorization
AbstractWhy Vectorize?How to VectorizeThree Approaches to Achieving VectorizationSix-Step Vectorization MethodologyStreaming Through Caches: Data Layout, Alignment, Prefetching, and so onCompiler TipsCompiler OptionsCompiler DirectivesUse Array Sections to Encourage VectorizationLook at What the Compiler Created: Assembly Code InspectionNumerical Result Variations With VectorizationSummaryFor More Information
Chapter 10: Vectorization advisor
AbstractGetting Started With Intel Advisor for Knights LandingEnabling and Improving AVX-512 Code With the Survey ReportMemory Access Pattern ReportAVX-512 Gather/Scatter ProfilerMask Utilization and FLOPs ProfilerAdvisor Roofline ReportExplore AVX-512 Code Characteristics Without AVX-512 HardwareExample — Analysis of a Computational Chemistry CodeSummaryFor More Information
Chapter 11: Vectorization with SDLT
AbstractWhat Is SDLT?Getting StartedSDLT BasicsExample Normalizing 3d Points With SIMDWhat Is Wrong With AOS Memory Layout and SIMD?SIMD Prefers Unit-Stride Memory AccessesAlpha-Blended Overlay ReferenceAlpha-Blended Overlay With SDLTAdditional FeaturesSummaryFor More Information
Chapter 12: Vectorization with AVX-512 intrinsics
AbstractWhat Are Intrinsics?AVX-512 OverviewMigrating From Knights CornerAVX-512 DetectionLearning AVX-512 InstructionsLearning AVX-512 IntrinsicsStep-by-Step Example Using AVX-512 IntrinsicsResults Using Our Intrinsics CodeFor More Information
Chapter 13: Performance libraries
AbstractIntel Performance Library OverviewIntel Math Kernel Library OverviewIntel Data Analytics Library OverviewTogether: MKL and DAALIntel Integrated Performance Primitives Library OverviewIntel Performance Libraries and Intel CompilersNative (Direct) Library UsageOffloading to Knights Landing While Using a LibraryPrecision Choices and VariationsPerformance Tip for Faster Dynamic LibrariesFor More Information
Chapter 14: Profiling and timing
AbstractIntroduction to Knight Landing TuningEvent-Monitoring RegistersEfficiency MetricsPotential Performance IssuesIntel VTune Amplifier XE ProductPerformance Application Programming InterfaceMPI Analysis: ITACHPCToolkitTuning and Analysis UtilitiesTimingSummaryFor More Information
Chapter 15: MPI
AbstractInternode ParallelismMPI on Knights LandingMPI OverviewHow to Run MPI ApplicationsAnalyzing MPI Application RunsTuning of MPI ApplicationsHeterogeneous ClustersRecent Trends in MPI CodingPutting it All TogetherSummaryFor More Information
Chapter 16: PGAS programming models
AbstractTo Share or Not to ShareWhy use PGAS on Knights Landing?Programming with PGASPerformance EvaluationBeyond PGASSummaryFor More Information
Chapter 17: Software-defined visualization
AbstractMotivation for Software-Defined VisualizationSoftware-Defined Visualization ArchitectureOpenSWR: OpenGL Raster-Graphics Software RenderingEmbree: High-performance Ray Tracing Kernel LibraryOSPRay: Scalable Ray Tracing FrameworkSummaryImage AttributionsFor More Information
Chapter 18: Offload to Knights Landing
AbstractOffload Programming Model—Using With Knights LandingProcessors Versus CoprocessorOffload Model ConsiderationsOpenMP Target DirectivesConcurrent Host and Target ExecutionOffload Over FabricSummaryFor More Information
Chapter 19: Power analysis
AbstractPower Demand Gates ExascalePower 101Hardware-Based Power Analysis TechniquesSoftware-Based Knights Landing Power AnalyzerManyCore Platform Software Package Power ToolsRunning Average Power LimitPerformance Profiling on Knights LandingIntel Remote Management ModuleSummaryFor More Information
Section III: Pearls
Introduction
Chapter 20: Optimizing classical molecular dynamics in LAMMPS
AbstractAcknowledgmentMolecular DynamicsLAMMPSKnights Landing ProcessorsLAMMPS OptimizationsData AlignmentData Types and LayoutVectorizationNeighbor ListLong-Range ElectrostaticsMPI and OpenMP ParallelizationPerformance ResultsSystem, Build, and Run ConfigurationsWorkloadsOrganic Photovoltaic MoleculesHydrocarbon MixturesRhodopsin Protein in Solvated Lipid BilayerCoarse Grain Liquid Crystal SimulationCoarse-Grain Water SimulationSummaryFor More Information
Chapter 21: High performance seismic simulations
AbstractHigh-Order Seismic SimulationsNumerical BackgroundApplication CharacteristicsIntel Architecture as Compute EngineHighly-efficient Small Matrix KernelsSparse Matrix Kernel Generation and Sparse/Dense Kernel SelectionDense Matrix Kernel Generation: AVX2Dense Matrix Kernel Generation: AVX-512Kernel Performance BenchmarkingIncorporating Knights Landing’s Different Memory SubsystemsPerformance EvaluationMount Merapi1992 LandersSummary and Take-AwaysFor More Information
Chapter 22: Weather research and forecasting (WRF)
AbstractWRF OverviewWRF Execution Profile: Relatively FlatHistory of WRF on Intel Many-Core (Intel Xeon Phi Product Line)Our Early Experiences With WRF on Knights LandingCompiling WRF for Intel Xeon and Intel Xeon Phi SystemsWRF CONUS12km Benchmark PerformanceMCDRAM BandwidthVectorization: Boost of AVX-512 Over AVX2Core ScalingSummaryFor More Information
Chapter 23: N-Body simulation
AbstractParallel Programming for Noncomputer ScientistsStep-by-Step ImprovementsN-Body simulationoptimizationInitial Implementation (Optimization Step 0)Thread parallelism (optimization step 1)Scalar Performance Tuning (Optimization Step 2)Vectorization with SOA (optimization step 3)Memory traffic (optimization step 4)Impact of MCDRAM on PerformanceSummaryFor More Information
Chapter 24: Machine learning
AbstractConvolutional Neural NetworksOverFeat-FAST ResultsFor More Information
Chapter 25: Trinity workloads
AbstractOut of the Box PerformanceOptimizing MiniGhost OpenMP PerformanceSummaryFor More Information
Chapter 26: Quantum chromodynamics
AbstractLQCDThe QPhiX Library and Code GeneratorWilson-Dslash OperatorConfiguring the QPhiX Code GeneratorThe Experimental SetupResultsConclusionFor More Information
Contributors
Glossary
Index

Overview

Intel Xeon Phi Processor High Performance Programming is an all-in-one source of information for programming the Second-Generation Intel Xeon Phi product family also called Knights Landing. The authors provide detailed and timely Knights Landingspecific details, programming advice, and real-world examples. The authors distill their years of Xeon Phi programming experience coupled with insights from many expert customers — Intel Field Engineers, Application Engineers, and Technical Consulting Engineers — to create this authoritative book on the essentials of programming for Intel Xeon Phi products.

Intel® Xeon Phi™ Processor High-Performance Programming is useful even before you ever program a system with an Intel Xeon Phi processor. To help ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi processors, or other high-performance microprocessors. Applying these techniques will generally increase your program performance on any system and prepare you better for Intel Xeon Phi processors.

A practical guide to the essentials for programming Intel Xeon Phi processors
Definitive coverage of the Knights Landing architecture
Presents best practices for portable, high-performance computing and a familiar and proven threads and vectors programming model
Includes real world code examples that highlight usages of the unique aspects of this new highly parallel and high-performance computational product
Covers use of MCDRAM, AVX-512, Intel® Omni-Path fabric, many-cores (up to 72), and many threads (4 per core)
Covers software developer tools, libraries and programming models
Covers using Knights Landing as a processor and a coprocessor

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Intel® Xeon Phi™ Coprocessor Architecture and Tools: The Guide for Application Developers

Publisher Resources

ISBN: 9780128091951

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills