book

Programming Neural Networks with Python

Name: Programming Neural Networks with Python
ISBN: 9781807782450

by Rheinwerk Publishing, Inc, Dr. Joachim Steinwendner, Dr. Roland Schwaiger

April 2026

461 pages

17h 56m

English

Packt Publishing

Read now

Unlock full access

Notes on Usage
Table of Contents
Preface
1 Introduction
1.1 Why Neural Networks?
1.2 About This Book
1.3 The Contents in Brief
1.4 Is This Bee a Queen Bee?
1.5 An Artificial Neural Network for the Bee Colony
1.6 From Biology to the Artificial Neuron
1.6.1 The Biological Neuron and Its Technical Copy1.6.2 The Artificial Neuron and Its Elements
1.7 Classification and the Rest
1.7.1 Big Picture1.7.2 Artificial Intelligence1.7.3 History1.7.4 Machine Learning1.7.5 Deep Neural Networks1.7.6 Transformer Neural Networks

1.8 Summary
1.9 Further Reading
Part I Up and Running
2 Starter Kit for Developing Neural Networks with Python
2.1 The Technical Development Environment2.1.1 The Anaconda Distribution2.1.2 Our Cockpit: Jupyter Notebook2.1.3 Major Python Modules2.1.4 The Google Colab Platform for Jupyter Notebooks2.1.5 Additional Jupyter Notebook Cloud Resources
2.2 Summary
3 A Simple Neural Network
3.1 Background
3.2 Bring on the Neural Network!
3.3 Neuron Zoom-In
3.4 Step Function
3.5 Perceptron
3.6 Points in Space: Vector Representation
3.6.1 Task: Completing Values3.6.2 Task: Outputting the Iris Dataset as a Scatterplot
3.7 Horizontal and Vertical: Column and Line Notation
3.7.1 Task: Determining the Scalar Product Using NumPy
3.8 The Weighted Sum
3.9 Step-by-Step: Step Functions
3.10 The Weighted Sum Reloaded
3.11 All Together
3.12 Task: Robot Protection
3.13 Summary
3.14 Further Reading
4 Learning in a Simple Network
4.1 Background: Plans Are Being Made
4.2 Learning in Python Code
4.3 Perceptron Learning
4.4 Separating Line for a Learning Step
4.5 Perceptron Learning Algorithm
4.6 The Separating Lines or Hyperplanes for the Example
4.7 scikit-learn Compatible Estimator
4.8 scikit-learn Perceptron Estimator
4.9 Adaline
4.10 Summary
4.11 Further Reading
5 Multilayer Neural Networks
5.1 A Real Problem
5.2 Solving XOR
5.3 Preparations for the Launch
5.4 The Plan for Implementation
5.5 The Setup ("class")
5.6 The Initialization ("__init__")
5.7 Something for In-Between ("print")
5.8 The Analysis ("predict")
5.9 The Usage
5.10 Summary
6 Learning in a Multilayer Network
6.1 How Do You Measure an Error?
6.2 Gradient Descent: An Example
6.2.1 Gradient Descent: The Concept6.2.2 Algorithm for the Gradient Descent
6.3 A Network of Sigmoid Neurons
6.4 The Cool Algorithm with Forward Delta and Backpropagation
6.4.1 The __init__ Method6.4.2 The “predict” Method6.4.3 The “fit” Method6.4.4 The “plot” Method6.4.5 The Complete Picture
6.5 A “fit” Run
6.5.1 Initialization6.5.2 Forward6.5.3 Output6.5.4 Hidden6.5.5 Delta W_kj6.5.6 Delta W_ji6.5.7 W_ji6.5.8 W_kj
6.6 Summary
6.7 Further Reading
7 Examples of Deep Neural Networks
7.1 Convolutional Neural Networks7.1.1 The Architecture of Convolutional Networks7.1.2 The Coding Block7.1.3 The Prediction Block7.1.4 Training Convolutional Neural Networks
7.2 Transformer Neural Networks
7.2.1 The Network Structure7.2.2 Embeddings7.2.3 Positional Encoding7.2.4 Encoder7.2.5 Decoder7.2.6 Training Transformer Neural Networks
7.3 The Optimization Method
7.3.1 Momentum Optimization7.3.2 ADAM Optimization
7.4 Preventing Overfitting
7.4.1 Early Stopping7.4.2 Dropout
7.5 Summary
7.6 Further Reading
8 Programming Deep Neural Networks Using TensorFlow 2
8.1 Convolutional Networks for Handwriting Recognition8.1.1 The MNIST Dataset8.1.2 A Simple Convolutional Neural Network8.1.3 The Results
8.2 Transfer Learning with Convolutional Neural Networks
8.2.1 The Pretrained Network8.2.2 Data Preparation8.2.3 The Pretrained Network8.2.4 The Results
8.3 Transfer Learning with Transformer Neural Networks
8.3.1 The Transformer Library8.3.2 Tokenizers and Models8.3.3 The Model Hub from Hugging Face
8.4 Summary
8.5 Further Reading
Part II Deep Dive
9 From Brain to Network
9.1 Your Brain in Action
9.2 The Nervous System
9.3 The Brain
9.3.1 The Parts9.3.2 A Section
9.4 Neurons and Glial Cells
9.5 A Transfer in Detail
9.6 Representation of Cells and Networks
9.7 Summary
9.8 Further Reading
10 The Evolution of Artificial Neural Networks
10.1 The 1940s10.1.1 1943 McCulloch-Pitts Neurons10.1.2 1949: Donald Hebb
10.2 The 1950s
10.2.1 1951: Marvin Minsky and Dean Edmonds – SNARC10.2.2 1955/1956: Artificial Intelligence10.2.3 1958: Rosenblatt’s Perceptron10.2.4 1960: Bernard Widrow and Marcian Hoff – Adaline and Madaline
10.3 The 1960s
10.3.1 1969: Marvin Minsky and Seymour Papert
10.4 The 1970s
10.4.1 1972: Kohonen – Associative Memory10.4.2 1973: Lighthill Report10.4.3 1974: Backpropagation
10.5 The 1980s
10.5.1 1980: Fukushima’s Neocognitron10.5.2 1982: John Hopfield10.5.3 1982: Kohonen’s SOM10.5.4 1986: Backpropagation10.5.5 1987: NN Conference10.5.6 1989: Yann LeCun: Convolutional Neural Networks
10.6 The 1990s
10.6.1 1997: Sepp Hochreiter and Jürgen Schmidhuber – Long Short-Term Memory
10.7 The 2000s
10.7.1 2006: Geoffrey Hinton et al.
10.8 The 2010s
10.8.1 2014: Ian J. Goodfellow et al. – Generative Adversarial Networks10.8.2 2017: Ashish Vaswani et al. – Attention Is All You Need
10.9 Summary
10.10 Further Reading
11 The Machine Learning Process
11.1 The CRISP-DM Model11.1.1 Business Understanding11.1.2 Data Understanding11.1.3 Data Preparation11.1.4 Modeling11.1.5 Evaluation11.1.6 Deployment
11.2 Ethical and Legal Aspects
11.2.1 Algorithmic Fairness and Bias11.2.2 Explainability and Interpretability11.2.3 Ecological Aspects11.2.4 Legal Aspects
11.3 Feature Engineering
11.3.1 Feature Coding11.3.2 Feature Extraction11.3.3 The Curse of Dimensionality11.3.4 Feature Transformation11.3.5 Feature Selection
11.4 Summary
11.5 Further Reading
12 Learning Methods
12.1 Learning Strategies12.1.1 Supervised Learning12.1.2 Unsupervised Learning12.1.3 Reinforcement Learning12.1.4 Semi-Supervised Learning
12.2 Tools
12.2.1 Confusion Matrix12.2.2 Receiver Operating Characteristic Curves
12.3 Summary
12.4 Further Reading
13 Areas of Application and Real-Life Examples
13.1 Warm-Up
13.2 Image Classification
13.2.1 Definitions13.2.2 On Bees and Bumblebees13.2.3 Pretrained Networks
13.3 Dreamed Images
13.3.1 The Algorithm13.3.2 Implementation
13.4 Deployment with Pretrained Networks
13.4.1 A Web Application for a Neural Network to Generate Image Descriptions13.4.2 A Web Application for Image Generation
13.5 Summary
13.6 Further Reading
A Python in Brief
A.1 The First Climb: Data Types, Variables, and ValuesA.1.1 Variables Are ValuableA.1.2 1 and 0 Equals BooleA.1.3 A Text Is a String
A.2 At the Top of the Hill: if-then-else and Loops
A.2.1 This Had Better Work: The “if” StatementA.2.2 Again and Again: “for” and “while” LoopsA.2.3 List Comprehension
A.3 Downhill: Functions, Classes, Modules, and Some Greek
A.3.1 Functions WorkA.3.2 Some Greek: Lambda FunctionsA.3.3 Classes Are GreatA.3.4 Modules Are Modern
A.4 Right in the Middle: Special Data Types for Vectors, Matrixes, and Tensors
A.4.1 Data Types in Core PythonA.4.2 The “numpy” (Numerical Python) Module
A.5 At the Finish: A Complete k-nearest neighbor Classifier
A.6 After the Ride Is Before the Ride
A.7 Further Reading
B Mathematics in Brief
B.1 Linear AlgebraB.1.1 Points, Vectors, and MatrixesB.1.2 Vector AdditionB.1.3 Multiplying Vectors by ScalarsB.1.4 Subtracting VectorsB.1.5 Multiplying Vectors with Each OtherB.1.6 Multiplying Matrixes and Vectors with Each OtherB.1.7 Transposing a MatrixB.1.8 Line EquationB.1.9 From Weight to Line
B.2 Calculus
B.2.1 What Is a Function?B.2.2 Differential CalculusB.2.3 Rules from Differential CalculusB.2.4 Example of the Chain Rule
B.3 Derivative of the Sigmoid Function
B.4 Key Statements
B.4.1 Perceptron Convergence TheoremB.4.2 Function Approximation
B.5 Notation
B.6 Summary
B.7 Further Reading
C TensorFlow 2 and Keras
C.1 Introduction to TensorFlow 2
C.2 Features of TensorFlow 2
C.2.1 Eager ExecutionC.2.2 Multi-GPU ManagementC.2.3 LiteRT
C.3 Integrated Keras Library
C.3.1 Sequential Model (Sequential API)C.3.2 Functional Paradigm (Functional API)
C.4 Summary
C.5 Further Reading
The Authors
Index
Service Pages
Legal Notes

Content preview from Programming Neural Networks with Python

7.2 Transformer Neural Networks

Transformer neural networks are a relatively young but revolutionary architecture in the field of AI and ML. Originally presented in the paper “Attention Is All You Need” by Ashish Vaswani et al. in 2017, transformer networks have quickly gained popularity and are now a central component of many advanced systems or large language models (LLMs), especially those for natural language processing (NLP).

In contrast to traditional recurrent neural networks (RNNs), which process sequential data step-by-step and usually only from left to right, transformers work with a mechanism referred to as self-attention. This mechanism enables the model to consider and weigh different parts of an input sequence (i.e., a sentence ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Claude Code Masterclass: Code faster with Agentic AI

Publisher Resources

ISBN: 9781807782450

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Programming Neural Networks with Python

by Rheinwerk Publishing, Inc, Dr. Joachim Steinwendner, Dr. Roland Schwaiger

7.2 Transformer Neural Networks

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.