50 Algorithms Every Programmer Should Know - Second Edition

Book description

Delve into the realm of generative AI and large language models (LLMs) while exploring modern deep learning techniques, including LSTMs, GRUs, RNNs with new chapters included in this 50% new edition overhaul Purchase of the print or Kindle book includes a free eBook in PDF format.

Key Features

  • Familiarize yourself with advanced deep learning architectures
  • Explore newer topics, such as handling hidden bias in data and algorithm explainability
  • Get to grips with different programming algorithms and choose the right data structures for their optimal implementation

Book Description

The ability to use algorithms to solve real-world problems is a must-have skill for any developer or programmer. This book will help you not only to develop the skills to select and use an algorithm to tackle problems in the real world but also to understand how it works.

You'll start with an introduction to algorithms and discover various algorithm design techniques, before exploring how to implement different types of algorithms, with the help of practical examples. As you advance, you'll learn about linear programming, page ranking, and graphs, and will then work with machine learning algorithms to understand the math and logic behind them.

Case studies will show you how to apply these algorithms optimally before you focus on deep learning algorithms and learn about different types of deep learning models along with their practical use.

You will also learn about modern sequential models and their variants, algorithms, methodologies, and architectures that are used to implement Large Language Models (LLMs) such as ChatGPT.

Finally, you'll become well versed in techniques that enable parallel processing, giving you the ability to use these algorithms for compute-intensive tasks.

By the end of this programming book, you'll have become adept at solving real-world computational problems by using a wide range of algorithms.

What you will learn

  • Design algorithms for solving complex problems
  • Become familiar with neural networks and deep learning techniques
  • Explore existing data structures and algorithms found in Python libraries
  • Implement graph algorithms for fraud detection using network analysis
  • Delve into state-of-the-art algorithms for proficient Natural Language Processing illustrated with real-world examples
  • Create a recommendation engine that suggests relevant movies to subscribers
  • Grasp the concepts of sequential machine learning models and their foundational role in the development of cutting-edge LLMs

Who this book is for

This computer science book is for programmers or developers who want to understand the use of algorithms for problem-solving and writing efficient code. Whether you are a beginner looking to learn the most used algorithms concisely or an experienced programmer looking to explore cutting-edge algorithms in data science, machine learning, and cryptography, you'll find this book useful. Python programming experience is a must, knowledge of data science will be helpful but not necessary.

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. Get in touch
  2. Section 1: Fundamentals and Core Algorithms
  3. Overview of Algorithms
    1. What is an algorithm?
      1. The phases of an algorithm
      2. Development environment
    2. Python packages
      1. The SciPy ecosystem
        1. Using Jupyter Notebook
    3. Algorithm design techniques
      1. The data dimension
      2. The compute dimension
    4. Performance analysis
      1. Space complexity analysis
      2. Time complexity analysis
      3. Estimating the performance
        1. The best case
        2. The worst case
        3. The average case
      4. Big O notation
      5. Constant time (O(1)) complexity
      6. Linear time (O(n)) complexity
      7. Quadratic time (O(n2)) complexity
      8. Logarithmic time (O(logn)) complexity
    5. Selecting an algorithm
    6. Validating an algorithm
      1. Exact, approximate, and randomized algorithms
      2. Explainability
    7. Summary
  4. Data Structures Used in Algorithms
    1. Exploring Python built-in data types
      1. Lists
        1. Using lists
        2. Modifying lists: append and pop operations
        3. The range() function
        4. The time complexity of lists
      2. Tuples
        1. The time complexity of tuples
      3. Dictionaries and sets
        1. Dictionaries
        2. Sets
        3. Time complexity analysis for sets
        4. When to use a dictionary and when to use a set
      4. Using Series and DataFrames
        1. Series
        2. DataFrame
        3. Creating a subset of a DataFrame
      5. Matrices
        1. Matrix operations
        2. Big O notation and matrices
    2. Exploring abstract data types
      1. Vector
        1. Time complexity of vectors
      2. Stacks
        1. Time complexity of stack operations
        2. Practical example
      3. Queues
        1. Time complexity analysis for queues
        2. The basic idea behind the use of stacks and queues
      4. Tree
        1. Terminology
        2. Types of trees
        3. Practical examples
    3. Summary
  5. Sorting and Searching Algorithms
    1. Introducing sorting algorithms
      1. Swapping variables in Python
      2. Bubble sort
        1. Understanding the logic behind bubble sort
        2. Optimizing bubble sort
        3. Performance analysis of the bubble sort algorithm
      3. Insertion sort
        1. Performance analysis of the insertion sort algorithm
      4. Merge sort
      5. Shell sort
        1. Performance analysis of the Shell sort algorithm
      6. Selection sort
        1. Performance analysis of the selection sort algorithm
      7. Choosing a sorting algorithm
    2. Introduction to searching algorithms
      1. Linear search
        1. Performance analysis of the linear search algorithm
      2. Binary search
        1. Performance analysis of the binary search algorithm
      3. Interpolation search
        1. Performance analysis of the interpolation search algorithm
    3. Practical applications
    4. Summary
  6. Designing Algorithms
    1. Introducing the basic concepts of designing an algorithm
      1. Concern 1: correctness: will the designed algorithm produce the result we expect?
      2. Concern 2: performance: is this the optimal way to get these results?
        1. Characterizing the complexity of the problem
        2. Exploring the relationship between P and NP
        3. Introducing NP-complete and NP-hard
      3. Concern 3 – scalability: how is the algorithm going to perform on larger datasets?
        1. The elasticity of the cloud and algorithmic scalability
    2. Understanding algorithmic strategies
      1. Understanding the divide-and-conquer strategy
        1. A practical example – divide-and-conquer applied to Apache Spark
      2. Understanding the dynamic programming strategy
        1. Components of dynamic programming
        2. Conditions for using dynamic programming
      3. Understanding greedy algorithms
        1. Conditions for using greedy programming
    3. A practical application – solving the TSP
      1. Using a brute-force strategy
      2. Using a greedy algorithm
      3. Comparison of Three Strategies
    4. Presenting the PageRank algorithm
      1. Problem definition
      2. Implementing the PageRank algorithm
    5. Understanding linear programming
      1. Formulating a linear programming problem
        1. Defining the objective function
        2. Specifying constraints
      2. A practical application – capacity planning with linear programming
    6. Summary
  7. Graph Algorithms
    1. Understanding graphs: a brief introduction
      1. Graphs: the backbone of modern data networks
        1. Real-world applications
      2. The basics of a graph: vertices (or nodes)
    2. Graph theory and network analysis
    3. Representations of graphs
    4. Graph mechanics and types
      1. Ego-centered networks
        1. Basics of egonets
        2. One-hop, two-hop, and beyond
        3. Applications of egonets
    5. Introducing network analysis theory
      1. Understanding the shortest path
        1. Creating a neighborhood
        2. Triangles
        3. Density
      2. Understanding centrality measures
        1. Degree
        2. Betweenness
        3. Fairness and closeness
        4. Eigenvector centrality
      3. Calculating centrality metrics using Python
        1. 1. Setting the foundation: libraries and data
        2. 2. Crafting the graph
        3. 3. Painting a picture: visualizing the graph
      4. Social network analysis
    6. Understanding graph traversals
      1. BFS
        1. Constructing the adjacency list
        2. BFS algorithm implementation
        3. Using BFS for specific searches
      2. DFS
    7. Case study: fraud detection using SNA
      1. Introduction
      2. What is fraud in this context?
      3. Conducting simple fraud analytics
      4. Presenting the watchtower fraud analytics methodology
        1. Scoring negative outcomes
        2. Degree of suspicion
    8. Summary
  8. Section 2: Machine Learning Algorithms
  9. Unsupervised Machine Learning Algorithms
    1. Introducing unsupervised learning
      1. Unsupervised learning in the data-mining lifecycle
        1. Phase 1: Business understanding
        2. Phase 2: Data understanding
        3. Phase 3: Data preparation
        4. Phase 4: Modeling
        5. Phase 5: Evaluation
        6. Phase 6: Deployment
      2. Current research trends in unsupervised learning
      3. Practical examples
        1. Marketing segmentation using unsupervised learning
    2. Understanding clustering algorithms
      1. Quantifying similarities
        1. Euclidean distance
        2. Manhattan distance
        3. Cosine distance
      2. k-means clustering algorithm
        1. The logic of k-means clustering
        2. Initialization
        3. The steps of the k-means algorithm
        4. Stop condition
        5. Coding the k-means algorithm
        6. Limitation of k-means clustering
        7. Hierarchical clustering
    3. Steps of hierarchical clustering
    4. Coding a hierarchical clustering algorithm
    5. Understanding DBSCAN
    6. Creating clusters using DBSCAN in Python
    7. Evaluating the clusters
      1. Application of clustering
    8. Dimensionality reduction
      1. Principal component analysis
        1. Limitations of PCA
        2. Association rules mining
        3. Examples of use
        4. Market basket analysis
    9. Association rules mining
      1. Types of rules
        1. Trivial rules
        2. Inexplicable rules
        3. Actionable rules
        4. Ranking rules
        5. Support
        6. Confidence
        7. Lift
      2. Algorithms for association analysis
        1. Apriori algorithm
        2. Limitations of the apriori algorithm
        3. FP-growth algorithm
        4. Populating the FP-tree
        5. Mining frequent patterns
        6. Code for using FP-growth
    10. Summary
  10. Traditional Supervised Learning Algorithms
    1. Understanding supervised machine learning
    2. Formulating supervised machine learning problems
      1. Understanding enabling conditions
      2. Differentiating between classifiers and regressors
    3. Understanding classification algorithms
      1. Presenting the classifiers challenge
        1. The problem statement
        2. Feature engineering using a data processing pipeline
        3. Scaling the features
        4. Evaluating the classifiers
      2. Confusion matrices
        1. Understanding recall and precision
      3. Understanding the recall and precision trade-off
        1. Understanding overfitting
        2. Specifying the phases of classifiers
    4. Decision tree classification algorithm
      1. Understanding the decision tree classification algorithm
      2. The strengths and weaknesses of decision tree classifiers
      3. Use cases
    5. Understanding the ensemble methods
      1. Implementing gradient boosting with the XGBoost algorithm
      2. Differentiating the Random Forest algorithm from ensemble boosting
      3. Using the Random Forest algorithm for the classifiers challenge
    6. Logistic regression
      1. Assumptions
      2. Establishing the relationship
      3. The loss and cost functions
      4. When to use logistic regression
      5. Using the logistic regression algorithm for the classifiers challenge
    7. The SVM algorithm
      1. Using the SVM algorithm for the classifiers challenge
      2. Understanding the Naive Bayes algorithm
    8. Bayes’ theorem
      1. Calculating probabilities
      2. Multiplication rules for AND events
      3. The general multiplication rule
      4. Addition rules for OR events
      5. Using the Naive Bayes algorithm for the classifiers challenge
    9. For classification algorithms, the winner is...
      1. Understanding regression algorithms
      2. Presenting the regressors challenge
      3. The problem statement of the regressors challenge
      4. Exploring the historical dataset
      5. Feature engineering using a data processing pipeline
    10. Linear regression
      1. Simple linear regression
      2. Evaluating the regressors
      3. Multiple regression
      4. Using the linear regression algorithm for the regressors challenge
      5. When is linear regression used?
      6. The weaknesses of linear regression
      7. The regression tree algorithm
      8. Using the regression tree algorithm for the regressors challenge
      9. The gradient boost regression algorithm
      10. Using the gradient boost regression algorithm for the regressors challenge
    11. For regression algorithms, the winner is...
    12. Practical example – how to predict the weather
    13. Summary
  11. Neural Network Algorithms
    1. The evolution of neural networks
      1. Historical background
      2. AI winter and the dawn of AI spring
    2. Understanding neural networks
      1. Understanding perceptrons
      2. Understanding the intuition behind neural networks
      3. Understanding layered deep learning architectures
        1. Developing an intuition for hidden layers
        2. How many hidden layers should be used?
        3. Mathematical basis of neural network
    3. Training a neural network
    4. Understanding the anatomy of a neural network
    5. Defining gradient descent
    6. Activation functions
      1. Step function
      2. Sigmoid function
      3. ReLU
        1. Leaky ReLU
      4. Hyperbolic tangent (tanh)
      5. Softmax
    7. Tools and frameworks
      1. Keras
        1. Backend engines of Keras
        2. Low-level layers of the deep learning stack
        3. Defining hyperparameters
        4. Defining a Keras model
    8. Choosing a sequential or functional model
      1. Understanding TensorFlow
      2. Presenting TensorFlow’s basic concepts
      3. Understanding Tensor mathematics
    9. Understanding the types of neural networks
      1. Convolutional neural networks
        1. Convolution
        2. Pooling
      2. Generative Adversarial Networks
    10. Using transfer learning
    11. Case study – using deep learning for fraud detection
      1. Methodology
    12. Summary
  12. Algorithms for Natural Language Processing
    1. Introducing NLP
    2. Understanding NLP terminology
      1. Text preprocessing in NLP
        1. Tokenization
        2. Cleaning data
    3. Cleaning data using Python
    4. Understanding the Term Document Matrix
      1. Using TF-IDF
      2. Summary and discussion of results
    5. Introduction to word embedding
    6. Implementing word embedding with Word2Vec
      1. Interpreting similarity scores
      2. Advantages and disadvantages of Word2Vec
    7. Case study: Restaurant review sentiment analysis
      1. Importing required libraries and loading the dataset
      2. Building a clean corpus: Preprocessing text data
      3. Converting text data into numerical features
      4. Analyzing the results
    8. Applications of NLP
    9. Summary
  13. Understanding Sequential Models
    1. Understanding sequential data
      1. Types of sequence models
        1. One-to-many
        2. Many-to-one
        3. Many-to-many
    2. Data representation for sequential models
    3. Introducing RNNs
      1. Understanding the architecture of RNNs
        1. Understanding the memory cell and hidden state
        2. Understanding the characteristics of the input variable
      2. Training the RNN at the first timestep
        1. The activation function in action
        2. Training the RNN for a whole sequence
        3. Calculating the output for each timestep
      3. Backpropagation through time
        1. Predicting with RNNs
      4. Limitations of basic RNNs
        1. Vanishing gradient problem
        2. Inability to look ahead in the sequence
    4. GRU
      1. Introducing the update gate
      2. Implementing the update gate
      3. Updating the hidden cell
        1. Running GRUs for multiple timesteps
    5. Introducing LSTM
      1. Introducing the forget gate
      2. The candidate cell state
      3. The update gate
      4. Calculating memory state
      5. The output gate
      6. Putting everything together
      7. Coding sequential models
        1. Loading the dataset
        2. Preparing the data
        3. Creating the model
        4. Training the model
        5. Viewing some incorrect predictions
    6. Summary
  14. Advanced Sequential Modeling Algorithms
    1. The evolution of advanced sequential modeling techniques
    2. Exploring autoencoders
      1. Coding an autoencoder
      2. Setting up the environment
        1. Data preparation
        2. Model architecture
        3. Compilation
        4. Training
        5. Prediction
        6. Visualization
    3. Understanding the Seq2Seq model
      1. Encoder
      2. Thought vector
      3. Decoder or writer
      4. Special tokens in Seq2Seq
      5. The information bottleneck dilemma
    4. Understanding the attention mechanism
      1. What is attention in neural networks?
        1. Basic idea
        2. Example
      2. Three key aspects of attention mechanisms
      3. A deeper dive into attention mechanisms
      4. The challenges of attention mechanisms
    5. Delving into self-attention
      1. Attention weights
      2. Encoder: bidirectional RNNs
      3. Thought vector
      4. Decoder: regular RNNs
      5. Training versus inference
    6. Transformers: the evolution in neural networks after self-attention
      1. Why transformers shine
      2. A Python code breakdown
      3. Understanding the output
    7. LLMs
      1. Understanding attention in LLMs
      2. Exploring the powerhouses of NLP: GPT and BERT
        1. 2018’s LLM pioneers: GPT and BERT
      3. Using deep and wide models to create powerful LLMs
    8. Bottom of Form
    9. Summary
  15. Section 3: Advanced Topics
  16. Recommendation Engines
    1. Introducing recommendation systems
    2. Types of recommendation engines
      1. Content-based recommendation engines
        1. Determining similarities in unstructured documents
      2. Collaborative filtering recommendation engines
        1. Issues related to collaborative filtering
      3. Hybrid recommendation engines
        1. Generating a similarity matrix of the items
        2. Generating reference vectors of the users
        3. Generating recommendations
        4. Evolving the recommendation system
    3. Understanding the limitations of recommendation systems
      1. The cold start problem
      2. Metadata requirements
      3. The data sparsity problem
      4. The double-edged sword of social influence in recommendation systems
    4. Areas of practical applications
      1. Netflix’s mastery of data-driven recommendations
      2. The evolution of Amazon’s recommendation system
    5. Practical example – creating a recommendation engine
      1. 1. Setting up the framework
      2. 2. Data loading: ingesting reviews and titles
      3. 3. Merging data: crafting a comprehensive view
      4. 4. Descriptive analysis: gleaning insights from ratings
      5. 5. Structuring for recommendations: crafting the matrix
      6. 6. Putting the engine to test: recommending movies
        1. Finding movies correlating with Avatar (2009)
        2. 10,000 BC (2008) -0.075431 Understanding correlation
        3. Evaluating the model
        4. Retraining over time: incorporating user feedback
    6. Summary
  17. Algorithmic Strategies for Data Handling
    1. Introduction to data algorithms
      1. Significance of CAP theorem in context of data algorithms
      2. Storage in distributed environments
      3. Connecting CAP theorem and data compression
    2. Presenting the CAP theorem
      1. CA systems
      2. AP systems
      3. CP systems
    3. Decoding data compression algorithms
      1. Lossless compression techniques
        1. Huffman coding: Implementing variable-length coding
        2. Understanding dictionary-based compression LZ77
        3. Advanced lossless compression formats
    4. Practical example: Data management in AWS: A focus on CAP theorem and compression algorithms
      1. 1. Applying the CAP theorem
      2. 2. Using compression algorithms
      3. 3. Quantifying the benefits
    5. Summary
  18. Cryptography
    1. Introduction to cryptography
      1. Understanding the importance of the weakest link
      2. The basic terminology
      3. Understanding the security requirements
        1. Step 1: Identifying the entities
        2. Step 2: Establishing the security goals
        3. Step 3: Understanding the sensitivity of the data
      4. Understanding the basic design of ciphers
        1. Presenting substitution ciphers
        2. Cryptanalysis of substitution ciphers
        3. Understanding transposition ciphers
    2. Understanding the types of cryptographic techniques
      1. Using the cryptographic hash function
        1. Implementing cryptographic hash functions
        2. An application of the cryptographic hash function
        3. Choosing between MD5 and SHA
      2. Using symmetric encryption
        1. Coding symmetric encryption
        2. The advantages of symmetric encryption
        3. The problems with symmetric encryption
      3. Asymmetric encryption
        1. The SSL/TLS handshaking algorithm
        2. Public key infrastructure
        3. Blockchain and cryptography
    3. Example: security concerns when deploying a machine learning model
      1. MITM attacks
        1. How to prevent MITM attacks
      2. Avoiding masquerading
      3. Data and model encryption
    4. Summary
  19. Large-Scale Algorithms
    1. Introduction to large-scale algorithms
    2. Characterizing performant infrastructure for large-scale algorithms
      1. Elasticity
      2. Characterizing a well-designed, large-scale algorithm
        1. Load balancing
        2. ELB: Combining elasticity and load balancing
    3. Strategizing multi-resource processing
    4. Understanding theoretical limitations of parallel computing
      1. Amdahl’s law
      2. Deriving Amdahl’s law
      3. CUDA: Unleashing the potential of GPU architectures in parallel computing
        1. Bottom of form
        2. Parallel processing in LLMs: A case study in Amdahl’s law and diminishing returns
        3. Rethinking data locality
      4. Benefiting from cluster computing using Apache Spark
    5. How Apache Spark empowers large-scale algorithm processing
      1. Distributed computing
      2. In-memory processing
    6. Using large-scale algorithms in cloud computing
      1. Example
    7. Summary
  20. Practical Considerations
    1. Challenges facing algorithmic solutions
      1. Expecting the unexpected
    2. Failure of Tay, the Twitter AI bot
    3. The explainability of an algorithm
      1. Machine learning algorithms and explainability
        1. Presenting strategies for explainability
    4. Understanding ethics and algorithms
      1. Problems with learning algorithms
      2. Understanding ethical considerations
      3. Factors affecting algorithmic solutions
        1. Considering inconclusive evidence
        2. Traceability
        3. Misguided evidence
        4. Unfair outcomes
    5. Reducing bias in models
    6. When to use algorithms
      1. Understanding black swan events and their implications on algorithms
    7. Summary
  21. Other Books You May Enjoy
  22. Index

Product information

  • Title: 50 Algorithms Every Programmer Should Know - Second Edition
  • Author(s): Imran Ahmad
  • Release date: September 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781803247762