O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Bioinformatics with Python Cookbook - Second Edition

Book Description

Discover modern, next-generation sequencing libraries from Python ecosystem to analyze large amounts of biological data

Key Features

  • Perform complex bioinformatics analysis using the most important Python libraries and applications
  • Implement next-generation sequencing, metagenomics, automating analysis, population genetics, and more
  • Explore various statistical and machine learning techniques for bioinformatics data analysis

Book Description

Bioinformatics is an active research field that uses a range of simple-to-advanced computations to extract valuable information from biological data.

This book covers next-generation sequencing, genomics, metagenomics, population genetics, phylogenetics, and proteomics. You'll learn modern programming techniques to analyze large amounts of biological data. With the help of real-world examples, you'll convert, analyze, and visualize datasets using various Python tools and libraries.

This book will help you get a better understanding of working with a Galaxy server, which is the most widely used bioinformatics web-based pipeline system. This updated edition also includes advanced next-generation sequencing filtering techniques. You'll also explore topics such as SNP discovery using statistical approaches under high-performance computing frameworks such as Dask and Spark.

By the end of this book, you'll be able to use and implement modern programming techniques and frameworks to deal with the ever-increasing deluge of bioinformatics data.

What you will learn

  • Learn how to process large next-generation sequencing (NGS) datasets
  • Work with genomic dataset using the FASTQ, BAM, and VCF formats
  • Learn to perform sequence comparison and phylogenetic reconstruction
  • Perform complex analysis with protemics data
  • Use Python to interact with Galaxy servers
  • Use High-performance computing techniques with Dask and Spark
  • Visualize protein dataset interactions using Cytoscape
  • Use PCA and Decision Trees, two machine learning techniques, with biological datasets

Who this book is for

This book is for Data data Scientistsscientists, Bioinformatics bioinformatics analysts, researchers, and Python developers who want to address intermediate-to-advanced biological and bioinformatics problems using a recipe-based approach. Working knowledge of the Python programming language is expected.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Bioinformatics with Python Cookbook Second Edition
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Sections
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
    5. Get in touch
      1. Reviews
  6. Python and the Surrounding Software Ecology
    1. Introduction
    2. Installing the required software with Anaconda
      1. Getting ready
      2. How to do it...
      3. There's more...
    3. Installing the required software with Docker
      1. Getting ready
      2. How to do it...
      3. See also
    4. Interfacing with R via rpy2
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    5. Performing R magic with Jupyter Notebook
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
  7. Next-Generation Sequencing
    1. Introduction
    2. Accessing GenBank and moving around NCBI databases
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    3. Performing basic sequence analysis
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    4. Working with modern sequence formats
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    5. Working with alignment data
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    6. Analyzing data in VCF
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    7. Studying genome accessibility and filtering SNP data
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    8. Processing NGS data with HTSeq
      1. Getting ready
      2. How to do it...
      3. There's more...
  8. Working with Genomes
    1. Introduction
    2. Working with high-quality reference genomes
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    3. Dealing with low-quality genome references
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    4. Traversing genome annotations
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    5. Extracting genes from a reference using annotations
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    6. Finding orthologues with the Ensembl REST API
      1. Getting ready
      2. How to do it...
      3. There's more...
    7. Retrieving gene ontology information from Ensembl
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
  9. Population Genetics
    1. Introduction
    2. Managing datasets with PLINK
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    3. Introducing the Genepop format
      1. Getting ready
      2. How to do it...
      3. See also
    4. Exploring a dataset with Bio.PopGen
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    5. Computing F-statistics
      1. Getting ready
      2. How to do it...
      3. See also
    6. Performing Principal Components Analysis
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    7. Investigating population structure with admixture
      1. Getting ready
      2. How to do it...
      3. There's more...
  10. Population Genetics Simulation
    1. Introduction
    2. Introducing forward-time simulations
      1. Getting ready
      2. How to do it...
      3. There's more...
    3. Simulating selection
      1. Getting ready
      2. How to do it...
      3. There's more...
    4. Simulating population structure using island and stepping-stone models
      1. Getting ready
      2. How to do it...
    5. Modeling complex demographic scenarios
      1. Getting ready
      2. How to do it...
  11. Phylogenetics
    1. Introduction
    2. Preparing a dataset for phylogenetic analysis
      1. Getting ready
      2. How to do it...
      3. There's more...
      4. See also
    3. Aligning genetic and genomic data
      1. Getting ready
      2. How to do it...
    4. Comparing sequences
      1. Getting ready
      2. How to do it...
      3. There's more...
    5. Reconstructing phylogenetic trees
      1. Getting ready
      2. How to do it...
      3. There's more...
    6. Playing recursively with trees
      1. Getting ready
      2. How to do it...
      3. There's more...
    7. Visualizing phylogenetic data
      1. Getting ready
      2. How to do it...
      3. There's more...
  12. Using the Protein Data Bank
    1. Introduction
    2. Finding a protein in multiple databases
      1. Getting ready
      2. How to do it...
      3. There's more...
    3. Introducing Bio.PDB
      1. Getting ready
      2. How to do it...
      3. There's more...
    4. Extracting more information from a PDB file
      1. Getting ready
      2. How to do it...
    5. Computing molecular distances on a PDB file
      1. Getting ready
      2. How to do it...
    6. Performing geometric operations
      1. Getting ready
      2. How to do it...
      3. There's more...
    7. Animating with PyMOL
      1. Getting ready
      2. How to do it...
      3. There's more...
    8. Parsing mmCIF files using Biopython
      1. Getting ready
      2. How to do it...
      3. There's more...
  13. Bioinformatics Pipelines
    1. Introduction
    2. Introducing Galaxy servers
      1. Getting ready
      2. How to do it…
      3. There's more…
    3. Accessing Galaxy using the API
      1. Getting ready
      2. How to do it…
    4. Developing a Galaxy tool
      1. Getting ready
      2. How to do it…
      3. There's more…
    5. Using generic pipelines with bioinformatics data
      1. Getting ready
      2. How to do it…
    6. Deploying a variant analysis pipeline with Airflow
      1. Getting ready
      2. How to do it…
      3. There's more…
  14. Python for Big Genomics Datasets
    1. Introduction
    2. Using high-performance data formats – HDF5
      1. Getting ready
      2. How to do it...
      3. There's more...
    3. Doing parallel computing with Dask
      1. Getting ready
      2. How to do it...
      3. There's more...
    4. Using high-performance data formats – Parquet
      1. Getting ready
      2. How to do it...
      3. There's more...
    5. Computing sequencing statistics using Spark
      1. Getting ready
      2. How to do it...
      3. There's more...
    6. Optimizing code with Cython and Numba
      1. Getting ready
      2. How to do it...
      3. There's more...
  15. Other Topics in Bioinformatics
    1. Introduction
    2. Doing metagenomics with the QIIME 2 Python API
      1. Getting ready
      2. How to do it...
      3. There's more...
    3. Inferring shared chromosomal segments with Germline
      1. Getting ready
      2. How to do it...
      3. There's more...
    4. Accessing the Global Biodiversity Information Facility via REST
      1. How to do it...
      2. There's more...
    5. Georeferencing GBIF datasets
      1. Getting ready
      2. How to do it...
      3. There's more...
    6. Plotting protein interactions with Cytoscape the hard way
      1. Getting ready
      2. How to do it...
      3. There's more...
  16. Advanced NGS Processing
    1. Introduction
    2. Preparing the dataset for analysis
      1. Getting ready
      2. How to do it…
    3. Using Mendelian error information for quality control
      1. How to do it…
      2. There's more…
    4. Using decision trees to explore the data
      1. How to do it…
    5. Exploring the data with standard statistics
      1. How to do it…
      2. There's more…
    6. Finding genomic features from sequencing annotations
      1. How to do it…
      2. There's more…