book

Machine Learning for Algorithmic Trading - Second Edition

by Stefan Jansen

July 2020

Beginner to intermediate

820 pages

25h 30m

English

Packt Publishing

Read now

Unlock full access

What to expectWhat's new in the second editionWho should read this bookWhat this book coversTo get the most out of this bookGet in touch
The rise of ML in the investment industryFrom electronic to high-frequency tradingFactor investing and smart beta fundsAlgorithmic pioneers outperform humansML-driven funds attract $1 trillion in AUMThe emergence of quantamental fundsInvestments in strategic capabilitiesML and alternative dataCrowdsourcing trading algorithmsDesigning and executing an ML-driven strategySourcing and managing dataFrom alpha factor research to portfolio managementThe research phaseThe execution phaseStrategy backtestingML for trading – strategies and use casesThe evolution of algorithmic strategiesUse cases of ML for tradingData mining for feature extraction and insightsSupervised learning for alpha factor creationAsset allocationTesting trade ideasReinforcement learningSummary
Market data reflects its environmentMarket microstructure – the nuts and boltsHow to trade – different types of ordersWhere to trade – from exchanges to dark poolsWorking with high-frequency dataHow to work with Nasdaq order book dataCommunicating trades with the FIX protocolThe Nasdaq TotalView-ITCH data feedHow to parse binary order messagesSummarizing the trading activity for all 8,500 stocksHow to reconstruct all trades and the order bookFrom ticks to bars – how to regularize market dataThe raw material – tick barsPlain-vanilla denoising – time barsAccounting for order fragmentation – volume barsAccounting for price changes – dollar barsAlgoSeek minute bars – equity quote and trade dataFrom the consolidated feed to minute barsQuote and trade data fieldsHow to process AlgoSeek intraday dataAPI access to market dataRemote data access using pandasReading HTML tablespandas-datareader for market datayfinance – scraping data from Yahoo! FinanceHow to download end-of-day and intraday pricesHow to download the option chain and pricesQuantopianZiplineQuandlOther market data providersHow to work with fundamental dataFinancial statement dataAutomated processing – XBRLBuilding a fundamental data time seriesOther fundamental data sourcespandas-datareader – macro and industry dataEfficient data storage with pandasSummary
The alternative data revolutionSources of alternative dataIndividualsBusiness processesSensorsSatellitesGeolocation dataCriteria for evaluating alternative dataQuality of the signal contentAsset classesInvestment styleRisk premiumsAlpha content and qualityQuality of the dataLegal and reputational risksExclusivityTime horizonFrequencyReliabilityTechnical aspectsLatencyFormatThe market for alternative dataData providers and use casesSocial sentiment dataSatellite dataGeolocation dataEmail receipt dataWorking with alternative dataScraping OpenTable dataParsing data from HTML with Requests and BeautifulSoupIntroducing Selenium – using browser automationBuilding a dataset of restaurant bookings and ratingsTaking automation one step further with Scrapy and SplashScraping and parsing earnings call transcriptsSummary
Alpha factors in practice – from data to signals Building on decades of factor researchMomentum and sentiment – the trend is your friendWhy might momentum and sentiment drive excess returns?How to measure momentum and sentimentValue factors – hunting fundamental bargainsRelative value strategiesWhy do value factors help predict returns?How to capture value effectsVolatility and size anomaliesWhy do volatility and size predict returns?How to measure volatility and sizeQuality factors for quantitative investingWhy quality mattersHow to measure asset qualityEngineering alpha factors that predict returnsHow to engineer factors using pandas and NumPyLoading, slicing, and reshaping the dataResampling – from daily to monthly frequencyHow to compute returns for multiple historical periodsUsing lagged returns and different holding periodsComputing factor betasHow to add momentum factorsAdding time indicators to capture seasonal effectsHow to create lagged return featuresHow to create forward returns How to use TA-Lib to create technical alpha factorsDenoising alpha factors with the Kalman filterHow does the Kalman filter work?How to apply a Kalman filter using pykalmanHow to preprocess your noisy signals using waveletsFrom signals to trades – Zipline for backtests How to backtest a single-factor strategyA single alpha factor from market dataBuilt-in Quantopian factorsCombining factors from diverse data sourcesSeparating signal from noise with AlphalensCreating forward returns and factor quantilesPredictive performance by factor quantilesThe information coefficientFactor turnoverAlpha factor resourcesAlternative algorithmic trading librariesSummary
How to measure portfolio performanceCapturing risk-return trade-offs in a single numberThe Sharpe ratioThe information ratioThe fundamental law of active managementHow to manage portfolio risk and returnThe evolution of modern portfolio managementMean-variance optimizationHow it worksFinding the efficient frontier in PythonChallenges and shortcomingsAlternatives to mean-variance optimizationThe 1/N portfolioThe minimum-variance portfolioGlobal Portfolio Optimization – the Black-Litterman approachHow to size your bets – the Kelly criterionOptimal investment – multiple assetsRisk parityRisk factor investmentHierarchical risk parityTrading and managing portfolios with ZiplineScheduling signal generation and trade executionImplementing mean-variance portfolio optimizationMeasuring backtest performance with pyfolioCreating the returns and benchmark inputsGetting pyfolio input from AlphalensGetting pyfolio input from a Zipline backtestWalk-forward testing – out-of-sample returnsSummary performance statisticsDrawdown periods and factor exposureModeling event riskSummary
How machine learning from data worksThe challenge – matching the algorithm to the taskSupervised learning – teaching by exampleUnsupervised learning – uncovering useful patternsUse cases – from risk management to text processingCluster algorithms – seeking similar observationsDimensionality reduction – compressing informationReinforcement learning – learning by trial and errorThe machine learning workflowBasic walkthrough – k-nearest neighborsFraming the problem – from goals to metricsPrediction versus inferenceRegression – popular loss functions and error metricsClassification – making sense of the confusion matrixCollecting and preparing the dataExploring, extracting, and engineering featuresUsing information theory to evaluate featuresSelecting an ML algorithmDesign and tune the modelThe bias-variance trade-offUnderfitting versus overfitting – a visual exampleHow to manage the bias-variance trade-offLearning curvesHow to select a model using cross-validationHow to implement cross-validation in PythonKFold iteratorLeave-one-out CVLeave-P-Out CVShuffleSplitChallenges with cross-validation in financeTime series cross-validation with scikit-learnPurging, embargoing, and combinatorial CVParameter tuning with scikit-learn and YellowbrickValidation curves – plotting the impact of hyperparametersLearning curves – diagnosing the bias-variance trade-offParameter tuning using GridSearchCV and pipelineSummary
From inference to predictionThe baseline model – multiple linear regressionHow to formulate the modelHow to train the modelOrdinary least squares – how to fit a hyperplane to the dataMaximum likelihood estimationGradient descentThe Gauss–Markov theoremHow to conduct statistical inferenceHow to diagnose and remedy problemsGoodness of fitHeteroskedasticitySerial correlationMulticollinearityHow to run linear regression in practiceOLS with statsmodelsStochastic gradient descent with sklearnHow to build a linear factor modelFrom the CAPM to the Fama–French factor modelsObtaining the risk factorsFama–Macbeth regressionRegularizing linear regression using shrinkageHow to hedge against overfittingHow ridge regression worksHow lasso regression worksHow to predict returns with linear regressionPreparing model features and forward returnsCreating the investment universeSelecting and computing alpha factors using TA-LibAdding lagged returnsGenerating target forward returnsDummy encoding of categorical variablesLinear OLS regression using statsmodelsSelecting the relevant universeEstimating the vanilla OLS regressionDiagnostic statisticsLinear regression using scikit-learnSelecting features and targetsCross-validating the modelEvaluating the results – information coefficient and RMSERidge regression using scikit-learnTuning the regularization parameters using cross-validationCross-validation results and ridge coefficient pathsTop 10 coefficientsLasso regression using sklearnCross-validating the lasso modelEvaluating the results – IC and lasso pathComparing the quality of the predictive signalsLinear classificationThe logistic regression modelThe objective functionThe logistic functionMaximum likelihood estimationHow to conduct inference with statsmodelsPredicting price movements with logistic regression How to convert a regression into a classification problemCross-validating the logistic regression hyperparametersEvaluating the results using AUC and ICSummary
How to backtest an ML-driven strategyBacktesting pitfalls and how to avoid themGetting the data rightLook-ahead bias – use only point-in-time dataSurvivorship bias – track your historical universeOutlier control – do not exclude realistic extremesSample period – try to represent relevant future scenariosGetting the simulation rightMark-to-market performance – track risks over timeTransaction costs – assume a realistic trading environmentTiming of decisions – properly sequence signals and tradesGetting the statistics rightThe minimum backtest length and the deflated SROptimal stopping for backtestsHow a backtesting engine worksVectorized versus event-driven backtestingKey implementation aspectsData ingestion – format, frequency, and timingFactor engineering – built-in factors versus librariesML models, predictions, and signalsTrading rules and executionPerformance evaluationbacktrader – a flexible tool for local backtestsKey concepts of backtrader's Cerebro architectureData feeds, lines, and indicatorsFrom data and signals to trades – strategyCommissions instead of commission schemesMaking it all happen – CerebroHow to use backtrader in practiceHow to load price and other dataHow to formulate the trading logicHow to configure the Cerebro instancebacktrader summary and next stepsZipline – scalable backtesting by QuantopianCalendars and the Pipeline for robust simulationsBundles – point-in-time data with on-the-fly adjustmentsThe Algorithm API – backtests on a scheduleKnown issuesIngesting your own bundles with minute dataGetting your data ready to be bundledWriting your custom bundle ingest functionRegistering your bundleCreating and registering a custom TradingCalendarThe Pipeline API – backtesting an ML signalEnabling the DataFrameLoader for our PipelineCreating a pipeline with a custom ML factorHow to train a model during the backtestPreparing the features – how to define pipeline factorsHow to design a custom ML factorTracking model performance during a backtestInstead of how to useSummary
Tools for diagnostics and feature extractionHow to decompose time-series patternsRolling window statistics and moving averagesHow to measure autocorrelationHow to diagnose and achieve stationarityTransforming a time series to achieve stationarityHandling instead of how to handleOn unit roots and random walksHow to diagnose a unit rootHow to remove unit roots and work with the resulting seriesTime-series transformations in practiceUnivariate time-series modelsHow to build autoregressive modelsHow to identify the number of lagsHow to diagnose model fitHow to build moving-average modelsHow to identify the number of lagsThe relationship between the AR and MA modelsHow to build ARIMA models and extensionsHow to model differenced seriesHow to identify the number of AR and MA termsAdding features – ARMAXAdding seasonal differencing – SARIMAXHow to forecast macro fundamentalsHow to use time-series models to forecast volatilityThe ARCH modelGeneralizing ARCH – the GARCH modelHow to build a model that forecasts volatilityMultivariate time-series modelsSystems of equationsThe vector autoregressive (VAR) modelUsing the VAR model for macro forecastsCointegration – time series with a shared trendThe Engle-Granger two-step methodThe Johansen likelihood-ratio testStatistical arbitrage with cointegrationHow to select and trade comoving asset pairsPairs trading in practiceDistance-based heuristics to find cointegrated pairsHow well do the heuristics predict significant cointegration?Preparing the strategy backtestPrecomputing the cointegration testsGetting entry and exit tradesBacktesting the strategy using backtraderTracking pairs with a custom DataClassRunning and evaluating the strategyExtensions – how to do betterSummary

How Bayesian machine learning worksHow to update assumptions from empirical evidenceExact inference – maximum a posteriori estimationHow to select priorsHow to keep inference simple – conjugate priorsDynamic probability estimates of asset price movesDeterministic and stochastic approximate inferenceMarkov chain MonteCarlo samplingVariational inference and automatic differentiationProbabilistic programming with PyMC3Bayesian machine learning with TheanoThe PyMC3 workflow – predicting a recessionThe data – leading recession indicatorsModel definition – Bayesian logistic regressionExact MAP inferenceApproximate inference – MCMCApproximate inference – variational BayesModel diagnosticsHow to generate predictionsSummary and key takeawaysBayesian ML for tradingBayesian Sharpe ratio for performance comparisonDefining a custom probability modelComparing the performance of two return series Bayesian rolling regression for pairs tradingStochastic volatility modelsSummary
Decision trees – learning rules from dataHow trees learn and apply decision rulesDecision trees in practiceThe data – monthly stock returns and featuresBuilding a regression tree with time-series dataBuilding a classification treeVisualizing a decision treeEvaluating decision tree predictionsOverfitting and regularizationHow to regularize a decision treeDecision tree pruningHyperparameter tuningUsing GridsearchCV with a custom metricHow to inspect the tree structureComparing regression and classification performanceDiagnosing training set size with learning curvesGaining insight from feature importanceStrengths and weaknesses of decision treesRandom forests – making trees more reliableWhy ensemble models perform betterBootstrap aggregationHow bagging lowers model varianceBagged decision treesHow to build a random forestHow to train and tune a random forestFeature importance for random forestsOut-of-bag testingPros and cons of random forestsLong-short signals for Japanese stocksThe data – Japanese equitiesThe features – lagged returns and technical indicatorsThe outcomes – forward returns for different horizonsThe ML4T workflow with LightGBMFrom universe selection to hyperparameter tuningSampling tickers to speed up cross-validationDefining lookback, lookahead, and roll-forward periodsHyperparameter tuning with LightGBMCross-validating signals over various horizonsAnalyzing cross-validation performanceEnsembling forecasts – signal analysis using AlphalensThe strategy – backtest with ZiplineIngesting Japanese Equities into ZiplineRunning an in- and out-of-sample strategy backtestThe results – evaluation with pyfolioSummary
Getting started – adaptive boostingThe AdaBoost algorithmUsing AdaBoost to predict monthly price movesGradient boosting – ensembles for most tasksHow to train and tune GBM modelsEnsemble size and early stoppingShrinkage and learning rateSubsampling and stochastic gradient boostingHow to use gradient boosting with sklearnHow to tune parameters with GridSearchCVParameter impact on test scoresHow to test on the holdout setUsing XGBoost, LightGBM, and CatBoostHow algorithmic innovations boost performanceSecond-order loss function approximationSimplified split-finding algorithmsDepth-wise versus leaf-wise growthGPU-based trainingDART – dropout for additive regression treesTreatment of categorical featuresAdditional features and optimizationsA long-short trading strategy with boostingGenerating signals with LightGBM and CatBoostFrom Python to C++ – creating binary data formatsHow to tune hyperparametersHow to evaluate the resultsInside the black box – interpreting GBM resultsFeature importancePartial dependence plotsSHapley Additive exPlanationsBacktesting a strategy based on a boosting ensembleLessons learned and next stepsBoosting for an intraday strategyEngineering features for high-frequency dataMinute-frequency signals with LightGBMEvaluating the trading signal qualitySummary
Dimensionality reductionThe curse of dimensionalityLinear dimensionality reductionPrincipal component analysisIndependent component analysisManifold learning – nonlinear dimensionality reductiont-distributed Stochastic Neighbor EmbeddingUniform Manifold Approximation and ProjectionPCA for tradingData-driven risk factorsPreparing the data – top 350 US stocksRunning PCA to identify the key return driversEigenportfoliosClusteringk-means clusteringAssigning observations to clustersEvaluating cluster qualityHierarchical clusteringDifferent strategies and dissimilarity measuresVisualization – dendrogramsDensity-based clusteringDBSCANHierarchical DBSCANGaussian mixture modelsHierarchical clustering for optimal portfoliosHow hierarchical risk parity worksBacktesting HRP using an ML trading strategyEnsembling the gradient boosting model predictionsUsing PyPortfolioOpt to compute HRP weightsPerformance comparison with pyfolioSummary
ML with text data – from language to featuresKey challenges of working with text dataThe NLP workflowParsing and tokenizing text data – selecting the vocabularyLinguistic annotation – relationships among tokensSemantic annotation – from entities to knowledge graphsLabeling – assigning outcomes for predictive modelingApplicationsFrom text to tokens – the NLP pipelineNLP pipeline with spaCy and textacyParsing, tokenizing, and annotating a sentenceBatch-processing documentsSentence boundary detectionNamed entity recognitionN-gramsspaCy's streaming APIMulti-language NLPNLP with TextBlobStemmingSentiment polarity and subjectivityCounting tokens – the document-term matrixThe bag-of-words modelCreating the document-term matrixMeasuring the similarity of documentsDocument-term matrix with scikit-learnUsing CountVectorizerTfidfTransformer and TfidfVectorizerKey lessons instead of lessons learnedNLP for tradingThe naive Bayes classifierBayes' theorem refresherThe conditional independence assumptionClassifying news articlesSentiment analysis with Twitter and Yelp dataBinary sentiment classification with Twitter dataMulticlass sentiment analysis with Yelp business reviewsSummary
Learning latent topics – Goals and approachesHow to implement LSI using sklearnStrengths and limitationsProbabilistic latent semantic analysisHow to implement pLSA using sklearnStrengths and limitationsLatent Dirichlet allocationHow LDA worksThe Dirichlet distributionThe generative modelReverse engineering the processHow to evaluate LDA topicsPerplexityTopic coherenceHow to implement LDA using sklearnHow to visualize LDA results using pyLDAvisHow to implement LDA using GensimModeling topics discussed in earnings callsData preprocessingModel training and evaluationRunning experimentsTopic modeling for with financial newsSummary
How word embeddings encode semanticsHow neural language models learn usage in contextword2vec – scalable word and phrase embeddingsModel objective – simplifying the softmaxAutomating phrase detectionEvaluating embeddings using semantic arithmetic How to use pretrained word vectorsGloVe – Global vectors for word representationCustom embeddings for financial newsPreprocessing – sentence detection and n-gramsThe skip-gram architecture in TensorFlow 2Noise-contrastive estimation – creating validation samplesGenerating target-context word pairsCreating the word2vec model layersVisualizing embeddings using TensorBoardHow to train embeddings faster with Gensimword2vec for trading with SEC filingsPreprocessing – sentence detection and n-gramsAutomatic phrase detectionLabeling filings with returns to predict earnings surprisesModel trainingModel evaluationPerformance impact of parameter settingsSentiment analysis using doc2vec embeddingsCreating doc2vec input from Yelp sentiment dataTraining a doc2vec modelTraining a classifier with document vectorsLessons learned and next stepsNew frontiers – pretrained transformer modelsAttention is all you need BERT – towards a more universal language modelKey innovations – deeper attention and pretrainingUsing pretrained state-of-the-art modelsTrading on text data – lessons learned and next stepsSummary
Deep learning – what's new and why it mattersHierarchical features tame high-dimensional dataDL as representation learningHow DL extracts hierarchical features from dataGood and bad news – the universal approximation theoremHow DL relates to ML and AIDesigning an NNA simple feedforward neural network architectureKey design choicesHidden units and activation functionsOutput units and cost functionsHow to regularize deep NNsParameter norm penaltiesEarly stoppingDropoutTraining faster – optimizations for deep learningStochastic gradient descentMomentumAdaptive learning ratesSummary – how to tune key hyperparametersA neural network from scratch in PythonThe input layerThe hidden layerThe output layerForward propagationThe cross-entropy cost functionHow to implement backprop using PythonHow to compute the gradientThe loss function gradientThe output layer gradientsThe hidden layer gradientsPutting it all togetherTraining the networkPopular deep learning librariesLeveraging GPU accelerationHow to use TensorFlow 2How to use TensorBoardHow to use PyTorch 1.4How to create a PyTorch DataLoaderHow to define the neural network architectureHow to train the modelHow to evaluate the model predictionsAlternative optionsApache MXNetMicrosoft Cognitive Toolkit (CNTK)FastaiOptimizing an NN for a long-short strategyEngineering features to predict daily stock returnsDefining an NN architecture frameworkCross-validating design options to tune the NNEvaluating the predictive performanceBacktesting a strategy based on ensembled signalsEnsembling predictions to produce tradeable signalsEvaluating signal quality using AlphalensBacktesting the strategy using ZiplineHow to further improve the resultsSummary
How CNNs learn to model grid-like dataFrom hand-coding to learning filters from dataHow the elements of a convolutional layer operateThe convolution stage – extracting local featuresThe detector stage – adding nonlinearityThe pooling stage – downsampling the feature mapsThe evolution of CNN architectures – key innovationsPerformance breakthroughs and network sizeLessons learnedCNNs for satellite images and object detectionLeNet5 – The first CNN with industrial applications"Hello World" for CNNs – handwritten digit classificationDefining the LeNet5 architectureTraining and evaluating the modelAlexNet – reigniting deep learning researchPreprocessing CIFAR-10 data using image augmentationDefining the model architectureComparing AlexNet performanceTransfer learning – faster training with less dataAlternative approaches to transfer learningBuilding on state-of-the-art architecturesTransfer learning with VGG16 in practiceClassifying satellite images with transfer learningObject detection and segmentationObject detection in practicePreprocessing the source imagesTransfer learning with a custom final layer Creating a custom loss function and evaluation metricsFine-tuning the VGG16 weights and final layerLessons learnedCNNs for time-series data – predicting returnsAn autoregressive CNN with 1D convolutionsPreprocessing the dataDefining the model architectureModel training and performance evaluationCNN-TA – clustering time series in 2D formatCreating technical indicators at different intervalsComputing rolling factor betas for different horizonsFeatures selecting based on mutual informationHierarchical feature clusteringCreating and training a convolutional neural networkAssembling the best models to generate tradeable signalsBacktesting a long-short trading strategySummary and lessons learnedSummary
How recurrent neural nets workUnfolding a computational graph with cyclesBackpropagation through timeAlternative RNN architecturesOutput recurrence and teacher forcingBidirectional RNNsEncoder-decoder architectures, attention, and transformersHow to design deep RNNsThe challenge of learning long-range dependenciesLong short-term memory – learning how much to forgetGated recurrent unitsRNNs for time series with TensorFlow 2Univariate regression – predicting the S&P 500How to get time series data into shape for an RNNHow to define a two-layer RNN with a single LSTM layerTraining and evaluating the modelRe-scaling the predictionsStacked LSTM – predicting price moves and returnsPreparing the data – how to create weekly stock returnsHow to create multiple inputs in RNN formatHow to define the architecture using Keras' Functional APIPredicting returns instead of directional price movesMultivariate time-series regression for macro dataLoading sentiment and industrial production dataMaking the data stationary and adjusting the scaleCreating multivariate RNN inputsDefining and training the modelRNNs for text data LSTM with embeddings for sentiment classificationLoading the IMDB movie review dataDefining embedding and the RNN architectureSentiment analysis with pretrained word vectorsPreprocessing the text dataLoading the pretrained GloVe embeddingsDefining the architecture with frozen weightsPredicting returns from SEC filing embeddingsSource stock price data using yfinancePreprocessing SEC filing dataPreparing data for the RNN modelBuilding, training, and evaluating the RNN modelLessons learned and next stepsSummary
Autoencoders for nonlinear feature extractionGeneralizing linear dimensionality reductionConvolutional autoencoders for image compressionManaging overfitting with regularized autoencodersFixing corrupted data with denoising autoencodersSeq2seq autoencoders for time series featuresGenerative modeling with variational autoencoders Implementing autoencoders with TensorFlow 2How to prepare the dataOne-layer feedforward autoencoderDefining the encoderDefining the decoderTraining the modelEvaluating the resultsFeedforward autoencoder with sparsity constraintsDeep feedforward autoencoderVisualizing the encodingConvolutional autoencodersDenoising autoencodersA conditional autoencoder for tradingSourcing stock prices and metadata informationComputing predictive asset characteristicsCreating the conditional autoencoder architectureLessons learned and next stepsSummary
Creating synthetic data with GANsComparing generative and discriminative modelsAdversarial training – a zero-sum game of trickeryThe rapid evolution of the GAN architecture zooDeep convolutional GANs for representation learningConditional GANs for image-to-image translationGAN applications to images and time-series dataCycleGAN – unpaired image-to-image translationStackGAN – text-to-photo image synthesisSRGAN – photorealistic single image super-resolutionSynthetic time series with recurrent conditional GANsHow to build a GAN using TensorFlow 2Building the generator networkCreating the discriminator networkSetting up the adversarial training processDefining the generator and discriminator loss functionsThe core – designing the training stepPutting it together – the training loopEvaluating the resultsTimeGAN for synthetic financial dataLearning to generate data across features and timeCombining adversarial and supervised trainingThe four components of the TimeGAN architectureJoint training of an autoencoder and adversarial networkImplementing TimeGAN using TensorFlow 2Preparing the real and random input seriesCreating the TimeGAN model componentsTraining phase 1 – autoencoder with real dataTraining phase 2 – supervised learning with real dataTraining phase 3 – joint training with real and random dataGenerating synthetic time seriesEvaluating the quality of synthetic time-series dataAssessing diversity – visualization using PCA and t-SNEAssessing fidelity – time-series classification performanceAssessing usefulness – train on synthetic, test on realLessons learned and next stepsSummary
Elements of a reinforcement learning systemThe policy – translating states into actionsRewards – learning from actionsThe value function – optimal choice for the long runWith or without a model – look before you leap?How to solve reinforcement learning problemsKey challenges in solving RL problemsCredit assignmentExploration versus exploitationFundamental approaches to solving RL problemsSolving dynamic programming problemsFinite Markov decision problemsSequences of states, actions, and rewardsValue functions – how to estimate the long-run rewardThe Bellman equationsFrom a value function to an optimal policyPolicy iterationValue iterationGeneralized policy iterationDynamic programming in PythonSetting up the gridworldComputing the transition matrixImplementing the value iteration algorithmDefining and running policy iterationSolving MDPs using pymdptoolboxLessons learnedQ-learning – finding an optimal policy on the goExploration versus exploitation – ε-greedy policyThe Q-learning algorithmHow to train a Q-learning agent using PythonDeep RL for trading with the OpenAI GymValue function approximation with neural networksThe Deep Q-learning algorithm and extensions(Prioritized) Experience replay – focusing on past mistakesThe target network – decorrelating the learning processDouble deep Q-learning – decoupling action and predictionIntroducing the OpenAI GymHow to implement DDQN using TensorFlow 2Creating the DDQN agentAdapting the DDQN architecture to the Lunar Lander Memorizing transitions and replaying the experienceSetting up the OpenAI environmentKey hyperparameter choicesLunar Lander learning performanceCreating a simple trading agentHow to design a custom OpenAI trading environmentDesigning a DataSource classThe TradingSimulator classThe TradingEnvironment classRegistering and parameterizing the custom environmentDeep Q-learning on the stock marketAdapting and training the DDQN agentBenchmarking DDQN agent performanceLessons learnedSummary
Key takeaways and lessons learnedData is the single most important ingredientThe new oil? Quality control for raw and intermediate dataData integration – the whole exceeds the sum of its partsDomain expertise – telling the signal from the noiseML is a toolkit for solving problems with dataModel diagnostics help speed up optimizationMaking do without a free lunchManaging the bias-variance trade-offDefining targeted model objectivesThe optimization verification testBeware of backtest overfittingHow to gain insights from black-box modelsML for trading in practiceData management technologiesDatabase systemsBig data technologies – from Hadoop to SparkML toolsOnline trading platformsQuantopianQuantConnectQuantRocketConclusion
Common alpha factors implemented in TA-LibA key building block – moving averagesSimple moving average Exponential moving averageWeighted moving averageDouble exponential moving averageTriple exponential moving averageTriangular moving averageKaufman adaptive moving averageMESA adaptive moving averageVisual comparison of moving averagesOverlap studies – price and volatility trendsBollinger BandsParabolic SARMomentum indicatorsAverage directional movement indicatorsAroon OscillatorBalance of powerCommodity channel indexMoving average convergence divergenceStochastic relative strength indexStochastic oscillatorUltimate oscillatorVolume and liquidity indicatorsChaikin accumulation/distribution line and oscillatorOn-balance volumeVolatility indicatorsAverage true rangeNormalized average true rangeFundamental risk factorsWorldQuant's quest for formulaic alphasCross-sectional and time-series functionsFormulaic alpha expressionsAlpha 001Alpha 054Bivariate and multivariate factor evaluationInformation coefficient and mutual informationFeature importance and SHAP valuesComparison – the top 25 features for each metricFinancial performance – Alphalens

Content preview from Machine Learning for Algorithmic Trading - Second Edition

18 CNNs for Financial Time Series and Satellite Images

In this chapter, we introduce the first of several specialized deep learning architectures that we will cover in Part 4. Deep convolutional neural networks (CNNs) have enabled superhuman performance in various computer vision tasks such as classifying images and video and detecting and recognizing objects in images. CNNs can also extract signals from time-series data that shares certain characteristics with image data and have been successfully applied to speech recognition (Abdel-Hamid et al. 2014). Moreover, they have been shown to deliver state-of-the-art performance on time-series classification across various domains (Ismail Fawaz et al. 2019).

CNNs are named after a linear algebra ...