Chapter 8. Scaling Up on Google Cloud
In this chapter, we will work through how to scale up our entity resolution process to enable us to match large datasets in reasonable timeframes. We will use a cluster of virtual machines running in parallel on Google Cloud Platform (GCP) to divide up the workload and reduce the time taken to resolve our entities.
We will walk through how to register a new account on the Cloud Platform and how to configure the storage and compute services we will need. Once our infrastructure is ready, we will rerun our company matching example from Chapter 6, splitting both model training and entity resolution steps across a managed cluster of compute resources.
Lastly, we will check that our performance is consistent and make sure we tidy up fully, deleting the cluster and returning the virtual machines we have borrowed to ensure we don’t continue to run up any additional fees.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access