17 Case study 4 solution
This section covers
- Parsing text from HTML
- Computing text similarities
- Clustering and exploring large text datasets
We have downloaded thousands of job postings by searching on this book’s table of contents for case studies 1 through 4 (see the problem statement for details). Besides the downloaded postings, we also have at our disposal two text files: resume.txt and table_of_contents.txt. The first file contains a resume draft, and the second contains the truncated table of contents used to query for job listing results. Our goal is to extract common data science skills from the downloaded job postings. Then we’ll compare these skills to our resume to determine which skills are missing. We will do so as follows:
Get Data Science Bookcamp now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.