How data wrangling enables you to take real-world data and “clean it, organize it, validate it, and put it in some format you can actually work with,” says Jarmul.
Why Python has become a preferred language for use in data science: Jarmul cites the accessibility of the language and the emergence of packages such as NumPy, pandas, SciPy, and scikit-learn.
Jarmul calls pandas “Excel on steroids” and says, “it allows you to manipulate tabular data, and transform it quite easily. For anyone using structured, tabular data, you can’t go wrong with doing some part of your analysis in pandas.”
She cites gensim and spaCy as her favorite NLP Python libraries, praising them for “the ability to just install a library and have it do quite a lot of deep learning or machine learning tasks for you.”