May 2020
Intermediate to advanced
404 pages
10h 52m
English
Even though the website or app corresponding to the project might resemble an ideal method of data collection, the data coming from it must not be assumed to be free of errors. Bad network requests, malicious connections, or simply garbage input provided by users can lead to data that is unfit for training. A non-malicious user may have network issues and refresh the same page 10 to 20 times in a short time frame, which should not add to the viewing-based importance of that page. All data collected from the website must be subject to cleanup and filtering based on the requirements of the model. It must be kept in mind that the challenges faced by websites will almost certainly affect the ...
Read now
Unlock full access