Seedling
An introduction to scrape data efficiently
Planted: 2022-12-17
Last Tended: 2022-12-17
Recently, I am interested in coats. Suppose that I want to know when there are promotions or new collections available on Uniqlo website.
A basic scraping method I can do to solve the problem is using Beautifulsoup
and extracting data by inspecting HTML elements.
But this isn't an efficient method because:
- If the website contains many of data fields: I need to spend a lot of time to inspect HTML elements
- If the website doesn't load the content at the very first time: I need to use
selenium
and simulate a user's behaviour such as scrolling, click buttons
More efficient method — inspecting their APIs
(Thank my friend for guiding me)
Modern websites use APIs to get data for displaying on their pages. You can inspect APIs responses by using the web browser's inspecting tool.
for Chrome: (mac) option + cmd + i
windows: F12
What you have to do is that finding an API call that contains data you need, then I would call ‘forging’ that request and send it.
- You need to pay attention to these things:
- request headers
- request method: GET? POST?
- payload: GET's
query parameters