-30k
HomePostsUsesAbout
Seedling

An introduction to scrape data efficiently

Planted: 2022-12-17
Last Tended: 2022-12-17

Recently, I am interested in coats. Suppose that I want to know when there are promotions or new collections available on Uniqlo website.

A basic scraping method I can do to solve the problem is using Beautifulsoup and extracting data by inspecting HTML elements.

But this isn't an efficient method because:

  • If the website contains many of data fields: I need to spend a lot of time to inspect HTML elements
  • If the website doesn't load the content at the very first time: I need to use selenium and simulate a user's behaviour such as scrolling, click buttons

More efficient method — inspecting their APIs

(Thank my friend for guiding me)

Modern websites use APIs to get data for displaying on their pages. You can inspect APIs responses by using the web browser's inspecting tool.

for Chrome: (mac) option + cmd + i windows: F12

What you have to do is that finding an API call that contains data you need, then I would call ‘forging’ that request and send it.

  • You need to pay attention to these things:
    • request headers
    • request method: GET? POST?
    • payload: GET's query parameters
Made with Earl Grey 🍵 by ALTR
Based on gatsby-theme-terminal