Playwright Amazon Scraper: Products & Reviews (Javascript)
Olivia Novak
Dev Intern · Leapcell

Web Automation and Data Collection with Playwright (Node.js Version)
Playwright는 웹 페이지 테스트 및 자동화를 위한 라이브러리로, Chromium, Firefox, WebKit과 같은 브라우저를 지원합니다. Microsoft에서 개발되었으며 효율적이고 안정적이며 빠르기 때문에 교차 브라우저 웹 자동화 작업을 수행할 수 있습니다.
Collecting Amazon Product Information with Playwright
Playwright를 사용하여 Amazon(www.amazon.com)을 방문하여 제품 정보 및 리뷰를 크롤링하는 것과 같은 사용자 동작을 시뮬레이션할 수 있습니다. CSS 선택자 또는 XPath를 사용하면 웹 페이지 요소를 정확하게 찾아 텍스트나 속성을 추출할 수 있습니다.
Example: Crawling the Amazon Best Sellers List
Playwright를 사용하여 Amazon의 국제 베스트셀러 목록을 수집합니다. 단계는 다음과 같습니다.
- 대상 페이지 방문, 예: https://www.amazon.com/b/?ie=UTF8&node=16857165011&ref_=sv_b_3
- 모든 도서 요소 선택(클래스 이름
a-section
및a-spacing-base
사용) - 도서 요소를 반복하고 제목, 가격, 평점, 리뷰 수와 같은 정보 추출
Deploying a Playwright Example on Leapcell
Playwright Deployment Example on Leapcell
이 가이드는 Leapcell에 Playwright 테스트를 배포하는 간소화된 방법을 제공합니다. 단계별 튜토리얼은 위의 링크를 참조하십시오.
Node.js Implementation Code
다음은 Node.js 및 Playwright를 사용한 데이터 수집 구현입니다.
const { chromium } = require('playwright'); (async () => { // Launch the browser const browser = await chromium.launch({ headless: true }); const context = await browser.newContext(); const page = await context.newPage(); // Visit the Amazon search page await page.goto('https://www.amazon.com/'); // Search for the keyword "laptop" await page.fill('#twotabsearchtextbox', 'laptop'); await page.click('#nav-search-submit-button'); // Wait for the page to finish loading await page.waitForLoadState('networkidle'); // Get the list of product links const links = await page.evaluate(() => { return Array.from(document.querySelectorAll('.s-result-item h2 a')) .map(a => a.href); }); // Collect product details data const results = []; for (const link of links) { const productPage = await context.newPage(); await productPage.goto(link, { waitUntil: 'networkidle' }); const title = await productPage.textContent('#productTitle'); const rating = await productPage.textContent('#averageCustomerReviews .a-icon-alt').catch(() => 'N/A'); const reviewCount = await productPage.textContent('#acrCustomerReviewText').catch(() => 'N/A'); results.push({ title: title.trim(), rating, reviewCount }); await productPage.close(); } // Output the collected data console.log(results); // Close the browser await browser.close(); })();
Code Analysis
- Initializing Playwright: Use
chromium.launch({ headless: true })
to launch the browser. - Navigating to the Amazon Search Page: Use
page.goto()
to visit the website, fill in the search box, and submit the search. - Extracting Product Links: Use
document.querySelectorAll()
to get the URLs of all products. - Collecting Product Details:
- Open each product's page.
- Get the product title (
#productTitle
). - Get the rating (
#averageCustomerReviews .a-icon-alt
). - Get the number of reviews (
#acrCustomerReviewText
).
- Outputting Data and Closing the Browser
Code Optimization
- Error Handling: Some products may not have ratings or review counts. Use
.catch(() => 'N/A')
to prevent the code from crashing. - Automation Efficiency: Use
await context.newPage()
to reuse the context and improve page loading speed. - Avoiding Being Blocked:
- You can use proxy access (such as Playwright's
proxy
option). - You can adjust the
userAgent
to make it more like a real user.
- You can use proxy access (such as Playwright's
Using Playwright and Node.js, we can efficiently automate Amazon web page data collection, which is suitable for scenarios such as e - commerce data analysis and competitor research.
Leapcell: The Next - Gen Serverless Platform for Web Hosting, Async Tasks, and Redis
Finally, I would like to recommend the best platform for deploying Playwright: Leapcell
1. Multi - Language Support
- Develop with JavaScript, Python, Go, or Rust.
2. Deploy unlimited projects for free
- pay only for usage — no requests, no charges.
3. Unbeatable Cost Efficiency
- Pay - as - you - go with no idle charges.
- Example: $25 supports 6.94M requests at a 60ms average response time.
4. Streamlined Developer Experience
- Intuitive UI for effortless setup.
- Fully automated CI/CD pipelines and GitOps integration.
- Real - time metrics and logging for actionable insights.
5. Effortless Scalability and High Performance
- Auto - scaling to handle high concurrency with ease.
- Zero operational overhead — just focus on building.
Explore more in the documentation!
Leapcell Twitter: https://x.com/LeapcellHQ