Recommend a flexible Node.js multi-functional crawler library —— x-crawl

Question

60k

Author

Asked: November 28, 20242024-11-28T03:21:07+00:00 2024-11-28T03:21:07+00:00

Recommend a flexible Node.js multi-functional crawler library —— x-crawl

60k

x-crawl

x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files.

If you also like x-crawl, you can give the x-crawl repository a star on GitHub to support it. Thank you for your support!

Features

🔥 Asynchronous Synchronous – Just change the mode property to toggle asynchronous or synchronous crawling mode.
⚙️Multiple uses – Supports crawling dynamic pages, static pages, interface data, files and polling operations.
⚒️ Control page – Crawling dynamic pages supports automated operations, keyboard input, event operations, etc.
🖋️ Flexible writing style – The same crawling API can be adapted to multiple configurations, and each configuration method is very unique.
⏱️ Interval Crawling – No interval, fixed interval and random interval to generate or avoid high concurrent crawling.
🔄 Failed Retry – Avoid crawling failure due to short-term problems, and customize the number of retries.
➡️ Proxy Rotation – Auto-rotate proxies with failure retry, custom error times and HTTP status codes.
👀 Device Fingerprinting – Zero configuration or custom configuration, avoid fingerprinting to identify and track us from different locations.
🚀 Priority Queue – According to the priority of a single crawling target, it can be crawled ahead of other targets.
🧾 crawl log – Logs the crawl and uses colored string reminders at the terminal.
🦾 TypeScript – Own types, implement complete types through generics.

Example

Take the automatic acquisition of some photos of experiences and homes around the world every day as an example:

// 1. Import module ES/CJS import xCrawl from 'x-crawl'  // 2. Create a crawler instance const myXCrawl = xCrawl({ maxRetry: 3, intervalTime: { max: 2000, min: 1000 } })  // 3. Set the crawling task /*   Call the startPolling API to start the polling function,   and the callback function will be called every other day */ myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {   // Call the crawlPage API to crawl the page   const pageResults = await myXCrawl.crawlPage({     targets: [       'https://www.airbnb.cn/s/*/experiences',       'https://www.airbnb.cn/s/plus_homes'     ],     viewport: { width: 1920, height: 1080 }   })    // Obtain the image URL by traversing the crawled page results   const imgUrls = []   for (const item of pageResults) {     const { id } = item     const { page } = item.data     const elSelector = id === 1 ? '.i9cqrtb' : '.c4mnd7m'      // wait for the page element to appear     await page.waitForSelector(elSelector)      // Get the URL of the page image     const urls = await page.$$eval(`${elSelector} picture img`, (imgEls) =>       imgEls.map((item) => item.src)     )     imgUrls.push(...urls.slice(0, 6))      // close the page     page.close()   }    // Call crawlFile API to crawl pictures   await myXCrawl.crawlFile({ targets: imgUrls, storeDirs: './upload' }) })

running result:

Note: Please do not crawl randomly, you can check the robots.txt protocol before crawling. The class name of the website may change, this is just to demonstrate how to use x-crawl.

x-crawl

Features

Example

More

ES6 - A beginners guide - Template Literals

Understanding Higher Order Functions in JavaScript.

Build a custom video chat app with Daily and Vue.js

Samantha Carter

Ella Lewis

Isaac Anderson

Sign Up

Sign In

Forgot Password

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Latest Questions

Recommend a flexible Node.js multi-functional crawler library —— x-crawl

x-crawl

Features

Example

More

Leave an answerCancel reply

ES6 - A beginners guide - Template Literals

Understanding Higher Order Functions in JavaScript.

Build a custom video chat app with Daily and Vue.js

Samantha Carter

Ella Lewis

Isaac Anderson

Leave an answer
Cancel reply