Developer

Web Scraper

Generates browser console scripts to extract data from paginated websites. Accumulates results across pages via localStorage, then processes and cleans the output.

/web-scrape Free
X-Ray: what this skill can and cannot do
Shell access node (runs processing scripts)
Network calls Yes (image downloads with User-Agent)
File writes Yes (scripts, JSON, images)
File reads User-specified files only
Destructive ops No
Credential access No
Scope Project only

Why this exists

Many websites have paginated data (portfolios, listings, directories) with no export button. You need a way to extract it without setting up Puppeteer, Selenium, or a full scraping framework. This skill generates a browser console script you paste on each page. It accumulates data in localStorage as you navigate, then downloads everything as clean JSON.

No dependencies. No login tokens. No headless browser. Just paste, navigate, paste, download.

How it works

Open DevTools
Paste Script
Navigate Pages
Download JSON
Process & Use
  1. You tell Claude what to scrape Give it the URL, tell it what fields you want (title, image, price, link, etc.), and how the site paginates (numbered pages, load more button, or infinite scroll). Claude asks follow-up questions if anything is unclear.
  2. Claude generates a browser console script A custom JavaScript snippet tailored to that specific site. It uses CSS selectors to find the data on the page, deduplicates automatically, and stores everything in your browser's localStorage so nothing is lost between pages.
  3. You paste the script on each page Open DevTools, paste, press Enter. Navigate to the next page. Paste again. Repeat. Each paste adds new items to the running total. You can paste on the same page twice without creating duplicates.
  4. You download the collected data When you have scraped all the pages, type downloadData() in the console. A JSON file downloads to your computer with every item collected across all pages.
  5. Claude processes and cleans the output Hand the JSON file back to Claude. It generates a Node.js script to deduplicate, normalise text, filter out incomplete entries, and optionally download all images locally. You end up with clean, ready-to-use data.

Step by step (for first-time users)

Never used browser DevTools before? Follow this exactly.

  1. Open Claude Code and type /web-scrape Claude will ask you a few questions: what website, what data you want, and how the pages work. Answer in plain English. You do not need to know CSS selectors or JavaScript.
  2. Claude gives you a script. Copy it. Select all the code Claude outputs and copy it to your clipboard (Ctrl+C on Windows, Cmd+C on Mac).
  3. Open the website you want to scrape Go to the first page of whatever you want to extract. Make sure you are logged in if the content requires it.
  4. Open DevTools Press F12 on your keyboard. A panel opens on the right or bottom of your browser. Click the tab that says Console.
  5. Paste the script and press Enter Click inside the Console area, press Ctrl+V (or Cmd+V), then press Enter. You will see a green message confirming how many items were found on this page.
  6. Go to the next page Click the "Next" button, or page 2, or scroll down if it is infinite scroll. The page changes. Your data is safely stored.
  7. Paste the script again and press Enter Same thing. Paste, Enter. The counter goes up. New items are added to your total.
  8. Repeat until you have covered all pages Keep going: navigate, paste, enter. The console message tells you the running total each time.
  9. Download your data When you are done, type downloadData() in the console and press Enter. A .json file downloads to your computer.
  10. Give the file back to Claude Go back to Claude Code and tell it you have the file. Claude will process it: remove duplicates, clean up the text, and give you the final output ready to use.

Console output

Each time you paste the script on a new page, you see this in your console:

Page scraped!
  New items found: 48
  Total collected: 192
  Next: go to the next page and paste this script again.

Three utility commands are always available:

  • checkCount() — see how many items you have so far
  • downloadData() — save the JSON when you are done
  • clearData() — start over if something went wrong

Data structure

The downloaded JSON contains an array of objects. Fields are configured per project. A typical output looks like this:

[
  {
    "title": "Project Name",
    "imageUrl": "https://cdn.example.com/img/project.jpg",
    "category": "Design",
    "date": "2025-11",
    "slug": "project-name",
    "link": "https://example.com/projects/project-name"
  },
  {
    "title": "Another Project",
    "imageUrl": "https://cdn.example.com/img/another.jpg",
    "category": "Illustration",
    "date": "2025-09",
    "slug": "another-project",
    "link": "https://example.com/projects/another-project"
  }
]

Use cases

Portfolio migration Extract work from an old platform when there is no export.
Directory scraping Collect business listings from paginated directories.
Content archiving Save paginated blog posts or articles before they disappear.
Price monitoring Extract product listings with prices for comparison.
Competition research Collect competitor portfolios or product catalogues.
Data collection Build datasets from paginated search results or feeds.

Honest take

What it does well: Zero setup. No npm installs, no API keys, no headless browser configuration. You paste a script, navigate pages, and download JSON. It handles deduplication automatically, so you can re-paste on the same page without creating duplicates. The localStorage pattern means your progress survives page navigation and even accidental tab closes. I built this approach for my own portfolio migration and it worked on the first try.

What it does not do: It cannot handle sites that require login (the script runs in your authenticated session, but Claude cannot log in for you). It does not bypass anti-bot protections like Cloudflare challenges. It will not work on sites that render content inside iframes or shadow DOM without adjustments. And it requires you to manually navigate each page. If you need fully automated scraping of hundreds of pages, you still want Puppeteer or Playwright.

When to use it: Any time you need data from a paginated website where the total pages are manageable (under 50 pages or so). Portfolio platforms, design contest sites, business directories, product listings, job boards. If you can see the data in your browser, this can extract it.

Use this skill in your project

Download the .md file, drop it into .claude/skills/, and run /web-scrape.

Download .md
Share
X LinkedIn Reddit