Web Scraping with Selenium IDE Commands

How to extract data from a website

This page explains how to do web scraping with Selenium IDE commands. Web scraping works if the data is inside the HTML of a website. If you want to extract data from a PDF, image or video you need to use visual screen scraping instead.

How to generate a good XPath for web scraping

The easiest method is to first record a CLICK on the element that you want to extract. This works even if the element is not a link, but e. g. a table entry. This generates a CLICK command a few suggested locators (XPath). Press the "Down" arrow in the IDE to see the full list of suggested, possible locators. Once you have decided on a good locator, then simply change the CLICK command to e. g. STORETEXT or STOREATTRIBUTE:
How to generate a good XPath for web scraping

When to use what command?

The table belows shows the best command for each type of data extraction. Click the recommended command for more information and example code.

Data to extract is in...	Command to use	Comment
Visible website text, for example text in a table just like this one, or a price on website	storeText
Text in input fields (input box, text area, select drop down,...)	storeValue	Do not confuse this command with storeEval, which is not for web scraping.
Get the status of a checkbox or radiobutton	storeChecked
URL "behind" an image	storeAttribute@href	storeAttribute \| xpath=...@href extracts the link of any element - if it has one! If that fails, consider browser automation to copy the link to the ${!clipboard} variable.
ALT text "behind" an image	storeAttribute@alt	The storeAttribute command can be used to get any attribute the HTML element has. For example, use @alt to get the "Alt" text of an image.
Page title	storeTitle
Table content: Row/Column/Cell	storeText with XPath locator	See TABLE Web Scraping or automate browser addon
Data from a list e. g. search results	Loop over storeText	See How to web scrape search results
Save complete web page source code	XType \| ${KEY_CTRL+KEY_S}*	On Mac it is ${KEY_CMD+KEY_S}.
Save complete web page with images	XType \| ${KEY_CTLR+KEY_S}*	See Forum post: How to save the entire HTML code
Take screenshot of website	captureEntirePageScreenshot*	This saves the complete website as image.
Take screenshot of a web page element	storeImage*	The element can be an image or any other web page HTML tag
Download an image from a website	saveItem*	Retrieve the image directly from the browser cache.
Text found only website source code	sourceExtract*	e. g. Google Analytics ID. For text inside page comments or Javascript, this is the only option
Extract complete website HTML with	executeScript to get entire HTML code	Useful if you want to extract the complete HTML source code of the website, e. g. as input for aiPrompt
JSON data displayed in web browser	storeText \| css=Pre \| json	A web api (e. g. OCR Api) displays JSON in the browser, not HTML. storeText with locator "css=Pre" can be used to extract it (Example: JSON scraping RPA)
PDF, Image, Video, Canvas	OCRExtractRelative and OCRExtractByTextRelative	This screen scraping command works everywhere because it works visually. The disadvantage is that it is slower than the pure HTML-based commands like storeText.
Text from outside the web page	OCRExtractRelative and OCRExtractByTextRelative	If you run these commands in desktop mode you can read data from any desktop app.

(*) These commands are only available in the Ui.Vision RPA Selenium IDE. They are not part of the classic Selenium IDE.

Anything wrong or missing on this page? Suggestions?

...then please contact us.

Web Scraping with Selenium IDE

How to extract data from a website

How to generate a good XPath for web scraping

When to use what command?

See also

Anything wrong or missing on this page? Suggestions?

Fresh from the Ui.Vision Forum: The Latest 3 Topics. Join the discussions: