How to extract data from a website

This page explains how to do web scraping with Selenium IDE commands. Web scraping works if the data is inside the HTML of a website. If you want to extract data from a PDF, image or video you need to use visual screen scraping instead.

When to use what command?

The table belows shows the best command for each type of data extraction. Click the recommended command for more information and example code.

Data to extract is in... Command to use Comment
Visible website text, for example text in a table just like this one, or a price on website storeText
Text in input fields (input box, text area, select drop down,...) storeValue Do not confuse this command with storeEval, which is not for web scraping.
Get the status of a checkbox or radiobutton storeChecked
URL "behind" a link storeAttribute@href The same command can be used to get any attribute the HTML element has. For example, use @alt to get the "Alt" text of an image.
Page title storeTitle
Save complete web page source code XType | ${KEY_CTRL+KEY_S}* On Mac it is ${KEY_CMD+KEY_S}.
Save complete web page with images XType | ...* See Forum post: How to save the entire HTML code
Take screenshot of website captureEntirePageScreenshot* This saves the complete website as image.
Take screenshot of a web page element storeImage* This is an easy way to extract images. The other option is to download them.
Text found only website source code sourceExtract* e. g. Google Analytics ID. For text inside page comments or Javascript, this is the only option
PDF, Image, Video, Canvas OCRExtractRelative* This screen scraping command works everywhere because it works visually. The disadvantage is that it is slower than the pure HTML-based commands like storeText.
Text from outside the web page OCRExtractRelative* For example, if you want to extract data from a browser extension or a desktop app

(*) These commands are only available in the Kantu Selenium IDE. They are not part of the classic Selenium IDE.

See also

Form filling with Selenium IDE (the opposite of web scraping), screen scraping (scraping with computer vision, OCR), Web Automation Extension User Manual,

Anything wrong or missing on this page? Suggestions?

...then please contact us.

Kantu Selenium IDE for Chrome and Firefox - Web Test Automation
Subscribe to the a9t9 automation software newsletter . We'll send you updates on new releases that we're working on.