Figranium offers powerful tools to extract structured data from web pages. The extraction process runs after all automation steps are complete.
Every task includes an Extraction Script field in the editor. This JavaScript code runs in the browser context to parse the final page state.
The script has access to:
document: The DOM of the page.
$$data.html(): A helper to get the raw HTML string (including Shadow DOM if enabled).
variables: Any runtime variables defined in the task.
Output
The script must return a value (String, Object, Array). This value is saved as the result of the execution.
- JSON: Automatically formatted.
- CSV: If
extractionFormat is set to csv, figranium attempts to convert an array of objects to CSV.
// Get all product cards
const products = Array.from(document.querySelectorAll(".product-card"));
// Map each card to an object
const data = products.map((card) => {
const title = card.querySelector(".title")?.innerText.trim();
const price = card.querySelector(".price")?.innerText.trim();
const link = card.querySelector("a")?.href;
return { title, price, link };
});
return data; // Returns an array of objects
const price = document.querySelector(".main-price").innerText;
return { price };
DOM cleaning
Before your extraction script runs, Figranium cleans the page HTML to remove noise and reduce payload size. This cleaned HTML is what you receive via $$data.html() and what is sent to AI providers when generating selectors or scripts.
What is removed
The following elements are stripped entirely:
script, style, link, meta, noscript, svg, canvas, iframe, object, embed, applet, param, source, track
What is kept
Only attributes that are useful for extraction are preserved:
- Identification:
id, class, name
- Content:
href, src, alt, title, value, placeholder, content, datetime
- Semantics:
aria-label, type, for, action, method, selected, checked, disabled
- Table structure:
colspan, rowspan, scope
- All
data-* attributes (e.g., data-id, data-price, data-sku)
All other attributes are removed to keep the HTML compact and focused on extractable content.
Shadow DOM support
If your target page uses Shadow DOM (common in web components), Figranium includes shadow root content by default. Shadow roots are serialized as <template data-shadowroot="open"> elements inside the cleaned HTML, so your extraction scripts can traverse them like regular DOM nodes.
You can disable Shadow DOM inclusion in the task settings if it is not needed.
AI script generation
You can have Figranium write extraction scripts for you using AI. Instead of writing JavaScript by hand, describe what you want to extract in plain language and the configured AI provider generates a working script.
How to use it
- Open a task in the editor.
- Click the Extraction Script block on the canvas.
- In the script modal, click the Generate button (sparkle icon) at the top of the editor.
- Type a description of the data you want — for example, “extract all article titles and links” or “get the price and availability from the product page”.
- Press Enter or click Generate. The AI returns a script and inserts it into the editor.
- Review the generated script, make any adjustments, and save.
The generation uses the AI provider fallback chain: Gemini → OpenAI → Claude → Ollama. At least one provider must be configured in Settings > System > API Keys. You can change the preferred model for each provider in Settings > System > AI Models.
The generated script is standard JavaScript that runs via page.evaluate(). You can edit it freely after generation — it is not locked or managed by AI after insertion.
Handling dynamic content
If the page loads content dynamically (AJAX), ensure your task includes wait or wait_selector actions before the extraction script runs. The script executes only after the last action completes.
If you select CSV as the output format:
- Ensure your script returns an Array of Objects.
- Keys in the first object become the CSV headers.
- figranium handles quoting and escaping automatically.
return [
{ name: "Item 1", price: 10 },
{ name: "Item 2", price: 20 },
];
// Result:
// name,price
// "Item 1",10
// "Item 2",20