Extraction Scripts

Figranium offers powerful tools to extract structured data from web pages. The extraction process runs after all automation steps are complete.

Extraction Script

Every task includes an Extraction Script field in the editor. This JavaScript code runs in the browser context to parse the final page state.

Input

The script has access to:

document: The DOM of the page.
$$data.html(): A helper to get the raw HTML string (including Shadow DOM if enabled).
variables: Any runtime variables defined in the task.

Output

The script must return a value (String, Object, Array). This value is saved as the result of the execution.

JSON: Automatically formatted.
CSV: If extractionFormat is set to csv, figranium attempts to convert an array of objects to CSV.

Example: Extracting a Product List

// Get all product cards
const products = Array.from(document.querySelectorAll(".product-card"));

// Map each card to an object
const data = products.map((card) => {
  const title = card.querySelector(".title")?.innerText.trim();
  const price = card.querySelector(".price")?.innerText.trim();
  const link = card.querySelector("a")?.href;

  return { title, price, link };
});

return data; // Returns an array of objects

Example: Extracting a Single Value

const price = document.querySelector(".main-price").innerText;
return { price };

DOM cleaning

Before your extraction script runs, Figranium cleans the page HTML to remove noise and reduce payload size. This cleaned HTML is what you receive via $$data.html() and what is sent to AI providers when generating selectors or scripts.

What is removed

The following elements are stripped entirely: script, style, link, meta, noscript, svg, canvas, iframe, object, embed, applet, param, source, track

What is kept

Only attributes that are useful for extraction are preserved:

Identification: id, class, name
Content: href, src, alt, title, value, placeholder, content, datetime
Semantics: aria-label, type, for, action, method, selected, checked, disabled
Table structure: colspan, rowspan, scope
All data-* attributes (e.g., data-id, data-price, data-sku)

All other attributes are removed to keep the HTML compact and focused on extractable content.

Shadow DOM support

If your target page uses Shadow DOM (common in web components), Figranium includes shadow root content by default. Shadow roots are serialized as <template data-shadowroot="open"> elements inside the cleaned HTML, so your extraction scripts can traverse them like regular DOM nodes. You can disable Shadow DOM inclusion in the task settings if it is not needed.

AI script generation

You can have Figranium write extraction scripts for you using AI. Instead of writing JavaScript by hand, describe what you want to extract in plain language and the configured AI provider generates a working script.

How to use it

Open a task in the editor.
Click the Extraction Script block on the canvas.
In the script modal, click the Generate button (sparkle icon) at the top of the editor.
Type a description of the data you want — for example, “extract all article titles and links” or “get the price and availability from the product page”.
Press Enter or click Generate. The AI returns a script and inserts it into the editor.
Review the generated script, make any adjustments, and save.

The generation uses the AI provider fallback chain: Gemini → OpenAI → Claude → Ollama. At least one provider must be configured in Settings > System > API Keys. You can change the preferred model for each provider in Settings > System > AI Models.

The generated script is standard JavaScript that runs via page.evaluate(). You can edit it freely after generation — it is not locked or managed by AI after insertion.

Handling dynamic content

If the page loads content dynamically (AJAX), ensure your task includes wait or wait_selector actions before the extraction script runs. The script executes only after the last action completes.

CSV Formatting

If you select CSV as the output format:

Ensure your script returns an Array of Objects.
Keys in the first object become the CSV headers.
figranium handles quoting and escaping automatically.

return [
  { name: "Item 1", price: 10 },
  { name: "Item 2", price: 20 },
];
// Result:
// name,price
// "Item 1",10
// "Item 2",20

Intro

Installation

Basics

Features

Agent Logic

Integrations

API

Ops