The JMeter Regex Extractor can pull out any piece of text from a sampler’s response, not just the bits that look like data.
Let’s say you’ve got a request that returns HTML, and you want to grab a specific id attribute from an <a> tag. Here’s a sample response:
<!DOCTYPE html>
<html>
<head>
<title>My Page</title>
</head>
<body>
<h1>Welcome!</h1>
<p>Here's a link: <a href="/items/12345">View Item</a></p>
<p>Another link: <a href="/products/abcde">Product Details</a></p>
</body>
</html>
You want to extract the href value from the first link.
The Setup
- Add a JMeter Request: This is your HTTP Request sampler that hits the URL returning the HTML above.
- Add a Regex Extractor: Right-click on the HTTP Request sampler -> Add -> Post Processors -> Regular Expression Extractor.
The Configuration
-
Name:
Extract First Link Href(or whatever makes sense) -
Apply to:
Main sample only(This tells JMeter to look at the response body of the main request, not any embedded resources). -
Field to check:
Body(We’re looking in the HTML content). -
Regular Expression:
href="(/items/\d+)"- Let’s break this down:
href=": Matches the literal stringhref=".(: Starts a capturing group. This is what we want to extract./items/: Matches the literal string/items/.\d+: Matches one or more digits (0-9). This would match12345.): Ends the capturing group.": Matches the closing double quote.
- Let’s break this down:
-
Template:
$1$- This tells JMeter to use the content of the first capturing group (
$1$) as the extracted value. If you had more capturing groups in your regex, you could use$2$,$3$, etc.
- This tells JMeter to use the content of the first capturing group (
-
Match No.:
1- This is crucial for picking which match to extract.
1: Extracts the first occurrence.2: Extracts the second occurrence.0: Extracts all occurrences into a list (you’d then access them with$1_1$,$1_2$, etc., where1is the reference name).-1: Extracts a random match.
- This is crucial for picking which match to extract.
-
Reference Name:
first_link_href- This is the JMeter variable name you’ll use to access the extracted value later. You’ll use
${first_link_href}.
- This is the JMeter variable name you’ll use to access the extracted value later. You’ll use
Running it
When JMeter runs this, it will:
- Execute the HTTP Request.
- Take the HTML response.
- Apply the regex
href="(/items/\d+)". - It finds one match:
href="/items/12345". - The capturing group
(/items/\d+)captures/items/12345. - The template
$1$uses this captured value. - The
Match No. 1ensures it’s the first match. - The value
/items/12345is stored in the variablefirst_link_href.
You can then use this variable in subsequent requests, assertions, or log it with a Debug Sampler.
What if the link format changes slightly?
Let’s say the link could also be /products/xyz and you want to capture any href that starts with / and contains alphanumeric characters.
-
Regular Expression:
href="(/[^"]+)"[^"]+: Matches one or more characters that are not a double quote. This is more robust for capturing the entire value within the quotes, regardless of its content, as long as it’s enclosed in quotes.
What if you want to grab the text of the link, not the href?
Consider this HTML:
<a href="/items/12345">View Item</a>
You want to extract View Item.
-
Regular Expression:
>([^<]+)<>: Matches the closing angle bracket of the opening tag.(: Starts capturing group 1.[^<]+: Matches one or more characters that are not a less-than sign. This effectively captures all text until the next HTML tag starts.): Ends capturing group 1.<: Matches the opening angle bracket of the closing tag.
-
Template:
$1$ -
Reference Name:
link_text
This extractor is incredibly powerful because it doesn’t care what the response is – JSON, XML, HTML, plain text, binary data (though regex is less useful there) – it will find the pattern and pull out the specified group. The key is crafting the right regular expression to precisely target the data you need, and then using the Match No. and Template fields to isolate and format that data.
The next hurdle is often handling cases where the expected pattern might not appear, leading to missing variables and subsequent test failures.