How to Convert Alphanumeric Strings to JSON in DataWeave

DataWeave is the powerhouse of data transformation within the MuleSoft ecosystem. However, developers often face the challenge of receiving "messy" data—specifically, long alphanumeric strings where numbers and letters are mashed together. Your goal? To separate them and organize the chaos into a clean, readable JSON object.

In this guide, we will walk through a practical scenario: extracting numbers to serve as Keys and letters to serve as Values. We will start with a basic approach and then refine it to handle real-world irregularities.

The Goal: Order from Chaos

Imagine you receive a long string containing code identifiers mixed with descriptive text. We need to separate these elements so that the numeric codes map directly to the text segments.

The Initial Approach (Strict Parsing)

Here is the initial script. It assumes your data follows a strict pattern: numbers always appear in pairs (2 digits), and text always appears in triplets (3 letters).

%dw 2.0
output application/json
import * from dw::core::Strings
var num = flatten(payload filter ((character, index) ->
character matches /[0-9]/
) scan /[0-9]{2}/)
// scan /[0-9]{2}/ divides the string into an array where each item is exactly size 2

var char = flatten(payload filter ((character, index) ->
character matches /[a-zA-Z]/
) scan /[a-zA-Z]{3}/)
// scan /[a-zA-Z]{3}/ divides the string into an array where each item is exactly size 3
---
{(num map(
   ($) : char[$$]
))}

How It Works

Let's break down the logic inside this script. It uses a combination of filtering and regex scanning to tidy up the data:

Importing Tools: It imports the dw::core::Strings module to access advanced string manipulation functions.
Extracting Numbers (num):
- First, the filter function iterates through the payload, keeping only characters that match the regex /[0-9]/ (digits 0-9).
- Next, the scan function uses the regex /[0-9]{2}/. This acts like a cookie cutter, chopping the string of numbers into pairs of exactly two.
- Finally, flatten ensures the result is a single, clean list. If your input was "a12b34c56d", the num variable becomes [12, 34, 56].
Extracting Text (char):
- It applies the same logic but filters for letters using /[a-zA-Z]/.
- The scan function here uses /[a-zA-Z]{3}/, meaning it groups letters into chunks of exactly three.
- Example: If the input contained "abc", "def", etc., the char variable holds those groups, like ["abc"].
Mapping to JSON:
- The script uses map to loop through the num array.
- It assigns the current number ($) as the Key and grabs the corresponding entry from the char array (using the index $$) to serve as the Value.

The Problem: Handling Leftovers

The strict script above works perfectly for perfect data. But in the real world, data is rarely perfect.

If your string ends with a single number or a text group that isn't exactly three letters, the previous script will ignore them. The strict regex quantifiers ({2} and {3}) force DataWeave to skip anything that doesn't fit that specific size constraint.

The Solution: Flexible Regex Ranges

To ensure we capture every piece of data—even if there is an odd number or a shorter word—we need to adjust the scan regex. We will change the strict counts to ranges.

The Fix

Change scan /[0–9]{2}/ to scan /[0–9]{1,2}/. This tells DataWeave to accept 1 or 2 digits.
Change scan /[a-zA-Z]{3}/ to scan /[a-zA-Z]{1,3}/. This accepts 1, 2, or 3 letters.

The Refined Code

Here is the robust, production-ready script:

%dw 2.0
output application/json
import * from dw::core::Strings
var num = flatten(payload filter ((character, index) ->
character matches /[0-9]/
) scan /[0-9]{1,2}/)
// scan /[0-9]{1,2}/ divides the string into an array where items can be size 2 or 1

var char = flatten(payload filter ((character, index) ->
character matches /[a-zA-Z]/
) scan /[a-zA-Z]{1,3}/)
// scan /[a-zA-Z]{1,3}/ divides the string into an array where items can be size 3, 2, or 1
---
{(num map(
   ($) : char[$$]
))}

Example Walkthrough

Let's see how this improved script handles a complex input string that would have failed the previous test.

Input String:
“01020304056INDAUSENGUSACHNUK”

Output JSON:

{
  "01": "IND",
  "02": "AUS",
  "03": "ENG",
  "04": "USA",
  "05": "CHN",
  "6": "UK"
}

Analysis:
Notice the end of the data? The number 6 was a single digit, and UK was only two letters. Because we updated the regex to {1,2} and {1,3}, the script successfully captured them (creating the "6": "UK" pair) instead of discarding them.

A Note on Optimization

While the examples above utilize the matches function to filter data first, DataWeave is highly flexible. Depending on your performance needs or coding style, you can achieve similar results using different logic flows, but the logic provided here is a solid foundation for understanding string separation.