Recently, I was helping a friend design a system for matching text against a flexible system of rules. For example, I might want to know if a piece text contains the word “foo” or the word “bar”. A rule can either be a single regex pattern or a series of patterns combined with a logical operation (AND, OR, or NOT). We’ll call this first rule Basic and the second one Composite. Since the composite rule can contain both basic and other composite rules, we need a third type to represent “can be either basic or composite”. We’ll call this type Rule.

In TypeScript, these types are very easy to represent:

// A Basic rule is a single regex pattern.
type Basic = {
  pattern: string;
};

// A Composite rule combines other rules with a logical operation.
type Operation = "and" | "or" | "not";
type Composite = {
  operation: Operation;
  rules: Rule[];
};

// A Rule can be either Basic or Composite.
type Rule = Basic | Composite;

In software design terminology, Rule is a polymorphic type that can be represented by other more specific types. However, Go doesn’t (directly) support algebraic types. Instead, we have to make clever use of Go’s interface system in order to emulate them.

Algebraic Data Types Link to heading

Credit for this approach goes to Eli Bendersky’s amazing blog post: Go and Algebraic Data Types . Since we can’t define a rule as the sum of two other types, we need to flip things around a bit. Instead, we define an interface that represents a Rule and contains a single, private method named isRule:

// A Rule can be either Basic or Composite.
type Rule interface {
	isRule()
}

Now, any specific subtypes of Rule can implement this private method and Go will understand that it “is a rule”. Let’s fill in the two concrete rule subtypes: Basic and Composite:

// A Basic rule is a single regex pattern.
type Basic struct {
	Pattern string `json:"pattern"`
}

func (Basic) isRule() {}

type Operation string

const (
	OperationAnd Operation = "and"
	OperationOr  Operation = "or"
	OperationNot Operation = "not"
)

// A Composite rule combines other rules with a logical operation.
type Composite struct {
	Operation Operation `json:"operation"`
	Rules     []Rule    `json:"rules"`
}

func (Composite) isRule() {}

Now that we have defined a Basic rule, a Composite rule, and a simple string type for Operation, we can represent arbitrary combinations of rules. Here are a few examples:

// This rule matches the text "foo" or "bar".
var fooOrBar Rule = Composite{
	Operation: OperationOr,
	Rules: []Rule{
		Basic{Pattern: "foo"},
		Basic{Pattern: "bar"},
	},
}

// This rule matches "foo" and not "bar".
var fooAndNotBar Rule = Composite{
	Operation: OperationAnd,
	Rules: []Rule{
		Basic{Pattern: "foo"},
		Composite{
			Operation: OperationNot,
			Rules: []Rule{
				Basic{Pattern: "bar"},
			},
		},
	},
}

I won’t go into detail about how the evaluation of these rules is written, but feel free to check out the full code on GitHub. This post isn’t about that, though. This post is about converting these polymorphic rules to and from JSON!

Simple JSON Link to heading

Usually, encoding (marshalling) and decoding (unmarshalling) JSON in Go is quite simple. To encode a struct, just pass it to json.Marshal. To decode a struct, take the raw JSON and a pointer to the incoming type and pass them to json.Unmarshal. Here is what that looks like when dealing with a basic rule:

func main() {
    var foo Basic = Basic{
        Pattern: "foo",
    }

    encoded, err := json.Marshal(foo)
    if err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("json: %s\n", encoded)

    var decoded Basic
    err = json.Unmarshal(encoded, &decoded)
    if err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("data: %+v\n", decoded)
}

When executed, this program prints:

json: {"pattern":"foo"}
data: {Pattern:foo}

This is exactly what we expected: the Basic struct can be easily transformed to and from JSON. However, things a get a bit tricky once we introduce the polymorphic Rule type. Let’s try this experiment again with a Composite struct and see how the encoding/json package reacts.

Polymorphic JSON Link to heading

This snippet is identical to the one above except that we trying to encode and decode a Composite rule. We’ll use the fooOrBar example from earlier.

func main() {
    var fooOrBar Composite = Composite{
        Operation: OperationOr,
        Rules: []Rule{
            Basic{Pattern: "foo"},
            Basic{Pattern: "bar"},
        },
    }

    encoded, err := json.Marshal(fooOrBar)
    if err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("json: %s\n", encoded)

    var decoded Composite
    err = json.Unmarshal(encoded, &decoded)
    if err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("data: %+v\n", decoded)
}

Now, let’s see what happens:

json: {"operation":"or","rules":[{"pattern":"foo"},{"pattern":"bar"}]}
json: cannot unmarshal object into Go struct field Composite.rules of type main.Rule

Well, that didn’t work. While encoding our polymorphic Rule interface works just fine, decoding it fails because json.Unmarshal doesn’t know what to do. This is because, when encoding, the specific type and structure of each individual rule is known. When decoding, however, the encoding/json package doesn’t have enough context and knowledge of the data structure to figure out a way to correctly parse it into concrete structs. We are going to need to write some code to help fill in these gaps.

Recursive Descent Link to heading

Figuring out how to make this work took quite a bit of research. I owe credit to Kiril Karaatanassov’s go_polymorphic_json project and Alex Kalyvitis’s JSON polymorphism in Go article for guiding me toward a clean, working solution.

At a high level, I knew that the parsing process would require a few specific steps:

  1. Try to decode a basic rule
  2. If that succeeds, return it
  3. Otherwise, partially decode a composite rule
  4. Recursively parse and collect each sub-rule
  5. Construct the composite rule and return it

Step 3 was the most mysterious to me: how can you “partially decode” JSON in Go? If I know that a rule is composite, it must have two keys: operation and rules. The operation field must contain a string and rules should be an array of… other rules. But since I don’t know what those rules are yet, I need to avoid decoding them directly and instead pass their raw JSON text back into our parsing function. How can this be done?

Thankfully, the encoding/json package is back to help us out again! The critical piece of missing tech is the json.RawMessage type. From the docs:

RawMessage is a raw encoded JSON value. It implements Marshaler and Unmarshaler and can be used to delay JSON decoding or precompute a JSON encoding.

This sounds like exactly what we need. Rather than decoding directly into a Composite struct, we’ll instead define a custom type to represent a partial composite rule. The two types look pretty similar, with the latter using json.RawMessage instead of Rule:

type Composite struct {
    Operation Operation `json:"operation"`
    Rules     []Rule    `json:"rules"`
}

type PartialComposite struct {
	Operation Operation         `json:"operation"`
	Rules     []json.RawMessage `json:"rules"`
}

Okay, at this point we have everything we need to parse our polymorphic JSON structure. Let’s get recursive!

// Recursively parse a dynamic, polymorphic rule structure.
func ParseRule(data []byte) (Rule, error) {
	// 1. Try to decode a basic rule
	var basic Basic
	err := json.Unmarshal(data, &basic)
	if err != nil {
		return nil, err
	}

	// 2. If that succeeds, return it
	if basic.Pattern != "" {
		return basic, nil
	}

	// 3. Otherwise, partially decode a composite rule
	var partial struct {
		Operation Operation         `json:"operation"`
		Rules     []json.RawMessage `json:"rules"`
	}
	err = json.Unmarshal(data, &partial)
	if err != nil {
		return nil, err
	}

	// 4. Recursively parse and collect each sub-rule
	var rules []Rule
	for _, ruleData := range partial.Rules {
		rule, err := ParseRule(ruleData)
		if err != nil {
			return nil, err
		}

		rules = append(rules, rule)
	}

	// 5. Construct the composite rule and return it
	composite := Composite{
		Operation: partial.Operation,
		Rules:     rules,
	}
	return composite, nil
}

This function contains all of the logic necessary to parse all rules that the system can represent. From simple regex patterns to complex combinations of rules, this single function can handle them all. To wrap things up, let’s use ParseRule to fix the broken code from earlier (this is the last code snippet, I swear):

func main() {
    var fooOrBar Composite = Composite{
        Operation: OperationOr,
        Rules: []Rule{
            Basic{Pattern: "foo"},
            Basic{Pattern: "bar"},
        },
    }

    encoded, err := json.Marshal(fooOrBar)
    if err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("json: %s\n", encoded)

    decoded, err := ParseRule(encoded)
    if err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("data: %+v\n", decoded)
}

Instead of getting an error, we are now able to encode and decode the fooOrBar rule successfully:

json: {"operation":"or","rules":[{"pattern":"foo"},{"pattern":"bar"}]}
data: {Operation:or Rules:[{Pattern:foo} {Pattern:bar}]}

Conclusion Link to heading

This post explained and demonstrated how recursive, polymorphic data structures can be converted to and from JSON. While dealing with this sort of dynamic JSON isn’t typically regarded as one of Go’s strong suits, it is doable with a bit of recursion and partial decoding via json.RawMessage. In some ways, I kind of like how specific and verbose the code needs to be in order to wrangle these complex structures. In TypeScript, you could simply call JSON.parse and assert the resulting type. While that is certainly easier, I don’t necessarily think it is better.

With the approach shown by the ParseRule function, the incoming JSON is verified to be completely valid all the way down. Furthermore, the data we get back is a structured Rule, meaning that the rest of the code can depend on its correctness. This idea has been discussed before: see Parse, don’t validate. Due to having recently re-read that post, I had the foresight to write the function as ParseRule and not ValidateRule. The nuance is subtle but quite significant.

Thanks for reading!