How we improved Packagemap query performance by 4x
Packagemap is a powerful tool that allows you to explore maps of your Java codebase. Its query language lets you filter your code and drill down to specific classes, methods, and their interactions. However, as the complexity of your queries increases, the tool’s performance can suffer.
In this blog post, we’ll discuss how we used Go profiling tools to diagnose performance issues with Packagemap’s query execution, and how we improved its performance by building a custom wildcard matcher to replace regular expression parsing.
Diagnosing the performance issues
There were several possible causes of the query and rendering slowness in Packagemap. The first step in addressing this was to pinpoint which parts of the tool were taking the most time. To do this, we used the Go CPU profiler and the built-in benchmark tests:
go test -bench=. -test.cpuprofile=CPU.prof
go tool pprof -http=localhost:8080 CPU.prof
The flamegraph shows that a significant portion of the processing time was spent in regex.MatchString
, which we knew was not ideal for performance.
Understanding how queries work in Packagemap
Packagemap’s query language supports two types of queries: prefix queries and wildcard queries.
Prefix queries match class and method names based on a specified prefix, while wildcard queries use the *
and $
operators to match any part of the name.
For example, to match any class or method in your codebase starting with my
, containing calculator
, and ending in add
, you would use the query my*calculator*add$
.
Initially, we were using regex to parse these queries and match them against class and method names in the codebase.
Improving query performance with a custom matcher
Regex matching can be slow, so we built a custom matcher to replace it. The key insight we had was that the most important part of the query was the text that we were given, rather than the wildcard operators.
To illustrate, consider the example query my*calculator*add$
.
We know that this query starts with a prefix of my
, ends with a suffix of add
, and has a single part in the middle, calculator
.
By encoding these rules into our custom matcher, we can efficiently match the query without using regex.
For example, we can match the prefix and suffix of the class or method name and trim them off, then check that the remaining text contains calculator.
If the query is more complex, such as my*calculator*device*add$
, we can still match the prefix and suffix and trim them off.
Then, we can check that the remaining text contains both calculator and device, in the correct order.
The result of these optimizations is a custom matcher that is much faster than regex. With this new implementation, we can ensure that Packagemap stays fast and responsive, no matter the size of your codebase or the complexity of your queries.
Measuring the improved performance
To validate our optimizations, we re-ran our performance tests and generated a new flamegraph. The results were impressive: the test performance time dropped from 2170ms to 450ms, a 4x speed improvement.
We can see the improvement in the updated flamegraph.
Keeping Packagemap fast and efficient
We believe that tools that work with large codebases should be fast and responsive. By replacing regex with a custom wildcard matcher, we were able to significantly improve Packagemap’s performance, making it faster and more efficient for developers to explore their codebases.
If you’re interested in exploring Packagemap you can get started for free at https://packagemap.co