Visually Inspect And Force Decode YARA And Regex Matches Found In Both Binary And Text Data, With Colors



AVvXsEjAog6oWRmXKmapz7kx IjCUpxiu1kZFqGawD JLGEmLgU3dbwFKUYMY7SOR3oSy1N2SyEWoh 3C JcPJI0IRxCjNa1r4KUSCHpZrQEL8F w O4q5T0c2qDy532zFs9d4REMcr0DTaf IJ8whxupFrlpc0aBcg6aFYAx9JFWUpZvbMmTsO2JoK0GtdtbA=w640 h442

Visually inspect all of the regex matches (and their sexier, more cloak and dagger cousins, the YARA matches) found in binary data and/or text. See what happens when you force various character encodings upon those matched bytes. With colors.

Quick Start

pipx install yaralyzer

# Scan against YARA definitions in a file:
yaralyze --yara-rules /secret/vault/sigmunds_malware_rules.yara lacan_buys_the_dip.pdf

# Scan against an arbitrary regular expression:
yaralyze --regex-pattern 'good and evil.*of\s+\w+byte' the_crypto_archipelago.exe

# Scan against an arbitrary YARA hex pattern
yaralyze --hex-pattern 'd0 93 d0 a3 d0 [-] 9b d0 90 d0 93' one_day_in_the_life_of_ivan_cryptosovich.bin

What It Do

  1. See the actual bytes your YARA rules are matching. No more digging around copy/pasting the start positions reported by YARA into your favorite hex editor. Displays both the bytes matched by YARA as well as a configurable number of bytes before and after each match in hexadecimal and “raw” python string representation.
  2. Do the same for byte patterns and regular expressions without writing a YARA file. If you’re too lazy to write a YARA file but are trying to determine, say, whether there’s a regular expression hidden somewhere in the file you could scan for the pattern '/.+/' and immediately get a window into all the bytes in the file that live between front slashes. Same story for quotes, BOMs, etc. Any regex YARA can handle is supported so the sky is the limit.
  3. Detect the possible encodings of each set of matched bytes. The chardet library is a sophisticated library for guessing character encodings and it is leveraged here.
  4. Display the result of forcing various character encodings upon the matched areas. Several default character encodings will be forcibly attempted in the region around the match. chardet will also be leveraged to see if the bytes fit the pattern of any known encoding. If chardet is confident enough (configurable), an attempt at decoding the bytes using that encoding will be displayed.
  5. Export the matched regions/decodings to SVG, HTML, and colored text files. Show off your ASCII art.

Why It Do

The Yaralyzer’s functionality was extracted from The Pdfalyzer when it became apparent that visualizing and decoding pattern matches in binaries had more utility than just in a PDF analysis tool.

YARA, for those who are unaware1, is branded as a malware analysis/alerting tool but it’s actually both a lot more and a lot less than that. One way to think about it is that YARA is a regular expression matching engine on steroids. It can locate regex matches in binaries like any regex engine but it can also do far wilder things like combine regexes in logical groups, compare regexes against all 256 XORed versions of a binary, check for base64 and other encodings of the pattern, and more. Maybe most importantly of all YARA provides a standard text based format for people to share their ‘roided regexes with the world. All these features are particularly useful when analyzing or reverse engineering malware, whose authors tend to invest a great deal of time into making stuff hard to find.

But… that’s also all YARA does. Everything else is up to the user. YARA’s just a match engine and if you don’t know what to match (or even what character encoding you might be able to match in) it only gets you so far. I found myself a bit frustrated trying to use YARA to look at all the matches of a few critical patterns:

  1. Bytes between escaped quotes (\".+\" and \'.+\')
  2. Bytes between front slashes (/.+/). Front slashes demarcate a regular expression in many implementations and I was trying to see if any of the bytes matching this pattern were actually regexes.

YARA just tells you the byte position and the matched string but it can’t tell you whether those bytes are UTF-8, UTF-16, Latin-1, etc. etc. (or none of the above). I also found myself wanting to understand what was going in the region of the matched bytes and not just in the matched bytes. In other words I wanted to scope the bytes immediately before and after whatever got matched.

Enter The Yaralyzer, which lets you quickly scan the regions around matches while also showing you what those regions would look like if they were forced into various character encodings.

It’s important to note that The Yaralyzer isn’t a full on malware reversing tool. It can’t do all the things a tool like CyberChef does and it doesn’t try to. It’s more intended to give you a quick visual overview of suspect regions in the binary so you can hone in on the areas you might want to inspect with a more serious tool like CyberChef.

Install it with pipx or pip3. pipx is a marginally better solution as it guarantees any packages installed with it will be isolated from the rest of your local python environment. Of course if you don’t really have a local python environment this is a moot point and you can feel free to install with pip/pip3.

Run yaralyze -h to see the command line options (screenshot below).

AVvXsEhhTVoKs1Ynnu71NXgC9RVNhwRYw4Q 5fUW9z5qWgVMsHcCo3Zap86cn2FuVnVTp xCRhixghl2VZA lyCGSkevCjOWVEmzDQsN6O3OTFEXHSIl39 7Wy1MFz8lIbVInbBdIr 3SqEkM3CODltOOqZw3 bIRyogXEfAkWXTsoBCBmsY9 wcuJ1ClcFphg=w636 h640

For info on exporting SVG images, HTML, etc., see Example Output.

Configuration

If you place a filed called .yaralyzer in your home directory or the current working directory then environment variables specified in that .yaralyzer file will be added to the environment each time yaralyzer is invoked. This provides a mechanism for permanently configuring various command line options so you can avoid typing them over and over. See the example file .yaralyzer.example to see which options can be configured this way.

Only one .yaralyzer file will be loaded and the working directory’s .yaralyzer takes precedence over the home directory’s .yaralyzer.

As A Library

Yaralyzer is the main class. It has a variety of constructors supporting:

  1. Precompiled YARA rules
  2. Creating a YARA rule from a string
  3. Loading YARA rules from files
  4. Loading YARA rules from all .yara file in a directory
  5. Scanning bytes
  6. Scanning a file

Should you want to iterate over the BytesMatch (like a re.Match object for a YARA match) and BytesDecoder (tracks decoding attempt stats) objects returned by The Yaralyzer, you can do so like this:

from yaralyzer.yaralyzer import Yaralyzer

yaralyzer = Yaralyzer.for_rules_files(['/secret/rule.yara'], 'lacan_buys_the_dip.pdf')

for bytes_match, bytes_decoder in yaralyzer.match_iterator():
do_stuff()

The Yaralyzer can export visualizations to HTML, ANSI colored text, and SVG vector images using the file export functionality that comes with Rich. SVGs can be turned into png format images with a tool like Inkscape or cairosvg. In our experience they both work though we’ve seen some glitchiness with cairosvg.

PyPi Users: If you are reading this document on PyPi be aware that it renders a lot better over on GitHub. Pretty pictures, footnotes that work, etc.

Raw YARA match result:

AVvXsEhpbFyYTE Yi9UVD3mjw8gxCVw0QI8QX4 n1PHHD7JYo6pEEdo1dsEDh3taEhR2ir4N ZtxYFIwA BiCdz9URzLkWU94TECdi6lXkcpuuxZYNEWSoiOXA5l2tHDgf0ioIeNzJ3SbVZA78boR 8A60f6PrreJl3R0GaEjY9vRkK0z9ASQB myOp S2Re5A=w640 h252

Display hex, raw python string, and various attempted decodings of both the match and the bytes before and after the match (configurable):

AVvXsEjZXul0LNHq8hjE9OQPWztHkERd4Koqnfc60GTwo8bseD42mzD7ZhP6 XLH oujyf1ZkWInpIy9Oy9l ZO3Aoh5FiaZq0yb2P3Trnxaib6z oP6UN HV1b k9BtT5W4lM

Bonus: see what chardet.detect() thinks about the likelihood your bytes are in a given encoding/language:

AVvXsEg PZ0gFTUplf5CElIHpKBHqKAbkpSGuRP7 30YeMlqnGqvTEcF43 79KayY8qH19VHX2UDpX3nJkTiuZT43YPM9qC4bbkMXSt2WT2JdmF486K6QXZGRKq9endnNy hi3Sz YVs7Uoe7 xZR9VpormtJK 8QwrudumURAAteRytql3lQrBky2Fle4Usmw=w640 h500

  • highlight decodes done at chardets behest
  • deal with repetitive matches

administrator

Leave a Reply

Your email address will not be published. Required fields are marked *

fb logo
recover dogecoin from a scam
recover ethereum from a scammer
hire a hacker to hack iphone
hire a hacker to hack snapchat
hire a hacker to hack a windows computer
error: Content is protected !!