Hunting With YARA

The following are a series of mini-tutorials that can help you get the most out of your YARA rules!

Introduction

YARA is an open-source tool used for identifying and classifying malware samples. It's essentially a pattern matching tool that allows researchers to create descriptions of malware families (rules) based on strings or binary patterns. These rules can then be used with the YARA engine to scan files for matches.

In addition to the open source YARA tool, YARA has been integrated into many anti-virus and EDR products providing a way to scan endpoints in an enterprise using YARA rules.

If you are new to YARA we recommend starting with the official YARA documentation.

Your First YARA Rule

One of the key advantages of YARA is the portability of the rules. YARA rules that are developed and shared with the community can be used by any tool that supports YARA to scan for malware. In essence YARA rules are a form of portable, actionable, threat intelligence. The following are some open source YARA rule repositories used for sharing rules with the community.

YARA Use Cases

At UnpacMe we roughly divide rules into three use cases, each one with its own unique set of advantages and constraints.

Identifying specific malware families (unpacked)

These rules are used to identify specific malware families and are often used in malware processing platforms such as sandboxes to identify malware in-memory. They are often distributed via threat intelligence feeds or individual researchers who are tracking specific malware families.

Advantages

Identify specific malware families with high fidelity
Used with sandboxes and other malware processing platforms to categorize malware

Constraints

Malware must be unpacked, or the rule must be run on process memory that contains the unpacked malware
Often not useful for enterprise scanning with EDR/AV unless the tools support in-memory scanning
False positives in rules have significant impact – a false positive can lead to a misclassification of malware

Identifying malware on disk or in network traffic (packed)

In general these rules will be used with AV or EDR products to identify badness in an enterprise. They can be thought of as typical malware signatures and are generally developed to identify packed samples.

Advantages

Identify packed malware on disk
Often used to identify scripts and other early stage delivery artifacts
Used with EDR/AV to scan enterprise for malware

Constraints

Short life span – malware developers continuously change their packers to prevent detection meaning that rules matching on the packers quickly become outdated
Often incapable of identifying specific malware families with high fidelity
False positives in rules have significant impact – often these rules are used to trigger alerts in SIEM/SOC platforms and require analyst attention

Hunting (malware characteristics)

These rules are used for hunting potentially malicious files and can tolerate a high false positive rate. Often the rules will focus on a specific characteristic that may indicate maliciousness such as embedded bitcoin addresses. There characteristics are not exclusive to malware but they are useful for hunting.

Advantages

Used to hunt for potential malware
Can be loose and developed quickly
False positives are ok

Constraints

Results must be triaged by an analyst
Low fidelity means they cannot be used for blocking or quarantine processes

Writing Efficient Rules

YARA rule efficiency generally refers to the amount of compute required to process a file with a given YARA rule. This often translates into scan time, or how slow a specific YARA scan will be. Obviously this depends on the file being scanned but rules can be crafted to give them the best possible chance at efficiency when scanning unknown files.

Though always important to consider, efficiency must sometimes take a back seat to fidelity. For example, when developing a rule to identify a specific malware family which has constantly changing binary sizes it may be less efficient but necessary to not include a size restriction in the rule.

When discussing efficiency it is important to consider how the YARA engine works under the hood. In the words of noted YARA contributor Westley Shields

I encourage people to think of YARA as a two-step process.

Step 1 is just searching for all the patterns listed in the rules, regardless of where they are in the file (with a couple of small exceptions).

Step 2 is to evaluate the conditions in each rule. In most cases the string you are searching for is independent of the condition in which it is used.

When this two step process is considered it becomes clear that attention must be payed to both the types of strings included in a rule and the conditions in the rule as both can independently effect the efficiency of the rule. It is also clear that conditions cannot be used to short-circuit poorly chosen strings except in some cases unique to the UnpacMe service which we will detail later in this post.

Though there are many small adjustments to a rule that can be used to make it more efficient the following key factors maximally influence efficiency; string choice, module use, condition order.

We also recommend the following reading for a more in-depth understanding of YARA efficiency.

String Choice

Because YARA will scan an entire file for all string patterns before applying conditions string choice can significantly influence the efficiency of a rule. To better understand what makes a good string choice it will help to understand how YARA scans for strings.

Before scanning a file YARA breaks all strings in the rule into atoms of up to four bytes in length. These atoms are chosen using an algorithm that attempts to select for the longest and most unique pattern of bytes. Once the atoms are extracted from the rule YARA scans the file using the Aho-Corasic algorithm, locating all occurrences of the atoms. For each atom that has been located YARA will then determine if the full string matches in the file where the atom was detected. It is important to note that this process occurs for all string types in a rule including hexadecimal strings, and regular expressions.

With this process in mind the following may be considered when selecting strings for a YARA rule.

Avoid short strings. The shorter the string the more likely it is to occur in multiple locations and slow down scanning. Strings shorter than an atom (4-bytes) should be avoided at all costs.
Avoid breaking hexadecimal strings into sections smaller than an atom (4-bytes). Binary strings allow a lot of flexibility with wildcards and jumps but only the bytes between these operators can be used to extract atoms.
Do not use leading or trailing wildcards in hexadecimal strings.
Avoid regular expressions. If a regular expression is required the efficiency can be improved by adding anchor text to the expression which can be used to generate an atom. Regular expressions with no fixed substrings will not generate any atoms and are the least efficient type of string.
Avoid strings with a single repeating byte. A string that only contains a single repeated byte will not generate efficient atoms regardless of its length.
Use the nocase string modifier sparingly. The nocase string modifier creates n! duplicates of the string where n is the length of the string. This can generate a lot of atoms depending on the string length.

Module Use

YARA has many powerful modules that can be used to augment rules, however modules can come at a cost. The general rule for using modules is only use a module if you are using multiple features from the module in your rule. Modules included in a rule will parse the entire file prior to applying conditions so including a module when a simple condition would suffice can be very inefficient.

A good example is the PE module pe.is_pe feature. Importing the module to check if a file is a PE is very inefficient when a simple magic byte check uint16(0) == 0x5A4D would be sufficient in most cases.

Condition Order Short-Circuit

The order of conditions matters. Conditions are evaluated from left to right and the first failed condition short-circuits the evaluation and terminates the scan. Rules can be made more efficient by placing filter conditions and conditions that are simple to evaluate first.

For example, when developing a rule to match on PE files it is efficient to start the condition logic with a simple PE magic byte check uint16(0) == 0x5A4D to filter all non-PE files prior to adding the rest of the condition logic for the rule.

UnpacMe Specific Considerations

Though compatible with the standard YARA engine the UnpacMe YARA scanner has some enhancements that allow for more control of the scanning process. These can be used to drive efficiency in rules where it would normally not be possible.

Unlike standard YARA, UnpacMe does evaluate file size prior to scanning files. This means that file sizes including in the YARA rule can be used to pre-filter files prior to scanning. Including file size limits in a rule can dramatically improve the efficiency.

UnpacMe also offers a Scan Assist setting that will dynamically tailor the scan parameters based on realtime feedback during the scan. For example, if the scan engine observes that the only EXE files are matching on a rule the engine will dynamically filter out all non-EXE files for the remainder of the scan even if the rule does not include a condition to explicitly filter non-EXE files. Scan Assist can be used to automatically improve rule efficiency.

Hunting With YARA

Introduction

Your First YARA Rule

Sharing Rules

YARA Use Cases

Identifying specific malware families (unpacked)

Identifying malware on disk or in network traffic (packed)

Hunting (malware characteristics)

Writing Efficient Rules

String Choice

Module Use

Condition Order Short-Circuit

UnpacMe Specific Considerations

Need help?