Throughout the course of analysis, we often come across samples containing strings in a foreign language. Strings can be time-saving low-hanging fruit, and if they’re meaningful, should be used to assist analysis whenever possible. But if they’re unintelligible, or worse, appear as jumbled bytes due to poor encoding support, analysis will often be slowed.
The IDA Translator plugin by the folks over at Kyrus tries to solve this problem. This plugin “assists in decoding arbitrary character sets in an IDA Pro database into Unicode”. It uses Google Translate for the final translation step. While this plugin is great, it has two issues. First, a Google API key is required to perform the translations, and second, an Internet-connected analysis system. While this may not be much of an obstacle to some, others have more stringent requirements regarding internet access/air gapping systems, etc…
We produced a simple, and easily modifiable, plugin for IDA to help cope with samples containing the foreign language we see the most, Simplified Chinese encoded in GB2312. Our Chinese to English translation plugin, ce_xlate, has the ability to display the GB2312 strings as UTF-8 characters in several ways, and, when combined with the CC-CEDICT from MDBG, presents possible translations to the analyst.
full_disassembly_view
Disassembler view of translation using the ce_xlate plugin.
By default, we print each symbol’s possible translation to the console and rewrite the first byte of the string with IDA’s ability to set manual instructions. There is also support to automatically add comments with each translation option next to the symbol’s bytes. Additionally, the plugin can be easily modified to handle other string encodings by using the decode() method in Python’s <str> object. Primarily though, thanks to the CC-CEDICT translation dictionary, it can be used offline in places disconnected from the internet.
The plugin can be downloaded from here. https://github.com/threatgrid/ce-xlate
The IDA Translator plugin by the folks over at Kyrus tries to solve this problem. This plugin “assists in decoding arbitrary character sets in an IDA Pro database into Unicode”. It uses Google Translate for the final translation step. While this plugin is great, it has two issues. First, a Google API key is required to perform the translations, and second, an Internet-connected analysis system. While this may not be much of an obstacle to some, others have more stringent requirements regarding internet access/air gapping systems, etc…
We produced a simple, and easily modifiable, plugin for IDA to help cope with samples containing the foreign language we see the most, Simplified Chinese encoded in GB2312. Our Chinese to English translation plugin, ce_xlate, has the ability to display the GB2312 strings as UTF-8 characters in several ways, and, when combined with the CC-CEDICT from MDBG, presents possible translations to the analyst.
full_disassembly_view
Disassembler view of translation using the ce_xlate plugin.
By default, we print each symbol’s possible translation to the console and rewrite the first byte of the string with IDA’s ability to set manual instructions. There is also support to automatically add comments with each translation option next to the symbol’s bytes. Additionally, the plugin can be easily modified to handle other string encodings by using the decode() method in Python’s <str> object. Primarily though, thanks to the CC-CEDICT translation dictionary, it can be used offline in places disconnected from the internet.
The plugin can be downloaded from here. https://github.com/threatgrid/ce-xlate