Protecting C++ Source Code

January 2, 2019
No Comments

Share:

Protecting C++ Source Code & Trade Secrets

Catastrophic loss of revenue and investment is attributable to C++ source code theft today. And the weapons of this war between software developers and rebuilders are always increasing in sophistication and complexity. The crime at hand is the theft of intellectual property (IP), and it can cripple or destroy a company. Stolen IP in the form of C++ source code can be especially damaging to an enterprise whose lifeblood is a unique product or service offered as a web service or mobile app. In the case of the once wildly popular video game Half-Life, the losses were estimated in the hundreds of millions of dollars.

Whether or not such a loss can be absolutely prevented is a debate deserving a CIO’s undivided attention. The business problem of C++ source code security is complex and crosses  boundaries of perception. For example, a compiled app running on an IoT device can actually be decompiled into human readable C++ source code! In other words, a compiled app is not safe from theft. The secret formulas and patented innovations built into algorithms are easily deciphered by rebuilders today. And additional preventive measures such as obfuscation are required to protect C++ source code from theft and tampering.

Once the compiled assembly language – the machine code running on a device – is decompiled, a company’s fiercely guarded IP secrets are exposed. In one manner of perception these algorithms are the most concise way of revealing the mechanics of an operational process. This could take the form of a unique procedure for scheduling app taxi services like Grab. Such a  decompiled app may contain API keys to payment platforms and other crucial libraries bundled together for production. The cleverness encoded in that app is what gives the company its competitive edge. If that algorithm is copied, that competitive edge is lost. Half-Life lost its edge, but as we will show, this loss is often preventable.

Fortunately many developer tools are available to prevent the direct reverse engineering of machine code. Here, we will focus in depth on the particular obfuscation strategies for protecting C++ application code. We will explore the best practices for safeguarding and protecting C++ source code, and raise awareness of emerging trends among rebuilders along the way.

Motivating Awareness of C++ Source Code Theft

The Theft of Trade Secrets Clarification Act  expanded the reach of prosecutable IP theft including C++ source code and other application software services. Awareness of weaknesses in IP protection under law was raised when a Russian coder was accused of stealing C++ source code from Goldman Sachs, which amounted to the lion’s share of the company’s trading strategies – called Strats in the financial industry. And the company fought like a lion to recapture its lost secrets. The coder languished in prison through several schizophrenic court battles.

The Strats were regarded as IP by the firm, but the court’s interpretation did not concur. Ultimately, it took a literal act of congress to raise awareness of the diverse types of IP coded into software which were not then protected under US law. IP law is now more rigorous on software as IP, but firms with significant resources dedicated to C++ application development must nevertheless continue their vigilance to stop C++ source code theft both from internal and external interlopers. This is true because creative inventions defy our ability to define and protect them in advance!

Innovation itself contains the key to understanding C++ source code protection. Every great innovation must inherently challenge our standards. Because we are looking at something totally new, it follows logically that laws are not yet developed to protect the invention, except in vague terms. In the aforementioned case, vague terms did not satisfy the victim! We commission developers to invent new apps, not knowing whether they will succeed. The resulting C++ source code contains those inventions, but in a subtle way which makes it difficult to understand the diverse ways it can be lost or stolen. NDAs are not satisfying after vital trade secrets are blowing in the wind.

One important way which we observe regularly in the news is through design flaws which leave back doors to repos wide open. There are intentional and accidental back doors. Aeroflot famously left its entire source code base unsecured and open to public view on a staging server! Under pressure during Agile sprints important security details may be overlooked. Ordinary developer habits of making backup copies of their work can suddenly become nefarious and even take on the pall of criminal acts when that employee changes jobs! Inevitably developers must be trained in security compliance programming methods. How can developers systematically protect IP in the form of C++ source code?

The Goldman Sachs employee copied thousands of lines of C++ source code shortly before leaving the company for another trading firm. This was a case of C++ source code theft which absolutely altered our interpretation of software, and the subsequent reaction redefined IP forever. How can this disaster be prevented?

Beyond the Obvious Encrypted Cleartext

To begin, we will assume that every obvious effort has been taken to prevent internal and external threats. We are looking beyond the discussion about encrypting cleartext files stored in virtual private clouds for builds and testing. And that your C++ source code is protected in all stages of development prior to compilation and delivery to production.

In other words, your developers already maintain all cleartext C++ source code in an encrypted form in all storage forms and locations, and the encryption keys and OAuth keys for example are also secure and never scripted or hard coded into builds or test suites.

Recently a new standard was set by a mainframe server unit which includes encryption coprocessors and guarantees all code and data are encrypted 100% of the time. But this costly solution is not available to all, and so additional diligence is required on the part of developer teams.

Secure Coding Practices

When you open a banking app on your phone, do you ever consider that the app is actually querying data directly from your bank? In a sense, your phone is like a little ATM machine! What stops a rebuilder from decompiling the online banking mobile app and tampering with accounts? Technically it is possible of course. Naturally banks implement the highest level of security features to protect accounts and code. The best practices in protecting C++ source code begins with the developers. Here are some of the methods used by enterprises to secure code:

  • Code obfuscation
  • Code optimization
  • Pre-verification of classes
  • Don’t hardcode credentials
  • Security-centric versioning
  • Secure private repositories
  • Secure package managers

Three factor dynamic authentication is one such method to secure online data. But are you aware that the same apps often contain secret API keys and other authentication credentials? Looking at recent security breaches like Uber and Equifax you may be astonished to learn that coders routinely script continuous integration pipelines and hardcode login credentials to operate their CI pipelines automatically.

  • Functional programming paradigm
  • Function trampolining
  • Thunks – delay calculation until needed
  • Dummy Functions (actually called).
  • Debugger runtime check
  • Extra Casting and classes

Functional Programming Paradigm for Security

Although C++ is not strictly a functional programming language, FP paradigm concepts can be implemented in C++ to make code much more difficult to reverse engineer. A feature of FP, which is a declarative programming paradigm is that recursion is used rather than conditional branching. As you know, recursive algorithms are more reliable. Erlang telephony apps show the highest reliability of any running software. But recursive algorithms are more difficult to read, and for our current purposes this has the advantage of making them more difficult to reverse engineer! A couple of hallmarks of the FP paradigm are:

  • Code and data are mixed and treated as equal.
  • Recursion is used instead of conditionals.

Let’s have a look at an example of recursion in Erlang. This simple recursive algorithm computes the factorial of an input:

-export([fac/1,start/0]).
fac(N) when N == 0 -> 1;
fac(N) when N > 0 -> N*fac(N-1).
start() ->
X = fac(4),
io:fwrite("~w",[X]).

A mechanical point about recursion which makes comprehension arduous is the recursion stack. A rebuilder will need to simulate the stack in order to fully trace all possible outcomes.

Here we have a literal paradigm shift: we are talking about using secure programming practices to motivate a move toward more reliable programming outcomes! The benefits are obvious, but significant effort will be required to encourage coders to change their ways.

Thinking about Thunks!

Function trampolines and “thunks” are an important measure for reducing the options of the decompiler and rebuilder. These methods simply make it difficult to determine the absolute outcome of branches. Tail call optimization results in pushing new information onto the stack. Some compilers support this. However, if a decompiler does not know in advance the attributes of state, then the code is more difficult to interpret absolutely. Have a look at this example:

function my_range(s, e) {
    var result = [];
    while (s != e) {
        result.push(s);
        s < e ? s++ : s--;
    }
    result.push(s);
    return res;
}

Normally, when a function is called the state of the current code must be known and preserved. However, there is this special situation when the state is not preserved. This is known as tail call. With a tail call we can call a function and immediately return a value without conserving state. This may be used among other methods to defeat or disrupt the reverse engineering of C++ source code.

A related method makes use of the thunk. A thunk is another function or subprogram which injects a supplemental calculation into a subordinate function or subroutine. Thunks are reminiscent of async operations and are mostly intended to delay a calculation until a later time when the result is required. Thunks can also insert procedures strategically into other functions, and can do so without parameters. Thunks can therefore serve as obfuscators to reduce the intelligibility of a running app.

Data Obfuscation and C++ Source Code Security

Intentional obfuscation is manifestly counterintuitive to any engineering plan. The logic of computer programming is already challenging, and now we are talking about purposely rendering code unintelligible! Yet, there are already standard implementations of code which do so, and they have constructive security side effects to boot!

You may have looked at a page of HTML code that was totally generated by PHP and thought, “it’s going to be very difficult to figure out which script generated this HTML output!” Nowadays it’s very common to find most if not all of a page’s output originating from a data structure, either a JSON structure or a MySQL query, or a mixture of several. When a rebuilder looks at the code for such an app the functionality cannot be fully deduced if the data structure and contents are not available. This is one highly effective deterrent to C++ source code theft.

Technologies for Securing C++ Source Code

Every programming language implements unique logical forms. The object is to enable coders to express concepts fluently. Rust is a programming language whose compiler enforces memory integrity through a unique feature called ownership. Many such compilers concepts of today evolved from such memory safety issues that arise in C++ coding. Dangling pointers is an example of a common memory safety issue in C++ coding.

Star++ and Macmade are among the many platforms specializing in the obfuscation of C++ code. CEOs who take an active role in the development of their company’s software application and services will benefit from surveying these packages for potential use in-house.

Special Focus on Code Obfuscation

Most computer scientists agree that reverse engineering cannot be prevented absolutely. But many useful steps in that direction are possible. The most promising method for defeating the rebuilder’s effort to reverse engineer your C++ module and thus create readable C++ source code is obfuscation.  A variety of non-trivial transformations to code may render it extremely difficult to read and even incomprehensible. Apps classified as code obfuscators commonly use the following methods:

  • Layout obfuscation
  • Debug info removal
  • Identifier scrambling
  • Data structure alteration

Layout obfuscations typically alter code format and remove comments in order to make it less comprehensible. Obfuscators can detect certain debugger methods and alter their output to confuse the debugging results. Naturally, this has the side effect of making constructive debugging difficult. However, for fully tested apps this may not be an issue.

Data obfuscations may also be an option for securing an app. This includes the use of array transformations and split variable. Inheritance modification also shows successful application, and includes class splits and insertion. Yet another type, the control obfuscation, seeks to confuse the rebuilder with methods such as opaque constructs and redundant code which introduces opaque predicates and loops structures. These are nonsense structures whose only purpose is to delay and disrupt the rebuilder’s attack.

The Delicate Balancing Act

A finely tuned scale is needed to determine how much time and money should be invested in C++ source code security. If your app is popular, it’s going to be a target. If it is appealing to a subculture like Half-Life, then you can assume the secrecy of key algorithms likewise has a short half-life! A sort of triage is recommended. In extreme cases, it is actually advisable to assume that your source code has been reverse engineered and think through security measures beyond the source code theft. Banking and finance apps are among the obvious targets, and must therefore use three-factor authentication, geo-location, device recognition, and other methods to protect customer accounts beyond tampering with source code.

The diligence a rebuilder is willing to put into disassembling your app into source code is bewildering and occurs with surprising alacrity. This diligence can be matched on your team by implementing a comprehensive security management plan, including security-focused versioning platforms, for example, which scan C++ source code for API secret keys and authentication credentials, and halt deployment on discovery of a violation. Such promethean measures will actually become standard as the sophistication of decompilers increases.



Read other posts like this:


Trends in Data Loss Prevention (DLP)
What is DLP (Data Loss Prevention)
How to Choose a Secure Software Development Company
The Great Resignation and What it Means for Software Development and Data Security
Source Code Security Highlights of 2019 Report
Top Data Breaches of 2019: Half-Year Review