Protecting Java Source Code

December 27, 2018
No Comments


The Vital Source: Protecting Java Source Code

Most new breach announcements typically feature the threat to users’ private data, such as the exposure of employee personal info at NASA, however a tech-savvy CIO recognizes an even deeper simultaneous threat: if the network is penetrated, then very likely the enterprise’s Java source code will be exposed as well. The CI / CD craze is fueling the fire with new innovations and new risks emerging daily.

As the popularity of continuous integration (CI) and continuous delivery (CD) of software explodes, the many and diverse platforms used to script and operate CI and CD pipelines likewise increase the number of exploits available to hackers. XSS attacks are now more widespread because of the complexity of these developer tools, such as Jenkins – the automation server, Docker containers, and the operation of virtual machines through platforms like VMWare. The recently discovered flaw in SQLite also proves that core components of widely used web apps still have serious security weaknesses. Why do developer platforms inherently increase Java source code security risks?

We will explore this in depth and delve into the realm of reverse engineering of Java source code as well. And we will catch a glimpse of the dark side of coding in order to understand the mind of the hacker as we survey the tools of the trade – or shall say weapons of the trade! The paradox which is perhaps most disturbing is that hacker tools and developer tools are virtually the same! In principle and concept, the most incisive reverse-engineering tools such as IDA and Olly are used by coders for debugging, and by hackers for decompiling Java source code! We will explore this territory and more to uncover the habits of the hackers and the methods to disrupt them.

Indirect Routes to Vital Java Source Code

Recent attacks most abundantly focus on weaknesses in JavaScript code on the client side of web applications. Although this may not seem directly related to the theft of Java source code, the real threat is more subtle. If unauthorized access can be achieved via an XSS attack, by passing a code snippet to an unsanitized $GET or $POST, then access to the entire server may follow and expose Java source code indirectly. For this reason, we must analyze all methods of breaching security in order to design a comprehensive solution to protect Java source code. Such indirect methods include:

● XSS attacks on client-side web app components
● SQL injection attacks on server-side components
● Scripted or hardcoded API keys and other user secret tokens
● Flaws in social engineering JavaScript, client-side
● Unsecured code repositories and…
● Open source repos which contain access to private repos

Methods of Attackers – XSS and SQL Injection

XSS and SQL injection attacks share a fairly simple conceptual method: the hacker sends a script to a function which is expecting data. If a developer fails to sanitize an input, and instead sends that input straight to SQL, it is a simple business to send an SQL command and gain access to a server through an admin account for example. XSS attacks commonly send a JavaScript code snippet to web app which is expecting data like a search parameter. For example, suppose your Google search URL looks like this:

And in the search result the user receives this:

<p>Your search for ‘insects’ returned the following results:</p>

An interloper can run any destructive function with label malicious_Script() by injecting an XSS script fragment such as this:

The consequences of XSS injection however can be substantially worse than SQL injection. Here are some of the outcomes a hacker can achieve through XSS attack:

● Read browser history and clipboard
● Hijack an account
● Spread malicious scripts
● Control victim’s browser remotely
● Exploit IoT devices and data

Word to the Wise CIO: Prevention

Preventing XSS and SQL injection is a necessary step called “sanitizing” or “escaping” JavaScript. This type of coding may require compliance training for developers. It’s not technically complicated, but it’s easy to skip on a lazy day! The point of escaping JS is to filter all inputs to make impossible the insertion of commands instead of data. This can be done easily with scripts like the following:

var sanitizeYourHTML = function (str) {
    var tempV = document.createElement('div');
    tempV.textContent = str;
    return tempV.innerHTML;

More Sophisticated Weapons – More Diligence in Prevention

If the target Java source code is securely encrypted, so that a hacker finds no direct inroads, and if the algorithm implemented in the source code is valuable, then the next weapon in principle is reverse engineering. In practical terms, a Java web app running on an Android device can be decompiled into original source code with shocking alacrity!

The sophistication of decompiler methods and technology is perhaps slightly higher than ordinary dev tools. However, ordinary developers will recognize these popular tools as debugging tools rather than hacker tools. Because these popular tools designed specifically to reverse engineer Java code are often used to debug running apps:

● AndroChef Java Decompiler
● Cavaj Java Decompiler
● DJ Java Decompiler
● JD Project

The reverse engineer, also called a “rebuilder,” who has the technical capability to download an app from Google Play Store, and use one of the tools in the list above to decompile it back into Java source code – into human-readable form. Reengineering, as it is sometimes called, is now so widespread that the weapons of the trade are marketed openly, alongside ordinary dev tools and frameworks like Laravel and Symphony. How can you prevent theft of your vital Java source code from both internal and external threats?

ProGuard and it’s Progeny

One method of prevention is to make compiled code more difficult to decompile. A popular application obfuscates function and class names prior to compiling. ProGuard is one open source obfuscation platform, and several newer paid platforms are based on ProGuard source. In addition to Java source code obfuscation, ProGuard and DexGuard also shrink and optimize Java code, and pre-verify classes prior to compiling.

Dev tools such as this discover and delete unimplemented classes before compile time. Variable names, unused methods, and other named parameters usually have meaningful names which could be helpful to a rebuilders in recognizing the functionality of code – these names are changed to generic labels.

In other words, while developers label functions in some rational schema, these names are not meaningful to the JVM. So a function like table_Resort() can be obfuscated as a1(), for example, making the decompiled source outcome more difficult to interpret. ProGuard and its progeny dev tools also generally offer a combination of these benefits:

● Unused class analysis
● Partial evaluation
● Global type numbering
● Control flow analysis
● Data-flow analysis
● Static single val assignment

We may win a battle or two with these tools, but the war will never end, unfortunately. There is now a publicly available deobfuscator offered as open source on GitHub. The sophisticated means of reverse engineering is therefore easily accessible to all coders. It is now as easy as entering a single command to decompile a running Java application:

java -jar deobfuscator.jar --config detect.yml

A Microcosm of Static Code Analysis

PMD is a well-established app for running static code analysis to detect potential security risks in Java source code. But when you first look at the rule base applied in SCA the complexity can be daunting. SCA apps can make life easy or difficult for coders depending on their knowledge of the rules. With SCA, we’re basically scanning source code for obvious exploits. Code snippets like the following frequently appear in developer forums like Stackoverflow. In this example a puzzled Java coder wonders why PMD flagged the keys[i] array as a “Law of Demeter” violation:

Fortunately, static code analyzers work as assistants. Instead of halting the process as compilers often do, they leave it to our discretion to determine the realistic probability that code like the above could become a security exploit.

Android Exposure – The Widespread Weakness

Today, effectively all technologies require an equivalent mobile app deployment. In other words, if there is a successful implementation of intellectual property written in Java, then it must eventually be delivered as a mobile app on Android devices. Why is this especially problematic to the cause of securing Java source code?

Because too much is known in advance about every app designed to run on the Google Android platform. Java source code compiled to Android devices is certainly the easiest target of attack because so much standard code must be included just to achieve the Compatibility Standard on the platform. What does a rebuilder already know about your app before even feeding it to a decompiler?

● Android operating system
● Multi-user Linux OS
● Unique Linux user ID
● Each app has virtual machine for isolation
● Manifest file
● Core framework components

…And the list goes on and on. The hacker only needs to find the logic of your app to decompile your Android app into Java source code and steal your intellectual property.

Risk of Java Source Code in Repositories

Hacker news sites feature stories of massive exposure of source code every week, often resulting from Java source code stored in both private and public repos. And yet entire teams of enterprise level developers commonly fail to comply with best practices in securing Java source code from both hackers and employee theft! A technical weakness in the way continuous integration pipelines are scripted today leads to security breaches in surprising places like GitHub. In response, many enterprises have developed their own container platforms and repos in an effort to secure their sacred proprietary Java source code.

Unencrypted Java source code maintained in repositories like GitHub is abundant, and the fact is widely known to coders. Worse still is the habit of hard coding API keys and secret user tokens into build scripts and web apps. Such scripts are also frequently stored in repos. Reluctance or laziness to change these coding habits has two important consequences for Java source code security.

CI pipelines are now commonly scripted by developers to run all the way through from a code change to delivery to end users and production. This scripted pipeline includes package managers, bundlers, compilers and scanners, virtual machines, integration platforms, as well as automation testing scripts. All of the above require authentication credentials which are coded right into the build. These credentials are often the target of the rebuilder!

The weakness in scripting CI pipelines is partly the result of the diversity of dev tools and platforms required in the build, combined with the growing assortment of authentication methods required by each. It is not always clear how to script encryption, decryption, and then re-encryption of credential when orchestrating the pipeline of continuous deployment.

CIO Versus Hacker

To sum it all up, let’s look at the attacks and the defenses to determine who is winning the war today, and to prepare for the next battle. Here are the primary threats, from both internal and external sources:

Methods of Attack and Vulnerabilities:

● Theft of unencrypted Java source code – by employee or hacker
● Scripted or coded authentication credentials: encryption keys, API keys
● Memory safety bugs which expose dangling pointers for example
● Pressure to reach Agile development goals
● Failure to comply with security-based coding practices

As you know, Java source code in particular expresses your company’s intellectual property in a form easily comprehensible to coders. We avoid these issues at the peril of our enterprise life blood. CIOs must fight to obtain the best security methods with the right balance of cost and efficacy. One of the most effective measures available today is a hardware level encryption design featured by the mainframe server z14. Let’s look at the other primary focuses of Java source code security:

Methods of Prevention

● Security-centric versioning platforms
● Hardware level encryption (z14 mainframe server)
● Class name obfuscation – protect Java code
● Source code encryption – encrypt Java code
● OS built-in methods

Security centered versioning platforms scan source code for anything that looks like a hard-coded password, API key, or any authentication credential. Upon discovery it sends an alert to a project manager with a level of urgency equal to the discovery of a bug or a virus! These versioning platforms add a bit of overhead to development cycles which may be well worth it when a hack is thwarted!

Methods of prevention traditionally have limitations and disadvantages. Generally, flow obfuscation to secure Java source code will impact the performance metrics of the application. And may also make testing and debugging more laborious. In other words, relabeling functions to make them less comprehensible may be counterproductive for field debugging. On the other hand, optimization and shrinking may improve performance. These counterbalancing factors indicate a unique security suite for each application. Indeed, the security plan may be as unique as the algorithms in the Java source code. Likewise, storage and deployment will need special attention.

Private repositories are thought to be more secure than those which host a bevy of open source code such as Git. Hacker news stories regularly feature breaches in private repos which began with leaks in open source repos. In many cases developers used familiar open source APIs and libraries without truly vetting their security. In-house repo development may be key, but it won’t be a panacea where securing Java source code is concerned. The best strategy is a balanced plan which takes into account as many vulnerabilities and preventions as possible.

Financial pressure is often extraordinarily counterproductive to security issues. If the z14 mainframe’s built-in hardware level encryption is the ultimate in securing Java source code, it may simply be too expensive with base units starting at $75,000. Here is a machine that prevents all threats, both employee theft and hacker penetration. However, compared with an ordinary server entry level cost of $8,000, that’s an ideal which may be out of reach. Yet keeping an analytical eye on such ideals serves an important purpose: we are constantly striving for improvement. Today’s lofty ideal will become tomorrow’s security standard!

Read other posts like this:

Trends in Data Loss Prevention (DLP)
What is DLP (Data Loss Prevention)
How to Choose a Secure Software Development Company
The Great Resignation and What it Means for Software Development and Data Security
Source Code Security Highlights of 2019 Report
Top Data Breaches of 2019: Half-Year Review