Protecting Your Android Source Code

November 23, 2018
No Comments


Imagine that you have a popular mobile app startup in the Android market. Your top secret is a breakthrough innovative ML algorithm for face recognition. You’re well past the pitch and fully funded. Now, your algorithm is developed in Java source code, and the app distributed on Google Playstore. How do you keep this extraordinary asset safe from attack and theft? How do you protect the magic that is coded into your Android source code?

We will answer these questions and more with this in-depth exploration on protecting Android source code. Along the way, we will take a sobering look at several potentially unsettling facts. Did you know, for example, that hacker tools and developer tools are essentially the same? The most popular reverse-engineering tools like Olly and IDA are also marketed as “debuggers.” In order to prevent attack, we need to understand the ways and weapons of the attackers. We need to know our adversaries.

A “rebuilder,” for example, is a hacker who downloads your app from Google and converts it back into human-readable form. Reengineering, as it is called, is the widespread practice of converting compiled machine code back into source code. And as you know, source code reveals a company’s intellectual property, and oftentimes a bevy of authentication credentials. We can no longer avoid these subjects because they are tedious or “too technical.” It’s high time to take an unmitigated look at Android source code vulnerabilities and how to resolve them.

Achilles Heel of All Tech Intellectual Property: Android Source Code

A vast majority of technology-based intellectual property is now built into Android devices in the form of mobile web apps. Every significant enterprise offers its product or service on the Android platform, including financial, medical, and other data-sensitive businesses. Implementation of these technologies always begins with source code.

Source code is the easiest target of attack because it is human-readable. However, compiled machine code which is running on a device is also vulnerable, because it can be decompiled into source code. Therefore, we will explore these two major areas of vulnerability in Android app deployment:

  • Cleartext source code in repositories and build scripts for CI
  • Machine code which is decompiled to source code

Ultimately, both of these are source code which can reveal a company’s proprietary technology and authentication credentials. Each is exploited by hackers using a variety of unique methods. Let’s examine these methods thoroughly.

Protecting android source code - The IDA “Debugger” at work decompiling machine code!

The IDA “Debugger” at work decompiling machine code!

Vulnerability in Source Code Repositories

In spite of recent security breaches at important companies like Uber, many developers continue to operate continuous integration pipelines built from unencrypted source code files stored in popular repositories like GIT. Although security breaches occasionally make news headlines, Android developer culture remains somewhat intransigent to respond. This intransigence has two important causes where Android source code and data security are concerned.

The first cause is endemic to a weakness in the way CI pipelines are scripted by developers. Common methods used every day by developers in building CI pipelines result in Android source code security vulnerabilities and the possibility of access by unauthorized users. These security vulnerabilities often arise from the following issues:

  • Unguarded source code (Java and Kotlin)
  • Hard coded or scripted authentication credentials
  • Shared developer credentials (encryption keys, API keys)
  • Memory safety errors (see Ownership in Rust compiler)
  • Agile and DevOps development methods

Interestingly, while the top four factors are directly related to coding practices, the fifth may turn out to be the more ominous indirect causal factor involved. Entrenched and competitive Agile methods may be the immovable object faced by the tech-savvy startup CEO determined to improve security. But let’s look at the mechanics first. Because within the Android developer community, there is a unique guard at the gate…

Special Significance of “Unguarded” Source Code

Since most Android source code is written in Java, let’s have a quick review of a typical development cycle in the life an an Android app. This will equip us to understand the security features needed to protect Android source code.

Most developers write apps for Android devices in the Java language with the Android Studio IDE. Android Studio is based on IntelliJ IDEA software by JetBrains. Android Studio figures among the most popular editing and development frameworks and is specifically intended for Android app development. Source code is written in the IDE and then compiled to a machine language called Java Bytecode.

JVM – The Primordial Magic

The original source code is readable to human beings – developers and hackers in particular – while Java Bytecode is not readable to humans. Bytecode looks encrypted at a glance. But Java Bytecode is NOT encrypted, and this is one of the most important factors in our discussion on Android source code security.

In fact, it is with unguarded Java Bytecode that the rebuilder finds a first point of entry – an obscure door to wedge a digital foot into. And this is also where the dreaded decompiler makes a foray. But let’s have a few more details about the next step in the normal development cycle before we study the deviants!

After our app source code is compiled to Java Bytecode it is now fully portable, which means that our app will run on any device with a JVM – the Java Virtual Machine. The JVM is the primordial magic which gives rise to universal compatibility for Android devices.

However, this universality also means that Android apps are familiar territory to hackers. And the fact that Android OS is also open source means that hackers and rebuilders know a lot about the standard and default characteristics of our apps before we even write the first line of source code! This likewise implies Android APK security concerns. Code from Android APK is also familiar ground to rebuilders. We must also remember that Java itself is an older language compiler which does not manifest the most modern security concepts…

Enter the New Kotlin Language

Kotlin is a new programming language which also runs on the Java Virtual Machine. But it can also be transpiled to JavaScript! Kotlin also supports the LLVM compiler infrastructure, which implies additional security robustness.

The transition to Kotlin from Java will be smooth for developers, as Kotlin implements the stand Java Class Library and the collections framework. Kotlin implements aggressive type inference which has benefits for resolving memory safety issues. Here we see parallels with the memory safety focus of Rust’s Ownership. Kotlin includes @Nullable and @NonNull variable and data types to avoid NullPointerExceptions. And Kotlin has a variety of language components to assist developers in avoiding typical coding mistakes. As we have seen, coding errors can lead to larger scale security vulnerabilities. And so modern compilers like Kotlin enforce greater security standards. Beyond coding, how can you further guard the security of your Android Source code?

The Legacy of ProGuard

Now that we have covered the core mechanics, it’s time to look at essential methods to prevent hackers – especially rebuilders – from reverse engineering our algorithms. An open source security focused dev tool called ProGuard has been around and in continuous use by Android developers for 16 years. In fact, several paid versions including Dexguard are built on the ProGuard platform. ProGuard enables Android developers to deploy robust security features within their Android apps. Here are a few of the actions ProGuard takes to prevent source code theft:

  • Shrink and Optimize Code
  • Obfuscate Code
  • Preverify Java classes

ProGuard acts to discover and remove unused classes, variables, methods, and any attributes or named parameters which could be used by rebuilders to identify the function and purpose of code segments. Developers give meaningful names to functions, but the JVM does not care how functions are labeled. Since Obfuscation only alters the non human readable code, it only prevents the rebuilder from reading and making sense of the function names. For example, a function in source code named RefreshCustomerBalance() might be renamed to H01a() by ProGuard, making it impossible for the rebuilder to intuit the original purpose of the call by the function name.

Additional methods used by ProGuard to secure source code include:

  • control flow analysis,
  • data-flow analysis,
  • partial evaluation,
  • static single assignment,
  • global value numbering
  • liveness analysis

Automating the Login: Unavoidable Vulnerability?

We now live in the era of continuous integration. There is a mad rush to fully engage and implement the new ideas of scripting and automating the CI pipeline all the way from developer software revision, through automation testing, and out to production in one large brushstroke! CI is the holy grail of software development. But as we will see shortly, there is a significant weakness in the way these continuous deployment pipelines are scripted today.

Continuous Integration means scripting a build from developer to customer. Ratcheting Agile and DevOps up another notch, CI creates unprecedented pressure on developers to continuously deliver innovative web apps with miraculous features and performance. According to Agile mythology this development must happen at lightning speed, or at least at a sprint. Security shortcuts are an obvious consequence.

There is such a frenzy to deliver software that today we can see web applications released to production even when the code is actually unfinished. This author was engaged contractually to revise code on a WordPress site which was sold to customers with incomplete code modules! Entire sections of experimental code, although commented out, were still to be found in software released to customers. Errors of all kinds permeated the source code files. In fact, it would be a very easy business to crack that WP plugin and get access to all member accounts and source code!

This phenomenon is among many outcomes of enterprises rushing to market under intense economic and competitive pressure. Another issue is the commonplace scripting of authentication credentials in build scripts for CI pipelines.

public static void main(String[] args) throws Exception {
    String url = "";
    String gmail = "";
    HttpUrlConnectionExample http = new HttpUrlConnectionExample();
    // make sure cookies is turn on
    CookieHandler.setDefault(new CookieManager());
    // Send "GET" request, to extract form's data.
    String page = http.GetPageContent(url);
    String postParams = http.getFormParams(page, "[email protected]","password");

    // Construct post content and send a POST request for authentication
    http.sendPost(url, postParams);

    // If successful then go to gmail.
    String result = http.GetPageContent(gmail);

Scripting code changes out to productions includes an exotic mix of components. Gradle, for example, and the customizable Android plugin for Gradle provide a flexible method to compile Android source code. Developers can script a new build to run from a code change pushed to a repo all the way through automation test suites, and package an Android app or library for production. This is one of many CI tools which are popular in the operation of Continuous Delivery pipelines today.

Agile and DevOps Response to Competitive Pressure

In reality, there is an overarching indirectly causal factor in the intransigent developer culture which leads to Android source code security weaknesses. This factor may be described as the often unrealistic goals and deadlines set by developer team leaders. In other words, the security short-cuts and coding conveniences may be either inspired or required by pressure to reach production goals. These pressures result in developer coding shortcuts which result in source code vulnerability.

We can trace much of this pressure to marketing members who likewise make unrealistic promises to deliver functionality which does not actually exist! When marketing wants to say “yes” to a customer’s every wish, the result is furious midnight coding. An outcome of midnight coding is security shortcuts. Although marketing is not the subject, a good tech-oriented startup CEO will recognize that marketing has at least an indirect role in the evolving security issue.

Extraordinary Motivation to Protect Android Source Code

It’s particularly hard to protect Android apps, because attackers know so many things in advance about an app running on Android:

  • Coding language
  • Default app UX on Android
  • Device characteristics
  • Android Studio attributes

The above list also speaks to Android security issues which extend beyond the developer realm. Because Android is an open source platform, this means there are standard Android components built into your popular app which are well-known to hackers and rebuilders even before you deploy. So many many modules known in advance means easy reverse engineering. For this reason we are compelled to consider source code and compiled executable apps equally as security risks!

The attack on the Malaysian mobile network operator back in 2017 exposed the private information of more than 46 million customers. Stolen information included names, phone numbers, addresses and even internal sim card information such as IMEI numbers. The message in the mayhem also leads to source code. While we are discussing Android source code weaknesses, the crucial issue at hand is what occurs when the source code gets out of hand. In breaches such as those at Equifax and Uber the outcome is catastrophic for company and customer alike: loss of proprietary code, loss of trust, and lawsuits!

Read other posts like this:

Trends in Data Loss Prevention (DLP)
What is DLP (Data Loss Prevention)
How to Choose a Secure Software Development Company
The Great Resignation and What it Means for Software Development and Data Security
Source Code Security Highlights of 2019 Report
Top Data Breaches of 2019: Half-Year Review