This 66-year-old is still writing code and wants to fix bugs early in the SDLC
24 Nov 2020 Author: Wanjia Credit: InfoQ - translated from the original Chinese
In Shin-ming Liu’s 36-year career, he spent half of his time writing code. According to his estimation, the code he wrote ranges from one million to 1.5 million lines. Today, as the co-founder and chief architect of Xcalibyte, the 66-year-old still writes code, but mainly focuses on the core algorithm level. In China, it is very rare to still insist on writing code at this age.
Shin’s career as a programmer started in 1984 after he graduated from Utah State University with a master’s degree in computer science. He found a job at Daisy Systems, and formally entered the software development industry. For the next ten years, he was engaged in compiler-related work. Since 2000, he has worked at HP and Intel, in the field of compilers and performance analysis.
In 2016, Shin joined Futurewei, a US subsidiary of Huawei, as the vice president of research and development. He said “I worked at Futurewei for two years, mainly to help with digital transformation. Because Huawei’s software was previously written in Java, we had to modernize it so it was suitable for running on cloud architecture”. It was his work at Futurewei that allowed him to get acquainted with his future entrepreneurial partners. “It was a coincidence that several of our Xcalibyte founders were in senior management or architecture development at Huawei”.
In the first half of 2018, Shin returned to China to set up Xcalibyte.
1. A new opportunity in the China IT industry
Long-term experience in the software industry and rich knowledge have enabled Shin to keenly capture the opportunities in the transformation of the IT industry. He said, “If you look at the evolution of system architecture, Moore’s Law may fail in the next ten years. We have seen the rapid growth of the software industry in the first 25 years, and then the birth of cloud architecture. With this, over the past two or three years, the entire performance improvement period is now over.”
The former president of Stanford University and Turing Award winner John Hennessy once pointed out that in the next ten years, the IT industry will “transform” from a general-purpose CPU to an architecture based on various applications, that is, Domain-Specific Architecture (DSA). When many special architectures appear, the entire software system will become very complex. “This gives us a new opportunity for compilers and system software. Because in the current environment, it is difficult for you to build various complex environments, and then make the hardware performance stand out. This is a very complicated operation” Shin said.
As complexity increases, the importance of software quality is highlighted, and security vulnerabilities will continue to appear. In the past ten years, the rapid increase in security vulnerabilities and the emergence of a large number of software problems have become a shadow over the software industry. How to manage and control the rapidly increasing security issues and software quality issues has become a common problem facing the world.
Statistics from Forrester show that 82% of vulnerabilities exist in application code. Moreover, as an important part of the software world, the security of open source software is worrying: there are 14 security flaws in every 1,000 lines of open-source software code and 1 high-risk security flaw in every 1,400 lines of open-source software code. In addition, the booming open-source software has further magnified security issues. In other words, a vulnerability in an open-source software will cause other open-source software that depends on it to be affected, and layers of associated dependencies, which leads to a very hidden and complex attack surface. In China, software-related security and quality problems are extremely prominent.
2 The Banana, Monkey and Jungle Problem
In the past 20 years, with the popularity of the Internet in China, “user doctrine” has become very common. For example, when people see a good open-source product or tool, they just use it. However, Shin believes that such “usage doctrine” is problematic. There is a popular metaphor in the West: Banana, Monkey and Jungle. You wanted a banana but what you got was a monkey holding the banana and the entire jungle. In other words, you want to use ‘Banana’ but end up importing monkey then the whole things that monkey depends on like other object states and environment i.e. jungles. There is more or less the “big jungle” problem in China’s software industry. “The general situation facing China’s software industry is that very few software applications can only take bananas and nothing else if they want to eat bananas”. In fact, software all over the world has such problems. “Whether we can separate those useless things is very difficult, because everything is tied together, especially languages such as Java,” Shin said.
However, this challenge is also an opportunity for entrepreneurship. In Shin’s view, “companies that truly solve software quality problems and operate well are rare in China, and there are only a few competitors”. A good sign is that China’s software market is huge. According to statistics, in 2019, China’s software business revenue reached 7.2 trillion renminbi, a year-on-year increase of 15.4%.
For example, when a person’s monthly income is only RMB 1,000, his concern is different from that when his monthly income is RMB 10,000 or RMB 100,000. When the monthly income was only RMB 1,000, he cared about food and clothing; when the monthly income reached RMB 100,000, he would pay attention to the quality of life. Shin believes that “China’s software is going through the process from RMB 1,000 yuan to RMB 100,000, and this speed will accelerate. For example, the security scanning tools in Europe and the United States have insufficient applications for small programs.
3 Identify and resolve bugs early in the SDLC
In order to solve software quality and security issues, I wanted to build a good tool, like Xcalscan, to find defects in software, and at the same time teach users how to solve the problem. This was achieved through the use of deep compiler level technology to check data flows, analyze software applications and identify code defects. This allows the use of scientific methods to manage software development projects.
As the cornerstone of software, the security of the source code is of paramount importance. In 2014, the ‘Heartbleed’ bug swept the world, not only causing the entire Chinese Internet to tremble, but also causing more than two-thirds of the world’s websites to be affected. It places a large number of private keys and other encrypted information on the global Internet exposing numerous systems and allowing hackers to obtain data such as user passwords directly from the server. This vulnerability comes from a bug in the open-source software OpenSSL.
The National Security Agency, NSA, used this loophole to steal information from all over the world. “There are many loopholes in open-source software. If a powerful person or organization discovers a loophole, assuming that it has sufficient power and financial resources, it will do something that should not be done. If a bad person knows about it, he will do something bad. Stealing data makes you spend money to eliminate disasters.” Shin said.
With the continuous development of software, the factors that affect source code security become more complex. For example, a programmer wrote a software dedicated to picking bananas in response to customer needs. Later, another customer came and saw that this software was good, but wanted to use it to pick oranges, and the next customer wanted to use it to pick apples. This software grew bigger and bigger. When the third customer asked to pick an apple, the programmer had two choices: one case was to refactor the software, simplify the function, and write it so that it is clean. However, the most common situation is that the software has to be delivered tomorrow, and so the programmers have to rush to work. Cutting corners equals bad code, and the quality of software progressively becomes worse over time as more and more changes are made.
4 Vulnerability hunting with static code analysis tool
Static code analysis tools can come in handy when trying to resolve security issues in source code.
Static code analysis means that there is no need to run the code under test conditions, as the program can be checked only by analyzing static (uncompiled) source code for syntax, structure, process, interface, etc. Errors and defects of the code can be found such as incorrect parameters, incorrect recursion, and null pointer dereferences, etc.
Statistics show that in the entire software development life cycle, 30% to 70% of code logic design and coding defects can be discovered and repaired through static code analysis. In this regard, Shin said, “We use deep compiler-level technology to grasp all the semantics in the program so that we can accurately view the data flow and control flow. For example, where is a variable declared? Where are they used? What control flow passes through? You can check them clearly with a compiler.”
We used Xcalscan to analyze an open-source project of a well-known Chinese company and found more than 2000 code vulnerabilities. Finding the location of vulnerabilities and the vulnerability type are some of the advantages of our algorithm.
However, Shin also admitted, “Have we done all the known vulnerabilities? No, our company has only had two years to develop our solution and it is impossible to find all the vulnerabilities that have ever been seen.” Currently, most software development teams rely on dynamic testing which use test methods to detect defects and runtime errors in the software. Dynamic testing requires engineers to write and execute a large number of test cases. “Dynamic testing is to push test cases into it, and then report possible errors before the software is released.” But since dynamic testing cannot exhaust all situations, it alone is not enough to ensure software safety and reliability.
Static code analysis tools can not only quickly detect errors and verify code compliance, but also reduces the cost of remediation due to finding the bugs early in the SDLC. In the software development process, assuming that a bug is found and fixed during the development process, the cost is RMB 1. When the software is tested and then the bug is discovered and repaired, the cost comes to RMB 10. When the software is delivered to the customer and applied to its business scenario, if something goes wrong, the cost directly soars to RMB 160. Therefore, the value of static code analysis tools is obvious. Shin believes that a good static code analysis tool should have five elements: reliability, speed, efficiency, affordability and easy to use.
Currently, there are many static code analysis tools on the market, each with its own merits. For technical teams or enterprises, the key is how to choose a tool that suits them. There are generally two extremes here: some companies may choose the cheapest tool, while others will choose the most expensive tool. Shin believes that a code specification that can be customized according to business needs for product quality is of significant importance. The most suitable tool should be neither the cheapest nor the most expensive, but the one that gives the best return on value.
Getting people to adopt and use static analysis can also be a challenge depending on what type of user you are. Shin’s experience of working in large American Internet companies is worth learning from. In June this year, at the SOAP 2020 and PLDI 2020 conferences, the heads of internal static code analysis of Google and Facebook delivered keynote speeches. In 2012, they started using static code analysis tools internally, but to no avail, as few defect fixes were being reported. The companies resorted to a ‘killer trick’ where everyone’s code was submitted to a repository at the same time. The code was analysed and the results posted on a company-wide bulletin board. After launching this initiative, the repair rate of code errors jumped from 0 to 70%. “When the whole company is watching you, you will fix the errors. Later, there are fewer and fewer errors in the software. Because everyone will check for errors before submitting code.”
5 Sino-US software industry knowledge
After writing code for 36 years, Shin has his own understanding of the differences between Chinese and foreign software industries. In his view, the biggest difference between the Chinese and foreign software industries is, “I am still writing code at my age, and there is probably only me and my partner in China.” Chinese programmers generally face the “35-year-old hurdle”, while there are many older programmers in the United States. It is reported that Dennis Rich, the father of the C language, was writing code up until his passing away.
The second difference is that when software developers in the United States write code, they will consider the whole system from the application to the system hardware. Therefore, the United States can produce deep learning DNN and CNN, “Although Canadians have developed deep learning algorithms, it would be difficult to develop software without NVIDIA’s CUDA algorithm platform environment.”
Shin said, “The software developers in the United States apply and learn back and forth between hardware and software. But in China, everyone tries to only understand software problems for the sole purpose of completing the task.”
6 There is no real knocking-off time
Perhaps it is this mentality of wrapping up at work means no more thought given to coding that prevents some programmers from improving their abilities. At a certain age, a developer might not be able to keep up with the development of technology. His or her physical strength can’t match that of young people, and “the brain is not as fast as the younger generation”. Developers can’t afford to think like this. “In order to keep improving you can set small goals for yourself. This week is better than last week, better than last year, you can keep improving”. In addition, developers must have an overall assessment of themselves every year. They should stay ahead of eth latest set of freshly graduated developers.
After decades of software development, he also gave some suggestions to Chinese developers: First, developers should have self-discipline requirements. “I have a certain degree of accuracy in the goals I set for myself.” After working in the industry for many years, Shin has a principle in writing code: the code written on the day, the test is completed on the day, and the test results are written by yourself. “Tomorrow comes, and I know that the job is finished. I never worry about bugs in the job yesterday.” Second, developers can’t have a “bell hitting” mentality. “Many people think: I get a salary, I go to work, and when the time is up, I will leave work. It doesn’t matter whether I do well or not.” Shin believes that developers must demand standards from themselves. Third, you must keep pushing yourself to learn more.