Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities. We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009–2022, covering 6 117 ...
We propose a testing framework for validating static typing procedures in compilers. Our core component is a program generator suitably crafted for producing programs that are likely to trigger typing compiler bugs. One of our main contributions is that ...
Context: Software development projects increasingly adopt unit testing as a way to identify and correct program faults early in the construction process. Code that is unit tested should therefore have fewer failures associated with it. Objective: ...
Despite the substantial progress in compiler testing, research endeavors have mainly focused on detecting compiler crashes and subtle miscompilations caused by bugs in the implementation of compiler optimizations. Surprisingly, this growing body of work ...
Acquiring developer-prized practical skills, knowledge, and experiences.
Stream processing lies in the backbone of modern businesses, being employed for mission critical applications such as real-time fraud detection, car-trip fare calculations, traffic management, and stock trading. Large-scale applications are executed by ...
Incremental and parallel builds are crucial features of modern build systems. Parallelism enables fast builds by running independent tasks simultaneously, while incrementality saves time and computing resources by processing the build operations that ...
Software Heritage is the largest existing public archive of software source code and accompanying development history. It spans more than five billion unique source code files and one billion unique commits, coming from more than 80 million software ...
In order to understand the state and evolution of the entirety of open source software we need to get a handle on the set of distinct software projects. Most of open source projects presently utilize Git, which is a distributed version control system ...
GitHub projects can be easily replicated through the site's fork process or through a Git clone-push sequence. This is a problem for empirical software engineering, because it can lead to skewed results or mistrained machine learning models. We provide ...
We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the ...
Puppet is a popular computer system configuration management tool. By providing abstractions that model system resources it allows administrators to set up computer systems in a reliable, predictable, and documented fashion. Its use suffers from two ...
Motivation: In modern it systems, the increasing demand for computational power is tightly coupled with ever higher energy consumption. Traditionally, energy efficiency research has focused on reducing energy consumption at the hardware level. ...
Systematic use of proven debugging approaches and tools lets programmers address even apparently intractable bugs.
The documented Unix facilities data set provides the details regarding the evolution of 15 596 unique facilities through 93 versions of Unix over a period of 48 years. It is based on the manual transcription of early scanned documents, on the curation ...
Examining the different characteristics of open-source software in relation to security vulnerabilities, can provide the research community with findings that can lead to the development of more secure systems. We present a dataset where the reported ...
The software development process produces vast amounts of textual data expressed in natural language. Outcomes from the natural language processing community have been adapted in software engineering research for leveraging this rich textual information;...
Motivation: Even though many studies examine the energy efficiency of hardware and embedded systems, those that investigate the energy consumption of software applications are still limited, and mostly focused on mobile applications. As modern ...
Background: As evolving desktop applications continuously accrue new features and grow more complex with denser user interfaces and deeply-nested commands, it becomes inefficient to use simple heuristic processes for grouping gui commands in multi-level ...
Context: Databases are an integral element of enterprise applications. Similarly to code, database schemas are also prone to smells - best practice violations.
Objective: We aim to explore database schema quality, associated characteristics and their ...
Git repositories are an important source of empirical software engineering product and process data. Running the Git command-line tool and processing its output with other Unix tools allows the incremental construction of sophisticated data processing ...
Recent research provides evidence that effective communication in collaborative software development has significant impact on the software development lifecycle. Although related qualitative and quantitative studies point out textual characteristics of ...
Debugging is an inevitable activity in most software projects, often difficult and more time-consuming than expected, giving it the nickname the "dirty little secret of computer science." Surprisingly, we have little knowledge on how software engineers ...
Motivation: The energy efficiency of it-related products, from the software perspective, has gained vast popularity the recent years and paved a new emerging research field. However, there is limited number of research works regarding the energy ...
Cross-Site Scripting (XSS) is one of the most common web application vulnerabilities. It is therefore sometimes referred to as the “buffer overflow of the web.” Drawing a parallel from the current state of practice in preventing unauthorized native code ...
Infrastructure as Code (IaC) is the practice of specifying computing system configurations through code, and managing them through traditional software engineering methods. The wide adoption of configuration management and increasing size and complexity ...
Tracking long-term progress in engineering and applied science allows us to take stock of things we have achieved, appreciate the factors that led to them, and set realistic goals for where we want to go. We formulate seven hypotheses associated with ...
Software vulnerabilities can severely affect an organization's infrastructure and cause significant financial damage to it. A number of tools and techniques are available for performing vulnerability detection in software written in various programming ...
Examining software ecosystems can provide the research community with data regarding artifacts, processes, and communities. We present a dataset obtained from the Maven central repository ecosystem (approximately 265GB of data) by statically analyzing ...
Modern programs rely on large application programming interfaces (APIs). The Android framework comprises 231 core APIs, and is used by countless developers. We examine a sample of 4,900 distinct crash stack traces from 1,800 different Android ...
State of the art kernel diagnostic tools like DTrace and Systemtap provide a procedural interface for expressing analysis tasks. We argue that a relational interface to kernel data structures can offer complementary benefits for kernel diagnostics.
This ...
We explore how programs written in ten popular programming languages are affected by small changes of their source code. This allows us to analyze the extend to which these languages allow the detection of simple errors at compile or at run time. Our ...
Tools emerge as the result of necessity - a job needs to be done, automated, and scaled. In the ""early days" - compilers, code management, bug tracking, and the like - resulted in mostly local home-grown tools - and when broadly successful - spawn (...
A single statistical framework, comprising power law distributions and scale-free networks, seems to fit a wide variety of phenomena. There is evidence that power laws appear in software at the class and function level. We show that distributions with ...
Why Wikipedia's remarkable growth is sustainable.
Apart from source code, software infrastructures supporting agile and distributed software projects contain traces of developer activity that does not directly affect the product itself but is important for the development process. We propose a model ...
The FreeBSD, GNU/Linux, Solaris, and Windows operating systems have kernels that provide comparable facilities. Interestingly, their code bases share almost no common parts, while their development processes vary dramatically. We analyze the source code ...
Review of "Sustainable Software Development: An Agile Perspective by Kevin Tate," Addison-Wesley Professional, 2005, $39.99, ISBN: 0321286081.
FreeBSD is a sophisticated operating system developed and maintained as open-source software by a team of more than 350 individuals located throughout the world. This study uses developer location data, the configuration management repository, and ...
Distributed computer architectures labeled "peer-to-peer" are designed for the sharing of computer resources (content, storage, CPU cycles) by direct exchange, rather than requiring the intermediation or support of a centralized server or authority. ...
Computer-mediated television brings new requirements for user interface design and evaluation, since interactive television applications are deployed in a relaxed domestic setting and aim to gratify the need for entertainment. Digital video recorders, ...
Ubiquitous computing places humans in the center of environments saturated with computing and wireless communications capabilities, yet gracefully integrated, so that technology recedes in the background of everyday activities. The ubiquitous computing ...
Forty years ago, when computer programming was an individual experience, the need for easily readable code wasn’t on any priority list. Today, however, programming usually is a team-based activity, and writing code that others can easily decipher has ...
Recent developments in mobile technologies and associated economics of scale via mature manufacturing processes have made possible the construction of pervasive systems in specific application areas. In this paper we discuss a novel type of retail ...
The purpose of this paper is to present our experience from the design of a personalized television application, and the implications for the design of interactive television applications in general. Personalized advertising is a gentle introduction to ...
Attempting to determine how quickly archival information becomes outdated.
Field accessor methods have become a ubiquitous feature of object-oriented programming. The definition and use of such methods promote code bloat and an unnatural expression style. We propose a simple addition to the C++ language that can move the ...
The integrity verification of a device's controlling software is an important aspect of many emerging information appliances. We propose the use of reflection, whereby the software is able to examine its own operation, in conjunction with cryptographic ...
Peephole optimisation as a last step of a compilation sequence capitalises on significant opportunities for removing low level code slackness left over by the code generation process. We have designed a peephole optimiser that operates using a ...
A log of operating system calls made by a process can be used for debugging, profiling, verification and reverse engineering. Such a log can be created by acting as an intermediary between the traced process and the operating system. We describe the ...
In a separate compilation environment type checks across modules are difficult to implement, because the natural place to perform them, the linker, is rarely under the control of the compiler developer. A solution to this problem, presented in the C++ ...