AI that can predict future source code changes from past edits might be an invaluable tool for programmers, but it’s a challenge that’s yet to be fully conquered by researchers.
A team at Google Brain, though, describe a promising new approach in a preprint paper on Arxiv.org (“Neural Networks for Modeling Source Code Edits“) that they say provides the best overall performance and scalability of any yet tested.
“At any given time, a developer will approach a code base and make changes with one or more intents in mind,” the paper’s authors write.
“It is … an interesting research challenge, because edit patterns cannot be understood only in terms of the content of the edits (what was inserted or deleted) or the result of the edit (the state of the code after applying the edit).
An edit needs to be understood in terms of the relationship of the change to the state where it was made, and accurately modeling a sequence of edits requires learning a representation of the past edits that allows the model to generalize the pattern and predict future edits.”
Toward that end, they first developed two representations to capture intent information that would scale “gracefully” with the length of code sequences: explicit representations, which “instantiate” edits in the sequence (represented as tokens in a 2D grid), and implicit representations, which instantiate subsequent edits.