If AI helps you code, who owns the final product?

Opinion I’ve been writing software for almost half a century, and my recent experiences with AI suggest that developers may soon find themselves in a very difficult situation.

I say I started with 8085 assembly code and then moved on to C, then C++, then Java. Once the web was around, I learned the three P’s: Perl, PHP, and Python.

Python stuck – more than twenty years later, it remains my favorite language. I am far from alone; These days, many introductory computer courses teach Python. This means that most scientists and engineers are at least somewhat familiar with it, so when they need to code something, they use Python. There are enormous libraries of Python ‘solutions’ available online. If you have a coding problem, chances are someone else has already solved it.

This explains why Python became the de facto language for machine learning and artificial intelligence; researchers working on ML algorithms want to test their hypotheses and optimize their approaches – without worrying about the details of the code. With Python, researchers don’t have to put a lot of effort into their code; instead, they can focus on the problem they are solving. That started a virtuous cycle of development: virtually everything in artificial intelligence today (except the lowest, tightest loops of bit-banging and matrix multiplications) is written in Python.

Recently, a lawyer who specializes in intellectual property law asked for my help in prototyping a tool he came up with: using generative AI to automate some of the tedious, clumsy bits of research that IP lawyers do every day. I jumped at the chance to get my hands on some ‘product-oriented’ AI coding, and realized I could even benefit from some AI myself, using OpenAI’s GPT-4.

All five major base models (GPT-4, Microsoft Copilot, Google Gemini, Anthropic Claude, and Meta AI) were subjected to trillions of “tokens” of text during their lengthy training, including virtually every last example of source code that could be taken from the open web and open source code repositories scrapped.

A lot of that code is Python, which means all of these models can do a good job of writing Python.

Knowing this, I wanted to learn if I could use AI to ’10x’ myself: could I make software ten times faster using AI than using just my wetware?

To test that idea, I quickly discovered that I needed to adapt my playful coding approach to something more rigorous. Do I understand the problem I wanted to solve? Could I express it clearly? Could I convey my insights to the AI ​​in a sufficiently direct and unambiguous way so that it would generate the response I was looking for?

That was my first big “aha” moment: to realize the benefits of AI, I had to completely rework my workflow into something much more formal, thoughtful, and structured — a process that’s much less fun than pointlessly switching from editor to command line. Working with an AI as an accelerator transforms work.

If I hadn’t been coding for almost half a century, it would have taken me much longer to sense how to change my practice and conform myself to what the AI ​​asks; As it is, I see what I need to do, even if I resist. It feels less fun this way. But then again, that will always be the nature of the trade-off – certainly, you can work faster, but you probably won’t enjoy the process.

However, when faced with writing a function to extract a set of relevant data from a huge and deeply nested XML document, I enjoyed GPT-4’s help. I could have spent a day writing snippets of code and exploring Python’s XML module. I did spend an hour on the problem before deciding that this work is better done by the AI. It took me a few minutes to structure an effective prompt, and enter it along with a sample XML file – a ‘one-shot’ prompt. The AI ​​quickly gave me a function that fit perfectly and even worked the first time. But after a few tweaks it became clear that I didn’t understand the structure of the XML document and the AI-generated code reflected my poor understanding. That led to my second “aha” moment: garbage in, garbage out.

I asked GPT-4 to adjust the function to reflect my deeper understanding; it generated a new version of the function. I pasted that into my code and then added one-line additions so I could tailor it to my specific needs. I got to a point where about 80 percent of the output was generated by AI and 20 percent was my own work. Then I had my third and biggest “aha”: Whose code is this?

In the age of AI, one thing the legal system has been unequivocally clear on so far is the ownership of AI-generated content: since it is not created by a human, it cannot be copyrighted. The AI ​​doesn’t own it, the creators of the AI ​​don’t own it, and whoever prompted the AI ​​to generate this content doesn’t own it either. That code cannot have an owner.

Who owns this code I wrote for this lawyer? I’ve pasted a copyright notice at the top of the source – as I always have – but does that mean anything? A core function in this code is largely AI generated; and while the rest of my code may be artisanal, custom, human-crafted Python, any programmer working with an IDE connected to Github Copilot or getting help from GPT-4 is likely creating code with that much AI written facts that it is very difficult to know where the human ends and the machine begins.

Is any part of that code protected by copyright? Or has all the software we write today been so thoroughly compromised that it may no longer be defensible as a copyrighted work?

I asked the lawyer who brought me in to solve their problem. “Well, you own the copyright to the compilation,” he said, pointing to a recent ruling where an author was awarded the copyright to an AI-generated collection of texts – due to his role as curator of that collection.

What does that mean for source code, where one line can be human-written (and therefore protected), while the next line can be AI-generated (and unprotected)? “It’s a mess,” the lawyer admitted.

We got into this mess with ridiculous speed, completely unaware that using these AI-powered encryption tools turns the copyright protections that every software company takes for granted into a kind of Swiss cheese of loopholes, exceptions and issues that will ultimately need to be tested in court cases.

It seems unlikely that commercial organizations will turn their backs on the productivity gains promised by AI coding tools. The allure of calling in a technical team ten times will almost certainly drown out any risks expressed by a legal department urging caution.

Charging ahead at full speed will work until it doesn’t – when a major software company finds out that their crown jewels have been taken away by the consistent integration of generative AI. By then it will be far too late for anyone else. ®

Leave a Reply

Your email address will not be published. Required fields are marked *