- cross-posted to:
- ai_infosec@infosec.pub
- cross-posted to:
- ai_infosec@infosec.pub
From https://twitter.com/llm_sec/status/1667573374426701824
- People ask LLMs to write code
- LLMs recommend imports that don’t actually exist
- Attackers work out what these imports’ names are, and create & upload them with malicious payloads
- People using LLM-written code then auto-add malware themselves
I quite like this blog post why LLM are bad for software quality in general: https://softwarecrisis.dev/letters/ai-and-software-quality/
Considering the Legal Eagle video I just watched about the lawyer getting into trouble because chatgpt citied non-existent legal cases for him that nobody verified really existed, you shouldn’t ever trust it at face value. Use it to aid your research, sure. But don’t blindly present it’s findings to a judge, ha.
That was just crazy. And instead of owning up to it, they doubled down on the whole thing by having GPT invent fake rulings which they could claim to have cited and wrapped up the whole shebang by revealing that the attorney who filed the stupid thing wasn’t the person who “wrote” it and hadn’t even read it and therefore shouldn’t be held accountable for its contents.
The future is bleak
It’s terrifying that someone would build off suggestions from ChatGPT without verifying the packages they are installing.
From everything I asked chatgpt I found the most success in helping me understand some stuff in Linux or just writing small simple scripts that does some converting. But more often than not if I ask it about a system and how something should work, it literally makes something up. If you have googled already and read the documentation it’s probably not going to help you.
In the case of linux there is too much info on the web where it gets all the info from too so if posts are old or false it literally provides you with wrong information as a fact.
What I have learned to do is to just try it but I want to fully understand what it does and I’ll convert stuff to my own style.
I’ve used ChatGPT and Copilot to help with PowerShell in the past. For the most part I’ve found it, okay. But I can definitely see how that could happen. I’ve had a few instances where is tried giving me cmdlets that don’t exist. This means it is just taking pieces of code from someone else’s project and not understanding how it all fits together. So, if I search that cmdlet there is no telling the number of results I could get from good, bad, or irrelevant sources. It would be better if it told you where that code came from. Then I could look at the original source and see if they created custom functions that these AI are considering to be built-in cmdlets.
Indirect prompt injections will make this worse. Plugins lead to scraping insecure websites (i.e., search for docs for a particular topic). This can result in malicious context being embedded and suggested during a PR or code output.
That along with the above, faking commonly recommended inputs, it becomes very difficult to just trust and use LLM output. One argument is that experienced devs can catch this, but security is often about the weakest link, one junior dev’s mistake with this could lead to a hole.
There are guard rails to put in place for some of these things (i.e., audit new libraries, only scrape from ‘reliable’ websites), but I suspect most enterprises/startups implementing this stuff don’t have such guard rails in place.
Related
I wouldn’t trust the packages used by an LLM, or any other part of code they write, or any other text response they generate. Everything must be treated skeptically and verified!
I use ChatGPT daily to assist with my job as a fullstack web developer. If I’m asking it for boilerplate code, I already know exactly what it should look like, and having ChatGPT generate the code and then proofreading it myself is usually a small timesaver. And even if it’s not, the lower perceived effort on my part is beneficial for my mental load.
When I ask it for code where I don’t already know the ‘right’ answer (e.g. refactoring an algorithm to use loops instead of recursion, example for library with poor documentation, creating new function from scratch), I always write a series of test cases to ensure the code behaves as expected.
Similarly if it suggests a library I am unfamiliar with, I’ll check it’s github or stats on npmjs to verify it’s maintained and commonly used. Though it almost always picks the same library that I had picked previously (when one of my previous projects was in a similar situation). Probably because those libraries were the most commonly used. I never experienced a made up import; however, there were a couple instances where it made up a function that did not actually exist inside of the library.
Asking LLMs for code is fine, but it really needs proof reading to be worth anything. Could even ask it to proofread its own work.
Also, never 3.5
I mostly find Copilot extremely useful for sql queries. So much easier to write naturally what I want and Copilot just spits it out. I have to doctor it sometimes but it really does get the job done. Other than that I could do without it.
Definitely not. LLMs just make things up that sounds right, for anything other than the simplest code you pretty much always have to fix the output.
LLMs are only useful as rubber ducks to figure out what might be wrong with your code, and it’s honestly easier for me to read the documentation/Stack Overflow instead when trying to write code from scratch.
I’ve found co-pilot useful for writing code in languages that I can read pretty well but I’m not comfortable enough to type in it fluently such as shell-scripts and rust.