The world of open-source development comes with various cyber threats. GitHub is still facing a type of attack that is ongoing since last year where attackers mirrored a huge number of repositories. So as it turns out…the clone wars are not over!
If you haven’t heard about what’s going on:
GitHub is struggling to contain an ongoing attack that’s flooding the site with with millions of code repositories. These repositories contain obfuscated malware that steals passwords and cryptocurrency from developer devices. … The result is millions of forks with names identical to the original one.
– Dan Goodin, Ars technica
Because search engines and GitHub’s own search rankings favor recent activity, these cloned repositories often float to the top – then they lure unsuspecting developers into pulling code that may contain malware.
One of my repositories has been targeted by such an attack, prompting me to monitor it closely. This guide offers tips to spot malicious repository clones before they catch you off guard.
Table of Contents
What is a Repository Confusion Attack?
A repository confusion attack involves:
Cloning legitimate repositories.
Injecting malicious code into the clone.
Uploading the clone.
Spreading through various unaware actors.
Supply Chain Attacks
If you search for repository confusion on the internet, you'll find out it's a type of supply chain attack.
A supply chain attack is an indirect threat where hackers try infiltrating a system by targeting a trusted third-party or software component, rather than attacking the primary target directly.
It's not the first time this has happened. Before GitHub was targeted, PyPI was attacked in 2023 with fake packages posing as legitimate. These packages lured negligent pip users into downloading malicious payloads (containing in most cases infostealer malware).
🛡️ Basic Mitigation Strategies
Before using any repository, make sure you follow these steps and take these precautions.
Verify the contributors profiles
That's a first check: if you see a rather empty GitHub profile – one without reputation that contains just one repository but with a lot of daily commits to it – well, that's a bit suspicious.
In the fake repository, the original author will be listed as a contributor, too. Check that profile. You should be able to find the legitimate repository and do some comparisons.
In the above screenshot you can see solotech143, my evil doppelgänger (he’s been taken down since).
Search for clone repositories
You can do a GitHub search by repository name and sort the results by most recent first. Malicious repositories tend to appear at the top of the search results because they are updated more frequently. The original repository might be hidden deeper in the search results.
It’s like clone wars.
This is where it’s dangerous: users generally click on the first few search results, and in that type of attack, you’re almost guaranteed to see the attacker’s fake repository at the top of the results. The attacker achieves that by giving the fake repository regular fresh commits (and sometimes even a few stars!).
In my case, the original repository is a submission for the HackaViz 2025 competition. Hackathons offer a good attack surface because, beyond the fact they draw niche communities, they are also time sensitive.
Now, let’s move forward a year and imagine Hackaviz 2026 is starting soon. The attacker has easily outranked the untouched original submission. Which repository is most likely to be visited when future competitors – unaware of the scam – will look for the previous submissions?
Examine the commit pattern
Here’s when things take a weird turn. Malicious clones are run by automated agents, so the commit history fits a pattern that is rather unusual for a human. Of course, you can automate for many legitimate reasons but… this will always follow a clear goal and there will always be a human-touch at some point. In this case, commits are not adding up.
Let's see how that looks in the screenshots below:
Regular like a clock...
... and hyperactive!
Examine the commit history
You can’t! And that's the weird part. You're just able to see the last and the initial commit. So why is it hiding all of them? Do you like it when someone hide things from you?
For July 10th, we should be able to see 11 commits, where are the ten others?
Well, you can only check the first and last commit. That is not a lot for a repository that has more than 2000 commits registered.
Examine the commit contents
Well, since I can always check the last commit, I checked some of them. They share the same pattern: the bot is constantly looping over the README file doing the same modifications. As you can see in the screenshot below, it’s updating the file with links to an infected release.
Above you can see an AI agent stuck in the Readme loop of change.
Human edits are more varied. In a human-driven project, you will see a large mix of commits: feature commits, exploratory experiments, bug fixes, styling tweaks, and sometimes reverts. A bot clone will often just overwrite files, bump versions, or re-inject the same malicious payload repeatedly with no real contribution to the codebase.
Compare the concerned files
This is where common sense comes handy. So, you have two README's:
The first consists of AI-generated content that is cluttered with emojis and low-value information. It is designed solely to entice you into clicking the download link of the release.
The other follows best practices for creating a good README file. It is accurate and well-structured and functions as a valuable helper and explainer to the code. It also goes deep into the most important aspects of the project. This is usually a good sign that a repository is organic and genuine.
Some information about the malware
What do we have so far? Well, a suspicious link in a phishy, AI-generated README file that is consistent with a very suspicious pattern in the commit history.
Now, let’s have a closer look at that dubious release and let’s see what an online antivirus scanner might reveal about it.
The malware is packed only in the miniature-fortnight-v1.7.6.zip release.
Above you can see the result of a scan with an online scanner.
The .zip file contains only four files:
config.txt
launch.bat
lua51.dll
luajit.exe
These files are totally unrelated to the source project (a Python data science project with Jupyter notebooks combined to a React app using three.js).
I will not go into the detail in this article. But for the curious ones, it's an infostealer malware (a malware that will exfiltrate your credentials and other precious information about your configuration) similar to the one described in detail here.
Action Time
If you discover a potentially malicious repository, here are some steps you can take:
Document some evidence.
Notify the original repository maintainers.
Report the malicious clone to GitHub.
Reporting a repository or a profile on GitHub is easy and fast. Go to the user’s profile page, click “Block or report” in the left sidebar and choose “Report abuse” in the pop-up. You will have to complete a short contact form with some details about the behavior before submitting. If needed, you can find more information on GitHub.
Conclusion
This is a description of just one attack, from the perspective of someone who found out that one of his repository had been targeted. There are likely cases of more sophisticated attacks. But the clone repository flood we can see on GitHuB is definitely massive low quality automation. Quantity over quality.
To be honest, I'm quite surprised algorithms crafted at GitHub didn't manage to spot this one.
This also raises questions related to AI.
What happens when LLMs are trained on malicious content? That’s a more general question about AI poisoning.
A human might easily spot the patterns and the low quality content for now. But..
Imagine you are using coding agents, many of them. Will the agents pick-up the malicious clone instead of the original one? How to distinguish the repositories from an automaton's perspective?
The attackers will refine their tactics, making the clones more human-like and therefore luring us more easily into their traps.
This is really a situation that makes me wonder about the early days of Google. Back then, the company had to fight huge amounts of spam due to keyword stuffing and manipulative SEO tactics. Will big tech companies have to go through a Florida update moment to face the rise of AI generated spam ?
More Resources
Stay Informed, Stay Secure!
A cheat-sheet is also available on my GitHub. Feel free to contribute to it!