For the past few months, I’ve been running a handful of SSH Honeypots on some cloud providers, including Google Cloud, DigitalOcean, and NameCheap. As opposed to more complicated honeypots looking at attacker behavior, I decided to do something simple and was only interested in where they were coming from, what tools might be in use, and what credentials they are attempting to use to authenticate. My dataset includes 929,554 attempted logins over a period of a little more than 3 months.

If you’re looking for a big surprise, I’ll go ahead and let you down easy: my analysis hasn’t located any new botnets or clusters of attackers. But it’s been a fascinating project nonetheless.

Honeypot Design

With a mere 200ish lines of Go, I implemented a honeypot server using the golang.org/x/crypto/ssh library as the underlying implementation. I advertised a portable OpenSSH version as the server version string (sent to clients on connection). I then logged each connection to a SQLite database, including the timestamp, IP address, client version, and credentials used to (attempt to) authenticate.

Analysis of Credentials

In a surprise to absolutely nobody, root is by far the most commonly tried username for login sessions. I suspect there must be many attackers trying lists of passwords with just root as the username, as 78% of attempted logins were with username root. None of the remainder of the top 10 are particularly surprising, although usuario was not one I expected to see. (It is Spanish for user.)

Blank passwords are the most common attempted passwords, followed by other obvious choices, like 123456 and password. Just off the top 10 list was a surprising choice of password: J5cmmu=Kyf0-br8CsW. Interestingly, a Google search for this password only finds other people with experience running credential honeypots. It doesn’t appear in any of the password wordlists I have, including SecLists and others. If anyone knows what this is a password for, I’d love to know.

There were a number of other interesting passwords such as 7ujMko0admin, used for a bunch of networked DVRs, and also known to be used by malware attacking IoT devices. There are other passwords that don’t look obvious to a US-centric view of the world, like:

  • baikal – a lake in Siberia
  • prueba – Spanish for test
  • caonima – a Mandarin profanity written in Pinyin
  • meiyoumima – Mandarin for “no password”
  • woaini – Mandarin for “I love you”
  • poiuyt – The name for an optical illusion also known as the “devil’s tuning fork”

There are also dozens and dozens of keyboard walks, like 1q2w3e, 1qaz@WSX, and !QAZ2wsx. There are many more that took me much longer to realize they were keyboard walks, such as 4rfv$RFV and qpwoei.

It has actually fascinated me to look at some of the less obvious passwords and discern their background. Many are inexplicable, but I assume they are from hardcoded passwords in devices or something along those lines. Or perhaps someone let their cat walk across the keyboard to generate it. I’ve certainly had that experience.

Overall, the top 10 usernames and top 10 passwords (not necessarily together) are:

|———-|——|———-|——|

UsernameCountPasswordCount
root729108<blank>40556
admin2330212345614542
user8420admin7757
test75471237355
oracle621112347099
ftpuser4012root6999
ubuntu3657password6118
guest3606test5671
postgres3455123455223
usuario2876guest4423
———-—————-——
{:.center}

There were a total of 128,588 unique pairings of username and password attempted, though only 38,112 were attempted 5 or more times. You can download the full list of pairs with counts here, but I’ve omitted those attempted less than 5 times in case a legitimate user typo’d an IP or otherwise was mistaken. The top 25 pairings are:

|———-|————|——-|

usernamepasswordcount
root37580
rootroot4213
useruser2794
root1234562569
testtest2532
adminadmin2531
rootadmin2185
guestguest2143
rootpassword2128
oracleoracle1869
ubuntuubuntu1811
root12341681
root1231658
postgrespostgres1594
supportsupport1535
jenkinsjenkins1360
adminpassword1241
root123451177
piraspberry1160
root123456781126
root1234567891069
ubntubnt1069
admin12341012
root1234567890967
ec2-userec2-user963
———-——————-
{:.center}

Again, no real surprises here. ubnt is a little bit higher than I would have thought (for Ubiquiti networking gear) but I suppose there’s a fair bit of their gear on the internet. It’s interesting to see the mix of “lazy admin” and “default credentials” here. It’s mildly interesting to me that all substrings of the first 10 digits (3 or longer) are included, except for 7 digits. I guess 7 digit passwords are less common?

Timing Information

Though I imagine these kind of untargeted scans are long-term processes continually running, I decided to check and see what the timing looked like anyway. Neither the day of week analysis nor the hour of day analysis look like there’s any significant variance.

Day of Week{:.center} Hour of Day{:.center}

Looking at the number of login requests over the time period where I’ve been running the honeypots shows the traffic to be intermittent. While I didn’t expect the number to be constant, the variance is much higher than I expected. I imagine a larger sample size and more nodes would probably make the results more even.

Day of Study{:.center}

Analysis of Sources

So where are all of these requests coming from? I want to start by noting that none of my analysis is an attempt to attribute the actors making the requests – that’s just not possible with this kind of data. There’s two ways to look at the source of requests – in terms of the network, and in terms of the (assumed) geography. My analysis relied on the IP to ASN and IP to Country data provided by iptoasn.com.

Looking at the country-level data, networks from China lead the pack by a long shot (62% of all login attempts), followed by the US.

Countries{:.left}

|———-|——–|

CountryCount
CN577789
US87589
TW48645
FR39072
RU30929
NL29920
JP28033
DE15408
IN13921
LT6623
———-——–
{:.right}

Again, I’m not claiming that these countries mean anything other than location of the autonomous system (AS) that originates the requests. I also did not do individual IP geolocation, so the results should be taken with a small grain of salt.

So what networks are sourcing this traffic? I have the full AS counts and data, but the top networks are:

AS NameCountryASNCount
CHINANET-BACKBONE No.31,Jin-rong StreetCN4134202024
CHINANET-JS-AS-AP AS Number for CHINANET jiangsu province backboneCN23650186274
CHINA169-BACKBONE CNCGROUP China169 BackboneCN4837122192
HINET Data Communication Business GroupTW346248492
OVHFR1627630865
VECTANT ARTERIA Networks CorporationJP251927481
DIGITALOCEAN-ASN - DigitalOcean, LLCUS1406126965
MICROSOFT-CORP-MSN-AS-BLOCK - Microsoft CorporationUS807520370
RMINJINERINGRU4987716710
AS38994NL3899414482
XMGBNET Golden-Bridge Netcom communication Co.,LTD.CN4505812418
CNNIC-ALIBABA-CN-NET-AP Hangzhou Alibaba Advertising Co.,Ltd.CN3796312045
CNNIC-TENCENT-NET-AP Shenzhen Tencent Computer Systems Company LimitedCN4509010804
CNIX-AP China Networks Inter-ExchangeCN484710000
PONYNET - FranTech SolutionsUS536679317
ITTIUS446857960
CHINA169-BJ China Unicom Beijing Province NetworkCN48087835
AS12876FR128767262
AS209605LT2096056586
CONTABODE511676261
{:.center}

AS Graph{:.center}

Chinanet is no surprise given the high ratio of China in general. OVH is a low-cost host known to have liberal AUP, so is popular for both malicious and research purposes. DigitalOcean and Microsoft, of course, are popular cloud providers. Surprisingly, AWS only sourced about 600 connections, unless they have a large number of IPs on a non-Amazon ASN.

Overall, traffic came from 27,448 unique IPv4 addresses. Of those, more than 11 thousand sent only a single request. At the other end of the spectrum, the top IP source sent 64,969 login requests.

Most hosts sent relatively few requests, the large numbers are outliers:

IP Count Graph{:.center}

Surely, by now a thought has crossed your mind: how many of these requests are coming from Tor? Surely the Tor network is a wretched hive of scum and villany, and the source of much malicious traffic, right?

Tor Graph{:.center}

Not at all. Only 219 of the unique source IPs were identified as Tor exit nodes, representing only 0.8% of the sources. On a per-request basis, even a smaller percentage of requests is seen from Tor exit nodes.

Client Software

Remember – this is self-reported by the client application, and just like I can spoof the server version string, so can clients. But I still thought it would be interesting to take a brief look at those.

|——————————————-|——–|

clientcount
SSH-2.0-PuTTY309797
SSH-2.0-PUTTY182465
SSH-2.0-libssh2_1.4.3135502
SSH-2.0-Go125254
SSH-2.0-libssh-0.6.362117
SSH-2.0-libssh2_1.7.023799
SSH-2.0-libssh2_1.9.021627
SSH-2.0-OpenSSH_7.39954
SSH-2.0-OpenSSH_7.4p18949
SSH-2.0-libssh2_1.8.05284
SSH-2.0-JSCH-0.1.453469
SSH-2.0-PuTTY_Release_0.702080
SSH-2.0-PuTTY_Release_0.631813
SSH-2.0-OpenSSH_5.31212
SSH-2.0-paramiko_1.8.11140
SSH-2.0-PuTTY_Release_0.621130
SSH-2.0-OpenSSH_4.3795
SSH-2.0-PuTTY_Release_0.66694
SSH-2.0-OpenSSH_7.9p1 Raspbian-10+deb10u2690
SSH-2.0-libssh_0.11660
——————————————-——–
{:.center}

You know, I didn’t expect that. PuTTY as the top client strings. (Also not sure what to make of the case difference.) I wonder if people are building the PuTTY SSH library into a tool for scanning or wrapping the binary in some kind of script.

Go, paramiko, and libssh are less surprising, as they’re libraries designed for integration. It’s hard to know if the OpenSSH requests are linked into a scanning tool or just wrapped versions of the SSH client. At some point in the future, I might dive more into this and trying to figure out which software uses which libraries (at least for the publicly-known tools).

Summary

I was hoping to find something earth-shattering in this research. Instead, I found things that were much as expected – common usernames and passwords, widespread scanning, large numbers of requests. One thing’s for sure though: connect it to the internet and someone’s going to pwn it.