Skip to main content

Github Avatars

· 3 min read
RatoGBM
Random Blogger

Scrapping Github avatars, because it is easy.

Avatars are located at https://avatars.githubusercontent.com/u/80184495?v=4 where /u/ stands for user, and 80184495 is the user's ID.

warning

Scripts and examples in this post involve images controlled by other users. I have no control or responsibility over what is displayed. The images can (and will) include pornography and other offensive content. They will also include trademarked logos. Don't automatically use images you haven't looked at, and never impersonate.

User IDs are given sequentially and quite easy to look up on https://api.github.com/user/[username].

?v=4 is the version of something, but not the avatar, because everyone I checked has version 4, and changing the version doesn't change the avatar. (I compared the JPEGs on binary level with cmp -l, and no difference.)

Note: it is possible to do ID -> username lookup via https://api.github.com/user/:id.

Automating

I could manually type in numbers, and download avatars, but I decided to script it.

First I tried curl https://avatars.githubusercontent.com/u/80184495?v=4 > avatar.jpeg. And curl worked perfectly.

Based on curl, I can write a quick bash script.

#!/usr/bin/env bash
id=$RANDOM # Random number between 0 - 32767
curl https://avatars.githubusercontent.com/u/$id\?v\=4 > $id.jpeg
echo "Saved $id.jpeg"

Read more: running bash | shebangs | $RANDOM | escaping characters

Problem is, $RANDOM only gives a number between 0 - 32767, while there are around 100M github users. Also, I might want to run the script more than once.

##!/usr/bin/env bash
id=$(shuf -i 1-100000000 -n 1 2>&1)
curl https://avatars.githubusercontent.com/u/$id\?v\=4 > $id.jpeg
echo "Saved $id.jpeg"

Read more: shuf command | getting outputs in bash

##!/usr/bin/env bash
N=5 # Don't put too much, or cleanup will be annoying
for ((i = 0 ; i < $N ; i++)); do
id=$(shuf -i 1-100000000 -n 1 2>&1)
curl https://avatars.githubusercontent.com/u/$id\?v\=4 > $id.jpeg
echo "Saved $id.jpeg"
done

The last problem is that most avatars we download are boring Identicons.

Going Advanced

At this point, I am swtiching to JavaScript, because I will need stronger tools for analyzing images. Also Github has a REST API with IP rate limits, so I will have to be careful.

I could detect identicons pretty easily, because thye have exactly 2 colors from a predefined palette, and one color is always white.

But there are easier ways: Identicons are 420x420, while custom profile pictures are usually resized to 460x460 or some other size.

const avatars = document.getElementById('avatars');
const boring_avatars = document.getElementById('boring-avatars');
function rand_avatar() {
let img = new Image();
let id = randrange(0,100_000_000);
img.src = `https://avatars.githubusercontent.com/u/${id}?v=4`
img.id = id.toString();
avatars.appendChild(img);
img.onload = function(){
if (img.naturalHeight==420) {
img.remove();
boring_avatars.appendChild(img);
}
};
}
function randrange(a,b) {
return Math.round(Math.random()*(b-a+1)+a);
}

Demo

That's it for this adventure. I might mess around with githubs API more in the future.

Take care