Learnings from a "small" project

Profilecloudbot is a small project that I have created. It's main purpose was to get me back in the swing of things in terms of creating. I have been on a bit of a slump lately when it comes to making projects and wanted something easy to get me back into creating things.

For that purpose, it has worked great! I am now working much more on my projects.

Although it was small in scope like all things it turned out to be a lot harder to create then expected. All I wanted to do was the following:

Sounds easy but there was a lot more to it then just that.

My initial thought process was that i'll just talk to the twitter api, download tweets, run them through this wordcloud library which actually has built in support for masking and then reply to the user with there wordcloud. How hard can that be‽

Turns out it's not particularity hard but I very much underestimated the amount of work required.

What I actually had to do to get this to work

Download a users tweets

Ok this was actually very easy to do. Not much more to say other then that

Run tweets through wordcloud library

Alright that was relatively simple as well, took a fair bit of data normalising and regexes to get the wordclouds looking ok and without lots of weird twitter urls strung across the place. We now have a wordcloud that looks like this:

these are tweets from The Stoic Emperor

Basic wordlcoud of the TheStoicEmperor's tweets

Turn that wordcloud into a mask

The mask we want to use for this image is the users profile picture.

I first checked if twitter has any sensible method of naming the profile pictures of users (If it did I could have just ran wget to get the image), it does not so I had to learn how to use the api to download the images. Turns out that was not as easy as expected either. The images that where being returned by the api where to small. Turns out you can slightly modify that url to get the correct image. Found that out by poking around with the dev tools.

This returned a square image. I wanted these images to be circular so they would match the expected twitter profile picture. Enter imagemagick and this beast of a command

convert image.jpg -alpha set \
    \( +clone -distort DePolar 0 \
       -virtual-pixel HorizontalTile -background None -distort Polar 0 \) \
    -compose Dst_In -composite -trim +repage circle.png

I don't know what most of those words mean

Oh yeah, the images had to be a png as well.

The wordcloud library also wanted the image to not be transparent so here is a much simpler imagemagick command I used to generate that

convert circle.png -background white -alpha remove -alpha off
white.png

I know what those words mean

If a users image is greyscale then the whole thing fails, I could not find a way to colour the image if it was greyscale so I just decided to fail here and upload a placeholder image instead.

This now pretty much works and returns wordclouds that look the this

TheStoicEmperors word cloud with profile picture:

Elon Musks!

Austen Allreds, this one looks really good.

Paul Graham; another good one!

It was at this point that twitter blocked me from using the api via some automated spam detection

(ノಠ益ಠ)ノ彡┻━┻

After a couple of days (where I did basically no work on this project) I got the api access back

┬─┬ノ( º _ ºノ)

Upload to twitter

Reply to a user who asked for a wordcloud with there wordcloud. That part was relatively easy to do and I did not run into many unforeseen issues. I did however run into the issue of replying to people multiple times.

I originally wanted to solve this issue by checking if the bot had already replied to a tweet. But I ran into so many issues with this that I actually just gave up.

Next Idea: Each tweet has an ID that is basically a fancy timestamp, I wanted to store the ID of the the most recently replied tweet, so if a tweet is < that date then we have almost certainty replied to it before and we can skip over it.

I did not want to use a traditional database for this, that would be insane. I tried storing it in a blank file, just a number in a file that was not to hard. I did not like that implementation for whatever reason, it felt "not right". So I decided it should be json formatted.

I decided to use TinyDB, a kind of database that only consists of 1 json file. It can do all the standard DB operations. So I now have a "database" with the ID in it. Since I had this database I also decided to record each user who uses the bot for no real reason at all.

TinyDB was overkill, I could have just wrote the json myself. But I like TinyDB and will be using in my future projects, so not a waste of time at all.

Glue it all together

This project has lots of glue code to stick all these various parts together. Bash scripts for imagemagick commands and data cleaning, python scripts for wordcloud generation and talking to the twitter api. I don't mind this to much.

I like my glue code.

Deploy

Deploying the bot was the easiest part, it's running in a detached screen session on 1 of my servers. I probably should create a systemd unit file and have it run as a daemon (that way it would survive a reboot) but I just wanted to get it done at this point.

Launch

I never really did a launch for this project. I probably should but that was not the main goal, it does not cost anything for me to run this project so I am happy to keep it ticking away.

Finito

Well that's basically what it took to get this working. I had fun doing it but it was way more work then expected. I intentionally picked it as I thought it would be simple. I don't mind it when things are harder then expected (they almost always are) but I wanted to point out how much even the most simpliest project can grow.

From a 3 item todo list to a 66 item todo list.

Project did serve it's purpose, I am now back into the flow of making things.

Go check it out if you want https://twitter.com/profilecloudbot


Last modified on May 13, 2021, 12:14 a.m.

Published on May 13, 2021, 12:12 a.m.