Friday, April 25, 2014

Tutorial - Create a reddit bot with python and Heroku

Heroku is a popular Platform as a Service (PaaS) host for deploying applications in multiple languages. You can run a limited number of small applications for free, Heroku automatically restarts your application if it crashes, and deployment is fairly simple.

Prerequisites

There are a few tools that you'll to have installed need before you start:

  • Heroku toolbelt (Heroku has good installation instructions here. You only need to follow steps 1-3.)
  • git
  • python (any version is fine, I'll use 3.4 in my examples)
  • virtualenv (not exactly necessary, but useful)

Setup instructions

Start by making a project directory and a virtual environment.

mkdir redditbot
cd redditbot
virtualenv --python=python3.4 env
source env/bin/activate

This will create a virtual python environment in a directory named env. If you run python --version you should see the version that you requested and not your system default python.

\

Next, you'll need to install PRAW, which will be used for interacting with reddit.

pip install praw

This will install praw into your virtual environment.

redditbot

I'll give my reddit bot a really original name: redditbot. I'm not going to explain how to use PRAW, but I will give a really simple example bot and point you to the documentation here. This bot is a slightly modified version of the "quick peek" example bot from the PRAW documentation.

import os
import time

import praw

# reddit really wants you to use a unique user agent string.
# see https://github.com/reddit/reddit/wiki/API#rules
r = praw.Reddit(user_agent='redditbot 0.1 by /u/')
# login isn't strictly needed here since we're not
# posting, commenting, etc.
# you'll need to set the REDDIT_USER and REDDIT_PASS
# environment variables before you run this bot
r.login(os.environ['REDDIT_USER'], os.environ['REDDIT_PASS'])

while True:
    for submission in r.get_subreddit('learnpython').get_hot(limit=5):
        print(submission)
    time.sleep(30)

Save this into redditbot.py, and you're ready to deploy the application to Heroku

Deployment

First, you'll need to create a heroku application.

heroku create

This will create a Heroku application. In the output, you should see the application name (I'll use "rapid-brook-5928" as a made-up example) and a git address. After this, you can create a git repository and connect it to your Heroku app (add the Heroku git repo as a remote).

git init
heroku git::remote -a rapid-brook-5928

Now, you'll need to create a few files to tell Heroku (and git) what to do to create and run your app.

requirements.txt

This file can be created for you by pip. You should recreate it any time you install or update any libraries in your virtual environment.

pip freeze > requirements.txt

runtime.txt

This file tells Heroku what version of python (or ruby, or java, etc.) to run your application. You should set it to the version you're using in your virtual environment.

python-3.4.0

Procfile

This file tells Heroku about the processes needed for your app. The free tier only supports one process per application, and we're going to make it a worker.

worker: python redditbot.py

.gitignore

This last one will keep git from storing unneeded files (or sending them to Heroku).

__pycache__/
env/
*.pyc

Remember the REDDIT_USER and REDDIT_PASS environment variables that redditbot is using? You need to set those in your Heroku environment.

heroku config:set REDDIT_USER=AzureDiamond REDDIT_PASS=hunter2

Now that you have all of the files created, it is time to push them to Heroku.

# add all files in current directory to git
git add .
git commit -m "creating redditbot"
# push your changes to Heroku
# this will deploy/redeploy your app on Heroku
# Heroku will install all of the libraries
# in requirements.txt and create a worker
git push heroku master

At this point, your bot is not yet running. You still need to log in to Heroku, go to your application dashboard, and increase the number of dynos allocated to your application to 1 (make sure to click on the "Apply Changes" button. For my example app, I would go to https://dashboard.heroku.com/apps/rapid-brook-5928/resources

Viewing the output

At this point, the bot doesn't do anything besides list the top posts at /r/learnpython. Everything the bot prints (including stacktraces when it crashes) goes to the Heroku log, which you can view with this command:

heroku logs

Where to go from here

Now that you're done creating a basic reddit bot, you'll want to expand it and do more. Explore the PRAW documentation and the reddit API to learn how to create posts or respond to comments. If you do respond to comments, you'll want to remember which comments you've responded to. You can do this with a list you store in memory, or you can use a database. Good luck!

Wednesday, March 31, 2010

grep -e works in Go

I've been working on a rewrite of 9base in Go. 9base is a subset of Plan 9 from User Space, which is a *nix port of Plan 9. My project, goblin, can be found on bitbucket.

While working on writing grep, I ran into a limitation of Go's flag package--each flag can only have one value. Basically, grep written in Go using the built-in flag package would be unable to be called like this (search expression can be passed in to grep with -e):

grep -e foo -e bar -e baz file.txt


I'm not certain what would happen if it was called like that, although my guess is that the baz value would be used. The only thing I'm willing to guarantee here is that only one of the three values passed in via -e would be used. You would be expecting to have grep use an expression like (foo|bar|baz), but you'd be getting the results of baz. One option I investigated was using a different library for flag parsing, optparse. I am planning on writing a brief description/tutorial of this package later.

After a little bit of conversation in the golang-nuts mailing list, Rob Pike decided to change the flag package (code review). His change was very simple; He opened the API up a little bit to allow custom types. Support for multiple types isn't built in to the flag package, but I believe that this is a good thing. The flag package is kept very simple, but it can now be extended if needed.

In the code review, Rob gave an example of how to use the modified flag library to support multiple flags. It's simple really, all you have to do is create a new type that satisfies the flag.Value interface, i.e. it needs to have a Set and String function.

I took the example, added to it, and made it work the same as the flag package--without any global variables. Here's an example of how to use my version (multiflag) (note the lack of a default value):



package main

import (
"flag"
"fmt"
"./multiflag"
)

func main() {
argList := multiflag.StringArray("e", "Can be used multiple times")
flag.Parse()
fmt.Print(*argList)
}



Assuming that this file was compiled into a binary named grep, if it was called like the grep example above (grep -e foo -e bar -e baz), this program would print out [foo bar baz].

And here is the code that makes it all possible (requires Go to be updated to at least 6070517caba0):



package multiflag

import (
"flag"
"fmt"
)

type stringArrayValue struct {
p *[]string
}

func StringArray(name string, usage string) *[]string {
p := new([]string)
StringArrayVar(p, name, usage)
return p
}

func StringArrayVar(p *[]string, name string, usage string) {
flag.Var(&stringArrayValue{p}, name, usage)
}

func (f *stringArrayValue) String() string {
return fmt.Sprint(*f.p)
}

func (f *stringArrayValue) Set(s string) bool {
if *f.p == nil {
*f.p = make([]string, 1)
} else {
nv := make([]string, len(*f.p)+1)
copy(nv, *f.p)
*f.p = nv
}
(*f.p)[len(*f.p)-1] = s
return true
}


Thursday, January 01, 2009

4 Ways to Make a Linux Reinstall Easier

I tend to grow dissatisfied with my current Linux distribution from time to time, and start looking around at other distributions. As I've done this, I've learned a few tricks that make it a lot easier and faster to get the new setup running and working for me.

1) Separate partition for /home


I can not stress this point enough. All of your configuration settings are stored in your user account under /home. All of your documents, downloads, wallpapers, etc. are stored in your user account. Any program settings are stored in /home. Generally, all files that you have created or worked on live somewhere in /home. If you have a separate partition for /home then you can reinstall Linux and have it configured (on a user level) the same way the old install was.

If you don't currently have a separate partition for /home, then you should make sure to create one during your next Linux install. It will make all future installs a lot easier.

If you have any important files elsewhere in your filesystem (like a webpage you've put in /var/www) then you might want to consider making that a separate partition as well, or at least have backups.

The only issue I've ever had with this is with the default user id the distributions use. You need to make sure that the your users have the same uid as they had on the previous distribution, or chown their home directories so that they belong to the proper users.

2) Multiple partitions for installing


This one is a little less obvious, but it has been very good to me. I like to have more than one partition for installing Linux. I alternate which partition I use for every install so that I always have my previous install to go back to. If something isn't working in my latest Arch install, I can just boot back into Debian. If the latest Ubuntu install has issues then I can boot into the previous version.

I like to have 10 GB partitions for linux installs. I usually only use about half of that, but it leaves room for growth without eating up too much disk space. The rest of my disk is usually reserved for my home partition.

3) Keep a list of your commonly installed packages


If you're switching from one Debian based distro to another, then a combination of dpkg and dselect can be very useful. Look here for more information: http://mybrainrunslinux.com/node/2

Whether you're using yum, rpm, yast, pacman, or dpkg you can get a list of installed packages and use that as a guide for what you'll need to reinstall in your new distribution. If you do this very often you will probably become very familiar with the packages you need to get everything back to normal.

4) Use Foxmarks (if you use Firefox)


Ok, this one isn't very useful if you're reinstalling on the same computer and using a separate partition for home, but it is very nice if you are installing on a different computer or if you don't have a separate home partition. Foxmarks is a Firefox extension that synchronizes bookmarks. If you use it then all you have to do in a fresh install is install the plugin and synchronize your bookmarks. Where this plugin really shines for me is multiple computer use. Whether I'm using my work computer, my home computer, or any other computer where I have an account, I have the same set of bookmarks. I love it. If you're concerned about hosting your bookmarks on somebody else's server, you can always configure Foxmarks to save its file on your own server.

Conclusion


I titled this post "4 Ways to Make a Linux Reinstall Easier", but most of the points in here could equally apply to making it easier to move to a new computer. Some of them make it easier to simultaneously use multiple computers that all have the same user settings and data. Hopefully some of these tips can make the process a lot easier for you.

If all you want to do is try out a new distribution, you do have an alternative--Virtual Box (or VMWare, or etc). You can install Linux in a virtual machine (whether you're running Linux, Windows, or Mac) and run it without having to worry about partitions and without erasing your current operating system.

Tuesday, December 16, 2008

Heroes and relativity

I don't want to make this a theme, but I really feel like complaining about Heroes right now.

If you haven't seen last night's episode (Season 3 Episode 13 - Dual), then you might not want to read this. I'll try to keep spoilers to a minimum, but I need to spoil a little bit to get my point across.

Last night's episode wasn't bad, but their physics were horrible. I am always more than willing to suspend my belief for a good story. I love the super powers--flight, time travel, all that weird stuff Sylar and Peter can do, etc.. If you're going to explain how the powers work though, at least tell us something acceptable. Maybe Daphne's ability is related to time. Maybe she can move really fast because she's really just slowing time down. Maybe when coupled with Ando's new power she can manipulate time even better and is able to travel back and forth through time.

Don't feed us a bunch of crap about how she can travel through time because she runs so fast that she can travel through time. The theory of relativity doesn't work that way, so don't tell us that it does. Find some other way to tell the story. You can let Daphne time travel without lying to us about physics.

Here's to a new season of Heroes. May it be better than the last couple of seasons.

Wednesday, December 10, 2008

Maze generation in Python

When I was in school, I created (as an assignment) a program that created mazes and allowed the user to run through it in 3D. I used a recursive backtracking algorithm to create the maze. While this isn't a bad way to do it, I wanted to try implementing a different solution in python.

Enter Kruskal. Kruskal's algorithm is an algorithm used to find a minimum spanning tree for a graph. If this doesn't sound like a good way to build a maze, just hang on. I'm getting there. A lot of the time, the most difficult part of applying an algorithm to a problem is fitting it into some sort of data structure. A maze makes a perfect graph--every square is a node in the graph. Every square that is not separated by a wall is connected.

For more details on graph theory or Kruskal's algorithm, Wikipedia is your friend.

Back to the point: I love how simple the python implementation is.


width, height = 20, 20

# create a list of all walls
# (all connections between squares in the maze)
# add all of the vertical walls into the list
walls = [(x,y,x+1,y)
for x in range(width-1)
for y in range(height)]
# add all of the horizontal walls into the list
walls.extend([(x,y,x,y+1)
for x in range(width)
for y in range(height-1)])

# create a set for each square in the maze
cell_sets = [set([(x,y)])
for x in range(width)
for y in range(height)]

# in Kruskal's algorithm, the walls need to be
# visited in order of weight
# since we want a random maze, we will shuffle
# it and pretend that they are sorted by weight
walls_copy = walls[:]
random.shuffle(walls_copy)

for wall in walls_copy:
set_a = None
set_b = None

# find the sets that contain the squares
# that are connected by the wall
for s in cell_sets:
if (wall[0], wall[1]) in s:
set_a = s
if (wall[2], wall[3]) in s:
set_b = s

# if the two squares are in separate sets,
# then combine the sets and remove the
# wall that connected them
if set_a is not set_b:
cell_sets.remove(set_a)
cell_sets.remove(set_b)
cell_sets.append(set_a.union(set_b))
walls.remove(wall)


I'm sure there are things here that could be improved, but I really like the way Python works. If I were to implement this in C++, it might run faster (not that it needs to, the mazes are generated fast enough in Python) but it would take a lot longer to write and would end up with a lot more code.