first | Automalia: done it is!

Howdy 🙂

it’s 10PM once again, and I’ve decided to restart posting. Really it’s been really hard to focus energy into posting content. Let’s see how it goes. I was planning a deep analysis into many different services and their offerings, guided by this wikipedia entry, but, as a colleague once said: “who DOESN’T use GitHub nowadays”?

I love Free and Open Source Software and advocate its use, but I don’t really care if the host I’m using holds proprietary software. I work in a closed-source business company (in fact, we use plenty Open Source software to produce our proprietary products), and I don’t think every software MUST be free, but COULD, with a different business model.

I’d like to have a defect control system with it (such as Bugzilla) and I’d love to have my projects featuring a nice front page introduction, perhaps even some examples of its use. A wiki system would also be nice, but I could live without it. I don’t need a build system, mailing lists or forums for my projects. There are also lots of alternatives for team management, so I don’t need it (especially on single-man projects such as mine).

So, not only GitHub matches my needs, but also Sourceforge, Google Code (without web-page hosting), Launchpad (without web and wiki), JavaForge, GNU Savannah and some other less known choices. But, it’s friends’ good experience with it that guides my choice this time, and I can always change if there’s a better solution. 🙂

To GitHub!

This was my first Code Jam problem ever, and I was quite thrilled with the competition at the time! 😀 So I figured it would be the perfect introduction problem. You can access through the following link:

http://code.google.com/codejam/contest/90101/dashboard#s=p0

It is Alien Language, and it says:

After years of study, scientists at Google Labs have discovered an alien language transmitted from a faraway planet. The alien language is very unique in that every word consists of exactly L lowercase letters. Also, there are exactly D words in this language.

Once the dictionary of all the words in the alien language was built, the next breakthrough was to discover that the aliens have been transmitting messages to Earth for the past decade. Unfortunately, these signals are weakened due to the distance between our two planets and some of the words may be misinterpreted. In order to help them decipher these messages, the scientists have asked you to devise an algorithm that will determine the number of possible interpretations for a given pattern.

A pattern consists of exactly L tokens. Each token is either a single lowercase letter (the scientists are very sure that this is the letter) or a group of unique lowercase letters surrounded by parenthesis ( and ). For example: (ab)d(dc) means the first letter is either a or b, the second letter is definitely d and the last letter is either d or c. Therefore, the pattern (ab)d(dc) can stand for either one of these 4 possibilities: add, adc, bdd, bdc.

So, it means there is a translation expression and words that we must match, in order to know how many words belong to the test case language. Ring any bells? Yes, regular expressions! So, all we need to do is write the text reader for the input rules, read the words and build the regular expression from the test case line, and count the matches!

I/O rules:

Input

The first line of input contains 3 integers, L, D and N separated by a space. D lines follow, each containing one word of length L. These are the words that are known to exist in the alien language. N test cases then follow, each on its own line and each consisting of a pattern as described above. You may assume that all known words provided are unique.

Output

For each test case, output:
Case #X: K
where X is the test case number, starting from 1, and K indicates how many words in the alien language match the pattern.

So, the following was produced:

import re, string, sys

from array import *

ST_HEADER=1
ST_WORDS=2
ST_PATTERNS=3
ST_EXIT=4

state =ST_HEADER
cases = 0
casen = 1
wordnum = 0
wordlen = 0

words = list()
patterns = list()

while state != ST_EXIT :
	line = raw_input()

	if state == ST_HEADER:
		header = line.split(" ")
		wordlen = int(header[0])
		wordnum = int(header[1])
		cases = int(header[2])

		state = ST_WORDS
	elif state == ST_WORDS:
		words.append(line)
		if len(words) == wordnum:
			state = ST_PATTERNS

	elif state == ST_PATTERNS:
		expr = ""
		insor = False
		justin = False

		for i in range(len(line)):
			if line[i] == '(':
				expr += '('
				insor = True
				justin = True
			elif line[i] == ')':
				expr += ')'
				insor = False
				justin = False
			else:
				if insor == True:
					if justin != True:
						expr += '|'
					justin = False

				expr += line[i]

		p = re.compile(expr)
		matches = 0

		for i in words:
			if p.match(i) != None:
				matches+=1

		print "Case #%d: %d" % (casen,matches)
		casen+=1

		if casen-1 == cases:
			state = ST_EXIT
	else:
		state = ST_EXIT

exit()

Some comments on the code:

I like reading the file through a simple FSM, which tells me when to stop reading headers, test cases and when I should be done processing. Not all problemas can be solved this way (some inputs are more complex).
There must be a better way for building the regular expression based on the test case. You can review the submitted versions on the codejam page and look for alternatives (including other languages).
Assembling the expression is the “hard” work here 🙂

Happy coding! 😀

Automalia: done it is!

Tag Archives: first

Project hosting, tough choice

CodeJam 2009 Qualification A