Create your own textformat and parse it
For my cv i maintain a list of all projects i worked on to give a bit more information than a typical german cv does. Each project entry contains a name and description, what my job and role was and which technology was mainly used and when it was. I used OpenOffice several years for this task, but was never quite lucky with it as i always had to fix the layout after i added a new entry... But this fall it bothered me once too often, so i had to code something :-)
My requirements
- simple text format; should be readable and writable in a normal texteditor, so no xml
- should output pdf
My way:
I looked at various ways to produce pdfs and decided that i didn't want to mess with pdf generation directly, specially layouting. But i can outout html pretty fast and there are plenty solutions out there with will "convert" html to pdf. So, pdf generation is not covered here (I used wkhtmltopdf ).
The textformat
<starttime> - <endtime> <name> <description> Job <jobdescription> Role <roles in project> Technology <technoligies used>
At least the starttime needs to be in the format MM.YYYY. The keywords Job, Role and Technology are case-sensitive.
Example entry
10.2009 - 10.2010 Ich bin der Projektname Es ging um a b und c Job OOAD, Aufwandsschätzungen Role Architekt, Entwickler, Tester Technology Java, Tomcat 5/5.5/6
Parsing the format
I've implemented the parser as a simple state machine with no syntax tolerance; so formating errors might break it. And i allowed markdown inside of the project and job description. The code is commented and should be self-explanatory. Feedback is appreciated.
The Code:
# -*- coding: utf-8 -*- import re, os, sys from jinja2 import Environment, FileSystemLoader import markdown #define states state_time, state_name, state_desc, state_job, state_role, state_tech = range(6) #setup jinja2 template_store = '.' env = Environment(loader=FileSystemLoader(template_store)) projects = {} state = state_time current_project = None key = 0 if len(sys.argv) != 2: exit("No project list given") print "Reading file %s" % sys.argv[1] #start reading project file line by line for line_raw in open(sys.argv[1], 'r').readlines(): line = unicode(line_raw, "utf-8") if re.match("[0-9]{2}\.[0-9]{4}.*", line): time = line.strip() projects[key] = {} projects[key]['time'] = time current_project = projects[key] current_project['desc'] = '' current_project['job'] = '' current_project['role'] = '' current_project['tech'] = '' state = state_name key += 1 elif state == state_name: current_project['name'] = line.strip() state = state_desc elif state == state_desc: if not re.match("Job", line.strip()): current_project['desc'] += line else: state = state_job current_project['desc'] = markdown.markdown(current_project['desc']) elif state == state_job: if not re.match("Role", line.strip()): current_project['job'] += line else: state = state_role current_project['job'] = markdown.markdown(current_project['job']) elif state == state_role: if not re.match("Technology", line.strip()): current_project['role'] += line else: state = state_tech elif state == state_tech: current_project['tech'] += line print "Successfully build the projectlist" print "Generating the html page" template = env.get_template('template.html') content = template.render(projects=projects).encode('utf-8') path = os.path.join("index.html") file = open(path, 'wb') file.write(content) file.close() print "Done"
Usage in the template:
{% for pkey in projects.keys() %}
<table class="event">
<tr class="dummy">
<td> </td>
</tr>
<tr>
<td class="date"><h1>{{projects[pkey]['time']}}</h1></td>
<td>
<h1>{{projects[pkey]['name']}}</h1>
{{projects[pkey]['desc']}}
<h3>Tätigkeiten</h3>
{{projects[pkey]['job']}}
<h3>Rollen</h3>
{{projects[pkey]['role'] |trim |replace("\n", "<br/>")}}
<h3>Technologien</h3>
{{projects[pkey]['tech'] |trim |replace("\n", "<br/>")}}
</td>
</tr>
</table>
{% endfor %}




By: Jens in
on 27 December 2010 at 23:46 Michael Aye said …
Is there any advantage of doing it this way compared to use Latex with a CV stylefile? (Apart from the obvious that Python is readable? ;) )
on 28 December 2010 at 06:33 aah said …
Very good job!!!
Now, I'm thinking to use your script to generate Markdown, but rendering PDFs with Pandoc (http://johnmacfarlane.net/pandoc/).
Thank you!
on 28 December 2010 at 15:10 Jens said …
@Michael: probably not. I looked at latex and its styling. Didn't like the syntax and found it too complicated for my needs. And i'd wanted to write a simple parser anyways :-)
I even thought about using it as an excuse to dig into antlr...
@aah: Thanks. How do you style/layout your pdf with Pandoc? I miss this information in almost all tools i found so far.