Author Archive

RPy – simple and efficient access to R from Python

July 28, 2009

RPy is an interface that allows you to call R functions and handle R objects in Python.

R language


Using mutable objects as default parameter in Python

July 22, 2009

We just came across a weird Python behaviour (of course only weird if you don’t know why). If you use mutable objects such as lists as the default parameter in a function declaration, you might end up with some unintented behaviour. Everytime the function modifies the object, the default value is in effect modified as well (e.g. appending to the list). This is also explained in the Python documentation:

Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function, …

Find the whole thing here.

A Tutorial on Independent Component Analysis

July 21, 2009

can be found here.

pthreads – Some useful links and solution for maximum thread number

July 21, 2009

Manual Reference Pages (including a list of functions that are not thread-safe)

Tutorial on pthreads

And if you wonder why you cannot create more than X threads on your system (for me this was always 382), this forum provides a solution.

Basically, the problem is that each thread created will occupy space for its stack. On my system, the default thread stack size is 8MB. Therefore, after 382, I simply run out of space.

Solution: Change the stack size to a smaller value, unless you really need 8MB.

pthread_attr_t tattr;
size_t size;
pthread_attr_setstacksize(&tattr, size);

Another interesting page on pthreads is this.

UNIX Network Programming by Stevens

July 6, 2009

UNIX Network Programming Volume 1, Third Edition: The Sockets Networking API
by W. Richard Stevens; Bill Fenner; Andrew M. Rudoff

can be found online at safari books.

in addition:

just realised that it is not the complete version, just previews. sorry!

Profiling Code Using clock_gettime

July 6, 2009

Find a good explanation here written by Guy Rutenberg.

Set Thunderbird to use Gmail’s Trash folder

March 31, 2009

Sorting all my mail the other day, I realised that Gmail still keeps a copy of every email in the “All Mail” folder. Only if the messages are moved to Gmail’s “Trash” folder, they are eventually deleted. This cannot be set in Thunderbird directly afaik. Here is what you have to do.

sed one-liners

March 24, 2009

Handy one-liners for UNIX stream editor sed.

Statistics Textbook Online

February 19, 2009

This might be useful.

How do I read a huge file line by line in Python?

February 4, 2009

This is taken from here and was written by rupe.

In Python, the most common way to read lines from a file is to do the following:

for line in open('myfile','r').readlines():

When this is done, however, the readlines() function loads the entire file into memory as it runs. A better approach for large files is to use the fileinput module, as follows:

import fileinput
for line in fileinput.input(['myfile']):

the fileinput.input() call reads lines sequentially, but doesn’t keep them in memory after they’ve been read.