SIGSTP: 2013

Monday, 2 December 2013

Blobs in Postgresql with Perl

Are you using, or do you want to use Postgresql Blobs with Perl?

Providing you are in the Moose ecosystem, here is Pg::Blobs. Pg::Blobs is a role that adds blobs handling methods to any Moose based package.

Here is how to use it:

package My::App;
use Moose;
with qw/Pg::Blobs/;
# Just provide:
sub pgblobs_dbh{ .. return the DBH .. }
1;

Now your package My::App is capable of managing Postgresql blobs:

my $o = .. and instance of My::App ..
my $blob_id = $o->pgblobs_store_blob($binary_content);
print $o->pgblobs_fetch_blob($blob_id);

A couple of guidelines:

Blobs in Postgresql are just numeric Object IDs. You WILL have to store them in a OID column for later retrieval. If you don't, there is no guarantee they will persist, as any unreferenced blob can get vacuumed away.

You MUST wrap any blob operation in a transaction.

Happy blobing!

Tuesday, 17 September 2013

Email::Postman - Yet another email sending package

I know I know, there is Email::Sender, and Mail::Sendmail, the 'send' method of MIME::Lite, Mail::Mailer and probably other ones I don't know about on the CPAN and also the good old pipe to /bin/sendmail trick.

Each of them have their advantages, but none of them actually does what I wanted.

Email::Postman does all the following:

- It can send anything that's compatible with Email::Abstract (MIME::Entity, Email::Simple, rfc822 String, etc..)

- It will manage multiple To, multiple Cc and multiple Bcc nicely.

- It will report on each of the recipients individually.

- It will speak to MX servers directly, according to the recipient's domain, so you don't need to manage a separate MTA to send emails.

- It has got a straight forward interface:

my $postman = Email::Postman->new({ hello => 'my-domain.com',
from => 'postmaster@domain.com' });

my $email = ## any Email::Abstract compatible email.
Email::Simple->create(
header => [
From => 'casey@geeknest.com',
To => 'drain@example.com',
Subject => 'Message in a bottle',
],
body => '...',
);

my @reports = $postman->deliver($email);

The only thing you need to be careful about is that depending on the responsiveness of the distant MX servers, calling deliver can be slow. So ideally you want to use it through some asynchronous mechanism. Gearman is a very popular one.

Give it a go!

Jerome.

Thursday, 15 August 2013

The fuss about Gmail and emails.

According to some recent news, "Google has finally admitted they don't respect privacy,". Understand Google has "admitted" they scan Gmail emails for profit.

I've been using Gmail for quite a long time now and well, this is no news at all. It's always been quite clear that by using Gmail, you agree that Google will scan your email content to target advertisement. This is how the (great) service remains free. No mystery there. Nothing to admit really, and as Google puts it, it's always been part of their standard Gmail ordinary business practises. Since 2007.
It's 2013 and Consumer Watchdog called the "revelation" a "stunning admission". Are those people so dumb it took them 6 years to realise what gmail is?

The answer is probably yes, given the fact that John Simpson, Consumer Watchdog’s privacy project director has declared: "sending an email is like giving a letter to the Post Office. I expect the Post Office to deliver the letter based on the address written on the envelope. I don't expect the mail carrier to open my letter and read it.".

What is striking in this statement is that the man clearly doesn't know what he is talking about. Sending an email the standard way (meaning without using any additional encryption techniques) has never been like sending a letter (understand a message hidden in an envelope). From the late 60's/early 70's, sending an standard email is more like sending a post-card without envelope wrapping. It will arrive at destination, but you can't expect the postmen not to have a quick look at it. By the way, this is how spam/malware/virus detection works.

A good email analogy

The fact that this John Simpson is not already aware of that clearly demonstrate his lack of technical culture, which is quite ridiculous and worrying for a guy who is a "privacy project director". If I was a "privacy project director", I would probably encourage people who actually care about their privacy to use encryption techniques. Many solutions are available, and again this is no news at all and used routinely by businesses.

All of that makes me think that 20 or so years after the incredible expansion of the web, and 40 something years after the invention of emails, the general public (by that I mean people who are not in IT) is still lacking essential basic IT knowledge. Like how do emails work (not in details of course), or how a website (a simple one) works. Probably the solution resides on better basic computer education.

Tuesday, 13 August 2013

Get a B::Deparse piggy back through Gearmany

Gearman is a great tool to run asynchronous and/or distributed jobs over a cluster of machines. It's not a full general message queue ala RabbitMQ, but a rather minimalist piece of software that is very simple to use, does only one thing and does it very well.
To use gearman from your favourite language, the cpan provides two modules. The original Gearman and the more recent Gearman::XS. For this post we're going to use the original Gearman, but you should be able to implement the example using Gearman::XS without much trouble.
In a nutshell, writing for gearman is a two sides process, writing some client code and some worker code to actually do the job, with gearman just sitting in the middle and distributing the jobs:

Client(s) <--> Gearman Deamon <--> Worker(s)

Vanilla gearman

Let's have a quick pseudo code look at an asynchronous example inspired by the Perl package Gearman:

Worker:

my $worker = Gearman::Worker->new(...);
$worker->register_function('sum' => sub {my $job = shift; calculate sum of decoded($job->arg) and store it somewhere });
$worker->work();

Client:

my $client = Gearman::Client->new(...);
my $handler = $client->dispatch_background('sum', encoded([1, 2, 3]), { uniq => some string unique enough });
## wait for the job to be done
while( my $status = $client->get_status() && $status->running() ){
sleep(1);
}
.. Retrieve sum from the storage and print it ..

By the way 'encoded' and 'decoded' are entirely up to you, as long as they encode to bytes and decode from bytes. I personally use JSON, but it's a matter of taste (and performance but that's another story).
Also, I deliberately skip the task sets and synchronous mechanisms as this is not supposed to be a gearman tutorial :)
So here we go. You make sure that your gearman daemon is running, you fire up your worker script and while it's running, each time you run your client, it will print 6.
You feel great. You've written your first minimalist gearman application and you're ready to gearmanize the rest of your long running and/or easily distributable code.

Why this approach is a pain

So following the example, the temptation is great to just extend the worker like that:

$worker->register_function('my_long_specific_thing', \&gm_do_long_specific_thing);

I guess that if you're reading this post, you probably already have something in your model code that is long and already packed into a function:

package MyApp::Object;

sub do_long_specific_thing{
my ($self, $arg1 , ... ) = @_;
...
}

So in reality your gearman specialised 'gm_long_specific_thing' will probably look like that:

sub gm_long_specific_thing{
my($job) = @_;

my $args = decoded($job->arg());

my $application = .. Build or get application ..;

$application->get_object( build object getting from args )->do_long_specific_thing( build arg1 , arg2 from the args);
}

Then you know the rest of the story. Every time you need something else to be gearmanized or every time you need to make a change to the arguments of one of your gearmanized method, you have to propagate your changes to the specific gearman registered functions. Your code has become a bit less maintainable, just because you want the benefits of gearman.

Fixing it

But what if.. you could write something like that:

$client->application_launch(sub{ my $app = shift;
my ($oid) = @_;
$app->get_object($oid)->do_long_specific_thing(1, 'whatever');
},
[ $oid ] );

No more worker specific code to write. Everything is done from the application's point of view, leaving the back-end details out of the way.
Want to change the API of do_long_specific_thing? No problem, just apply parameter changes where they appear in the code. No more headaches propagating the API change through the gearman specific methods.
Want to make your gearmanized process longer without changing do_long_specific_thing? No problem:

$client->application_launch(sub{ my $app = shift;
my ($oid) = @_;
$app->get_object($oid)->do_long_specific_thing(1, 'whatever');
.. and something else ..
},
[ $oid ] );

What if you have another long thing to gearmanize? Well, you get the picture:

$client->application_launch(sub{ my $app = shift;
my ($oid, $arg1) = @_;
$app->get_object($oid)->do_another_thing($arg1);
.. and something else ..
},
[ $oid, $arg1 ] );

Implementing it

The client 'application_launch' method

Thanks to B::Deparse, we can turn any sub into a plain string. The rest is trivial, so

here we go:

## In some object that wraps the $gearman_client
## you can also inherit if you prefer. But I like wrapping more, cause you can store utilities.
sub application_launch{
my ($self, $code, $args ) = @_;

$code //= sub{};
$args //= [];

my $gearman_client = $self->gearman_client() OR just $self;
my $deparser = B::Deparse->new('-sC'); ## C style
my $json = JSON::XS->new()->ascii()->pretty(); ## I like pretty. I know it's larger but well..

my $code_string = $deparser->coderef2text($code);

my $gearman_arg = $json->encode({ code => $code_string,
args => $args });

my $uniq = Digest::SHA::sha256_hex($gearman_arg);

my $task = Gearman::Task->new('gm_application_do', \$gearman_arg, { uniq => $uniq });

my $gearman_handler = $gearman_client->dispatch_background($task);
unless($gearman_handler){
confess("Your task cannot be launched. Is gearman exposing the gm_application_do function?");
}
return $gearman_handler;
}

And that's pretty much it.

The worker 'gm_application_do' code

The worker code is very similar.

$worker->register_function('gw_aplication_do', sub{ _gm_application_do($application, shift) });
sub _gm_application_do{
my ($app , $job) = @_;

my $gm_args = $json->decode($task->arg()); ## Note you need a $json object.
my $code_string = $gm_args->{code};
my $code_args = $gm_args->{args};

my $code = eval 'sub '.$code_string;
$code || confess("EVAL ERROR for $code_string: ".$@); ## That shouldnt happen but well..

## This is not supposed to return anything, as we call that asynchronously.
&{$code}($app, @$code_args);
return 1;
}

Adapting to synchronous calls

Adapting that to synchronous calls is quite straight forward.
Remember gearman exposed functions should always return a single scalar.

Conclusion

gotchas:

This doesn't work with closures, so really your sub's should be pure functions and all parameters should be given as such.
As far as the magic goes, the parameters can ONLY be pure Perl structures; something that's serializable in vanilla JSON.
If you try to pass bless objects, bad things will happen.

Disadvantages:

Insecure. What if anyone injects sub{ destroy_the_world(); } in your gearman server. That's kind of easily fixed. Just implement some secure signing of the code in transit.
No strict control of what can be done through gearman. Developers enlightenment is the key here.
No strict control about what 'flavour' of gearman worker is running your code. Some people like to have specialised gearman workers exposing only a subset of functions. This can easily be fixed by adding a 'target' option to the application_launch method and exposing the same general purpose gm_application_do under different names on different machines. But again, choosing the right target falls under the developers responsibility.

To sum up the advantages:

Stable gearman worker code that's decoupled from the application code itself.
Flexible gearmanization of any application code you like.
Clarity of what's going on. No more parameter encoding/decoding to write, and no API change propagation through what should be infrastructure only code.

Perl offers us the flexibility that empowers us to clearly separate code that deals with different concerns.
As developers, we should take advantage of it and build reactive, flexible and generic enough business components.
As an infrastructure developer, I don't really know what people are going to do with gearman, nor should I care too much. This approach let me concentrate on what is important: the stability, scalability and the security of my gearman workers.
As an application developer, I don't care that I have to encode my functions parameters in a certain way. And I don't want to bother changing code in some obscure gearman module when I make changes to my business code. What I want is to use gearman as a facility that helps me design the best possible application without getting on my way too much.

Hope you enjoyed this post.

Until next one, happy coding!

Jerome.

Monday, 25 March 2013

Run Postgresql pg_dump from your scripts

If you use Postgresql, you probably noticed that the command line tools who come with it don't accept passwords on the command line.

That's a good thing, as it prevents other users of the system from spying your command line via ps and see your password.

Ok, but what if you want to run let's say pg_dump from your script? Easy, just export an environment variable PGPASSFILE that points to a file that contains your credentials. Note that this file MUST be in mode 0600, to prevent other users to look into it.

If you use Perl, and File::Temp, that's exactly what it does, so here's a Perl snippet that uses pg_dump:

my ( $cf, $cf_name ) = File::Temp::tempfile();
print $cf '*:*:*:*:'.$password."\n";
close($cf);
system('export PGPASSFILE='.$cf_name.'; pg_dump etc...');
## Don't forget to unlink cf_name to avoid disk pollution.
unlink $cf_name;

Also, don't forget to run in taint mode and sanitise everything you use to build your command. Managing correctly the returned value of the system call is left as an exercise to the reader :)

Happy coding!

J.

Thursday, 21 March 2013

Good old days Javascript

Good old days Javascript:

if( confirm('Are you sure?') ){ ... do stuff ... }

Modern days fancy Ecma script (abridged):

Think a lot
Write 100KB of untestable code
Instanciate a factory.
Build an anonymous function.
Don't forget to hoist your variables.
Fire a dozen async events in two different frameworks
Don't forget to use .promise()
Catch your events in a completely remote and random other place.
Build a jQuery UI modal dialog with appropriate call backs
Oh no, it's too slow in IE :(
Go back to step 1.

Wednesday, 13 March 2013

Mason - A Template system for us programmers

Templating Modules are a bit like editors. Every web application developer has a favourite one. And every template system is someone's favorite.

Mine is Mason. But not because the Perl MVC tutorials are full of examples using Mason. Not because it's the fastest (use xSlate if you want to trade speed for flexibility). Not because it's the most popular. It's my favourite for a very plain reason: It makes me more productive and allows me to develop web GUIs using all the powerful features a programmer should biterly miss if they are taken away. That includes writing Perl code :P

I never heard of a business that went down because they didn't have fast enough CPUs, so I'm not fussy about trading a few CPU cycles for elegance, ease and speed of development.

Enough talking. Here's my contribution to the template holy war and Mason lobbying presentation:

Tuesday, 5 March 2013

One more (good?) reason to consider NoSQL solutions

Btw, if you're convinced by this video - I know this is very unlikely - , Seven Databases in Seven Weeks is the book you'll like to read. It's very well written and gives you just the right amount of information so you can build yourself a broad and clear vision of what major NoSQL technologies can and cannot do for you.

Saturday, 2 March 2013

The only bad thing about Mason

At the moment, I'm writing a presentation about Mason2. The goal is to somehow convince my colleagues to consider using Mason. Instead of Template Toolkit.
As part of it, I thought I'd do a bit of performance benchmarking against Template Toolkit. So I put together a reasonably complex mini set of templates using both systems. Well, as complex as TT can take in reality. Which is not very much.

But anyway, let's jump straight to the bad news.

        Rate Mason    TT
Mason 901/s    -- -51%
TT    1852/s 106%    --

As you can see, with similar settings (code caching enabled) and over 10000 runs of two pages, Mason2 is twice as slow as Template Toolkit. Mason2 is still below the millisecond per page in my benchmark, but still that's quite a big difference.

As the common wisdom goes, CPU time is cheaper than programmer time, so it shouldn't be a big deal, given the huge gains of productivity you should get by using Mason2's very cool and powerful features.

Stay tuned for the full presentation to come!

PS: To follow up on the comments, feel free to have a look at the benchmark source https://bitbucket.org/jeteve/mason-pres/

J.

Thursday, 21 February 2013

Give your application a shell (and never write any ad-hoc scripts again)

As developers, we sometimes have to help operations going smoothly by fiddling with the data "by hand" because there's no GUI to allow people doing some rare and obscure things. One common way of doing it is by connecting to the DB and writing some SQL. But often, accessing your database is not enough, because you need your application code to be run.

And for that, you need a shell.

I mean a Perl application shell

Perl offers a great module to implement this: Devel::REPL . By the way, if your using a debian based distribution and if you're wondering about the quickest way to install a CPAN module, CPAN ↔ Debian is the site you want !

Let's say your application is called 'My::Bob'. In this example, it's fully implemented in the shell script itself, because well, it doesn't do much.

So here it is, the magic bob.pl shell for your My::Bob application:

use strict;
use warnings;
package My::Bob;
use Moose;
## Application bob just holds a name and a couple of methods.
has 'name' => ( is => 'rw', isa => 'Str' , default => 'Bob' );
sub hello{
my ($self) = @_;
return 'Hello, my name is '.$self->name();
}
sub count_to{
my ($self, $limit) = @_;
if( ( $limit // 0 ) < 1 ){ confess "limit should be >= 1"; }
return map{ $_ } 1..$limit;
}

package main;
## The shell code starts here
use Devel::REPL;

## You probably want to skip output buffering
$| = 1;
# Build an instance of your application.
our $BOB = My::Bob->new();

print "Hello, this is bob\n";

my $repl = Devel::REPL->new;

## Inject the instance of bob in the shell lexical environment.
$repl->load_plugin('LexEnv');
$repl->lexical_environment->do(q|my $bob = $main::BOB ;|);

## Tell something useful in the prompt
$repl->load_plugin('FancyPrompt');
$repl->fancy_prompt(sub {
my $self = shift;
sprintf ('Bob (%s) > ',
$BOB->name()
);
});

## Various autocompletion.
$repl->load_plugin('CompletionDriver::LexEnv');
$repl->load_plugin('CompletionDriver::Methods');
$repl->load_plugin('CompletionDriver::INC');

## Allow multiline statements.
$repl->load_plugin('MultiLine::PPI');

# And run!
$repl->run();

print "\nThanks for using Bob. Bye\n";

Now, let's look at a session:

$ perl bob.pl
Hello, this is bob

Bob (Bob) > $bob->hello(); ## Call any method.
Hello, my name is Bob Bob (Bob) > $bob->name('Bobbage'); ## Change of name is reflected in the prompt
Bobbage Bob (Bobbage) > $bob->count_to(3) ## Arrays are printed
1 2 3

Bob (Bobbage) > my $count_hello = sub{ my ($n) = @_; ## Define multiline subs re.pl(main):005:1* map{ $bob->hello() } $bob->count_to($n); re.pl(main):006:1* }

CODE(0x2fcbea0)
Bob (Bobbage) > &$count_hello(3) ## And call them
Hello, my name is Bobbage Hello, my name is Bobbage Hello, my name is Bobbage

Bob (Bobbage) > ## CTRL-D ends the session
Thanks for using Bob.

Hope this short example was useful to you. If you can build your application as an object, writing your own shell should be straight forward. Also you might want to add some credential checking before you let anyone playing with the guts of your application.

Until next time, happy coding!

Jerome.

Thursday, 17 January 2013

What Javascript programmers should learn from Perl mistakes

Although I spend most time developing stuff in Perl, sometime I have to fix Javascript code that's been written by JavaScript specialists. Sorry, ECMA script as it should be called. And the more I look at 2013 JavaScript programming, the more I can see some similarity with the state of Perl programming 12 years ago.

Remember the Perl obfuscated code contest? Basically it was all the "serious" Perl programmer boasting about being able to write the most ~~f...d up~~ original version of Yet Another Perl Hacker. While this was quite fun and made quite a good showcase of Perl's flexibility and expressiveness, some Perl programmers of the time were not quite serious enough and started to (unwillingly ?) use obfuscation techniques in production code, gaining Perl a reputation of being unreadable and unmaintainable, thus not fit for serious industrial development. You know the rest of the story: PHP, the offspring of 1994 Perl took over.
As programmers, we know that you can write unreadable and unmaintainable code in any language, and you can certainly write elegant things in any language too. Although I'm not sure about brainf.ck. The problem lies in the fact that non-programmers easily confuse the Programming language itself and the nasty habits of programmers. Have enough people write bad *any language* code, even for fun, and *any language* will gain a bad reputation. It comes from the extraordinary appetite our lazy human brains have for generalisation. Show enough retarded programs on TV and very soon people will think that the TV media itself is retarded.
So please, Javascript programmers, stop writing stuff like that:

var layout_options = {}, optgroups = {
globals: 'Public',
userones: 'Private'
};
Or like that:
var block = conf[i], options, sel_conf, k, j;
Or like that:
opt.lytId == cfg.actLyt_id && opt.lytType === cfg.actLytType && (option_conf.selected = true);

Maybe some prehistoric browsers will actually run that faster than if you were using the 'var' keyword sensibly or if you were using good old 'if' statements. Otherwise, I think it's safe to assume these are completely useless optimizations, paid at the high price of unreadability. Plus, you know, text editors have auto-completion, so it's useless too to save keystrokes on variable names, specially if your code goes though some optimizer/packager in production.
Maybe I shouldn't rant about Javascript programming, maybe stuff like CoffeeScript or Dart will take over and it JavaScript will only live as an assembly language. Personally I have the feeling it would be a good thing. A higher level - cleaner language compiled to JavaScript taking all browsers specificities into account. Why not? I don't see anything wrong with that.
But JS programmers, if you love JavaScript and want to make sure it's still used in its current form over the next few years, stop writing code that's only barely readable by your fellow gurus.

Edit: Actually, someone has pointed out to me that Perl is still more popular than PHP in terms of Job offers.

Perl, Python, Php, Ruby trends

Perl jobs | Python jobs | Php jobs | Ruby jobs

Although PHP easily wins on Google fight.