NYCJUG/2010-01-12/PerlIdiomKeysMapEG

From J Wiki
Jump to navigation Jump to search

Example of good explanation - from http://www.perlmonks.org/?node_id=280658 - of a Perl idiom. This is good because it gives examples and is followed by comments from various people elaborating on usage and and efficiencies in relation to the construct.

Perl Idioms Explained - keys %{{map{$_=>1}@list}}

by broquaint (Abbot) on Aug 04, 2003 at 12:55 UTC ( #280658=perlmeditation: print w/ replies, xml )

Perl Idioms Explained - keys %{{map{$_=>1}@list}} =

my @list = qw/a b c d d a e b a b d e f/;
my @uniq = keys %{{ map {$_ => 1} @list }};

print "@list\n@uniq\n";
__output__
a b c d d a e b a b d e f
e f a b c d

Explanation

The above idiom is a simple way of creating a list of unique values from another list, as the output of the code aptly demonstrates. However, with all those curly braces it may not be immediately obvious what's going on, so let's break it down.

map { $_ => 1 } @list

This is pretty straight-forward - create a list of key-value pairs where the keys are the values from @list.

{ map { $_ => 1 } @list }

The surrounding pair of curly braces creates an anonymous hash which is populated with they key-value pairs from the map statement. So we now have a hash reference to an anonymous hash who's keys are the elements from @list, and because hash keys are unique, the keys of the anonymous hash represent a unique set of values.

keys %{ { map { $_ => 1 } @list } }

Finally, with the last pair of curly braces the hash reference to the anonymous hash is dereferenced and we get its list of keys.

Caveats

Because this idiom creates a list, and then a hash, both of which are immediately disposed of, it's not suited for large lists. Also, because the keys of a hash aren't ordered, the list of unique values returned will be in a random order (although this can often be remedied with a simple sort).

Summary

A short (memory hungry) way to get a list of unique values from a given list.


broquaint

Comment on Perl Idioms Explained - keys {{{ %{{map{$_=>1}@list}} }}}


Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by Juerd (Abbot) on Aug 04, 2003 at 13:02 UTC

In the same category, to see if $bar is in @foo:

my %foo = map { $_ => 1 } @foo;

if ($foo{$bar}) {
...
}

or faster and using less memory:

my %foo;
undef @foo{@foo};

if (exists $foo{$bar}) {
...
}

Of course, for a single test, grep is easier and often faster:

if (grep $_ eq $bar, @foo) {
...
}

Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }


Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by diotalevi (Canon) on Aug 04, 2003 at 13:23 UTC

Instead of using => 1, it is valid and cheaper to use => undef.


Re: Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by RMGir (Parson) on Aug 04, 2003 at 15:34 UTC

I'm not disagreeing, but I am curious. Why is it cheaper? One less node allocated? (Oh, I think I see... PL_sv_undef gets reused instead of a new SV?) This sounds like a job for those perl internals diagrams someone did at some point... Wish my memory was better. -- Mike


Re: Re: Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by diotalevi (Canon) on Aug 04, 2003 at 15:41 UTC

Yes. Instead of one scalar of the value '1' for every key, you get to share the same undef value for all the keys and thus don't have to allocate tons of memory you aren't going to use anyway. So

keys @{{map { $_, undef } @ary}}

has the potential to be a lot cheaper than

 keys @{{map { $_, 1 } @ary}}

The "a lot" part is just related to how many values are in your array. You burn one extra value for every entry in @ary and I'm just suggesting that you don't have to do that in this case.


Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by dragonchild (Archbishop) on Aug 04, 2003 at 14:06 UTC

Along similar lines, I will often create a hash of keys which are needed for lookup. For example, I will do:

my %acceptable_server = map { $_ => 1 } qw(
DEV
QA
PROD
);

It's a nice way of working with a changing list of static acceptable things.


We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.


Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by thinker (Parson) on Aug 04, 2003 at 15:46 UTC

Hi broquaint, or, to keep the order that the items are inserted (ie. the order in which the first instance of a value is encountered) my %seen;

@uniq = grep ! $seen{$_}++, @list;

cheers thinker


Re: Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by LanceDeeply (Chaplain) on Aug 04, 2003 at 18:58 UTC

This is the way I've usually seen it done. I was curious, so I ran a benchmark against the two. use Benchmark;

my @list;
for ( 0..9999 )
{
push @list, sprintf "%d", 100 * rand ;
}

timethese(
1000,
{
'keys_map' => sub { my @uniq = keys %{{ map {$_ => 1} @list }}
+; },
'grep_seen'    => sub { my %seen; my @uniq = grep ! $seen{$_}+
++, @list; },
}
);

Yields the following output. Benchmark: timing 1000 iterations of grep_seen, keys_map... grep_seen: 13 wallclock secs (11.17 usr + 0.00 sys = 11.17 CPU) @ 89 +.52/s (n=1000) keys_map: 30 wallclock secs (29.28 usr + 0.00 sys = 29.28 CPU) @ 34 +.15/s (n=1000)


Re: Re: Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by RMGir (Parson) on Aug 04, 2003 at 20:28 UTC

Adding

'keys_map_undef' => sub { my @uniq = keys %{{ map {$_ => undef} @list
+}}; },

to test the undef suggestion, it turns out to be 15-20% faster than using 1. grep still wins, though. -- Mike


Re^2: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by Aristotle (Chancellor) on Aug 04, 2003 at 20:12 UTC

Note that this is slightly broken as is. To be entirely correct, you have to say

my (%seen, $seen_undef);
my @uniq = grep defined() ? !$seen{$_}++ : !$seen_undef++, @list;

Of course, if you're fiddling with objects which cannot be compared for equity by stringification, it is still broken. Makeshifts last the longest.


Re3: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by dragonchild (Archbishop) on Aug 04, 2003 at 20:14 UTC

Of course, if you're fiddling with objects which cannot be compared for equity by stringification, it is still broken. Well, so is every uniquification based on the keys of a hash! The point still stands that grep is faster than the original idiom presented, by orders of magnitude, if still as memory-hungry.


We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.


Re^4: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by Aristotle (Chancellor) on Aug 04, 2003 at 20:19 UTC

Re: Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}} by Jasper (Chaplain) on Aug 04, 2003 at 22:21 UTC

You can also grep lists for certain 'numbers' of entries (so if you wanted only the items that were in a list twice)

@doubles = grep ++$seen{$_} == 2, @list;

Jasper


Re: Re: Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by cees (Curate) on Aug 05, 2003 at 03:32 UTC

That would be 2 or more times right? Once ++$seen{$_} == 2 is true, you are immediatelly copying $_ to @doubles. So if it appeared a third time, you couldn't magically remove it again. Your code would be clearer if you used >= to emphasize that. The following would probably do if you only wanted entries with exactly 2 occurences: @doubles = grep $seen{$_} == 2, grep !$seen{$_}++, @list; But it is not pretty, or obvious. The first grep counts all occurences and only passes the first found entry to the next grep which will check to see how many were actually found. There must be a better way! - Cees


Re^3: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by Aristotle (Chancellor) on Aug 05, 2003 at 17:51 UTC

No, that won't fly. You need

$seen{$_}++ for @list;
my @doubles = grep $seen{$_} == 2, @list;

Makeshifts last the longest.


Re: Perl Idioms Explained -  keys %{{map{$_=>1}@list}}

by Aristotle (Chancellor) on Aug 04, 2003 at 20:18 UTC

Just as memory hungry, still not order preserving, and still broken with regards to undefined values, but faster approach:

my %seen;
@seen{@list}=();
my @uniq = keys %seen;

And if you want to wrap it up without "leaking" lexicals:

my @uniq = do {
my %seen;
@seen{@list}=();
keys %seen;
};

This approach does not (explicitly) iterate - more of the data structure setup work is done implicitly by perl. Makeshifts last the longest.