NYCJUG/2024-08-06

From J Wiki
Jump to navigation Jump to search

Beginner's regatta

We look at how J's parsing rules affect precision of numbers.

Maintaining Precision

From recent discussions on the J forum, it seems that a lot of people are confused about what we need to do in a J expression to avoid losing precision.

For instance, why does this expression return what looks like an incorrect result?

   -/x: 1234567891000000000001 1234567891000000000000
0

We know that J evaluates from right to left, so it firsts evaluates the two numbers before applying x:; however, the two long numbers like the ones shown are represented as floating-point because of their magnitude, so the result of x: is applied after the numbers have already lost precision due to floating-point representation.

   x: 1234567891000000000001 1234567891000000000000
1234567890999999987712 1234567890999999987712

We see that, because of floating-point's limited precision, the two numbers are the same; x: does not work on the integers we see displayed in the input expression. On the other hand, if we suffixed these long numbers with an "x", we get this:

   -/1234567891000000000001x 1234567891000000000000x
1

High-precision Fractions

What if we want high-precision non-integral values? Let's say we wanted π (pi) to 101 decimal digits? Clearly something like PI101=: 3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798 won't work for the same reason as our initial problem above: J evaluates the number as floating point before applying x: as we saw.

In this case, we have to back into the number by evaluating something like this:

   PI101=: 314159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798x % 10x^101
   PI101
157079632679489661923132169163975144209858469968755291048747229615390820314310449931401741267105853399r50000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

It's hard to tell if this is correct so let's look at it under decimal formatting:

   103j101 ": PI101
3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798

Some useful high-precision functions can be found here.

Show-and-tell

We look at a small modification to the steganography code covered last month and an example of finishing off the work on modifying executables. These were both presented at the HOPE XV conference last month.

In addition, we extend the technique for locating information we used for executables to binary formats of images and uncover a defect in some commercial image-processing software.

Slight Steganography Simplification?

Looking at the ultimate solution we came up with last time for hiding text in an image, the latter part of it seems a little klugy. To put it in context, we'll repeat the steps, edited for brevity, leading up to the section in question.

   load '~Code/imageProcessing.ijs'                          NB. Basic image processing routines
   $binpic=. (8$2)#:rgb3 read_image 'GreatlyReducedGWoK.png' NB. "rgb3" for 3-plane representation -> image as bits
226 330 3 8
   $bintxt=. ,(8$2)#:a. i. text                              NB. Encode characters as 8-bit Booleans
12088
   $,/,/binpic                                               NB. Flatten into (# of pixels) by (8 bits) table
223740 8
   $newgwok=. bintxt (<"1]7,.~i.#bintxt) } ,/,/binpic        NB. Write bits of text into low-order bits of RGB sub-pixels
223740 8

We write this altered image to a new file, reshaping it back to the shape of the original image on the fly:

  (#.226 330 3 8$,newgwok) write_image 'newgwok.png'

The two related parts that seem awkward are where we lose information by collapsing the initial dimensions of gwok then have to retrieve this information to reconstruct the proper shape of the array before we write it out.

Something else I just noticed about the last expression above is the apparently write_image is smart enough to do the right thing when you supply a 3-D RGB version of the image instead of the 2-D combined version like the one we read in using read_image.

It would be nice to update the low-order bits of the 4-D Boolean image without collapsing it first. Since we are supplying the specific indexes of the locations to which to write the bits of the text - with <"1]7,.~i.#bintxt - why not directly create the indexes for the 4-D image?

Creating Four Dimensional Indexes

So, using the completion backwards principle, we want a set of indexes which start like this:

+-------+-------+-------+-------+-------+
|0 0 0 7|0 0 1 7|0 0 2 7|0 1 0 7|0 1 1 7|
+-------+-------+-------+-------+-------+

We have "7" as the final index in all cases because we target the low-order bit in each RGB sub-pixel of the image.

The parts of the index before this last one represent counting through the consecutive sub-pixels in order. Looking at the first three above, we see that they address each RGB plane for the first pixel and the final two address two planes of the second pixel, and should continue like this for each plane of each pixel for each row and column of the image. This is a mixed-base number, something very easy to create in J.

The basis is all but the last part of the shape since we've already accounted for the last index by forcing it to be "7". We want one index for each bit in the text, so we can write something like this to get the 4-D indexes we want:

   5{.ixs=. <"1]7,.~(}:$binpic)#:i.#,bintxt
+-------+-------+-------+-------+-------+
|0 0 0 7|0 0 1 7|0 0 2 7|0 1 0 7|0 1 1 7|
+-------+-------+-------+-------+-------+
   _5{.ixs
+---------+---------+---------+---------+---------+
|12 83 2 7|12 84 0 7|12 84 1 7|12 84 2 7|12 85 0 7|
+---------+---------+---------+---------+---------+

This allows us to update the image without losing the shape information:

   $newgwok2=. bintxt ixs}binpic
226 330 3 8

Finding "For" Loops in an Executable

To re-cap some of what we covered last month, we are interested in modifying executables directly and have concentrated on "for" loops. By compiling two very similar C programs which differ only in the loop limit, we can compare the bytes of the two resulting executables to locate the bytes denoting this value. Once we've done this, we can modify those bytes directly in J and write back the result to get a new executable with different behavior than the other two.

Finding Loop Limits

Here is the source for one C program:

// TestNew01.cpp : single "for" loop
#include <iostream>
int main()
{
    for (int ii = 0; ii < 4; ii++) { std::cout << ii << "\n"; }
}

Running the code:

>TestNew01
0
1
2
3

Another very similar C program (also compiled with Visual Studio 19):

// TestNew02.cpp : single "for" loop with limit of 5
#include <iostream>
int main()
{
    for (int ii = 0; ii < 5; ii++) { std::cout << ii << "\n"; }
}

We read the executables for these two programs and compare them to each other, expecting to find the difference that is the different loop limit in each case.

   'fl1 fl2'=. fread&.>'TestNew01.exe';'TestNew02.exe'
   #&>fl1;fl2
49152 49152
   fl1-:fl2             NB. They have the same length but are not identical.
0
   +/fl1~:fl2           NB. How many bytes differ?
22
   ]ixs=. I. fl1~:fl2   NB. Where is each difference?
232 6330 36064 36092 37196 37197 37198 37199 37200 37201 37202 37203 37204 37205 37206 37207 37208 37209 37210 37211 37252 37268

Concentrating on just the first few differences, we find what we are looking for:

   (<a.)i.&>(<4{.ixs){&.>fl1;fl2            NB. Values of the first 4 differences
 12 4  12  12
100 5 100 100

We see that the 2nd index value turns up 4 and 5 as the differences in the two executables which we know are the loop counters. To test this, we replace that byte with one denoting 9, write this to a new executable, then run it. Here is what we see:

   1{ixs                          NB. The position we want to modify
6330
   fl3=. (9{a.) 6330 } fl1        NB. Replace with byte representing the number "9"
   fl3 fwrite 'TestNew03.exe'     NB. Write new file with this modification.
49152

Running this:

>TestNew03
0
1
2
3
4
5
6
7
8

We have successfully located the loop limit and altered the executable without re-compiling it.

Guessing at "for" Loop Indicator

The following work was based on another set of executables with multiple "for" loops where we found numerous places for loop limits. We look at the bytes around a few of these locations to see what they have in common. Looking around multiple FOR loop counter limits.jpg

The values of the loop limits are bolded and some recurring byte pairs are italicized.

Testing Our Guess "in the Wild"

Let's look at an executable we did not compile to see how many instances of each suspicious byte pair from above that we find.

   $bej=. fread 'WinBej.exe'
1028096
   +/(72 137 193 232{a.) E. bej
0
   +/(194 72 139 5{a.) E. bej
0
   +/(199 69{a.) E. bej
1910

The first two are busts and the third one gives a lot of hits. Enumerating the locations of these hits:

   ]ixs=. I. (199 69{a.) E. bej
6170 6922 6931 7936 7945 8082 8411 8421 9366 9771 9795 10339 10359 12979 12989...

Let's look at the surrounding neighborhoods of some of these hits.

   a. i. bej{~6170+i.20
199 69 252 2 0 0 0 232 154 211 0 0 80 141 77 136 232 129 61 1
   a. i. bej{~6922+i.20
199 69 240 8 0 0 0 139 216 199 69 244 8 0 0 0 139 59 133 255
   a. i. bej{~6931+i.20
199 69 244 8 0 0 0 139 59 133 255 116 105 221 71 8 220 5 8 52

There are a couple of patterns here. The first is that each pair of target bytes is followed by a different byte (252, 240, 244, respectively), then by four bytes looking suspiciously like a 4-byte integer: a low number followed by four zeros. Let's try systematically incrementing or decrementing what we are guessing is a loop limit which is three past the target bytes whose locations we have found.

We define a utility bumpDown to decrement this value for a given location and apply it to create some new executables.

   bumpDown=: 4 : '(a.{~<:a. i. (3+x){y) (3+x)}y'   NB. Decrement 3+location
   (wix bumpDown bej) fwrite 'WinBej',(":wix),'.exe' [ wix=. 0{ixs
1028096
   (wix bumpDown bej) fwrite 'WinBej',(":wix),'.exe' [ wix=. 1{ixs
1028096
   (wix bumpDown bej) fwrite 'WinBej',(":wix),'.exe' [ wix=. 2{ixs
1028096
   (wix bumpDown bej) fwrite 'WinBej',(":wix),'.exe' [ wix=. 3{ixs
1028096
   (wix bumpDown bej) fwrite 'WinBej',(":wix),'.exe' [ wix=. 4{ixs
1028096

Some of the changes we made, especially when we experimented with incrementing by one instead of decrementing, often quickly gave an error like the following.

Fatal Error example fo altered WinBej .jpg

However, on the fourth one, it appears to run correctly until a certain condition is encountered.

   shell 'WinBej7945.exe'
NB. Jackpot!

Testing this version, we discover that forming a set on the bottom row triggers the error we have introduced by corrupting what we think is a "for" loop. The playing screen freezes, disallowing further input, but with the points counter continuing to increase.

Crazy WB2.JPG

Advanced topics

We look at a group of "toolbox languages" - which includes J - then at issues surrounding employment in general and in technology.

Toolbox Languages

This article, by Hillel Wayne, lists a set of useful languages that are "...good at solving problems without requiring third party packages." The author notes his default languages in this category are Python and shell scripts but that there are less well-known ones he outlines as follows.

AutoHotKey

This is basically “shell scripting for GUIs”: "a tool to smooth over using unprogrammable applications. It’s Windows-only but similar things exist for Mac and Linux." Anything that helps us break out of the "GUI jail" is good by me.

Useful features:

  • You can configure shortcuts that are only active for certain programs, if a global flag is set, if certain text appears on screen, etc.
  • Simple access to lots of basic win32 functionality. Opening the file selection dialog is just f := FileSelect().
  • The GUI framework is really, really good. Honestly the best of any language I’ve used, at least for small things.

This is followed by an example of adding mouse shortcuts to Audacity, as well as others.

J

An array language, like APL. Really good at doing arithmetic on arrays, hair-pullingly frustrating at doing anything with strings or structured data. I used to use it a lot but I’ve mostly switched to other tools, like Excel and Raku. But it’s still amazing for its niches.

Useful features:

  • It is insanely terse. Things that would take a several lines in most languages take a few characters in J, so I like it for quickly doing a bunch of math.
  • First-class multidimensional arrays. + can add two numbers together, two arrays elementwise, a single number to every element of an array, or an array to every row (or column) of an higher-dimension array.
  • There are lots of top-level primitives that do special case mathematical things, like decompose a number into its prime factors.

This is followed by examples of finding prime factors and calculating possible interleavings of a certain number of processors running an algorithm with a certain number of steps. (Interleaving refers to the number of different ways a set of concurrent processes can process a certain number of statements.)

Frink

Possibly the most obscure language on this list. Frink is designed for dimensional analysis (math with units), but it’s also got a bunch of features for covering whatever the developer thinks is interesting. Which is quite a lot of things! It’s probably the closest to “a better calculator” of any programming language I’ve seen: easy to get started with, powerful, and doesn’t have the unfamiliar syntax of J or Raku.

Useful features:

  • Lots of builtin units and unit modifiers. calendaryear is exactly 365 days, tropicalyear is 365.24, and half nanocentury is about 1.6 seconds.
  • Date literal notation: # 2000-01-01 # - # 200 BC # is 2199.01 years.
  • There’s a builtin interval type for working with uncertainties. It’s a little clunky but it works.

The Frink examples include calculating what date someone with a given birthdate becomes one billion seconds old, calculating rates of speed in different units, and doing calculations with built-in variance.

Raku

Raku (née Perl 6) is a really weird language filled to the brim with dark magic. It’s very powerful and also very easy to screw up. I’m not yet comfortable running it for a production program. But for personal scripting and toolkits, it’s incredible.

Useful features

  • You can define your own infix operators! And postfix operators. And circumfix operators.
  • Lots and lots of syntactic sugar, to a level that worries me. Like instead of [1, 2] you can write <1 2>. And instead of ["a", "bc"] you can write <a bc>. Raku Just Knows™ what to do.
  • If you define a MAIN function then its parameters are turned into CLI arguments.
  • Multimethods with multiple dispatch, based on runtime values. Combining this with MAIN makes small CLI tooling really easy.
  • Many of the mathematical operators have unicode equivalents (like ∈ for `(elem)`), which synergizes well with all of my AutoHotKey hotstrings.

The example uses of Raku include generating three random strings of lowercase letters and copying SVG ids into "inkscape labels". (Inkscape is open-source software for creating and editing vector graphics represented by mathematical equations.)

Picat

My newest toolbox language, and the language that got me thinking about toolboxes in general. A heady mix of logic programming, constraint solving, and imperative escape hatches. I first picked it up as a Better Constraint Solver and kept finding new uses for it.

Useful features:

  • Assignment to variables. Shockingly useful in a logic language. Lots of problems felt almost right for logic programming, but there’d always be one small part of the algorithm I couldn’t figure out how to represent 100% logically. Imperative provided the escape hatch I needed.
  • The planner module. I love the planner module. It is my best friend. Give it a goal and a list of possible actions, Picat will find a sequence of actions that reaches the goal. It is extremely cool.

Examples using Picat include solving an algebra problem and figuring out a vacation plan subject to a list of activities and the constraints around them.

What makes a good toolbox language?

Most of the good toolbox languages I’ve seen are for computation and calculation. I think toolbox languages for effects and automation are possible (like AutoHotKey) but that space is less explored.

A toolbox language should be really, REALLY fast to write. At the very least, faster than Python. Compare “ten pairs of random numbers”:

Language Expression
Python
from random import randint
[(randint(10), randint(10)) for _ in range(10)]
Raku
^10 .roll(2) xx 10
J
10 2 ?@$ 10

A few things lead to this: a terse syntax means typing less. Lots of builtins means less writing basic stuff myself. Importing from a standard library is less than ideal, but acceptable. Having to install a third-party package bothers me. Raku does something cool here; the Rakudo Star Bundle comes with a bunch of useful community packages preinstalled.

If you can do something in a single line, you can throw it in a REPL. So you want a good REPL. Most of the languages I use have good repls, though I imagine my lisp and Smalltalk readers will have words about what “good REPL” means.

Ideally the language has a smooth on-ramp. Raku has a lot of complexity but you can learn just a little bit and still be useful, while J’s learning curve is too steep to recommend to most people. This tends to conflict with being “fast to write”, though.

Other tools I Want in my Toolbox

  • jq for json processing
  • Javascript, so I can modify other people’s websites via the dev console
  • Some kind of APL that offers the benefits of J but without the same frustrations I keep having
  • A concatenative PL if I ever find out what small problems a CPL is really good for
  • Something that makes webscraping and parsing as easy as calculation. Requests and bs4 ain’t it.

10 New Toxic Employer Behaviors

This essay outlines what the author claims are new behaviors by employers which hurt people looking for work.

Corporations control the labour market, dictate the quality and quantity of jobs according to their current mood or projections for the future, fire large percentages of their workforce at will and push out high-earners for low wage or even non-paid bright young “interns” on a routine basis.

As a society we seem to have pretty much accepted and come to terms with the fact that a small group of profit-driven psychopaths currently dominate, organize and arrange the soulless economy we must now navigate and depend upon in order to survive.

This sets the tone for a very impassioned list of what employers routinely do that hurts job-seekers and employees.

  1. Ghost Jobs
    ‘Ghost jobs, also known as fake jobs, permeate the job market. While these positions appear online, they are either already filled or non-existent. In some cases, employers keep a job posting up even though they don’t intend to fill the position anytime soon.’ — Forbes (May 2024). The Forbes article claims this is to stay open to new talent, to keep current employees motivated, and to give the impression the company is growing. This doesn't mention the common practice of listing a job with unrealistically low salary requirements in order to legally justify hiring cheaper off-shore workers.
  2. Five-Step Application Processes

    On Youtube, Tiktok, Reddit and elsewhere, it is currently very common to hear job-hunters complaining and decrying the insane hoops potential employees are being forced to jump through to land even the most basic positions.

    Set aside the futility of submitting both a cover letter and CV only to fill out the same information again in a pre-set form later, but these days we are also expected to do 2,3,4 and even sometimes 5 ROUNDS of interviews with the company before we know where we stand or even what the salary might be.

  3. Stringing Along
    This is where potential employers sporadically ask for more information while possibly hinting that you are only a few steps away from getting the job. It is often a preliminary to ghosting.
  4. Ghosting
    Employers typically don't respond to an application at all and never explain their reasons for not choosing you.
    A poll conducted in 2023 in the UK by People Management found that:
    ‘The majority (92 per cent) of people have been ghosted during the job application process by hirers.’
  5. Automated Rejection Emails
    On the rare occasion employers do confirm a rejection, their emails often contain the same generic list of vague reasons for it.
  6. Salary Low-Balling
    Employers routinely offer embarrassingly low wages for extremely high-skilled jobs that demand years of training and education.

    When they ask you what your “salary expectations” are, they’re silently assessing how desperate you may be for the work and how much they can exploit you.

    This tactic is a form of psychological warfare and forces workers into a miserable race to the bottom that none of us can afford.

  7. Strings Attached

    Ok, so you finally swallowed the little bit of dignity and self-esteem you had left in order to be low-balled on working fifty hours per week in a soulless job for starvation wages, now what?

    Here come the strings: Oh, but you actually have to come in weekends too. Oh, we do actually require set overtime hours. Oh, you will actually be on call several days per week as well.

  8. Cutting Pre-Arranged Benefits

    Ever feel like you thoroughly went through what benefits (if any) were included in your pay package with your new employer only to suddenly realize they aren’t mentioned at all in your contract?

    It’s the classic bait and switch tactic of snake-oil salesmen and the conmen of the modern corporate world.

  9. Independent Contractor Nonsense

    You may look like an employee, act like an employee, but apparently you’re actually a duck — I mean an “independent contractor.”

    As fancy and ego-boosting as this title may sound, it’s actually just a way for employers to skirt around their only responsibilities towards you and scam you out of your healthcare or retirement entitlements.

    In many states and countries, this practice has already been made illegal, but many companies still depend on it and actually use it as their very business model (Uber, Amazon, Deliveroo etc).

  10. Last Minute Lay-offs

    Whether weeks, months or years have been siphoned off and drained from your life at the hands of greedy corporate kleptocrats, the pain and humiliation of an unexpected lay off is unforgiveable, and these are also becoming more common in 2024.

    Chances are the employer knew for a long time that they were going to let you go, and they just didn’t bother to tell you because, hey, what do they care if you have kids, a mortgage, a rental lease, a life to rearrange?

    This is when corporations really show their true colours; their black heart of stone in the face of the total immiseration of individuals and communities alike.

Software Developer Salaries have Declined

A recent salary survey on Stack Overflow appears to show that salaries in software development have declined in the past year, more for developers in some languages rather than others.

DeveloperSalaryChange2023-2024 0.jpg|DeveloperSalaryChange2023-2024 1.jpg|DeveloperSalaryChange2023-2024 2.jpg
DeveloperSalaryChange2023-2024 3.jpg|DeveloperSalaryChange2023-2024 4.jpg

There is quite a disparity in the (mostly) decreases between the different languages.

To put these decreases in context, here are some of the most commonly used languages by developers in the survey as well as breakdown by countries represented in the survey.

DeveloperSurvey-ProLanguages.jpg|DeveloperSurvey-ProLanguages1.jpg
DeveloperSurvey-countries.jpg

Learning and Teaching J

We take a look at the summary of an essay on the future of Kdb and consider what analogies there might be for J. We also consider a list of possible improvements for APL from the 2007 ACM OOPSLA conference where Guy Steele spoke on "What APL can Teach the World (and vice-versa)".

Guy Steele on APL Strengths and Weaknesses

Here are some of the summary slides from Guy Steele's talk to the APLers - titled "What APL Can Teach the World (and vice-versa)" - at the 2007 OOPSLA conference held in Montreal.

Originally APL had trees with Guy Steele 40p.jpg|Technical recs from Guy Steele talk What APL Can Teach the World-APL strengths 40p.jpg|Technical recs from Guy Steele talk What APL Can Teach the World - APL Weaknesses 40p.jpg
Technical recs from Guy Steele talk What APL Can Teach the World - APL2World 40p.jpg|Technical recs from Guy Steele talk What APL Can Teach the World - what the world can teach APL 40p.jpg|Technical recs from Guy Steele talk What APL Can Teach the World-APL has improved 40p.jpg
Technical recs from Guy Steele talk What APL Can Teach the World-APL can learn from Fortress 40p.jpg|Technical recs from Guy Steele talk What APL Can Teach the World 40p.jpg

Summary of "The Future of Kdb"

Kdb+ is an absolutely amazing technology but it’s about the same amazing today as it was 15 years ago when I started. In that time the world has moved on. The best open source companies have stolen the best kdb+ ideas:

  • Parquet/Iceberg is basically kdb+ on disk format for optimized column storage.
  • Apache Arrow – in-memory format is kdb+ in memory column format.
  • Even Kafka log/replay/ksql concept could be viewed as similar to a tplog viewed from a certain angle.
  • QuestDB / DuckDB / Clickhouse all have asof joins

Not only have the competitors learnt and taken the best parts of kdb+ but they have standardised on them. e.g. Snowflake, Dremio, Confluent, Databricks are all going to support Apache Iceberg/parquet. QuestDB / DuckDB / Python are all going to natively support parquet. This means in comparisons it’s no longer KX against one competitor, it’s KX against many competitors at once. If your data is parquet, you can run any of them against your data.

As many at KX would agree I’ve talked to them for years on issues around this and to be fair they have changed but they are not changing quick enough. They need to do four things:

  1. Get a free version out there that can be used for many things and have an easy reasonable license for customers with less money to use.
  2. Focus on making the core product great. – For years we had Delta this and now it’s kdb.ai. In the meantime mongodb/influxdb won huge contracts with a good database alone.
  3. Reduce the steep learning curve. Make kdb+ easier to learn by even changing the language and technology if need be.
  4. You must become more popular else it’s a slow death

This is focussing on the core tech product.

Looking more widely at their financials and other huge costs/initiatives such as AI and massive marketing spending, wider changes at the firm should also be considered.

2024-08-03: This post got 10K+ views on the front page of Hacker News to see the follow up discussion go here.

Author: Ryan Hamilton