Wiki/Report of Meeting 2023-08-17

From J Wiki
Jump to navigation Jump to search

Report of Meeting 2023-08-17

Present: Ed Gottsman, Raul Miller, and Bob Therriault

Full transcripts of this meeting are now available below on the this wiki page.

1) Ed updated on Live Search and has made it a local database search which is pretty quick. The biggest challenge is relevance ranking and that requires that you get all of the results before you can assess the relevance rank. The challenge discussed in depth was the fact that the first character entered creates a large result response, before the second character re-initiates the search. Details can be reviewed in the transcript.

2) A discussion of how to go about updating the database now that there is a large SQL database and a smaller log database. The interface may need to be addressed if the databases remain separate.

3) Bob showed that the results in Live Search don't exactly match the result on the webpage. This was determined to be html and wiki markup that does not show on the web page. There was discussion on the solutions to this that can be reviewed in the transcript.

4) Bob initiated a discussion about how Live Search, the Playground and the curation of the wiki would work together in the best possible way. The attempt is to try and develop the components in a way that the end result will be better. Bob talked about J Playground being used in NuVoc and Ed mentioned that he wondered whether J Playground should be in the J Wiki Browser. Bob thought that the Playground does not have to have a link in the wiki browser if it An example Bob gave was whether the Reference section of categorization has been replaced by Live Search. This would mean that categorization in Reference may need to be done differently. Other areas such as Developer and Newcomer would require curation to make sure that the information is appropriate to the audience. Bob felt that this is the time to review that because it is not yet in use. This is a foundation for further discussion.

5) Raul asked if New Pages could be added to the Sidebar, since they often indicate new areas of development. Bob said that he would add that.

For access to previous meeting reports https://code.jsoftware.com/wiki/Wiki_Development

If you would like to participate in the development of the J wiki please contact us on the general forum and we will get you an invitation to the next J wiki meeting held on Thursdays at 23:00 (UTC) Next meeting is August 24th, 2023.

Transcript

I'll give you the floor, Ed.

- Sure.

So live search is back to being a local database phenomenon.

It does not do anything over the network until obviously you click on a result to load up a webpage out of the Wiki or to load a forum post from the forum archive.

It's pretty fast if you haven't had a chance to look at it.

The exception is certain single character searches are quite slow.

So slash, for example, retrieves thousands and thousands of documents.

And this wouldn't be a problem except I want to do relevance ranking.

So all of the documents have to be retrieved, relevance ranked, and then the top 200 are picked off and those are shown to you in the results display.

I was just telling Bob that I've been working on non-preemptive search.

So it's annoying that you type you want to type slash code and you can't.

Right now you can type slash and then it blocks and you can't type any characters.

But T dot is a pretty good solution to that.

So I've got something that's 90% of the way there.

Um, and it, there have been periods of as long as five minutes in the last three hours when you've been able to type without interruption, without blocking.

And it finally will, in fact, come back with a result set.

And for two character searches and more, it really is pretty much instant.

So I feel pretty good about it.

I don't know how much interaction you've been able to have with it.

I find it's not as fast as JSORUS.

Jsaurus really is instant for any search you can imagine.

- Much smaller data set.

- Well, smaller data set for sure.

But I think it's got the same flavor as Jsaurus.

It has a speed and a responsiveness that I find intrinsically engaging, I think.

And so, Roel, I'd like to thank you for turning me on to Jsaurus in the first place.

I think it was a really good pointer.

Do you think it's worth disabling live search for single character searches if it's not until two characters that they start being quick.

Pound sign.

I think perfectly reasonable to go looking for pound sign individually.

I think if I just am not preemptive about it, that's as good as not searching on the first character.

Because you can go ahead and enter a second character immediately.

There's no penalty, there's no blockage.

And it just takes a little longer for the two-character search to come back.

There doesn't seem to be any way, and I wouldn't expect there to be any way of cancelling a SQL like query.

So you just have to wait for the first one to finish and then the second one starts up again.

It's actually kind of odd because popular lore and database terminology or database lore is that most searches are immediately discarded and are after only seeing a few results.

Yeah.

It's surprising that that equalite didn't get to heart.

Well, the problem is, if I'm understanding what you just said, The problem is, if you're going to do relevance ranking, you have to examine all of the hits in the whole database.

You can't just say, "I'm going to take the first 50 that I find.

" But can't you be iterating through them and see that there's another event coming and cancel at that point.

I don't know of a way to cancel a query in flight.

There's no way to- - You're not getting back page at a time, you're getting it back.

I don't know enough about the mechanics here.

It's not segmented though, is what you're saying.

- No, the mechanics of it are, I do a full text query and order the results by relevance rank, and that's a single operation.

And all of that is handled by SQLite internally.

And then the end of that query is limit 200 or something like that.

So SQLite examines every single document that has a slash in it, orders them by relevance rank, and then picks off the top 200 that came back.

And that's what I get.

And that's a one liner.

The only other way I can see around it, and it's a little kludgy 'cause it involves the user doing something extra.

If they wanted a single character entering a space after that character.

- You'd have to know to do that, yeah.

- Yeah, but you'd always have two character searches at that point.

- Well, but a two character search, one of which is a space, is essentially a one token search.

So it's not really searching for characters in the conventional sense, it's searching for tokens.

If you do /space, the first thing I do when you give me a query is I tokenize it with semi-co.

So that's the word formation primitive.

That produces, in the case of /space, that would produce a single token /.

So you're back to your one item search.

Right.

But in that case, that would be what the user wants.

They're specifically saying, "I want a single item.

" If they did slash co, then it would take it as together, right.

Yeah, that's true.

That's an interesting point.

Although back to Earl's point, they'd have to know to type a space if they wanted a single character search.

You could do it after a delay, I suppose.

But then if you're going to delay, you might as well just go ahead and put one in the search immediately and there will be a delay for them.

- Well, if say the delay was a second, that would improve the search time, wouldn't it.

- That's true.

Although there's something a little bit annoying perhaps about wanting a single character search that's going to take a while and you introduce an additional one second delay.

It seems like a penalty somehow.

- Would you notice if it turns out to be, I'm guessing it's like 15 seconds to go through that whole process.

No, no, no, no.

Uh, at least on my machine, it's about three.

Okay.

Okay.

So it's not a huge penalty to do that much work.

It's just, no, it's just a little bit.

And I think that if I think that if I make it non-preemptive, so you can type your whole search uninterrupted.

Yeah.

Uh, I think maybe that's a pretty good solution.

I don't know.

We'll have to try it and see how it feels.

Well, it would have narrowed it down to starting with a slash anyway, in the example you're giving.

No, every time it does a search, it's starting over from scratch, even if the search consists of additional characters.

Okay.

So it would just take that whole block, and that would be a quick search.

You wouldn't even know the first search took time.

Yeah, exactly.

So in that case, it seems to me that the penalty is on every search, you got three seconds.

Is that right.

No, because I think what's happening with SQLite is the penalty comes from the fact that there are so many hits.

So you search for slash, you get 50,000 hits, and it's got to process them all to do the relevance ranking.

You search for slash co and it gets maybe a few hundred hits, much easier problem to do the relevance ranking, much quicker.

So there are probably multi-character or multi.

.

.

So it's not really a question of how many characters are there or how many tokens are there.

It's a question of how many hits are there.

And so there are probably four and five token, by two and three token searches that return a lot of results and they're going to be slow.

They'll be slower.

Yeah.

Yeah, but even pound sign is pretty quick, just a single pound sign.

Slash is just peculiarly, peculiarly common.

And that's starting with slash, because to me, like reduce and scans and all those things are going to have a verb in front of them.

Plus slash, yeah, right, for example.

Yeah.

And that'll be faster than just a plain slash.

Slash, yeah.

So I think we're in a good place.

I don't think we have any problems to solve at this point.

We'll just have to see whether I can fix the last part.

Oh good, because I'm getting bored.

My machine's been acting up.

Okay, tell me.

I don't know how much of it's specific to your thing or how much of it's specific to my machine, but currently live search doesn't work for me.

And I've had some other odd behaviors, like not being able to enter your app after I exit it.

So I'm not quite sure what's going on with my setup.

I don't know.

I can't say that it's your-- Well, it's probably my fault.

At what point did this start happening.

Did you happen to notice.

I mean, I was just trying some things while you guys were talking.

And LiveSearch no longer was working for me.

Okay, I don't know what well Life search is did you by any chance download it in the last half hour.

Yep, I I hit add-on.

I saw the add-on was not up to date.

So I hit the button Yeah, I know.

So I I posted it.

Um And I posted it with a bug so that's what's happening to you.

Okay And and I now know about the bug because it's happening to me too and I was trying to fix it just before I got on so That answers that question.

There may be other questions There well one other thing while you while we're thinking about these or I mentioned these things Yeah, um when I first loaded your app The update earlier, you know to get live search going Um, you saw my note that I was having about the the secret light.

Yes.

Yes.

Well after that I got it loaded I play the app and it said local database is up to date, but live search wasn't working.

Right.

And I had to click the button that says local database is up to date and then wait for a few minutes before live search started working.

Stop right there and say more about a few minutes.

I went off and read email, came back.

Okay.

Because for me, it's about 50 seconds on a 100 megabit per second line, roughly 100 megabit per second line.

Yeah, I didn't time it.

But the thing that troubled me is I had to click a button that said local database is up to date.

And I didn't think that.

.

.

In order to get an up-to-date database.

Right.

Yeah.

So the problem there is, there are really two databases.

There's the old 10 megabyte database that we've always used, and there's this new full-text database.

And I'd like to maintain that distinction only because it's nice to be able to have people send in the 10 megabyte database when they have a problem.

That makes sense.

So now I'm thinking I need to treat the 10 megabyte database and the full text database with two distinct buttons.

Or you could just say if either of them is out of date, you can have the button say it's private.

That's true.

Yeah, I don't have that logic built in yet, but clearly you're right.

Clearly I need to do that.

And having something to look at, say, this may take a minute or something like that.

Yeah, absolutely.

That's all I have for now.

Okay, thank you very much, because those are both good points.

All right, excellent.

I played with it a bit.

I really enjoyed playing with it.

And it's a perfect way to drag yourself into rabbit holes.

It's wonderful that way.

that way.

I'll share my screen because there's one thing I noticed I found intriguing and I don't I really don't have an answer for it.

I did a search on Um, slash co twice, right.

The, the wizardry of, of Roger Hui.

Um, if you look at the top here, I looked at that and I thought, that's so cool.

I can look at this and I went to that page and I go to that page and I don't actually find this line anywhere on that page.

Ooh, that's disturbing.

Yeah.

But I find it intriguing because it definitely mentions-- like it definitely mentions the double slash.

Whereas I think it's down here a bit.

Yeah, down in here.

It mentions it, it uses it.

But it never actually uses that actual line.

Yeah.

>>It might be lines merged together.

Like that line with the volume line, and you're not seeing the new line characters.

Yeah, maybe.

It's-- I guess I'm not even seeing the LTTT thing.

When he's defining things, it's RKTU and RKU.

And then I look down here, I actually went into the file script of it to see whether there was anything extra in there.

Actually, that's just a file.

Essentially, this code summary is in the file.

So that's what you see.

But yeah, I just and I just found that kind of like that.

That that looks like HTML markup.

The LT semicolon.

Oh, I bet you're right.

Like an HTML entity.

Yeah, I bet you're right.

Yeah.

Less than right.

So we want to go to semicolon.

The ampersand LT semicolon is probably that.

He was making a change to deal with that, I think, with the ampersand issue.

I mean, basically, you just need to resolve HTML entities before passing them up to the database.

>> Right.

Rats, rats, rats.

Okay.

Add that to the list.

>> And then I guess the only other question is, would TT be something else.

>> No.

>> Okay.

>> That's an HTML NE 2 public.

>> Oh, is it.

>> Or maybe it's a.

.

.

>> Wiki markup, maybe.

>> Could be wiki markup, yeah.

But, um, are this HTML tags.

This thing does not for the wiki, it looks like.

TT right bracket.

That is a escaped HTML tag.

TT is a, you know, left angle bracket, TT, right.

Gotcha.

Gotcha.

That's the actual markup that we're seeing.

All right.

All right.

I'll try to clean that up.

So it's resolve HTML in these, which are publicly quoted.

because of wiki markups.

Anyways, so you got two layers, I guess, of resolution there that may have to be fixed.

- Yeah.

Yeah, I'm not, and Rula, I think this was your advice for which I will thank you again if I, thank you if I didn't thank you already to pull the wiki markup, not the HTML markup from the wiki pages that I think works out very well.

Although it looks like-- - Makes my job a lot easier.

I need to do some additional resolution in there.

Yeah, I'm not treating that properly.

So I'll fix that.

So Bob, I noticed you're using a very small font.

- Well, I flipped over to the small font just because like I was saying at one point, it's funny, if I go to a larger font, drag it up here a bit, there we go.

When I come back down to small, oh, that's funny.

Previously on, maybe it's because I'm sharing the page, but when I go to my full screen, this gives me the whole thing.

I don't even have to worry, like everything shows up.

Yeah, yeah.

So that's why I tended towards the small font.

I don't have to, it works fine to drag it up bigger.

It's quite nice.

But I was finding when I was looking at the full picture, it was just a little easier to see it that way.

and I can read it, so it's good enough.

- If you go back to the small font, I think the reason that you're not getting the whole list is that you're looking at live search.

- Okay.

- You're looking at some other item.

- There you go.

- Yeah, there you go.

'Cause there are some controls that only appear when you've selected live search.

- Okay.

- So, mystery solved.

- Yep.

- So I've got quite a number of relatively, of, I won't call them minor, but less difficult items to go through a lot of things from Stephen's session.

I've just got, I've got a list.

So I will clean up live search.

I will clean up notification of whether data is up to date and I will start knocking off those other items.

And my hope would be that next week we could take a checkpoint and then at that point start reaching out to the people, Bob, on your beta list and maybe see if we can't widen the scope of things a little bit.

I'm a little sad not to have heard from Dave for some time now.

I was sort of hoping he'd be on the call.

Yeah.

I thought he would be back this week.

I thought he was off for a week and that would be last week.

That was my understanding.

But you never know.

You don't know anything more.

Yeah, no, right.

This is nobody's primary job.

Well, I look at all the other things I do and I guess the podcast is probably my primary job right now.

But this is close to the running.

It's important.

Well, and what happens with search, but for me, I search and I find something, I start reading it, and I forget all about going back to search.

That's a point Bob made, though.

He said it really is a rabbit hole enabler, you know.

Well, and I think, you know, like you had some concerns about inviting some of the beta testers in, whether they'd find it useful.

I think they'll find it very useful just for that reason, because it'll just be, for them such a quick deep dive into these areas, they'll start, I think initially it would just be wow look at what I can find, look how quick looking at this information.

But I think the next step will be they'll start to control it and they'll start to say well I've always wondered about this and they'll direct their searches and that's when I think it'll really click for them.

Although the one, thinking about that process and thinking about where I'm at, I haven't tried editing recently from within this UI.

And editing is something that you do with the wiki.

I'm just not sure.

I haven't tested that.

That would be good to know about, because if we're hoping to encourage editing, it would be nice if the tool didn't stand in the way.

Well, see, whenever I've come to that point, what I do is I do the browser as a way to get to the page I want.

And then if I wanna do something with that page, I click the browser button and then I'm back in the browser on that page.

So it is a two-step process.

It's finding where I wanna be in that huge gamut of information.

But once I find it, I can hop out and do whatever I want just off the browser.

And I don't find that too restrictive because really at that point, if you're editing, you're not moving that fast anyway.

Like there's a bit more thought involved.

You've pulled that page, you know you wanna do something with that page.

I'm usually gonna hit the edit button, open up that page and I'll be sitting looking at it, doing changes, previewing, seeing what I wanna do.

I'm not feeling like, oh, I really wanna get back to the jWiki browser to be able to see all these other pages.

And if I did, that's easy to do because I just switch applications back to J9.

4 is what I'm running it under, and bang, it's right back there.

And I can go all those pages.

What I haven't tried yet is actually logging in under the jwiki browser and seeing whether that gives me full control of the pages.

And where that might be useful is if I was searching through pages to pull script or something, I could go to Edit.

I could still do View Source, though, I guess, if I wasn't logged in.

But anyway, it would allow me a cut and paste.

And that's the only other reason I can see I'd want to, you know, that kind of access in the browser.

- Yeah, the thing I was worrying about, one of the things I was worrying about, two things I was worrying about, one is session.

You know, we have in my regular browser, I have a session with my cookies and my login information, so like that, and that actually transcends tabs.

In the jwiki browser, it's not a tabbed interface.

I don't have multiple pages open.

and have one page which rapidly changes away.

So I'm not sure I want to be editing in the jwiki browser either because it'd be so easy to lose my edits.

So it's, like I said, it's something I haven't put a lot of thought in yet.

It's not something I've tried.

I'm still ruminating about it.

It may be I'll come up with a worthwhile thought at some point.

- It may be that the browser button is your friend and that Bob's workflow is the way to go.

Just find the page you want, hit browser, and then concentrate on that page.

And you'll be safe.

It won't spontaneously disappear on you.

- Or the other way I might approach it is I'd use the bookmarks.

So if I knew I was working in an area and I've got four or five pages, it's so easy to add bookmarks and take them out again.

I could have bookmarks just in that area.

Bang, bang, bang, bang.

Pop back and forth.

- If you switch way and come back, do you lose your edits or do they come back.

- I haven't tried that.

- Yeah, I haven't tried that so I couldn't tell you.

- It's a good question.

- If I was to guess, I would say you would lose your edits.

And in fact, when you change the page, it will probably prompt you by saying, do you wanna leave this page.

'Cause that's usually what it does at that point.

- Uh-huh.

I'll try that, after he's got LiveSearch working.

(laughing) - Yeah, I'll do that first.

I was going to, in fact, I already have temporarily at least turned off access to the original search mechanism because I think it's pointless at this point.

I mean, if you really want to, you can go to the web or the forum search interface, but I think that live search supersedes everything I was doing with it.

Yeah, I would agree.

I don't think we need to front-end that anymore.

And I think-- we'll leave the pregnant pause to ponder for a second.

Is there anything else in this area.

Because I have some stuff in this area that I think was worth discussing.

I do have one other possibly misguided point or issue or something that I currently when on the live search, the the in context bit on the left column in context bits has a lot of spaces in it.

And I'm wondering if that's an artifact of how you package it up for the database and if those spaces can be like removed.

You know, I think what's happening is you might have like a semi-co token and then pulled by a space to separate from other things.

And if those can be always inserted and always deleted, if that would work to make it look more like the original text.

That makes sense.

So it does.

It makes perfect sense.

So here is the tricky bit.

And maybe you can tell me how to approach it.

When I index the content, I don't just feed the original content to SQLite.

Rather, I run all the content through SemiCo.

So I tokenize everything, whether it's J code or not, it all gets tokenized according to SemiCo.

And that is then turned back into plain text.

simply by inserting a space in between every token.

And that, and by the way, the tokens are translated into their English language equivalents.

So with a little J, with a capital J in front of each one.

So semi-co, for example, becomes J semi-co rather than a semi-colon and a colon.

because that's how we get around the full-text indexer tokenizer's tendency to drop punctuation.

We turn all our punctuation into what look like not English words exactly, but words.

So that's what gets indexed is those tokens, those original tokens turned into English-like tokens and then turned back into text with a space in between each English-like token.

That's what gets fed to the indexer.

So when the indexer makes its little snippets, its little keyword in context, that's the text it's working from.

I no longer have any access to the original text, And even if I did, I'm not sure how I'd map back to it.

I'm going to give two ways of.

.

.

two completely different approaches for this problem.

Okay.

One is assuming that the indexer is always returning exact literal text that you gave to in the first place, is to make your transform invertible.

In other words, semicolon needs a semicolon inverse that does exactly the opposite.

In other words, you have space slash colon, or yes, you have slash colon, just make an imaginary J, you have a square bracket slash colon right bracket.

That's three tokens next to each other.

And that we convert into three words, JRBRA or whatever, J semi-colon JLBRA.

And if the way you convert that into tokens is each token has a space on the left and right hand side to make sure that it's separated from whatever it's next to.

Then when he comes back, you would be deleting a space on the left and right hand side to make sure that it's back down to compact.

Now if it's deleting spaces, redundant spaces, you know, if you don't get exactly back what you sent out, then we have to go to a completely different approach.

- Oh, there aren't any redundant space.

Well, there are two spaces between tokens.

There's only one space.

- But you'd have to add, you'd have to add, you'd have to make it two spaces to make that approach work.

Is that what I'm saying.

- Nope, you lost me.

So the problem is, the problem is right now that semi-co does not preserve space information between tokens.

That's right.

That's, and so you, you have to change semi-co and do a, do it, do it myself.

Oh, oh, oh, oh, oh, interesting.

That's that's approach one.

Okay.

I'm interested in hearing approach two.

Approach two is instead of, um, instead of the nonsense I just gave you, um, when, when you've for each page that comes back, go to that page, bring it up, run it through semi-co and, and find the match where, where, where this, this, this token stream occurs on that page.

And then use that, then use that to map back to the original page.

So map it back to the original page.

- You know, you need like also like line beginning count so you can say, you can know which line you're going for or something like that.

- See, I don't know if you have checked and when we do this for real, I'm gonna need to be very transparent about what's happening with this local index.

But sitting on your hard drive is about a gig worth of SQLite database at this point.

So we would be not doubling it perhaps, but making it a lot larger if we preserved the space.

Well, if we preserved, yeah, certainly the spaces, although a lot of that would get, yeah, no, the spaces would make it much bigger.

And the second approach that you outlined where we have to preserve not just the tokenized text for the indexer, but also the original text which we will map back, that would also dramatically increase the size of the repository.

Oh yeah, you'd have to, or you'd have to fetch them from web, which would slow things down quite a bit.

Yep.

All right, well, let me think, let me think about this.

There's a third, there's a third, there's a third approach too, I guess, which isn't so good, which is you end with a heuristic, which is, if it can be, if the space can be removed and get the same tokens, remove that space.

Well, I mean, ignoring non-J content, you could remove all the spaces and you'd be fine.

Well, not by all the spaces.

If it if it begins with a colon or a dot, you need to space the left.

Yeah, okay.

And if it's two numbers, you also have numbers, you know, you else one.

So, but it's a fairly simple set of rules to do, but there's a lot of spaces you can just delete because you know that, you know, the syntax, Jay's lexical rules and you know that it's not necessary in those cases.

That may in fact be the most productive approach.

So.

There'll be a few edge cases where it's not great, but.

But all that means is you've got some extra spaces that you could perhaps have removed.

In other words, you err on the side of caution.

Right.

And another thing, too, once you get.

.

.

the higher priority is this ampersand L T semicolon as three different tokens instead of being part of it.

Yeah, we've got to fix that.

Yeah.

Once we fix that, we'll become dissatisfied with all the extra spaces.

So it's worth thinking about.

- You will be, but not quite as satisfied because it'll make more sense when we're looking at it.

- Yeah, it's true.

It's true.

- So I just found something else that, again in this slash go slash go example, if you look at the one I've highlighted and I've gone to that document, and then I've highlighted in the document the line that I think it's coming from, you'll see that there's a difference.

It's taken off that first K comma P, and then it's that double arrow thing, I'm guessing is some kind of a Unicode, which has thrown a loom into your-- - Yeah, I have spent time learning more about Unicode than I ever wanted to.

- Oh God.

- And I still do not know nearly enough.

And my approach so far has been to punt.

- Yeah.

I run into a document that's got Unicode in it, I think I just throw it out.

- Wait, I would recommend doing-- - Yeah.

- There's a, I think, nine new colon, turns each Unicode character into a single character.

So you don't have to worry about-- - Creates code point.

- Here's a token that you're treating as individual tokens.

In parts of the characters, individual tokens, which means it's no longer, or maybe you're putting spaces between them or something like that.

Something that breaks it up.

It's univocarium.

- I see, so it all becomes legitimate ASCII at that point.

- Right now, you don't want to actually pass it off to the SQLite in that form, because that gives you a huge space expansion.

You know, eight characters now, I think 64 bits wide or something like that, maybe 32 bits wide, but they're a lot fatter.

So you'd want to convert back once you're done tokenizing and such.

But that's the intermediate form you should be using while you're doing the token substitution.

You said 9u colon.

Okay.

Let me take that note.

Yeah, 9u colon or 10u colon, but I think 9u colon.

And my sense from seeing the number of question marks with those diamonds in the original, well, not the original text, the text that we're referencing on.

That indicates to me that that Unicode is actually stored UTF-8.

So it's gonna have those blocks.

It's not a single digit.

It's gonna be UTF-8 times six to cover that.

- What is that stuff doing in there.

- The diamonds.

- Yeah.

- What happened is that- - Is that coming from.

Well, what it-- Is that something in double arrows.

Yeah, the double arrows will cause it.

And what's happening is it's probably got an offset.

And when it goes to read those, those triples aren't making any sense to it.

So it fires back.

This doesn't, I can't, this is unprintable.

But it should be taking them as a group.

And then it would be the double arrows.

Those double arrows do not occur in any font that I've got on the client, do they.

I mean, I couldn't reproduce them.

- Well, it's gonna have a Unicode value.

- Sure, but that glyph is probably not in Arial 14 on the Macintosh, or I don't know, maybe-- - I think it would be, to be quite honest.

- Really.

- Yeah.

- Okay.

- Yeah, especially something common like Arial.

it's more likely that it's got those.

- I see.

- I spent- - All right, I'll add that.

I have officially added that to the list.

So I will dig into that once I've got the other problems fixed.

- I spent a couple of months wading through the swamp of Unicode and- - It's horrible.

- Well, it's, yeah.

I mean, when you first get in there, like, I mean, the water's not warm.

There are things that bite you.

It's no fun.

But after a while, you make friends with the little creatures.

And then you can go, well, this is actually kind of neat, but what a, what a, well, yeah, I just don't understand the design decisions.

I'm sure once I did, I would it was a community over many years with many membership changes.

So there is a lot of different consistencies.

We interviewed Rob Pike two days ago, and one of the things he actually said I'm most proud of is that Anne Thompson and I sat down one night in a diner and in 20 minutes designed UTF-8.

And he says, everybody talks about the Go language, everybody talks about other things I was involved with, but UTF-8 is actually one of the things I'm most proud of.

And then Marshall said to him, "Did you have anything to do with UTF-16.

" He just said, "No, that's a really bad idea.

" And I imagine he feels the same way about the extended blocks and all that stuff and the surrogates and all the mess that comes with that.

But yeah, it's kind of crazy.

I would say, to be be quite honest, what we see over here, this reference point, isn't so important because you can pull this page up and see what it actually is.

If you can figure out that it's the same.

If you can figure out the same.

But honestly, all it's telling you is this stuff is somewhere in here.

Yeah.

And if you were looking for it, you'd probably go, that's probably what it is.

If this was referencing you to the wrong page, that would be a big issue, but it's not.

It's sending you to the right area.

9u colon does nothing if your text is pure ASCII.

It just returns pure ASCII.

If it has any Unicode characters which are not ASCII, it gives you a 32-bit encoding, which is one character per Unicode characters.

It's still a character-oriented system.

It's just that it's much fatter characters.

Now, I'm going by memory, but I bet if you fed a surrogate to 9 colon, you're going to get a surrogate back.

It's the whole text.

If you had to assert mixed with other Unicode, it's going to be based off the Unicode being in there.

I think it'll show the surrogate.

What do you mean by surrogate here.

You mean the Json code.

The surrogate is what it does in UTF-16 in order to extend and get to the full 32, it's taken a whole block and it's made them into pairs.

And so if you go into that zone.

.

.

Well, I don't think we're going to ever feed it UTF-16 because we don't get that from the website.

We always get UTF-8 from the website.

Okay.

So yeah, that's your way around it.

But I think actually when I was playing around with this, if you do feed UTF-16 to 9 colon, it stays UTF-16.

I don't think it switches back to a 32.

It's not supposed to, but hopefully that won't matter.

Yeah.

Yeah.

I'm just saying that was one of the little swamp creatures that I looked at and went, "Oh, really.

Interesting.

" But I'm just going by memory.

I might be wrong, or I might have it backwards.

From the description, that's not supposed to happen.

I don't remember the exact language, but the suggestion to me was it was 32-bit characters or 8-bit characters, depending on what text you gave to it in the first place.

Yeah.

So, as I think about it, if I could, maybe the syntactic rules for dropping spaces are actually pretty simple.

So dot, for example, has to have a space to the left, colon has to have a space to the left.

Maybe there are a few others.

Well, numbers, obviously you don't.

words basically.

Yeah, so I'll leave them alone.

But it might actually be pretty simple to collapse most of the spaces and the salutary effect of that, besides better snippets on the left column, would be a much smaller repository on the disk.

'Cause SQLite doesn't compress in place.

It's the fully blown out text sitting on your hard drive.

- SQLite isn't gonna see the, is gonna need the spaces because you have J semicolon.

- Right, right, right, right, right, right, right, right, Never mind, never mind, never mind.

Right, I have to do this space collapsing on the fly when I get the snippets back from SQLite.

Look at them and say, OK, I can drop this space and this space and this space, and then I'll render the snippet on the screen.

Yeah, that's right.

Never mind, I was kind of excited there for a moment.

Bob, did you have anything else that you noticed.

Well, yeah, not that I noticed about this, but forward thinking a bit.

But first, does the word formation verb actually help you with this.

Because if you ran your line of J through word formation, it's going to box it into the different constructs, right.

You can't use that because you get errors from word formation from some strings.

Yeah, so if you've got an open quote, for example, that's not going to… doesn't like that.

Yeah, don't go to that.

Don't, don't.

But I do, I mean, I catch that and I deal with it.

I do in fact use the word formation verb.

That's exactly what I do.

I didn't write that myself.

It was so easy to use it with just a little precaution that that's what I did.

But yeah, not everything is syntactically correct, but so far it seems to be pretty easy to catch that and get around it.

- So my forward looking thing, and I picked this up after listening to Rob Pike talk about it.

He was saying Ken Thompson's superpower is actually when he, and he codes anything, how he does it, Rob doesn't know, but he says whenever he's putting something in, he manages to think two steps ahead of all the problems anybody's gonna find.

So in other words, as he codes something, he's already thinking about problems people don't know exist yet, and he's cleaning them up.

And I thought, well, I can't do that.

But what I can do is when we get to a certain point where we're developing something, we can look ahead and see if we can spot any problems at this point.

And the first step I was thinking in this, We've got a number of different tools.

We've got the overall app, which is the browser.

We've got live search, and we've got the playground now all running pretty close to up and going.

I think the next step is to think carefully how we put those together to make the Wiki more effective.

And to that, I'll also add categorization because that's the next thing I'm gonna be focusing on is doing a recategorization.

I should be doing that to work in concert with these other tools.

- Okay, could you peel back one layer of detail on that vision.

'Cause I'm having a little trouble concretizing it.

- Okay.

My vision in the end is somebody's gonna sit down on this, say six months from now and say, what a cool wiki, look what I can do with it.

And I want to put those pieces together in a way that is less likely that it's immediately gonna blow up on them or they're gonna fire off into an area that isn't useful.

And it's the same thing we're talking about, I was talking about with Ken Thompson.

It's very hard to say what that's going to be, but if we start thinking about that, it'll start affecting how these tools work together.

So maybe it starts out with, how would you see somebody using these together.

I think probably you've got a more complete picture of that than anybody.

And that frightens me.

Leaving that aside, you know what LiveSearch does.

You know what the viewer does without LiveSearch.

And you're a curator.

You're the Wiki's curator, main curator.

So I've felt and had mixed feelings about this, that you are in some ways the main user of this thing, and I've tried to be as responsive as I can to your concerns and interests and so on.

but you are atypical, and I recognize that you would like not to be atypical.

You would like everybody to be categorizing and curating and editing and so on.

I'd like people who are engaged to do that.

I don't want that for everybody.

Yeah.

Okay, sure.

But if everybody were engaged, you'd be perfectly happy to have them all beavering away.

Yeah, that would be wonderful.

At the wiki.

Right, exactly.

So, the question I guess is, and it's a question I have asked myself repeatedly over the last whatever, since April, what is it that will help you do what it is that you do and I don't have a full picture of what you're doing so I don't know.

I find MediaWiki to be rather awkward, what little I know about it.

There may be ways to streamline to make it easier to use.

I don't know.

I suggest that as a possibility.

Just ironically, the streamlining is what makes it awkward.

Streamline for certain tasks what we're trying to do is a little bit different, and that's been streamlined away.

And I guess, Bob, what I'm saying is I think it has to come from you, the vision.

Yeah, I would prefer that it doesn't only come from me, because I think that is a problem.

Yeah, I could design that.

I could say, "This is how it should work," and probably immediately eliminate 90% of the people that would want to use it.

That's really not a good idea.

It has to be wider than what my vision is.

But my vision would be that, and what I see developing is you've got the playground, which is a great way to experience or interact with the language.

And so I see that being brought in in a very positive way with Nuvoc.

And I believe with, if we have the right extensions, we can actually put in that you would be able to launch playground from the media wiki.

You can make, I guess it's a white list where you can launch certain things and control those things.

In which case you could click on a button and it would actually launch Playground with that text in it, that script in it ready to go.

And to me as a pop-up, that would be very powerful.

Time taken.

- Time out, okay.

So I've put JPlayground in because I suddenly realized I could, it's actually rather pointless to have it in the add-on because if you're using the add-on, you have a perfectly usable J terminal at your disposal.

The JPlayground is only interesting when that's not true.

So I was actually actively considering taking it out again, frankly, in some embarrassment, because I couldn't think of what the point was of having it there.

Well, and you could take it out, because it doesn't affect what I'm talking about in this case.

Okay.

But you did mention in this context- Because the buttons would sit in Duvok, which is where you would pull it out.

So it doesn't need to be on the sidebar.

OK, good.

And I would leave it on the sidebar of the actual wiki because I think it's just useful to have it available wherever you come in.

Yeah, absolutely.

But no, you're right.

I don't think it needs to be in the web viewer.

And then the same thing in a way, now that LiveSearch is developing and getting its legs, in so many ways it seems to take two jumps beyond categorization because you've got access to the information based on your search.

And I think that's a really good thing.

What I want to try and do is make it interact more effectively with the curation that exists because live search isn't tied to the browser, right, the jwiki browser.

could be independent of.

OK, give me a use case where it's independent.

I'm not disagreeing with you, but there is a nuance.

So give me a use case.

The use case is you don't have JQT installed.

Like the step I'm thinking ahead is somebody who comes to the language and says, I want to do this.

But you know what.

I don't want to download anything.

I just want to go in and do stuff with it.

Now, there'll be limitations on what they can do.

But with the J Playground and a browser, and if LiveSearch was on a browser, you have all these things available to you.

You don't have to go JQT, you don't have to do any of that stuff.

You're making that barrier of entry as low as possibly could be.

>>Kurt: That definitely was a barrier for Devin McCormick.

he was disappointed that he couldn't use the add-on with the console interface.

Is that common, do you think.

Do we have any intuition about that, or even better, data.

There are some people who prefer the console to JQT, because they're using their own system.

Like, you might be running it inside of Emacs.

Emacs, yeah.

>> Yeah.

>> You can't, Emacs only knows how to deal with terminal applications it doesn't know how to deal with.

>> Of course.

>> Containing a graphic.

It could at some point be extended to deal with something that does graphics, but it would be a major overhaul of the system.

Would also have very limited functionality, because a lot of what it's doing is being able to work with the text abstraction, you don't get that when you have a graphical interface.

The jump I see happening that takes you beyond the question of, are they using JConsole or not, is if you do it all in a web browser, if they really want to get into it, they can go back and do JConsole all day, but they could also just do it with Playground and the live search and a window open on the wiki.

And if, you know, again, I'm not taking this as everybody would want to do it that way, and there will be limitations as to what the playground can do.

But as an entry point, as an easy way into using a lot of this language, and might be doing, you know, optimistically 80% of what anybody who comes from outside would want to do with it.

It'll do that.

And if they wanted to get deeper, they're going to have to download the language and and work in an environment that allows them to do that.

It might be console, it might be JQT, but we're putting this little barrier up that keeps on top of the fact that it's an array language and the fact that some people have trouble understanding, you know, the ways that the language syntax and the semantics and such go.

We're just putting that up, yeah.

I don't think live search is, no, it's not live search.

I don't think that the full corpus is of interest or value to a beginner.

- And that's actually where categorization comes in.

Absolutely, I would say.

- I think that the collection of documentation that a new user is exposed to should be quite limited and carefully curated and edited and so on.

- Absolutely agree.

- Letting them stick their head into that 120,000 document corpus strikes me as counterproductive to their initial experience.

- Which again, and we should wrap up 'cause we're running over time and it's one in the morning your time Ed.

But what that makes me think is all that reference area, is that actually just live search.

That reference category.

- Oh, you just pitch the reference category, you just pitch all the reference entirely and just say we have a live search interface.

- We have a live search, yeah.

- Yeah.

- I don't know, I'm just gonna put it out there.

'Cause that's the sort of thing I'm thinking, we're building this, Are we looking ahead.

Are there things that we, if we build it this way, you know what, live search is a reference.

Now what you focus on is newcomers and developers because those are categories of curated information.

- One question I would have there is you don't want to yank the rubbug out from people who already have access to existing pages.

So you might be thinking of, I don't have to build this because this other thing will replace it.

But there's a temptation with, especially with computer folk to say, you know, that we want to consolidate and have one reference that's the reliable reference.

And you do that with documentation or anything on the web really, and all of a sudden, you know, people have bookmarks are using those go dead because you've replaced it.

- No, what I'm, no, okay.

So this is the time to think about that because if I took reference out now, nobody should be using that.

It isn't live essentially, it's sitting on the wiki, but it's not live.

- Right.

- And if I took it out now, it would have, might have some impact, it would be a minimal impact.

And we're not changing any of the pages that are attached to it.

All those bookmarks would be exactly the same.

But we would say live search is your reference manual.

You can go through this, you can look at it, and you can pull the information out, and you decide where to go looking for it.

I don't know.

Yeah, I'm just thinking, I'm just putting that out there 'cause this is the time to be thinking about that and not, as you say, six months from now, we got references and they're going, you know what.

We don't need reference.

It's a lot of work to keep up and really live search means we don't need it.

And now people have links to pages on it.

I guess you would just leave it up.

That's what you would do, but you know, yeah.

I don't know, that was my forward thinking bit.

But thanks so much for your time.

I think this was super productive and I'm really looking forward to where it goes 'cause I think there's a lot of really cool stuff that a lot of people are gonna be very surprised by in a positive way.

- Well, good, I'm really glad to hear that.

And I'll try to have us in a consistent state a week from now.

so that maybe we can stand on more solid ground with respect to the implementation.

And thank you both very much for all of your thoughts on that.

I really appreciate it.

It'll give me plenty to do over the next seven days.

- Well, and thank you so much for all your work.

It's it that's- - Oh no, this is about keeping me from going off the deep end.

I need this distraction desperately.

So I really appreciate the opportunity.

- Thank you for your neurosis then.

(laughs) - Always a pleasure.

Take care, both of you.

I'll see you next week.

Bye-bye.

Goodbye.

Bob, before you go, one other thing.

On the left side of the slide bar, we have the Recent Changes link, and there's another link that I frequently use that I keep wishing was right there, which is the Special New Pages link.

Okay.