The things you can do with search

Posted on
of 7
  • Tick "comments", before unticking "title" perhaps?

  • Now I don't have the "comments" option at all. I have

    Types to display

    • Conversations (unticked)
    • Events (unticked)


    • forumId = 549 (ticked)
    • Following (unticked)
    • Title matched search terms (ticked)

    Order by

    • Relevancy (radio button unselected)
    • Recent updates (radio button selected)
  • I don't have the "comments" option at all.

    Sorry, I didn't realise you only had "bronze" membership.

  • I have a poo-brown badge on my avatar when I log in. That lets me look at 50 posts a week and I get sent an email on my birthday saying "YOUR A CUNT" in the subject line.



    The proof-reading is subbed out to Ukraine these days and we is having comms trouble.

  • Thank you, that's really helpful.

    The bug happens when the forumId is added, that slows it down. Part of the way I've implemented that must no longer be good, probably since I added the sub-forums stuff. I'll look into it.

    It is blindingly fast without the forumId:­0gr9%22+sort:date

  • Thanks - to cut through the complexity, it's basically a search within a Classifieds sub forum to find a certain item. Often the items are not called out specifically in titles, as people title their thread "January Clearout" and so on.

  • I've got to the bottom of this... it can be summed up as "counting is hard".

    The actual search takes less than 2 seconds, but counting how many search results there were takes about a minute. There are only 52 results, but the search has to be fully generated and stored as a temporary table for the count to happen... and that is apparently costly.

    I'll need to split up search so that it does the main search as one query, and does the counting as another query. I'll see if I can do it this evening.

  • I did some tuning of the search whilst I was there... I've probably saved 2ms off a search that takes 600ms. Oh well.

  • Magnificent! I do appreciate your dedication.

  • I've just confirmed that running it as one query (do the search and figure out how many results there are whilst only returning the first 25) takes 1 minute and 38 seconds.

    Running it as two queries (do the search, return 25 rows and then figure out how many results there are) takes 2.5 seconds in total.

    Definitely will change this.

    Though... there is a risk that the search will be slower on subsequent pages. i.e. it may take towards 15-20 seconds (and risk timing out on some queries that return very large result sets) to fetch page 2 or more.

    In essence I had chosen to implement a way of searching in which the cost for all pages was equal, but it's probably best to tune for making page 1 very cheap, and subsequent pages expensive.

  • Would that mean that while a user is reading page 1, the subsequent pages would be being fetched and...cached for loading?...

    Sorry I'm not massively familiar with this level of coding.

  • Ooh, that's a good idea.

    It didn't mean that.

    It meant that "page 1 is fast", "pages 2 and onwards are a little slower".

    But... hey, I could prefetch the next search page and then when you click the link the browser either has it, or the server has already cached the query... either way that would be fast.


    So my changes this evening (when I get home) will be:

    1. Split the search query into 2 queries, one that does the search and one that counts the results.
    2. Add a prefetch meta header to the search results page.
    3. Entirely unrelated, add a meta header to prevent sending referrer headers off-site, which improves privacy.
  • Made the major change... I'll see how it goes.

    Your original query:­9+forumId:549+sort:date

  • Cheers ^

    It worked on mobile last night but is throwing a server error for me now on both mobile and desktop...

  • OK, I have another approach... I'll try this out this weekend...

    WITH m AS (
        SELECT m.microcosm_id
          FROM microcosms m
          LEFT JOIN permissions_cache p ON p.site_id = m.site_id
                                       AND p.item_type_id = 2
                                       AND p.item_id = m.microcosm_id
                                       AND p.profile_id = 47686
          LEFT JOIN ignores_expanded i ON i.profile_id = 47686
                                      AND i.item_type_id = 2
                                      AND i.item_id = m.microcosm_id
         WHERE m.site_id = 234
           AND m.is_deleted IS NOT TRUE
           AND m.is_moderated IS NOT TRUE
           AND i.profile_id IS NULL
           AND m.microcosm_id = 549
           AND (
                   (p.can_read IS NOT NULL AND p.can_read IS TRUE)
                OR (get_effective_permissions(234, m.microcosm_id, 2, m.microcosm_id, 47686)).can_read IS TRUE
    ), sr AS (
        SELECT si.item_type_id
          FROM search_index si
              ,plainto_tsquery('mks gr9') AS query
         WHERE si.site_id = 234
           AND si.document_vector @@ query
           AND si.microcosm_id IN (SELECT microcosm_id FROM m)
    SELECT total
          ,ts_headline(document_text, query) AS highlight
          ,has_unread(item_type_id, item_id, 47686)
      FROM (
               SELECT COUNT(*) OVER() AS total
                     ,ts_rank_cd(si.document_vector, query, 8) AS rank
                 FROM sr
                      JOIN search_index si ON sr.item_type_id = si.item_type_id
                                          AND sr.item_id = si.item_id
                      JOIN flags f ON f.item_type_id = sr.item_type_id
                                  AND f.item_id = sr.item_id
                 LEFT JOIN ignores i ON i.profile_id = 47686
                                    AND i.item_type_id = f.item_type_id
                                    AND i.item_id = f.item_id
                     ,plainto_tsquery('mks gr9') AS query
                WHERE f.site_id = 234
                  AND i.profile_id IS NULL
                  AND f.microcosm_is_deleted IS NOT TRUE
                  AND f.microcosm_is_moderated IS NOT TRUE
                  AND f.parent_is_deleted IS NOT TRUE
                  AND f.parent_is_moderated IS NOT TRUE
                  AND f.item_is_deleted IS NOT TRUE
                  AND f.item_is_moderated IS NOT TRUE
                  AND (
                          -- Things that are public by default
                          COALESCE(f.parent_item_type_id, f.item_type_id) = 3
                       OR -- Things in microcosms
                          COALESCE(f.microcosm_id, f.item_id) IN (SELECT microcosm_id FROM m)
                ORDER BY f.last_modified DESC
                LIMIT 25
               OFFSET 0
           ) r

    That correctly returns 52 results, in the right forum, bearing in mind the scope of the search and any ignores that you may have in place.

    Oh, and it does it in 111 milliseconds.

  • Notes to myself beyond the above...

    UPDATE search_index si
       SET microcosm_id = f.microcosm_id
      FROM flags AS f
     WHERE si.item_type_id = 4
       AND si.microcosm_id IS NULL
       AND si.parent_item_type_id != 5
       AND si.parent_item_type_id != 3
       AND f.item_type_id = si.parent_item_type_id
       AND f.item_id = si.parent_item_id;
    CREATE INDEX ON search_index (microcosm_id ASC NULLS LAST);
    -- In the search query, does this even make sense?
    COALESCE(f.microcosm_id, f.item_id) IN (SELECT microcosm_id FROM m)

    Also update update_revisions_search_index() to set the microcosm_id when a comment revision is posted.

  • What language is that? I can only recognise python and javascript!

  • Alright... I think I have it nailed.

    I hope I do anyway.

    Let's see.

  • By jove I think it's worked!!!!

    rep rep rep rep

  • I know.

    I'm all kinds of pleased with it.

    But I'm also looking at a search query that makes my head spin... not least because what you see above is one permutation of it, and it can actually build itself in hundreds of different ways and I needed to make sure I hadn't slowed down some of those other ways.

    Searching things that have attachments is faster now too.

    Am pleased with myself. Will go to bed smug as no-one has yet reported something has broken. If I can make it through my birthday without reports of me having broken stuff, it will be a good day.

  • @Velocio, the search function I find most useful - searching for a thread within a sub-forum - seems to always give me this these days. Wassup with that?

    Server Error
    We're sorry, there was a temporary error when we tried to process your request.

    The reason was: Too many results to process, please make your search more specific.

    Click here to go back to the homepage.

  • I wanted to see if a youtube video had already been posted on here, so I tried searching by just the video ID (because the rest of the URL can take a number of forms), but it doesn't seem to work.

    Am I missing something?

  • That basically does not work, as the search is designed for language and not links.

    But in theory I could make it work, as I already store every link in a different table. Could probably add it by adding a link:8mQxum-DEH4 which would find all places this video were embedded:­

  • Yeah, that was the video. I assume it's been spammed all over LFGSS by now. I did try a google search on LFGSS though, and found nothing, but Google's caches are never fully up to date...

    Have you considered Elastic search? People tell me it's the mutt's nuts. I've been meaning to take a look at it.

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview

The things you can do with search

Posted by Avatar for Velocio @Velocio