Here’s our update on that http://notes.envato.com/general/envato-response-to-the-heartbleed-ssl-vulnerability/
The server we use for account login ( https://account.envato.com ) was vulnerable, but we patched it, and reissued the SSL private keys and certificate. It’s all good now.
It’s still good practice for users to change their passwords though.
This is a known bug we’ll be fixing soon.
The search index doesn’t get updated when item attachments change at present.
In the meantime, you can fix these items yourself. You just need to make another change to your item (like adding a blank line at the end of your description). That’ll trigger the search engine detecting the item has changed, and reindex it properly.
We usually hold a post-incident-review session after events like this where we do proper root cause analysis and figure out recommendations on how to prevent it reoccurring. We haven’t done that yet, as the team has been busy cleaning up lingering issues, and getting everything back in order.
I’m sure that when we’ve regrouped and done some proper analysis, there will be more comms about it.Short version of what happened:
- load spike on marketplace sites – possible DDOS attack
- primary database got overloaded
- automated failover to secondary at the networking layer. DB IP traffic from app servers sent to secondary DB – (THIS SHOULD NOT HAVE HAPPENED)
- secondary database was configured to allow writes – (THIS SHOULD NOT HAVE HAPPENED)
- writes went to secondary DB for ~ 45 mins causing a huge data headache to be unstitched by the dev team
- writes went back to primary DB
- all data safe and nothing lost, but some data in the wrong place
- dev team does analysis, and moves data around to get everything back as it should be
- DDOS protection layer enabled to prevent further problems and keep the site from being taken down by attackers. However, DDOS protection can cause issues with API requests, and can cause page load performance decreases
note – I’m just commenting as an observer, and not officially. I haven’t really been working closely on this myself (aside from the initial recovery). This is just info that has been discussed already.
Hey all, have been distracted for a few days with other things going on around the place.
No backend changes to search have been released since last update. Some visual UI tweaks have been worked on.
Main thing happening in search land is that we’ve built a proper system for testing relevancy changes so we have better metrics about what changes and how prior to releasing. This means that we can replay the top 1000 queries for each site, get the order of items for both the new code, and what is currently live, and compare the results to show how divergent they are. This will help prevent facepalm stupid level bugs with search getting live.
We also had security evict Justin Bieber, as he was still hanging around the office, and wouldn’t leave. Also turns out his code is crap, and we wouldn’t want to have deployed it anyway.
But nevertheless, as long as the description is used at all in the search, a good strategy for the author is to filter out all terms that are not likely to be searched upon and maximize the density of good search terms for his/her item (more or less the way one goes about thinking up tags, actually)
The danger is though, that a lot of information that is very important to buyers is not search friendly. Not only is it not search friendly, it dilutes the density and the search engine score of the search friendly terms.
Look at my description on this item http://videohive.net/item/sketch/91631
Would you concede that it would rank higher on the terms “sketch”, “preset”, “drawn”, “cool”, “animation” and “look” if I were to remove everything except the first sentence? What I should do is encapsulate the latter part of the description in a gif image and embed that in the description, leaving my searchable keywords as a greater constituent element of the overall description.
You see how search is bending the purpose of the description? As long as this is the case, authors will be writing descriptions that are primarily about SEO and not about giving the customer the most useful and relevant information.My suggestion would be to stop searching the description altogether and add some more keyword options. These are pure, targeted search tags. Otherwise, as I see it, you’re designing misuse into the system.
You would get a higher score on the description, and only on those terms if you had only one sentence, but there are other fields that are used too,, so in theory that would work, it wouldn’t be a good idea, because as you say, there is useful info for buyers there. Description already has a pretty low boost, and we’re looking at turning it down further.
Just to add to the ‘items that have taken a hit’ list; we’ve seen a noticeable drop on our end. We normally have a few sales a day, and that since the first item we published here (a year ago)—but now it’s been a week with no sales at all, this definitely has do to with something search-related.
Category:Motions Graphics October:52 sales November:60 sales December:58 sales January:36 sales February:39 sales MARCH:??? Is this some kind of help to fix the search engine??? Or is this a pure coincidence??? Sorry for bothering. Regards.
Hope that information is of use and you can help me diagnose / remedy my steady drop in sales.
- 11 days since I sold my track Digital Distortion. This is a collaboration with my friend nemanja_reMAKE who has also complained about a 60% drop in sales
- It’s been 8 days since I’ve had a sale according to my dashboard, the longest stretch for almost two years
- My sales have gone from 53 Sales (Nov 2013) to 39 Sales (Dec 2013) to 28 Sales (Jan 2014) to 25 Sales (Feb 2014)
Sorry, but it’s very difficult to say in all these cases. Can’t see anything obvious, except that if items are in a very competitive niche, then it can be prone to be affected a LOT of factors (search and non-search)
One of the things we want to add is better tracking of what search terms led to click throughs to an item, and what search terms led to sales. Partly for metrics so we can understand what is happening, but also to add for calculating relevance better. And possibly exposing this to authors so they can see what generates traffic, and let them optimise that.
just a question, how much time it takes for the search to get in account your description after you’ve edited?
Usually within a few seconds. Could occasionally be up to a few minutes if the indexing server is under heavy load from doing other background tasks.
I tried to find “piano relaxing” on AudioJungle. Page 2, 3, 4 … are songs by the same author and have no relation to search. O_o http://audiojungle.net/search?page=2&term=piano+relaxing
Videohive, the results of “Wedding invitation”...
This files on first page:
And my template, that has tags “wedding” and “invitation”, and has 237 sales (the third of sales)... only in the middle of second page?
Where is the logic?
Hmm, that doesn’t look right. Will take a look at it.
Those ones show up because the item author had the search terms in the item description as links to the author’s other items (with that name). Link text is a known problem we need to clean up, probably some smarter way of filtering matches on that out.
It’s rather confusing to see the same authors coming up over and over again in searches but somehow never seeing my own items. Should we believe that it’s possible to work an item up into the search results even if it’s not showing up in any of the 30 pages of results? I’d like to think this is not going to be a wasted effort otherwise these items become essentially worthless if they can’t be found.
It depends on the search term. If it’s for something very broad, then it’s tough to get to the top – not everyone can be in #1 spot sorry
I have a suggestion in where a search field is needed, the Downloads Page. I have over 107 products purchased on Envato’s marketplaces but it is hard to find a particular purchase without having to go page by page.
Definitely on our list of things to work on.
Hi, I just noticed some search-related behaviour that I think might be confusing for buyers:
When I search for any term, the system first shows the “Best Match” results.
When I now move from the first page of the “Best Match” results to any following page – and then sort by sales, I don’t get redirected to the FIRST page of the “Sort by Sales” results. Instead I stay on the page number that I had been on…
How does this make sense regarding usability? When I search through several pages of “Best Match” results and don’t find what I want and therefore want to resort the results for my search term, I’d want to start from page one…!Am I missing something here? Has it always been like this?
Yup, that’s a bug. I think we’ve got that noted, but will make sure we do already.
Another search anomaly?
I’ve been testing some of my items in search and found another problem. – presumably to do with the synonyms again. The reason I’m writing this, is because it’s kind of frustrating to have a retro file in the retro category, with retro in the tags and description that has 850 sales, that only lands on page 4, right next to an older, lower-selling file with no mention of “retro” at all. Is there something I’m missing?
The file in question is Delicious. It’s a retro file. It’s in the category After Effects Project Files / Video Displays / Retro. It has “retro” in its tags and it has “retro” in the description. It has 850 or so sales. It shows up halfway down page 4.
I wonder if you’re searching the category somehow as a string, so Motion Graphics / Backgrounds / Retro ranks much higher than After Effects Project Files / Video Displays / Retro and concluding that the first has a higher density of the word “retro” because the rest of the category string is shorter?
Also, is a combination of the over-favouring of the category with synonym search problems and over boosted newness, throwing new results far too high up the charts, resulting in loads of new files ranking high in the “retro” charts that have little or no mention of retro at all… neither in the tags, nor in the title, nor in the description.
Is there a kind of double counting going on? In the Indie Leak Transitions file, there’s the word “retro” and the word “vintage” amongst the tags. Here, the user has himself used a synonym in the tags. Presumably, then, that’s two “retros” out of 15 in the tags, or maybe each is being counted twice – hence the big boost. If I created a file that I definitively wanted to appear high up in the retro category, could I just get my thesaurus out and pop in a handful of “retro” synonyms?
Or is there something else going on? Are you already managing to log successful user searches from the past and search on those as well, which means some files are getting boosted for reasons that aren’t immediately apparent to the user?By the way, is the sales boosting fixed for the other marketplaces yet, or are we still working with Themeforest figures? The “retro” search is throwing up pretty much entirely new files on page 1. I do understand the need to boost new files, and I’m cool with the idea that this will displace other files. It’s good for me too when I release a new file. However, it currently doesn’t seem right that a general Best Match should feature almost exclusively new files at the top of the charts, and thereby displace stuff that’s far more relevant in terms of tags, descriptions, sales and so on.
Great post. For each of the items you originally listed, the all come down to the same root causes:
- ALL the items around page 4 for that search have VERY close scores (like 3% difference from top of page 4 to bottom). Normally the following things I’m talking about wouldn’t have such a difference, but when scores are tight, little things matter
- Sales boosts on videohive aren’t fixed yet, and don’t have as much effect as they should
- Score for category depends on number of items in the category – smaller categories get a higher boost (this is how TF-IDF works). That might not be the desired behaviour though, and might need some rethinking for categories at least. At any rate, agreed that category boost is too high
- synonyms don’t have an effect on the term “retro” at all, so that’s not a factor here
- being featured doesn’t affect boosting (not yet, but it probably will in the future)
The balance of search still needs a lot of calibrating.
Yes. Yes it does. That’s what we’re doing
Thought: How about making a relevance based only on tags / title / category / description or whatever… (okay… maybe with a tiny bit of newness built in) and then organize these into finite relevance strata. Assuming the relevance were a floating point number from 1-20, you might have 20 strata based on rounding down the floating point number to an integer. Then within each of these 20 strata, you could organize the files newest first.
Effectively it’s a search based on relevance, then on newness.
That way, a file only lands at the top if it’s new AND relevant.Sorry… once something gets into my head, I can’t stop thinking about it. You probably thought of all of this.
Yup. That’s something similar to what we want to do. At the moment a lot of irrelevant items are floating up and being displayed, when it would be really better just to filter them out to begin with.
Relevance is tricky, as it’s not a fixed value range, but depends on the query, and the items returned. It could be ~ 100 for very strong matches, or ~ 0.1 for queries with very few good matches. So we’re thinking of running the query in two phases – 1 to get an initial summary of score spread, and to find a sensible cut off point, then 2 to run the query the a min score filter set to get rid of the cruft. Needs a bit of engineering to implement though.
I wrote this a while ago in another thread, and would love to bring this to your (or Envatos) attention again.
As a more or less “huge” company, in regards of the amount of electronic data, you should realy take in account to have a mirrored system for tests and developement.
I have worked many years in a IT (SAP) developement departement, and it’s crucial to work on business critical changes in a test environment before going live with it. Envato should understand that they are responsible for many individuals here which are living on the income from this marketplace. So it’s not something that should be taken light hearted and see how it goes.
Don’t get me wrong, i don’t want to offend anyone here, especialy not the hard working developers at Envato, but you should explain your “Bosses”, that it’s absolutely wrong to make changes and tests on a live system before everything has been tested and verified to work 100%.Thanks for reading and sorry for my english.
Yup. See note at the top of the post about this. We have very strong automated testing in general, but tests for search relevancy has been a big blind spot for us. Will be better going forward now we have new testing infrastructure in place to handle this.
Agreed, generally you will want to push these types of changes out to a percentage of users and then aggregate the results and make decisions based off of the mine data. Then after that if you need more changes, you go through the same method until you get things just perfect and then you push out to the entire community. But hey, what do we know? We just do this for a living right
That is what we do already.
OK now I am back with some sad news. Search is a mess again… Check this out… the results of
“Travel Opener” TOP RESULTS BEST MATCH:http://videohive.net/search?utf8=%E2%9C%93&term=travel+opener"
“Cinema Opener” TOP RESULTS BEST MATCH:http://videohive.net/search?utf8=%E2%9C%93&term=cinema+opener
Well I can see that there is certain system in the search : Travel = holiday ??? That’s mistake or at least the items with word “travel” should be ranked higher than the items with JUST word “holiday”
Ok, seems that the category is doing an AND match, so “travel opener” and “cinema opener” doesn’t match items based on the “opener” category – if it did, the results would make much more sense. Will make sure that gets looked at, as I totally agree that it’s kind of crap with those results.
Synonyms do get a lower boost than a match on the original term. The problem with that query is that because it’s not finding the good results in the “openers” category, the other results are lower quality, and things like synonyms start to have more effect at the tail end.
I’ve clicked the microlancer banner a few times by accident thinking it was gonna take me to the next page. Would be nice to checkout the event tracking for the microlancer banner ad above the pagination.
Yeah, that’s annoying. I do that too. Will see about getting it changed. IMHO, it should go below the pagination. But that’s just my personal opinion though.
How it works?
I was use two keywords “animal” and “animals” and get differen results
Why search results so different? Why “animal” =/= “animals”? I think, search results for”animal” must include both results.
Tag search and tag faceting is just doing a filter on the tag with an exact term match, and not stemming it like is done with name/description etc. So “animal” and “animals” show up as 2 different tags. Something for us to think about how to handle. That doesn’t make much sense.
Why is a search function a problem ? Why not use Lucene Search or sphinx, and everything would just be perfectly stable. With a re-indexed search every minute ( if not in progress )
We’re using elasticsearch, which is built on top of Lucene. We previously used sphinx, but that has a lot of limitations in terms of what you can do with it.
Indexing or stability of the system isn’t the problem, it’s tuning search queries. Tweaking them so they make sense; are useful to the most people; return quality results; and return fair results so authors don’t get an unfair treatment.
If this was an e-commerce site, and not a marketplace, this would all be a lot simpler
But that’s what keeps it interesting!