Dear Envato Staff… Don’t you think you owe authors some answers?
Last week the entire website went down but we still don’t know what happened exactly. There is a nice topic which has some information to calm people down but it’s still unclear what happened. Multiple disks failed subsequently on a database server? Bug in the database software that you use? Security breached by some bad guys? Can we be sure everything is under control?
I wouldn’t complain about this if everything was fine now. However, I noticed significant page load slowdowns during the day (reaching 5-10 seconds at peaks). People complain (so do I) about API being completely down. And guess what? I can’t find anything on these issues anywhere.
The thing is that the relationship between you and us (authors) is based on trust. The other thing is that a big part of us are techies who love facts and hate vagueness. I am 100% sure you know everything about us and need no further education here. But you must act on this – be crystal clear on what is happening behind the scenes.
I love you guys, just wanted to share an advice that you would be wise to take into consideration.
Cheers, Paul S.
I think everything we needed to know was disclosed in that thread. Sometimes it’s better not to disclose every little detail of whats happening behind the scenes as it could be a security risk.
The only possible security risk that I am aware of would be to share some exploit (that could have potentially be used against Envato servers) details. Which I am not asking for. And I am pretty sure nothing like that happened. In all other situations “something happened and you don’t deserve to know exactly” is not a good answer.
And as I said – it’s not about that single case. It was handled pretty well (except of not disclosing what happened at the first place). It’s about lack of information when continuous problems occur (API malfunction, temporary page load slowdowns).
We usually hold a post-incident-review session after events like this where we do proper root cause analysis and figure out recommendations on how to prevent it reoccurring. We haven’t done that yet, as the team has been busy cleaning up lingering issues, and getting everything back in order.
I’m sure that when we’ve regrouped and done some proper analysis, there will be more comms about it.Short version of what happened:
- load spike on marketplace sites – possible DDOS attack
- primary database got overloaded
- automated failover to secondary at the networking layer. DB IP traffic from app servers sent to secondary DB – (THIS SHOULD NOT HAVE HAPPENED)
- secondary database was configured to allow writes – (THIS SHOULD NOT HAVE HAPPENED)
- writes went to secondary DB for ~ 45 mins causing a huge data headache to be unstitched by the dev team
- writes went back to primary DB
- all data safe and nothing lost, but some data in the wrong place
- dev team does analysis, and moves data around to get everything back as it should be
- DDOS protection layer enabled to prevent further problems and keep the site from being taken down by attackers. However, DDOS protection can cause issues with API requests, and can cause page load performance decreases
note – I’m just commenting as an observer, and not officially. I haven’t really been working closely on this myself (aside from the initial recovery). This is just info that has been discussed already.