We’ve completed all the investigation and communication with affected community members. If you think there are outstanding issues with your sales, purchases or data and you have not been contacted, please put a ticket into support via support.envato.com. Our internal review of the outage and data issues revealed that:
- we were hit by a spike in traffic which caused a full site outage
- when we recovered from this outage, our application incorrectly connected to one of our secondary databases, for a period of 30 minutes. This was due to a network mis-configuration of that secondary database, which caused it to behave as if it was the primary database
- when we realised the incorrect state of the database we brought the site into maintenance mode, effectively disconnected this secondary database and connected the application to the correct primary database
- during this 30 minute window, transactions were being written to the secondary database, which was meant to be configured in a read-only mode.
- we didn’t lose any data in this process, however, we had production data on two databases. This is very complicated to merge.
- we immediately began work on identifying and reconstructing critical data transactions written to the secondary database. This included performing manual adjustments to users accounts as well as replaying sales.
- we identified all users that had performed any actions that would have written data to the secondary database and contacted them directly. For the transactions that we did not automatically reconstruct, we recommended that the user perform the action again (for example, rating an item or commenting on an item)
- we have rolled out immediate changes to the database network configuration along with the installation of monitoring that would alert us to both mis-configurations in the future.
- we have a plan to improve our automated testing tools to incorporate checks for these kinds of configuration issues in a pre-production environment.
Once again, thank you for bearing with us as we got to the bottom of this incident, we’ll be working hard to make sure this doesn’t happen again.
Since my update late last week, we’re continuing to keep in touch with individuals that have been affected. We’ve also been working hard to identify any additional data inconsistencies and rectify them. In the meantime we’ve undertaken a detailed technical review of the incident and have started to put in place recommendations from this review.
The dev team has completed identifying all users affected by the data inconsistencies that occurred between 6:30AM EAST to 7:00 AEST. We’ve sent direct communication to those users and have restored critical data. Again I apologise for the inconvenience, we’ve made a number of immediate changes to our infrastructure configuration to prevent this happening again and will be undertaking a major review of the incident.
We’ve started the data restoration process and have contacted individuals who were affected. We expect the first updates to come through in the next half hour.
We’ve still got a number of small data inconsistencies that we’ll be continuing to work on. I’ll keep you posted on this thread, and again I appreciate your patience.
I’m John Viner, the Software Development Manager at Envato. Thanks so much for you patience on the two periods of downtime today, and again, we’re very sorry for any inconvenience this has caused.
Our initial investigation has indicated that between 6:30AM and 7:00AM AEST, after the site came up from the initial downtime, some data was written to one of our non-primary databases instead of the primary database. This explains why some information such as a small number of sales, appear to be missing. Nothing is actually missing however, we have all the data. But we are now working on merging this data back into our primary database so everything aligns correctly.
We’ve got the whole development team focused on this issue and I’ll continue to keep you informed on this thread as we make progress. Thank you again for your continued patience.
UPDATE: We are still working on rectifying the issues with some of our Search Infrastructure. In addition to the features Carmen indicated, you may notice that some of the top level menu items are affected as well. We are working hard to rectify the problems and will keep you posted.
Thanks again for your patience.
We’ll be writing regularly about the technical decisions we make, successes and failures, and about all the interesting technical challenges we encounter. If you’re interested in what goes on behind the scenes, head on over and check it out at webuild.envato.com. You’ll also get to see our pretty faces.
Hey all, thanks for your friendly responses and welcomes. I’m looking forward to more discussions on the forums with you all as I start to post more often about what we are up to – watch this space.
@Stuck_in_the_Basement – we are working hard to keep the baldys to at least 50% of the team, we are slightly outnumbered at the moment and suffering from plenty of jibes at morning standup It’s making recruitment a little difficult however as a lot of developers have hair … poor things
@doru – you got us on that one! we really should run the office images we use past our internal security team, this is embarrassing – although not quite as embarrassing as prince william and the MoD passwords fiasco revealed today
Hey @infuse01 and @billyf, it is great to hear from you and please keep the feedback coming, we need community members like yourselves who are vocal and passionate.
With respect to the specific search features you mentioned, although we’ve rolled out a new search engine, the user-facing changes we’ve made have been minor and only the beginning of a series improvements we want to make. You’ll hear from us when we roll out each new search improvement.