Jump to content
Sign in to follow this  
brands online

Why all the bugs?

Recommended Posts

brands online

Hi All,

 

I'm pretty sure BOB staff are gonna attack me for this thread, but I don't think I'm alone when I say there are a whole lot of bugs/episodes on the site these days. Different times on different pages, funny characters appearing on ads, all sorts of stuff, not to mention the downtime today.

 

Is there a problem of recent as this is becoming more of a norm than an exception. Every day someone posts a complaint about a bug on the site. If it's all the changes, please, rather keep it simple...

Share this post


Link to post
Share on other sites
Miss Jewels

Or what about upgrades are done from 3 am so that when auctions are due to open, all the hic-ups have been sorted out?

Hi All,

 

I'm pretty sure BOB staff are gonna attack me for this thread, but I don't think I'm alone when I say there are a whole lot of bugs/episodes on the site these days. Different times on different pages, funny characters appearing on ads, all sorts of stuff, not to mention the downtime today.

 

Is there a problem of recent as this is becoming more of a norm than an exception. Every day someone posts a complaint about a bug on the site. If it's all the changes, please, rather keep it simple...

Share this post


Link to post
Share on other sites
MacMuffin

To clarify: The downtime today was unprecedented as all redundant infrastructure failed resulting in a complete site-failure. We are still unsure about the root-cause, but our focus was to get operational again. There is separate information regarding the trades today and communication will be sent out to all affected sellers and buyers in the next hour or two.

 

The last small feature release was done on 10.01. and we go through a number of stages before we release into production. Once we are production ready, we release first to Kenya and then to one of our secondary co.za servers.

 

I am not aware of any bugs, but if there are, please provide us with more detail.

 

Lastly, there have been massive outages across a number of ISPs and datacentres during this week which could have negatively affected your experience (see here: IS ADSL network problems).

Share this post


Link to post
Share on other sites
brands online

Hi, thanks for the reponse, even if it is just a shift of blame. The fact is a couple of months back. Snap Fridays were affected too and were extended until 15:00. One also just needs to look at the many many threads and posts relating to site problems to establish that there is a problem even if you don't want to call them 'bugs'. That you pre-test and launch to secondary servers, etc, is well and good but of no consequence if after features are launched live, they affect the site anyway.

 

It's not a case of sour grapes, I've also complained on various occassions about things like ads opening up as secondhand and been given silly answers like the original ad must have been opened as secondhand. While these answers might appease you, they do nothing to improve 'user experience' when they're a quick way out to kill the subject but no real solution.

I guess what I'm saying is , please recognise that though you will continuously deny it, there are flaws and bugs on the site.

Edited by brands online

Share this post


Link to post
Share on other sites
MacMuffin

Hi BrandsOnline, please post your list of bugs/flaws in the feedback section. As it stands I am not aware of bugs affecting trades or the functionality of the website. There are several enhancements and feature requests we have in the pipeline originating from feedback we received from users.

 

We are not trying to shift blame, I just outlined that the last week was generally bad for internet users using services from certain backbone providers - it's a hard reality that South Africa relies on a number of core-players in the broadband world and similar to your dependency on Eskom for electricity we depend on service providers to ensure that our infrastructure and connectivity is functioning well.

 

No IT-system will ever be 100% perfect (recent outages across multi-billion dollar companies such as Twitter, Facebook or local companies such as SBSA demonstrate this on occasion) and we will look at further improvements to avoid outages like today.

 

Lastly, with regards to items open up as 2nd-hand, we did a thorough investigation in September and have not found any issues in the relisting logic. If you have tradeids on hand we can re-open the case and go through our audit-records again.

Share this post


Link to post
Share on other sites
RISadler
... as all redundant infrastructure failed ...

 

Now that's not supposed to happen, is it?

 

We are not trying to shift blame

 

Yes, you are... how is Telkom/M-Web/Seacom responsible for all your redundant infrastructure failing?

 

No IT-system will ever be 100% perfect ...

 

Correct. Any IT-infrastructure is only as good as the techies looking after it and it's a known fact that adding humans degrade efficiency by 99%, at least.

Share this post


Link to post
Share on other sites
MacMuffin
Now that's not supposed to happen, is it?

 

Nope - tks for the constructive feedback otherwise ;-)

Share this post


Link to post
Share on other sites
RISadler

Pleasure... guess all that redundancy is now redundant?

Share this post


Link to post
Share on other sites
MacMuffin
Pleasure... guess all that redundancy is now redundant?

pretty much - having redundant switches, blades, storage, power is comparable to running with multiple girlfriends - cant count on just one when you need one ;-)

 

on a more technical/serious note: today's outage was unheard of with IBM as our whole Bladecentre backed up without any alerts, error events on hardware or software and none of the technicians have seen 5 physical blades becoming unresponsive at the same time while none of the diagnostics produced any warnings or errors. Although the infrastructure is up we will continue analysing the root cause for answers.

 

we will post a follow-up during the course of next week once we have clarity on what exactly caused the complete site-failure.

Share this post


Link to post
Share on other sites
RISadler

I left the IT business just as "blades" were coming in, so might be wrong about this... don't these blades effectively run off the same power source and insert-blades-here thingy? If so, then in my opinion that's pretty much not redundancy/fail-safe/whatever.

 

As to IBM... always hated working on those... their techs were (are?) the most arrogant/snotty... and AIX is probably the most useless OS ever devised, with smit never able to do things the way I wanted them done.

Share this post


Link to post
Share on other sites
MacMuffin
I left the IT business just as "blades" were coming in, so might be wrong about this... don't these blades effectively run off the same power source and insert-blades-here thingy? If so, then in my opinion that's pretty much not redundancy/fail-safe/whatever.

 

As to IBM... always hated working on those... their techs were (are?) the most arrogant/snotty... and AIX is probably the most useless OS ever devised, with smit never able to do things the way I wanted them done.

 

Agree with you on AIX and IBM locally. The blade-fabric has changed drastically over the years. In our scenario we have 4 independent power-supplies, two separate power-feeds from the data-centre, redundant IO-fabric in the chassis (i.e. each blade has separate IO modules for SAN, network). Having said that, it was no hardware failure and we could not connect to any servers while they still served traffic and eventually (at about 10am this morning) stopped responding. The SAN itself provides the same setup with 12 redundant drives and also did not show any issues - all still quite mysterious.

 

A simple power-failure or loosing two power-supplies would have not caused a problem. The same would have applied if we had a blade-failure as we could have moved to different physical blades within minutes. Even firmware corruption on the fabric-managers can be excluded as all diagnostics are clean. Also no reported outages on the backbone at Internet Solutions and no reports of anyone walking past with strong magnets :-(.

Share this post


Link to post
Share on other sites
RISadler

I have seen something similar in the past... obviously not with such a grand setup as yours, but in principle the same.

 

Are the machines running AIX and if so, have you checked the /tmp directory - which is a seperate partition, if I remember my AIX correctly? Obviously it'll be empty after the reboot, but maybe in the logs somewhere... :thinking:

Share this post


Link to post
Share on other sites
MacMuffin
I have seen something similar in the past... obviously not with such a grand setup as yours, but in principle the same.

 

Are the machines running AIX and if so, have you checked the /tmp directory - which is a seperate partition, if I remember my AIX correctly? Obviously it'll be empty after the reboot, but maybe in the logs somewhere... :thinking:

 

Nope - we are staying away from AIX (running CentOS), no space-issue (all servers have more than 80% available across mount points) and no CPU or memory issue before the failure. Also no unusual load behaviour.

Share this post


Link to post
Share on other sites
RISadler
... we are staying away from AIX ...

 

Wise decision.

 

... (running CentOS) ...

 

IBM allows this!? :surprised:

 

 

Maybe not very useful, but we had this database system which ran 99% on Linux (throwing fits every so many months) and yet it ran 100% rock-solid on AIX. On the flipside, I've heard that not all written-for x86 programmes run well on other platforms... but I assume these blades are x86-based?

Share this post


Link to post
Share on other sites
MacMuffin

yes, IBM supports CentOS (is a Redhat kernel). We have been running this infrastructure problem free since September. More IBMers are looking at this now over the weekend.

Share this post


Link to post
Share on other sites
RISadler

Good luck... I know these things are real buggers to trace. (But don't forget the /tmp directory.)

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×