Bugzilla Data Mining

I’ve been doing some data mining on b.m.o. recently, and have got some interesting results. I’ve been particularly looking at UNCONFIRMED bugs (of which we have a very large number) and what happens to them.

Any bug which ends up FIXED is a great thing. Almost any other bug is a net drain on resources – we’d have been better off if it had never existed. So, in an ideal world, all filed bugs would end up FIXED (and get filed with a crystal-clear description, a reduced testcase, and a pointer into the code…).

So I looked at all bugs filed as UNCO in the week of 2004-04-22 to 2004-04-28 (6 months ago) which eventually ended up FIXED. There were 38 of them. This is a chart of how long it took them to get confirmed. It was made by hand, checking the Bug Activity of each one.

< 1 day    18
< 1 week   13
< 1 month   3
< 2 months
> 2 months

Some bugs went straight from UNCO to FIXED; they were fixed after:

< 1 day
< 1 week    2
< 1 month
< 2 months  2 (but both of these should have been WORKSFORME)
> 2 months

So, based on this sample, any bug which hasn’t been confirmed after two months is very unlikely to become FIXED – no examples were found of this. (Please point out any flaws in this logic in the comments.)

So what are we going to do with this fact? More on that tomorrow.

11 thoughts on “Bugzilla Data Mining

  1. On the almost same line, I was thinking that it could be a good way to reduce the number of bugs in bmo by sending an email to unconfirmed bugs older than, let say 2 months because you seem to show it is reasonable but maybe 6 months to be less aggressive, asking them to verify if the bug is still existing and to close it if not.
    Of course variation of such may exist, for example send such an email only to the filler who have filled less than 5 bugs (more likely to forget) and so on… The only problem I see is that bmo could be considered as a spammer after sending 10000 emails :-)

  2. hehe… nice idea, but expect to get some mails saying that “just asking if a bug is fixed won’t get it fixed” ;)
    FranCK, if you should decide to start mailing and you need some help… then shout!

  3. Whatever is coming, I’m sure some people won’t like it :)

    I haven’t done any counting, but last time I triaged Firefox bugs, I found a number of UNCOs that were duplicates of newer bugs that had been fixed. Quite a significant proportion of bugs seem to be fixed by developers based on their own experience/diagnostics, even if someone filed a bug (confirmed or otherwise) previously on the same issue.

    It would, I suppose, be better if the triaging of bugs could be effective enough for developers to work more on fixing reported issues rather than doing their own diagnostic work, but if that’s not happening (which it isn’t), I guess it makes sense to adjust the system to fit the realities of report/triage/fix proportions. With the volume of reports, anything that gets lost in the process should get re-reported later…

  4. Does anyone know how to get a list of the oldest unconfirmed bugs?
    I’m sure there’s probably one from 98 or something heh.

  5. Uh, Gerv, you’re taking one week’s worth of data and extending it to the entire population of bugs. There are sample size issues here.

    Additionally, you’re equating “not fixed in 6 months” to “not fixed ever,” which is probably a decent heuristic, but I’d like to see some data to back it up.

  6. Gerv, as it’s already been pointed out, there’s an issue with your sample size. I know that in the past 2 weeks I’ve fixed a handful of “unconfirmed” bonsai bugs that have been sitting around for years. Presumably, they haven’t been fixed because no one had an active bonsai setup that they were willing to test the bugs on *not* because they weren’t valid bugs (they were) or that weren’t interested in getting them fixed (some had patches that had bitrotted).

    How many of those unconfirmed bugs are from non-tier1 platforms? We don’t tend to do a good job about not breaking non-tier1 platforms and when we know that we do break them, we’re never in a hurry to get them fixed. When a triager (developer, QA or other) doesn’t have the particular OS/hw combo or otherwise cannot test the bug, much less reproduce it, then how do you expect the unconfirmed count to drop?

    As a sidenote, it would probably help the triage effort if UNCONFIRMED was in the default query list for users that aren’t logged in. It would help the duplicate bug count issue as well.

  7. I think the major flaw in your logic is that it does not show the desired path of an unconfirmed bug, e.g. becomming a confirmed bug. I would expect that a significant amount of bugs gets confirmed so it first changes to new and stays there for a long time.

    I disagree that the remainder is “a net drain on resources”. Lets look for an example: https://bugzilla.mozilla.org/show_bug.cgi?id=26703
    It’s a typical first bug. The reporter did not use the nightly and stole developer time. But it was at least educational: the lesson was you need to use a nightly. Thats how you get people involved. If I look back it took me 6 bugs to land a real nice bug report (https://bugzilla.mozilla.org/show_bug.cgi?id=28844)

  8. Joe: sure, there are some sample size issues. Feel free to repeat the experiment with a larger sample. If it didn’t involve a manual step, I would do that in an instant.

    you’re equating “not fixed in 6 months” to “not fixed ever,” which is probably a decent heuristic, but I’d like to see some data to back it up.

    Well, I can’t predict when the other bugs filed six months ago will be fixed, if ever ;-) We could do the same thing with a 12-month-old data set, I guess.

    I know that in the past 2 weeks I’ve fixed a handful of “unconfirmed” bonsai bugs that have been sitting around for years.

    Sure. Bonsai is unusual. I’m thinking about Browser/Firefox/MailNews/Thunderbird here.

    As a sidenote, it would probably help the triage effort if UNCONFIRMED was in the default query list for users that aren’t logged in. It would help the duplicate bug count issue as well.

    There’s a bug arguing about what the default should be; but the correct fix is to steer people towards “find a specific bug”, which is supposed to be much better at this task than the general bug query page.

    I think the major flaw in your logic is that it does not show the desired path of an unconfirmed bug, e.g. becomming a confirmed bug.

    I’m not sure what you mean here. Those timings are for how long it took UNCO bugs to become confirmed (unless they went straight to FIXED).

    I’ll certainly take your point about some bugs not being a net drain on resources if they educate the reporter enough and then they become part of the community. Good catch.