Collecting Usage Data in Eclipse

Article originally published June 6, 2008.

The Eclipse Foundation is home to hundreds of open-source projects that people engage with all of the time. They have a tremendous amount of data and information that sits readily available for any and all to access as they wish.

What you may not know about the Eclipse Foundation is that they are also interested in collecting data about the people who come to their website to check out what they are all about.

The Challenge of Tracking Open-Source Data Users

There are some pretty big challenges with tracking open-source data users. Some of the problems that researchers have run into include:

  • The users prefer not to be tracked
  • Open-source means that anyone can come in at any time for virtually any purpose
  • There are a lot of people all attempting to access the same data at the same time

For these and many other reasons, the Eclipse Foundation has had some trial and error difficulties when it comes to how they track people who are using their open-source materials. Still, they are doing the best they can to recruit people who will volunteer to be tracked in their usage of open-source data.

They just want to see what those individuals are using the materials for and how they might best serve up more open-source material to a crowd that can’t seem to get enough of it.

One of our challenges at the Eclipse Foundation is understanding how and what people are using Eclipse. Millions of people come to our website to download the various projects, find different Eclipse-based plug-ins (open source and commercial) and use them to create amazing software.

If we can gain insight into how people use the different pieces of the Eclipse ecosystem we should be able to improve the overall user experience.

This whole initiative got started after I went to a talk at last year’s OSCON by Joe “Zonker” Brockmeier. At the end of his talk, he made the point that open source projects have a particular challenge in getting to know their users: we don’t ask people to register, and we don’t even have the most basic information we need to help improve our software.

We lack the stats to make good decisions. His suggestion: ask your users to provide useful data. So that’s what we’re planning on doing. This is about helping our projects and our ecosystem to make Eclipse better.

For this reason, I am very interested in seeing the response we get to the Usage Data Collector (UDC) that we are planning to include in the Ganymede EPP packages. For those that might not have seen Wayne’s previous posts on this subject, UDC is a piece of technology that will track how and what people are using in Eclipse.

UDC has been included in the EPP Ganymede milestones packages and over 1500 individuals have participated during the past four months. We have created some initial reports and I hope in the future we will be able to provide some interesting information for our committers and the wider Eclipse community.

As you can imagine with any data collection technology, privacy is a huge concern. Therefore, to be clear, UDC is 1) opt-in, so only people that agree to send the data will participate, and 2) completely anonymous. No personal data, including IP addresses, is being collected.

In addition, the Eclipse Classic package will not contain any UDC code at all, so there is a simple option for users who really want to avoid this. For those who are interested you can review the code in CVS.

So far it seems that our approach to UDC has been well received by the community. No one has expressed any concerns to date, and 1500 opt-ins have more than met our expectations during the development phase.

Coincidentally, in the last month, the Mozilla community has begun talking about a somewhat similar data collection program. In Mozilla, some strong opinions have been expressed about collecting data at all. Therefore, I want to make sure everyone in our community has an opportunity to respond to this program before we make the final decision to deploy it.

We are very excited about the potential of UDC but we also want to ensure we respond to any community concerns. Please feel free to contact me (mike at eclipse dot org) or better yet leave a comment letting me know your thoughts on UDC if you have any feedback.

Tracking Large Groups

Eclipse has managed to get tracking information on 1,500 individuals who have used the open-source materials that they have made available via their website. This data is helping eclipse better understand what happens to all of that data after it has left the website.

They are able to keep tabs on it to some extent by seeing the various ways that individuals apply the data to whatever it is that they are working on.

Other communities are taking on this project as well such as Mozilla, but Eclipse is definitely the group that got it all kicked off.

There have been an incredible number of insights that were discovered as a result of tracking the way that significant numbers of people have used the Eclipse system. The information that can be gleaned from something like this is huge, and the number of people who are willing to participate in something like this continues to grow.

People, by and large, want to contribute to the advancement of technology, and this is one way that they can do so easily.

The larger the sample size that is tracked as they use a system like this, the more accurate the data will be. Smaller sample sizes may produce some distortions that don’t tell the entire story of what is happening, but a large group will generally make it easier to figure out what is truly happening.

After all, the more people that are contained within a group, the more variety you will get in terms of their actions.

The number of insights that may be produced by tracking people’s usage of open-source software is mind-boggling. The hope is that this data will help future project directors understand what kind of elements to include in their free open-source software and which things they can throw overboard.

The bottom line is that there are so many things to be thankful for as far as the information that will be received from a project like this, and it is worth the trouble of this massive undertaking to get the results that will eventually come out of it .


Leave a Comment