Yesterday, I touched on some of the problems of creating a pcap repository. I explained how you need to market the site to vendors in order to solicit their contributions, but obviously you need more then corporate donations to make the site successful- you need individuals. So how do you motivate the masses?
First is realizing everyone loves a bandwagon… meaning, the best way to get contributions is already have them. The simplest way of doing this is scouring the web and finding publicly available pcap files and uploading them yourself to the site. By including, categorizing and organizing all the freely available pcap files in a single site, you encourage people to visit your site by becoming the best resource on the web.
Secondly, providing incentives. Show prominently (front page) the top 5-10 contributors in terms of number of pcaps, bytes, most downloaded and rating. I’m always amazed by what lengths people will go to show up in the top 10 of SETI@home and the like, so use humanity’s natural competitive nature to encourage contributions.
Third, make it easy by removing roadblocks. The less effort people have to put forth to contribute, the more likely the will. Make it easy & quick to register (OpenID maybe?). Don’t require them to provide detailed descriptions or provide extra metadata. These sort of things should be done by the moderators or automatically by the system (remember pcaps are structured data). Yes, there are potential security concerns, but be creative and deal with them.
Removing roadblocks also fits in other ways as well. It means having an easy to use website which has fast load times and is enjoyable to use. It should be powerful enough to allow people to quickly find pcaps relevant to their interest(s), but also allow people to easily browse the repository. Having an RSS feed of recent additions is an obvious no-brainer. As your site grows, it becomes impractical for people to download each pcap individually, so come up with creative ways to allow them to get them all at once/groups.
Another major roadblock is a site with bugs, it gives people a bad first impression and makes it less likely they’ll return or contribute. So make sure you have a staging environment where you can test new features and get a few trusted users on your Beta Team to find & squash bugs quickly. Also, if you’re going to use a bug/feature tracker, it’s important to show progress over time. If people see a bunch of bugs sitting there without being acted on, they’ll be less likely to submit their own.
Fourth: marketing. People need to know about your site in order to use it, so you need to market the site to your prospective user base. I can think of two free & effective means of doing this: Sending emails to relevant mailing lists and asking websites where these people visit regularly to add a link to your site. Exchanging links with sites that host tools which deal with pcap’s should be a top priority. Also you may find some of these other sites are willing to partner in other ways (maybe BitTorrent seeder for large pcaps?).
As you can see, making a site like this successful isn’t easy. It takes a lot of work and dedication, especially because a site like this is unlikely to see a huge ramp up in users over night, but rather a slower natural growth overtime.
Hi, sorry for the late response, I only found about this site today.
Anyhow, just wanted to mention wireshark’s sample pcap files, which is quite extensive (more extensive than openpacket’s library).
The main obstacle, imho, is privacy issues. Most people would think twice before uploading a dump of their network traffic. There are too many ways sensitive data may leak this way. For example, no organization would gladly submit samples of their exchange traffic. Most would avoid publishing their IP address ….
Just my two cents,
rouli
Excellent points Rouli. I’m still waiting to see if the OpenPacket team can import the Wireshark sample pcaps. Having a single repository helps everyone I think.
As for privacy… Obviously companies are afraid of secrets getting leaked. One of these days someone will be able to explain to me this concern people have with publishing their IP addresses… do people really think this matters? Between DNS, BGP, whois and good ol’ traceroute, people need to realize that IP addresses aren’t private.
But clearly there is information in the packet payloads which is confidential like emails, passwords, documents on SMB/NFS shares, etc. Honestly, I don’t have a good answer for this- other then hopefully you can find people with QA/Test labs who are willing to upload captures of these networks because they don’t contain any confidential information.