[Orechem] Amazon Exposes 1 Terabyte of Public Data to Developers - ReadWriteWeb

Carl Lagoze clagoze at gmail.com
Wed Feb 25 21:28:51 EST 2009


Anybody know what the "huge amounts of chemistry data" is?




ReadWriteWeb ReadWriteTalk Enterprise Jobwire
About Subscribe Contact Advertise

RSS
RWW Daily by Email


RSS
RWW Weekly Wrap-up



Home Products Trends Best of RWW Archives

Amazon Exposes 1 Terabyte of Public Data to Developers
Written by Marshall Kirkpatrick / February 25, 2009 5:26 AM / 15  
Comments « Prior Post Next Post »
Amazon.com changed the retail world. In the process the company built  
up so much surplus computing power that it started a dirt cheap  
"computing in the cloud" business that changed the computing world.  
This week the company's newest project Public Data Sets on Amazon Web  
Services began offering more than 1 Terabyte (1000 GB) of fascinating  
public data for developers to access on the fly through Amazon's cloud  
computing service.

We're talking about an annotated collection of all publicly available  
DNA sequences, including the Human Genome, huge amounts of chemistry  
data, machine readable encyclopedic entries about millions of  
different topics and an entire dump of Wikipedia. US Census data, data  
from the US Department of Transportation and more. It's all accessible  
by web applications in no time at all. What do you think this is going  
to change?

The company made a blog post last night announcing the availability of  
four new public data sets.

This includes data from:
The Bureau of Transportation Statistics.

DBPedia Knowledge Base - which "currently describes more than 2.6  
million things including 213,000 people, 328,000 places, 57,000 music  
albums, 36,000 films, and 20,000 companies." All in handy semantic  
markup.

The Freebase Data Dump - the giant collaboratively build semantic  
database on a wide variety of topics, data that high profile startup  
Metaweb has spent millions of dollars assembling.

The entire English section of Wikipedia, dumped into a machine  
readable format.

A number of large genetic and scientific databases.
We counted all the databases up and it passed 1 TB of available data.  
The company says that accessing this data is "trivial" for developers.

What are developers going to do with this data? We can't wait to find  
out. The prospect of mashing up, cross referencing and user  
interfacing with this amount of data is nearly unfathomable. Really.  
This data will be leveraged by all kinds of different web  
applications, for a long time.

You've read, or can imagine, the impact that the first Public  
Libraries had on human culture. Now imagine the opening up of not just  
this, but other libraries of data, so huge that economies of scale  
blast the project off beyond any analogy that could be drawn with our  
everyday experience or historical memories. It won't just be Amazon  
that offers up this kind of data - it will be relatively commonplace  
soon, we imagine.

It will be like a network of libraries - for robots. Robots that go to  
the library frequently, read very fast and make serious use of what  
they've learned.

Congratulations, Amazon, on passing 1 TB of public data made  
available. May all our robots of the future please live in peace.

« Prior Post Next Post » Posted in Amazon, Features, NYT and tagged  
with amazon ec2, apis
Comment Subscribe Email This Print This Digg	 Share
Related Entries
Google Announces Pricing for App Engine: Allows Developers to Scale  
Beyond Free Quotas
NYTimes Exposes 2.8 Million Articles in New API
New, Improved Bit.ly Plugin Adds More Functionality to Twitter
Amazon's New Management Console Makes Setting up a Server in the Cloud  
Easy


0 TrackBacks

TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/10429


Comments
Subscribe to comments for this post OR Subscribe to comments for all  
Read/WriteWeb posts

Are you sure its 1TB? Just Wikipedia by itself would make up a  
significant chunk of that space, and since it includes dbpedia (thus  
duplicating most content)... Terabytes are cheap these days. I have  
two TB of data just on my home desktop PC.

Posted by: Nate | February 25, 2009 7:47 AM



You apparently have to have an EC2 account and active instance to be  
able to get access to the data, so for the mean time there is a cost  
attached to getting hold of the data.

Posted by: Stuart Marsh | February 25, 2009 8:13 AM



to be honest marshall - you might change that title - when i saw it i  
thought "oh my amazon just got hacked and my info is public!"

not sure if others got the same idea

Posted by: Allen | February 25, 2009 8:22 AM



Amazon is a 'Cloud' Pioneer and I appreciate them for offering these  
datasets to the public. They have a potential to have a huge impact.

I recommend that there be a public information release on how often  
Amazon plans to update these files.
Posted by: Tecue | February 25, 2009 8:59 AM



To be honest, I thought the title was speaking of a vulnerability as  
well. Had to read the article twice to make the connection.

And I'm a user of AWS as well.

Posted by: mtranda | February 25, 2009 9:22 AM



Isn't terrabyte is supposed to be spelled terabyte?

Posted by: Bob Ohsiek | February 25, 2009 9:33 AM



Bob - thank you, you are the only real friend I have!

Allen - I would have thought the words "customer data" would have  
given you that impression. But I'll edit the headline.

Posted by: Marshall Kirkpatrick  | February 25, 2009 9:44 AM



Same comment about the title. "Releases" or "Publishes" instead of  
"Exposes" would be less confusing maybe

Posted by: Ozh | February 25, 2009 10:29 AM



It's great that Amazon is making all of this data available via EC2  
but the data has always been available for developers that were so  
inclined to use it.

These are public data sets already distributed by the respective  
organizations - from what I can tell this is just clustered to add  
value to Amazon's Web Service offerings.

Anyone have any sense on whether Amazon did work to mark these up  
better, cross reference them, or added any particular value besides  
exposing them from within EC2?

  Posted by: Christian  | February 25, 2009 10:48 AM



Please go to http://blog.infochimps.org/2009/02/06/start-hacking-machetec2-released

It will show you which AMI you need in order to access these new  
datasets. Unfortunately, Amazon is a little light on the details in  
terms of accessing the datasets they just published.

  Posted by: Allan  | February 25, 2009 11:28 AM



Title is scary, but after reading it, feel much better now.

Rex

Posted by: Rex Dixon | February 25, 2009 11:58 AM



Yes, boo to the misleading page title.

Posted by: exposer | February 25, 2009 1:30 PM



This system should provide a lot of good fodder to make some  
interesting mashups. Heck you could probably build an entire self- 
contained system just using Amazon products exclusively at this point:
-this service for the raw input
-EC2 and S3 for crunching and storing data
-Mechanical Turk for recognizing patterns in the output analysis

On another note, coincidentally today we released the JumpBox for  
SnapLogic which is essentially an Open Source "Yahoo Pipes" system. I  
recorded a demo video that helps people get started with it. You don't  
have to be a developer anymore to make mashups:

http://blog.jumpbox.com/2009/02/25/introducing-snaplogic-for-data-integration/

Sean

Posted by: Sean Tierney | February 25, 2009 1:48 PM



I understood what he meant from the headline immediately... "expose"  
has a bad rap, apparently. Published, released, etc would be  
inaccurate- Amazon is only providing easy access to the info which is  
freely available elsewhere. "Expose" is exactly the right word for that.

  Posted by: Evan  | February 25, 2009 2:17 PM



Totally read the same thing. I thought for sure, and had to do a  
triple take, that Amazon had been hacked.

But, no, this is VERY VERY interesting.

And not at all frightening like the usual digital security issues that  
we are bombarded with each and every day.

You ever read http://www.justaskgemalto.com? Anyway, it's not fun.

But yes, the cloud is very interesting.

Posted by: Janet Altman | February 25, 2009 3:31 PM



Leave a comment




Sign in to comment on this entry. (Optional)
Name

Email Address (required)

URL

  Cc. this comment to FriendFeed
  Remember personal info?
Comments (You may use HTML tags for style)



RWW SPONSORS


Build Your Own Wiki   clearspring.com   flash   iPhone Application  
Development   iphone security   Me2day   mobile   New Blog Traffic    
personalized news   qik   remote access   semantichacker   social  
media classroom   social media site list   Twitter Anti Spam Bot    
ubiquity firefox   vimeo   wholesale   wiki   www.myspace.com
Grab this swicki from eurekster.com



RECENT JOBS
.Net Developer
Rochester, NY
Senior Java Developer
New York, NY (Tri-State)
Senior Application Developer
San Francisco, CA (telecommute)
MACH Energy
Sr/Lead Engineer
San Francisco, CA
Motally
Java Application Developer (Investment)
Cincinnati, OH
InfraStaff
VB.NET/ASP.NET Application Developer
Las Vegas, NV
Quality Assurance Engineer (791 - 38633)
Atlanta, GA
ASAP Staffing LLC
MORE JOBS >
POST A JOB >
POWERED BY JOBTHREAD


POPULAR TAGS
google facebook twitter iphone microsoft search mobile yahoo social  
media music video social networking apple myspace semantic web trends  
advertising rss mobile web youtube friendfeed amazon blogging  
enterprise firefox data portability android politics social networks  
digg lifestreaming security    marketing adobe apps app enterprise 2.0  
privacy email startups api web apps news obama browsers cloud  
computing gmail chrome open source web 2.0

TEXT LINK ADS

Want to buy text links on ReadWriteWeb?



RWW READERS

Recent Visitors


You! Join Now.
Martina Stewart
gwthompson
John D
VoterSavvy
Koufie       See all 9,828 members...
Grab This!MyBlogLog


Home | Products | Trends | Company Index | Best of RWW | Archives
ReadWriteWeb | ReadWriteTalk | Enterprise | Jobwire
About | Subscribe | Contact | Advertise
© 2003-2008 ReadWriteWeb


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.openarchives.org/pipermail/orechem/attachments/20090225/b1d477c9/attachment-0001.htm


More information about the Orechem mailing list