Archives For Featured Post

Our Editor’s pick for blog post of the week

Indicate Investigators is an app in the karmadata gallery that was created to search a “disease of interest” and return a list of top Investigators within that disease.  Now we have added the option of not only searching by disease but being able to search by drug class.

Drug Class-MonoWhen you open the Indicate Investigator app, right away you are able to pick either Drug Class or Disease.  For example, if you wanted to search for Monoclonal Antibodies (which is popular in treatment of cancer and autoimmune diseases), you would pick drug class and then start typing it in as it autocompletes.  You are also still able to toggle back and forth between drug class and disease once you are fully in the app.

 

The patient facing service created by karmadata wins an award for providing meaningful information to patients at the 2014 Health Datapalooza, a gathering of over 2,000 of the nation’s healthcare experts, which was held on Tuesday in Washington, D.C

WASHINGTON, June 5, 2014 /PRNewswire-iReach/ — karmadata (www.karmadata.com) today announced the launch of MyHealth.io, a website that provides patients in need of surgery the ability to find the best surgeon in their area based on each surgeon’s volume and the quality of his or her affiliated hospital. On Tuesday, myHealth.io received one of three financial awards from the Health Data Consortium at the 2014 Health Datapalooza, and the karmadata team had the honor of presenting MyHealth.io to over 2,000 healthcare data experts and patient advocates from around the country.

myHealth.io is our opening salvo in creating free tools for patients, putting them in the driver’s seat for making informed decisions that impact their own healthcare,” said Sean Power, founder and CEO of karmadata. “Each year there are millions of surgeries performed in the U.S. and most patients have absolutely no way to comparison shop for their surgeon. Most surgical patients end up accepting a blind referral, typically from their primary care physician, without having access to important information.”

myhealth.io_map

“The release of physician identifiable data from Medicare has changed all of that,” said Brendan Kelleher, Chief Data Scientist of karmadata. “We link surgical volumes by surgeon for each procedure to data on the surgeon’s hospital. This allows the patient to not only see which surgeon has performed the most procedures, but also specific quality ratings on the surgeon’s affiliated hospital drawn from patient surveys and quantitative performance metrics released each year by CMS. Now comparison shopping for a surgeon using important factors such as volume and quality is easy.”

“My job is to think about each patient’s experience on myHealth.io,” said Yesi Orihuela, Head of Design and UX of karmadata. “We built the site for healthcare consumers, not data or industry experts. Your journey on the site starts by entering your zip code. From there you are led step by step through a body map to find your surgery, a list of surgeons that perform it, and a map and data visualization that make it easy to identify and locate the surgeon that is best for you.”

About karmadata and MyHealth.io
karmadata is the world’s healthcare (big) data, simplified. Using big data and cloud technologies, karmadata is able to standardize and link the world’s healthcare data ranging from leading open data sources to private pharmacy and medical claims. karmadata created myHealth.io as a free service to patients to enable comparison shopping for surgical services and will expand to enable a broad range of healthcare consumer activities. Learn more by visiting www.karmadata.com, www.myhealth.io, or by following them on Twitter @karmadata @myhealthio

About the Health Data Consortium
Health Data Consortium is a collaboration among government, non-profit, and private sector organizations working to foster the availability and innovative use of data to improve health and health care. The Consortium advocates for health data liberation; promotes best practices and information sharing; and works with businesses, entrepreneurs, and academia to help them understand how to use health data to develop new products, services, apps, and research insights. Learn more at www.healthdataconsortium.org or @hdconsortium on Twitter.

KARMADATA SCREENSHOT
Users can identify over 7,500 global clinical trial sponsors, and over 500,000 management contacts (each with a business email) working at those sponsors.

BOSTON, Jan. 9, 2014 /PRNewswire-iReach/ — karmadata® today announced the launch of Sponsor Finder™, an innovative new tool for data scientists, sales and marketing professionals at Clinical Research Organizations (CROs) and other firms that sell products and services into the global clinical trials industry. Sponsor Finder™ enables linkages to the best data sources available, including management contacts from Salesforce Data.com.

Sponsor Finder™ provides detailed profiles on over 7,500 clinical trial sponsors, including Pharmaceutical, Biotech, and Medical Device companies. Each sponsor is linked to their active and historical trials enabling detailed searches and analytics by geography, size of company, disease, and number of active or planned trials. Users can follow sponsors (or drugs or diseases) of interest, and stay informed through their feed of new activity from more than 30 healthcare data sources. Finally, each sponsor has management contacts from Salesforce Data.com, making it easy to identify new names, titles, emails, phone numbers and addresses using quick filters on titles and levels.

“We created Sponsor Finder™ in response to overwhelming demand by our clients that provide products and services to support the global clinical trials industry,” said Sean Power, karmadata’s Founder and CEO. “Sales and marketing professionals at CROs are tired of being locked into complex, outdated tools with stale data. Our ® cloud provides a great responsive web user interface that allows for rapid integration of the best new data sources. We are proving that with technology, scale, and the integration of best of breed data from places like Salesforce Data.com, you can provide a much better information service at a significantly lower cost than any other provider.”

karmadata_puzzleHorizontal_large

Founded by Sean Power, previous founder of Infinata (BioPharm Insight®), karmadata curates and provides access to a linked version of the world’s open data sources in Healthcare, Legal, Energy, and other verticals.

BOSTON, May 7, 2012 – karmadata officially introduced its freemium website (www.karmadata.com) and API (www.karmadata.com/API) to the global data community at the Data 2.0 Summit, held April 30th in San Francisco. karmadata’s website enables users to find, visualize and share data of interest to them and their social networks. The karmadata API provides standardized linked data from the world’s data sources, allowing developers to design and build their own applications.

“The Data 2.0 Summit (www.data2x.com) was the perfect arena for us to announce our launch amongst so many creative and like minded leaders in the data industry. Our website will reinvent how professionals access and analyze data, providing access to 10s of millions of users that are blocked by proprietary corporate-only license fee models. Additionally, our API will provide simple, affordable access to standardized, linked data for app developers and system integrators to create things we couldn’t possibly dream of,” said Sean Power founder and CEO of karmadata.

karmadata was chosen as one of the top 5 data startups of 2013 to present during the Data 2.0 Summit. “karmadata’s vision and technology platform fit our 2013 conference theme of ‘Democratizing Data’ perfectly. We welcome their entrance and look forward to their disruptive activities in the business information space,” said Geoff Domoracki, co-founder of Data 2.0.

karmadata poised to disrupt the Professional Business and Information Services Industry

The current Professional Business and Information Services industry is a $100 billion + market, dominated by a handful of Big Information Vendors which utilize a corporate-only, up-front licensing model. This archaic standard effectively locks out tens of millions of would-be users and application developers who generally cannot afford the upfront costs of access. karmadata is utilizing innovative technology and cloud based scale, enabling a first-in-kind freemium pricing model for industry data.

How karmadata works

On a daily basis, karmadata processes high value open data sources, such as: PubMed, ClinicalTrials.gov, the FDA Adverse Events Reporting System, the USPTO databases, and many others. These sources are primarily semi-structured text or XML that are not linked to each other, and provide no standards for querying, analyzing or visualizing. We have processed over 100 million records to date, and have identified over 6.1 million Entities that are healthcare providers, populated places, clinical investigators, diseases, organizations, drugs and more.

karmadata visitors can create “Datacards”©, which are meant to be mini-blogs, telling a data-driven and visualized story that is personal for the author. These Datacards can be easily shared on social media, third-party websites or embedded in blogs or articles on the web. Additionally, each of our 6.1 million Entities has its own Poster, with visualizations and an index to all the world’s data where that person, place, organization, product or thing can be found.

About karmadata

karmadata is on a mission to standardize and link the world’s data and to provide easy affordable access to it, to anyone, anywhere in the world. karmadata is currently a small and enterprising team of dedicated and experienced professionals with a common goal of redefining the industry standards of how data will be resourced, collated and shared among industry professionals and setting the gold standard for future data providers.

Currently seeking partnerships and investors

karmadata is actively recruiting investment, data provider and technology partnerships. For more information please contact Sean Power (media@karmadata.com)

@Data2x Summit Live Blog

Y —  April 30, 2013 — Leave a comment

data2Summit

9:15am: Opening Keynote by James Strittholt

James Strittholt @data2xDataBasin.org

James Strittholt delivers the keynote at Data2.0 Summit on Climate Change through GEO data. James provided an amazing live demo of DataBasin.org in which he visualized the effects of climate change via interactive and configurable maps in realtime with high quality data.

9:50am: Heard a great quote

“90% of the world’s data has been added in the last two years”

10am: PANEL: From Climate Data To Technology Solutions

data2xdata2x

Great discussion by the panel and was very impressed by Daniel Goldfarb, Partner, Director of Design Research, Greenstart who provided great insights in the current data inustry. Below are a few quotes by Daniel:

“The amount of ‘dashboard’ startups we see is staggering, but simply having a lot of data is not a business model. There’s a need for actionable end points for data driven decision making.”

“Gamification is the worst word in our industry. It doesn’t do anything in most cases.”

“Did you know that some of the small utility companies contract outside firms to retrieve email address of their own customers?”

A great question posed by Daniel Goldfarb: “Which car type do you think will be more prevalent in the next five years, EV or Self Driving cars?”

10:50am: PANEL: Data Science and Algorithms-as-a-Service
With a QA format, I’ve found it easier to jot down the best answers heard during the panel. Data is heavy but algorithms are light. Best strategy to address this difference is to place the algorithm where the data lives. A new strategy is to “burn” the model (IF THEN statements) into the chips themselves. Do you think data as service replaces data scientists? Absolutely not. How to use data successfully and what questions to ask become very important. There are many companies and organizations out there who don’t know they have data problems.

Algorithmia: Interesting startup that provides a marketplace for connecting algorithm developers with companies needing solutions.

12pm: Crowdsourcing the Oct Dataweek conference
During lunch the lead organizers of Data2.0 asked all of us to suggest topics we’d like to be covered at the next Data2.0 conference. Of the ideas suggested, then voted on by everyone, Data Predictability was at the top of the list.

12:30pm: Disqus demo of Gravity

Disqus Gravity

12:40pm: Fibit demo’d their latest product, app and API
Fitbit API can be reviewed here

3:20pm: PANEL: Democratizing Data: A business, technology, and society problem
How do we democratize the power of data? On the question of what are the inhibiters to democratizing data, Bruno Aziza from SiSense provided fantastic insights. Bruno outlined a few top inhibiters:

  1. Price: It’s currently too high a price to gain access to the data
  2. There’s an imbalance with the cost of storage versus crunching the data. It currently costs $1M to crunch 1TB of data versus the very cheap costs to store it.
  3. Although complex, we shouldn’t take an elitist closed approach to analyzing data. We should enable the consumer to analyze on their own without the need for experts.

Diego Oppenheimer from Microsoft touched upon the need for education to the consumer. With the increase of easy to use tools being created, how do we reduce the risk of incorrect conclusions made by the user.

On the question of how can we make users more data savvy, Diego pointed out that the issue starts with the fact that data is not clean and thus un-appealing to users to even get started. Diego mentioned that Microsoft has taken a visual and explorer approach with their Data Explorer product offering.

4:10pm Top 5 Startup Pitch Event
Out of 20 startup applicants across the country, karmadata was chosen along with 4 other startups to present during the Startup Pitch event. The other startups include Algorithms.io, MarkedUp, Vertascale and Virtue. You can read about them here.

Sean Power presenting at #data2summit #startup pitch event.

4:50pm PANEL: Big Friendly Data: Making Big Data Accessible to Non-wizards
How much of their own data is the average organization using? Only 15%. Organizations today can improve their use of data by simply taking a closer look at their own data.

What is the holy grail? It’s being able to take any business problem, use the data you already have and work with your current resources/team to reduce the amount of time to market (within 30days).

5:20pm: Top Startup Announced
Algorithmia.io was selected as the top startup.

karmadata_wordCloudPuzzle

People are inherently social.  Facebook, Twitter, Pinterest, LinkedIn, Spotify, FourSquare.  The social media list goes on.  And while many people avoid social media for reasons ranging from privacy concerns (more on this later) to not wanting to know what everyone is doing 24 hours a day (everyone has a friend or two who are social media spammers), the overall popularity of these sites indicates an almost unquenchable thirst for socializing, sharing, collaborating, and interacting.

What Facebook is to friends and photos, LinkedIn is to colleagues and work connections, and Spotify is to music fans and music, karmadata is to data consumers and data visualizations.  So even while we are heads down, programming, and buried in code, there is always an overriding sentiment in the back of our minds: we are building something social, collaborative, and most importantly, fun.

Another form of online interaction is blogging (like this one), and making data visualizations into mini-blog posts is the inspiration behind where we are heading with our datacard design.  The idea is that each datacard tests or validates a theory, and the user can then publish their insights on karmadata.  We are trying to make datacard creation as personalized, interactive, and fun as possible.  That means creating custom titles, descriptions, x and y-axis labels, and anything else that our user community can come up with.  We do our best to provide users with the basics, but our vision is that our users will take the value of the datacards to another level.  That means that an auto-generated y-axis of “# of Clinical Trials” can be quickly altered to “# of Phase III Leukemia Trials”.  Editing filters, seeing how it affects the data, customizing the metadata.  All of this should be fun.

That’s the fun part.  The social part is sharing that mini-blog with your friends and colleagues, engaging in comments back and forth, and leveraging the expertise of each other to answer questions and solve problems.  Or it can be finding a datacard that someone else has already created that answers the same question that you have.  This sharing and collaboration is the first half of our namesake.  The idea is that you get out of the community what you put into it.  Sharing is good karma.

Now much like the person who wants to “stay off the grid”, we recognize that many data consumers will not want to share because they do not want others seeing what they are interested in.  Many pharma, in particular, have a paranoia about competitors knowing what they are up to, and in many cases these concerns are valid.  But since we want everyone to share, our philosophy is that if you don’t want to share, then you have to pay to avoid sharing.  In such as case, a company can pay for karmadata Plus to unlock functionality for everyone at their company to remain anonymous and closed off from the rest of the community (Plus users also receive other benefits like data download and upload of internal data but that’s a story for another day).

In any case, we ask you to share your ideas with us about how to make the site more fun and social (because that’s good karma).

From time to time we’ll highlight a data set on karmadata.  Today I’ll provide a quick look at the NIH RePORTER grants database.

The RePORTER database (which replaced the old CRISP database) “provides access to reports, data, and analyses of NIH research activities, including information on NIH expenditures and the results of NIH supported research.”  In other words, we get to see our tax dollars at work.

When looking at these data sets I’ll try to highlight what is great about the source data/website (I can’t just be complaining all the time), and then highlight the value that we’re able to add.

The data itself (provided in both csv and XML) contains the funding agency (NIH, NCI, etc), the organization receiving the grant, the location, the principal investigators running the study, a list of terms associated with the project, and the amount funded for the project.  The RePORTER website has some pretty nice functionality for aggregating and ranking by those different entities.  You can play around with that tool here.  You can even map the data and drill down to view grants awarded to different states.  Neat.  The greatest limitation is probably the fact that you can only analyze the data one fiscal year at a time, but overall it’s a pretty nice presentation of the data.

The first thing I look for when I get my hands on a new dataset is the potential entities that we can standardize to.  This was a fun dataset for me because of all the entities that can be teased out.  In addition to the aforementioned entities, we were able to match the terms list to drugs and diseases.  The RePORTER database also provides an ID for the principal investigators, but unfortunately, much like the reviewer ID from BMIS, it is not unique.  We consolidate those entries.  We consolidate different company names to resolve to a unique ID, and then we are ready to go: city, state, country, organization, principal investigator, drug, disease, and time.  A robust database for both building our entity profiles and creating cool visualizations.

Leading Organizations Receiving NIH Grant Funding

Johns Hopkins leads organizations receiving NIH grant funding

 

Some facts we have gleaned from the database:

  • Johns Hopkins leads the way in NIH funding since FY2000 (with more than $7.5 billion)
  • NIH funding increased steadily from 2000 until peaking in 2010 at $38 billion
  • Boston leads the way in funding over that time (score one for Boston in the Boston-New York rivalry)
  • NIH funding was not limited to the United States.  $5.4 billion were funded outside the US since 2000, with South Africa leading the way
NIH grant funding trend

NIH grant funding peaked in FY2010

That should give you a flavor for what you can do with the dataset.  Try copying one of my datacards and discovering your own insights.

Keeping Open Data…Open

Sean Power —  March 20, 2013 — 2 Comments

Data has meaning, and should inspire ideas and action, the same as words can and often do.  Those meanings should be clear and easy to find AND understand, by anyone, anywhere in the world.  Society needs to encourage and adopt a data-driven approach to everything, from the impossibly complex (think global warming or healthcare costs), to the commercial (efficiently targeting your customer base), to the much enjoyed social (debating your friends and colleagues on Twitter, LinkedIn or Facebook).  Everyone knows “a picture says a thousand words”; well so do facts, and more so facts that have been attractively visualized and easily shared.  In a world where we are inundated with marketing messages touting the use of “Big Data” and “Open Data”, sharing and visualizing data seems like no-brainer.  If only that were true.

You see, there is a dirty little secret that the Big Data vendors and Open Data zealots won’t tell you, but we will.  Big Data is not open, and Open Data is just as closed.  What’s that you say?  It seems plausible that certain piles of Big Data are not open to the public, think patient health records, or detailed banking transactions.  We call this the Big Data Anonymity Problem, and we think we have a solution for it (more on that later).  But how can Open Data not be open?  I mean the word open is in its title!

This is where we reclaim the true meaning of the word Open as it relates to Data.

Issue #1: Open is not Free

I am sick and tired of the wrong headed association of the word Open with Free.  Let me set the record straight: nothing, and I mean absolutely nothing, is free.  There is the appearance of free.  Google seems free, but you are trading information about yourself to advertisers in exchange for the best search engine in the world, and let’s face it when you need to find a restaurant when traveling, you don’t care about the nuance.

wikibudget Wikipedia seems free, but most people don’t know that the online wiki has a $28 million annual operating budget and depends on individual and corporate donations (Google contributed $2 million in 2010) to survive.  

Downloading data from the U.S. Government seems free, but every time you open your paycheck, take a look at that hefty federal and state tax that is paying for the collection and dissemination of data (the National Library of Medicine has spent $3.2bn over 10 years in publishing Public Data, and that is just one example from the U.S. government, there are thousands of other examples from the U.S. and around the world ).

None of these examples are free, they just employ different revenue models (advertising, donations, taxes).

You see, in order to do a technology thing right, it takes resources, and lots of them. Programmers are stubborn like that, they need to get paid (yes, they have mortgages, and college tuitions, and car payments just like you and me).  And last time I checked, the folks at Microsoft, Oracle, Dell and HP aren’t starting to give away their software, servers, storage arrays for the common good!  And while the marketing guys keep laying it on thick touting the “Cloud” (doesn’t it sound nice and fluffy?) the actual data centers and hardware that make the “Cloud” go aren’t going to be free any time soon (like ever).  Read Richard Stallman’s (see GNU project and Free Software) take on this when he says,

“Think of free as in free speech, not as in free beer”.

I love this quote. Richard and I diverge on the best way to fund technology innovation, but for sure his heart is in the right place.

Issue #2: Open really means Public

I also reject the Open Knowledge Foundation’s interpretation of the word Open, partly because of their improper association of the word Open with Free, but mostly because what they are really talking about is Public Data – and Public Data does not always meet my definition of Open (see below).  At some point someone in a meeting decided to replace the word Public with Open, and that was a mistake.  Public Data really says it all: it is data that is owned by the Public because it was (wait for it…) PAID for by the citizens.  Any government or regulatory data falls into this category, anywhere in the world (there are some economic ethical issues around one country’s citizen paying for another country’s access to its data, but that is a topic for another blog post).  Public Data also includes any Private Data whose owner decides for one reason or another to release or publish to the Public (think press releases or public domain websites).

Taking Back the Word Open as it Relates to Data

In my book the word Open, as it pertains to Data, means:

  1. Accessible and Useful.  And no you Open Data zealots, a zip file of XML formatted records is not easily accessible nor is it useful …. I mean easily accessible and useful to folks that are not computer programmers.  I want end users around the world with the familiarity of just a real web browser (and IE < 8 is not real) to access the highest quality data that exists, for
    • Free (Basic accounts: advertising model…thanks Google!);
    • Cheap (Premium Accounts: as low as $9.99 a month); and
    • Fair prices (Plus accounts: starting at $2,500 a year, scales based on organization size, for those that need anonymity and data to be integrated into their workflow and systems).
    • Our freemium (Basic, Premium) accounts require no corporate subscription, as we are going direct to the end user (heads are exploding in the board rooms of Thomson Reuters, McGraw-Hill, Informa, Bloomberg and the like as you are reading this).
  2. Standardized and Linked.  You can almost stop reading here.  Until data is standardized with all of the interesting entities (like companies, people, products, countries, cities, etc..), it is really quite useless.  Standardized data is intelligent data.  Standardized data can be linked to other interesting data sets, allowing you to see the entire picture about a person, place or thing.  You can build alerting systems off standardized data.  Standardized data can be analyzed and visualized.  Standardized data is the s***.  Without standardized data, you don’t have data, you have a big pile of goo.  And by the way, even you computer programmers out there that can deal with the XML and the parsing the database normalization and indexing, will quite obviously appreciate and value standardized data so much more.
  3. Searchable.  It sounds obvious, and it sort of is.  Until you realize that critical Public Data sets like the FDA’s Adverse Event Reporting doesn’t have a search interface.  Wow.  We believe that even structured data needs a simple, single text box, search.  If I want stuff on China, or Pfizer, or Pancreatic Cancer, I just want to type it and go.  Yup, we have that.
  4. Query Ready (it can be analyzed and aggregated).  Data needs to be queried, like a dog needs to be walked.  The data is just begging for it.  Dynamic query engines are gnarly to build, and we have a great one for you to use.
  5. Visualized.   Facts are cool, we love facts.  And sometimes all you want is just the facts ma’am.  But nothing makes your point for you like an awesome visualization, and we are dedicated to helping you build beautiful visualizations.
  6. Easily Shared.  Last, and most important, data needs to be shared.  And in order for that to happen, it has to be easy to share.  If it is not easy to share, it isn’t Open.  Data needs to be social, and portable, and re-usable.  Your friends and colleagues should be able to build on what you started, copying, editing and enhancing to suit their needs.  This is the karma behind karmadata.

And Public (sigh “Open”) Data fails miserably on most, if not all, of these points.  Some Public Data is better than others, but few are great and none are linked to each other.  I started karmadata to help fulfill the promise of Open Data (and Big Data…and Private Data!).  Stay tuned, another blog post is coming on that pesky Big Data Anonymity Problem and our ideas on how to open private data up.