Saturday, 17 October 2015

The Medical Graph: Not Just An App: The Front End Of The Trillion-Dollar, Full-Stack Revolution

I've been thinking about this for some time, but conversations with Colin Simpson of Triscribe, Stephanie Zhan of Sequoia, Robert Leong of KallDoc and Vinod Khosla,  have made me come up with the concept of the Medical Graph as a dataset by analogy with Zuckerberg's concept of social graph.


So when you look at concepts like KallDoc what you are looking at is Not Just An App: it is in fact The Front End Of The Trillion-Dollar, Full-Stack Revolution the access point to patients at the point of primary healthcare delivery. The thin end of the wedge if you like.

As you remember, the social graph in the Internet context is a graph that depicts personal relations of internet users. In short, it is a social network, where the word graph has been taken from graph theory to emphasize that rigorous mathematical analysis will be applied as opposed to the relational representation in a social network.

The medical graph, by analogy would be a multidimensional graph (and more importantly the underlying data) depicting medical relationships not just of a single individual, but of groups, pathologies, treatments, outcomes etc and encompassing both deep personal data and epidemiological data. Again the word graph has been taken from graph theory to emphasize that rigorous mathematical analysis will be applied as opposed to traditional medical methodologies like case studies (which would be a single vector on such a graph).

Currently much of the information that would form the Medical Graph exists but is not connected such as current (2o15)  electronic medical records (EMR) as well as the corpus of medical literature and more global data such as the data derived in the Human Genome Project (HGP) and data sets such as the Icelandic Health Sector Database (HSD).

Ethically it is likely that "ownership" of personal data on the medical graph data will be primarily by the individual patient but to participate and benefit from shared sets patients will need to take part in sharing arrangement. Of course  interpretation of big data is a key part of the medical graph and the key value will be in the combination of algorithms, data and insights, as is always the case.

A number knowledgeable people have told me that the Medical Graph market is not big enough to excite their interest, much as I was told that downloadable music and video was too small a niche by a number of angels and VCs funds, but as before I beg to differ.

Friday, 16 October 2015

Natural Philosophy Society NPS Oxford 30 Years On

It's now the 30 years since the Natural Philosophy Society was formed at Oxford University. It was an "experimental philosophical club" run weekly (latterly bi-weekly with a weekly speaker meeting and a second discussion meeting) and styled on the  Oxford Philosophical Club which it succeeded.

Sunday, 11 October 2015

Characterizing the Google Books Corpus: Mitigating the effect of putative influencers who have a low dissemination level.

Pechenick, Danforth & Dodds PS recently suggest [1] that treat frequency trends from the Google Books data sets as indicators of the "true" popularity of various words and phrases and that a single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. They say that this call into question the vast majority of existing claims drawn from the Google Books corpus.

I'd like to humbly suggest a new methodology - that such claims are weighted by the citation index of the book which should mitigate the "unread influencer" syndrome. I.e. Each occurrence in the frequency count is multiplied by the citation index of the publication or book in which the phrase occurs. If it is never cited it does not count. This should correct for the prolific, unread (or at least uncited) author.


Post Script: "Neuroskeptic" observes quite rightly that the "books dataset" as expressed in the NGram viewer does not contain this information. The primary data are, of course, the books themselves rather than the "books dataset", and, as is often the case, we have to go back to primary data rather than a flawed subset of the data.  At the very least, the books dataset could be used alongside Google scholar - hardly a great challenge for serious work.