Posted on May 15, 2017.

I had this idea a while ago for leveraging the blockchain in order to build, store, surface, and monetize the largest fully sequenced genomic data set in the world.  The premise was that the economics and incentives are not in place for each stakeholder that would need to participate to do so today, but that a blockchain protocol and corresponding coin design could be written to provise for the terms by which participants could and would contribute sequenced genome data, pay/earn economics, attain/grant data access, and perhaps even communicate/participate in higher value interactions (i.e. clinical trials, EMR metadata transmission, etc.). The participants would be healthy people, patients, researchers, pharma companies, hospitals, and anyone else with an interest in advancing genomic research at the system level, monetizing owned genomic data, or query/access the largest repository of said data in the world.

I never fully flushed it out, nor do I really plan to, but it just seems cool to me to be able to anonymously post my sequenced dna to the blockchain, let researchers, computer scientists, or care providers query mine and many other peoples information blindly, and then potentially earn coin(money) when my data contributes to a party that stands to gain economically from my raw information, but only when it’s appended to the metadata of my lifelong EMR, and/or my willingness to participate in some clinical trial, etc…and it just seems obvious to me that all the actors in the system are not willing to finance this collective dataset b/c it’s not in their individual interests to do so without some attribution and downstream economics protocol clearly defined, objectively self governed and authenticated, and organized in such a way that system level gains represented economic gains for parties that pay it forward (and thereby speculate) on the future value of this collective effort.

There are a bunch of rough edges, but I think you can work a system like this out. I also think the speculative nature of financial capital into the ICO world would serve to smooth out some of the economic speed bumps that are gating enterprise from taking this task on, and I think that even with “small levels” of initial genome contribution/participation by information technology data set standards, the aggregate asset would represent a near term market value in a world where clinical trials are run only at the the tens or hundreds of patients scale, and where large / the largest genomic datasets merely exist in the tens of thousands or maybe 6 figure patient realm.

I know it sounds a little pie in the sky to think that there is genuine convergence available at the nexus of blockchain technology, large scale data processing / AI techniques, and genomic research / application…given that these are three super sound-bitey forward tech areas…but I am certain that our collective genomic data set is way too small to apply modern data analysis techniques for progress in the near and out markets within genomics, I am certain that there is an economic and societal alignment within reach on a long enough timescale with baked in trust between all participants, and I am certain that the the shape of this project is a great fit for a coin based project running on the blockchain for too many reasons to write about in this post.

I’m not gonna do anything with this idea, so if you like it, take it, make it happen, and break me off an allocation of coin hardcoded into the first block on the chain.  Pasting some of my past notes below.  Maybe they are incomprehensible, but if you have been thinking in these spaces, I think could be helpful.

If anyone is working on anything in this area, email me, I’d love to invest pre-ICO and help get the project to market.  If this post is inspiring and you decided to start working on it…same deal.  Lot’s to figure out, totally worth it!


Blockchain based genome database

1) sample of biz opportunity, albeit not my preferred approach:

2) earn coin by posting genome, earn coin by showing up in search results, earn coin by allowing contact/communication, earn coin by participating in trials

3) non-economic incentives: all good coin projects, some participants don’t act in a zerosum economic way because of ideals. here it would be advancing cancer therapy.

4) patients who had it done in course of treatment get excess income for free (insurance co or they paid already. similar to sharing excess compute resources.

5) pharma companies and researchers pay buy the coin and spend it in exchange for querying the database, contacting the patients, and ideally assembling trials.

6) patients or contributors may have access to query it for free

7) speculators bet on increased value of the largest genomic dataset in the world and increased volume of genomic research and targeted treatement plus other verticals

8) sequence is posted anonymously with report behind a paywall maybe? can we authenticate sequence on chain?

9) sequence acts as pubic id and is discoverable/analyzable without patient info, EMR metadata. demand side can identify genomes of interest without knowing anything about identity of patient. Raw string is good input for comparison analysis, etc..

10) Start with cancer genomes to seed database, but there is value in “normal genomes” and analysis of genome before cancer is expressed. it will become standard of care to sequence entire genome for every tumor, but currently insurance doesn’t cover if no targeted therapy available for that type (chicken and egg).

11) blockchain is great solution for chicken and egg problems because speculative capital can spread over early marketplace to allign incentives pre-liquidity

12) network effect. the more people who post their genome, the more valuable the dataset is to query and the more valuable the coin becomes

13) There is no incentive for an individual insurance co to finance a genome w/out direct care application, but there is an incentive for a consortium/all to chip in together. similar structure to private chain consortiums in fin services between banks

14) the equivalent of sponsored content in genomic sequencing is interesting. definiing a “sponsor” role in the protocol could be meaningful. certain rights, and downstream economics attached? maybe sponsoring sequencing is a way to speculate/earn coin (rev share). financial speculator as opposed to financing from end consumer/user of data (pharma for example).

extra notes

  • figure out financing of widespread full genome sequencing in healthy patients/people
  • apparently data storage is an issue given size of dataset…solvable on chain via storj like solution? any benefit to integrating that job into protocol vs using storj or another 3rd party layer in the stack?
  • is there a capacity constraint on full genome sequencing if volume rises abruptly? who serves this market as 3rd party?
  • genecoin “store you DNA on the bitcoin blockchain”…early project, pre appcoin it looks like
  • genome rights management is an interesting question
  • michigan pool:
  • genome rights management:

