Machine learning in schizophrenia genomics, a case-control study using 5,090 exomes

“The supervised ML algorithm, gradient boosted trees with regularization (eXtreme Gradient Boosting implementation) was the best performing algorithm yielding promising results (accuracy: 85.7%, specificity: 86.6%, sensitivity: 84.9%, area under the receiver-operator characteristic curve: 0.95). The top 50 features (genes) of the algorithm were analyzed using bioinformatics resources for new insights about the pathophysiology of SCZ. This manuscript presents a novel predictor which could potentially enable studies exploring disease-modifying intervention in the early stages of the disease.”


Any research on schizophrenia is very welcome.
Thank you for posting, @twinklestars

1 Like


Re: genomic tests

1 Like

That’s a massive increase from what I’ve seen previously! I believe I said something a while ago about being very skeptical of the possibility of this happening soon.

Did they use case history data as well as genetic data? I’ve turned off my computer and I have a cold and a fever so I don’t want to make the effort and check… :roll_eyes:
If they didn’t use case history data then I’d assume they are probably about as good as they’ll get with purely genetic tests now. At least until they get to the epigenetic part. And they could improve the tests a lot by adding case history data.

1 Like

I’ve only read the abstract, but it appears they achieved this level of accuracy soley based on genetic information.

" …file containing all cases and controls, the names of genes with variants meeting our criteria, and the number of variants per gene for each individual, was used for ML analysis. The supervised machine-learning algorithm used the patterns of variants observed in the different genes to determine which subset of genes can best predict that an individual is affected."

It’s pretty spectacular. One would think especially for identifying candidates for early intervention.

Possibly the remaining 15% or so of cases it did not detect could be due to epigenetic factors, like childhood adversity, maternal immune activation, drug use, etc.

Yeah, one would think combining this with brain scans, interviews, and in the future, perhaps proteomic testing, you could get some really accurate results.