A talk with Dr. Neguine Rezaii, a psychiatrist and psychology researcher, about her team’s 2019 research using machine learning finding speech patterns in young adults that were predictive of later psychosis and schizophrenia diagnosis. The two language patterns found in the subjects’ speech were 1) a low semantic density (i.e., low meaning), and 2) speech related to sound or voices. Here’s a good article about this work: Machine learning approach predicts emergence of psychosis.
A transcript is below.
Links to this episode:
Topics discussed include:
- How exactly they determined “low semantic density” (low meaning in speech content)
- How the algorithm found, on its own, indicators related to sound-related speech content
- The future of using machine learning and automatic diagnosis tools in psychology and therapy
- Theories that might help explain these findings
Content mentioned in podcast, or related content:
- The whisper of schizophrenia: Machine learning finds ‘sound’ words predict psychosis
- NIH article: Language patterns may predict psychosis
- Great book on schizophrenia: Hidden Valley Road, about a family that had 6 boys end up diagnosed with schizophrenia
- Work by Elaine Walker (part of Rezaii’s team for this research) finding behavior in children in home videos linked to later psychosis
- Information about how sensory gating issues may be related to schizophrenia
- Vice article about how people born blind don’t get schizophrenia
TRANSCRIPT
[Note: all transcripts will contain some errors.]
Zach: Welcome to the People Who Read People podcast, where I talk to people from various walks of life about how they analyze and predict human behavior. I’m your host Zachary Elwood.
In this interview, recorded on July 10th, 2020, I interview Dr. Neguine Rezaii, a psychiatrist and psychology researcher.
In 2019, in the journal NPJ Schizophrenia, Dr. Rezaii and her colleagues Elaine Walker and Phillip Wolff published a paper entitled “A machine learning approach to predicting psychosis using semantic density and latent content analysis.”
In that paper, they describe their work using machine learning to analyze the spoken speech of young people considered prodromal for psychotic symptoms: “prodromal” in this context means showing some behavioral indicators that are associated with later schizophrenic or psychotic symptoms. Their machine learning algorithm was able to predict with high accuracy which of the subjects would go on to develop psychosis. There were two indicators in the subjects’ speech that they found:
One was that the subject’s speech had “low semantic density.” In other words, the subject’s speech did not contain much meaning, word for word, compared to more normal speech.
The second indicator was that there were a greater-than-normal percentage of words pertaining to sound. For example, a subject talking about hearing whispers, or voices, or really any auditory-related word.
We did have some technical difficulties during this interview, so apologies for the audio problems. If you enjoy this podcast, please leave a rating or review on the platform you listen on; it’s much appreciated.
Okay, here’s the interview.
Zach: Thanks for coming on Dr. Rezai.
Neguine: Sure. It’s a pleasure.
Zach: Very honored to have you on here.
Your work is very interesting. I’ve been reading multiple articles about it. Let’s start out with, um, if you had to sum up in your work, uh, this work in a few sentences, how good was the algorithm at at predicting uh, future psychotic, schizophrenic episodes?
Neguine: This algorithm takes into account two important features of, um, schizophrenia.
One of them is poverty of thought, and the other one, vague or implicit reference to voices or abnormal auditory perception. Having these two uh, variables in the [00:03:00] equation, we can get an accuracy of 93. Percent in predicting, um, schizophrenia after a two year follow up.
Zach: And how young were the subjects?
Neguine: These patients were mostly teenagers and um, that was the goal of the Naples study. Uh, north American Prodromal Longitudinal Study, which is a nationwide, um, program across seven sites. And they follow up young adolescents who have the genetic pool for developing, um, schizophrenia. Most early because of the fact that they have a relative with schizophrenia, uh, first degree relative.
The goal is to follow up these, uh, young individuals when they’re very young so that they can follow them up and, um, consider as many variables as they could.
Zach: So were these patterns that the algorithm found in these, uh, [00:04:00] subjects, were these things that a human could find or notice or were they too subtle for, for humans to notice them?
Neguine: That’s a very good question. Yes and no. For this specific population, no, because they’re at prodromal or, um, very, very early stages of, um, becoming psychotic. So the signal is. Too vague and subtle for, um, a human observer to appreciate. And the yes part goes to the fact that once it becomes full-blown schizophrenia, then the clinician will be able to observe those because the patient will clearly talk about auditory hallucinations, but it becomes much more tricky.
But it is like years before the frank psychosis. Hmm.
Zach: Right. I see a, I saw a quote from, from you in an article about it where it said, you had said, uh, trying to hear these subtleties and [00:05:00] conversations with people is like trying to see microscopic germs with your eyes. The automatic technique we’ve developed is a really sensitive tool to detect these hidden patterns.
It’s like a microscope for warning signs of psychosis. So, yeah, that, that’s, that’s interesting. And, and now I guess we can talk about the specific, the two, the two specific indicators now and, and what those actually look like. So maybe first you could talk about the, the semantic density, uh, pattern and how that actually sh, uh, shows up and how the algorithm would detect that.
Neguine: So, semantic density, broadly speaking, um, it basically measures poverty of thought. So we as clinicians or I, during my residency, had a lot of experiences where I talked to a patient maybe for about an hour and the patient really talks. So there is no paucity of speech. But by the end of the one hour interview, uh, I realized [00:06:00] that I understood nothing.
I got nothing like it boiled down to nothing at the end of the, um, conversation, even though they talk. And that had shown to be a very, uh, difficult, um, quality, um, to be measured. So this algorithm tries to, um, measure it. Um, the way it works is that we first need to turn the boards, which are qualitative, uh, measures into something quantitative.
In order to do that, we use some of the techniques that are out there, different types of language models developed by various companies. The, the one that we used, uh, was, um, developed by Google, a language model called Word to vec, which basically transforms words into vectors. The way it works is that I’m just talking about how War two, [00:07:00] uh, war two VEC works, is that it takes, um, as input a very, very large corpus of language samples.
Usually they use Wikipedia or our case. We used New York Times, like 25 years of New York Times, and then they just measure what word usually accompanies the other one. For instance, for bookcase, the word that is usually is accompanied by bookcase is book rather than, uh, toad or frog. So, um, by just, um, measuring how, uh, probable it is for two words to be together, they develop a multidimensional distance.
A way that words that appear together a lot more frequently would be much closer to each other in comparison with words that rarely co appear
Zach: and the words that are closer to each other. If there are a lot of those words, that’s a lower [00:08:00] semantic, uh, value or density. Is that correct?
Neguine: So we are getting to the semantic density.
So this is just how the algorithm that Google, uh, the net language model that Google develop works. So they just, so they turn like large samples of text into a big, big space. That words that are closer together co appear together a lot more frequently. That’s what it does. You can, you can connect these words through vectors together from one word to the other, just like whatever vector is from, you know, a 0.0 to 0.1.
You can draw a vector from any word to another word. You can draw a vector
Zach: and then, and, and, and vector in this sense just means like a, a distance from something. Is that, is that correct?
Neguine: It, it is. And it also, you know, has other. Properties of like a vector. It has like a
Zach: direct direction,
Neguine: and That’s correct.
Exactly. And you can add them. So technically each word would be, would be associated with one vector. [00:09:00] It’s a, it’s very nice because when you have words as numbers, you can do whatever you want to do with them. Like you can add them. Like if in a sentence you have got five words, you can add all the.
Vectors, the numbers associated with each word in that sentence and that big number would, um, represent that sentence. And there are so many, uh, properties of that space. The credit, all the credit goes to Google that developed it, you know, when you subtract them or, um, when, what if two words are like, the victors of two words are parallel to each other, what does that mean?
It seems like it keeps some analogy to it. Um, so, um, so there are some properties that even the developers of the space were to not expect that to happen. So, but after analyzing the space, they, they found all these cool properties. So we use that space in order to measure semantic density. What [00:10:00] we did is we added all words in a sentence for all participants.
So we had a big vector called the sentence vector. It’s the, it just adds up all the um, word vectors. Associated in each sentence. So let’s say you wanna break number five. There are different ways to just break it down. Five can be one plus one plus one plus one, plus one plus one. I hope I said five. Oh, you can
Zach: say close enough.
Neguine: Yeah.
Zach: Plus
Neguine: one. Alright, one more. Or you can say two plus two, um plus one, or you can say two plus three. Um, so there are different ways to do that. The same thing with the sentence vector. We, when, when we try to break it just like the same way as we break a number into, its like components. It’s just a rough analogy.
Not the perfect analogy, but I just wanted to say like, um, we add them and then we can, [00:11:00] um, see what parts created the original sentence vector. If a sentence is too impoverished in content, the number of components that make up that sentence are a lot lower. Just the number is just fewer. Fewer vectors associated with that vector.
When a sentence is very rich and informative, when you break it down, you get a lot of a lot more component or meaning, what we called in our paper meaning vectors. So by just counting them, when you break up a sentence vector how many meaning components you get it, it turned out to be a very good predictor of future development of psychosis.
And in order to get a sense that easy to re, are we imagining that it is measuring thought, uh, content or richness of a sentence? We, we ran this on Amazon Turk. We asked, uh, numerous, uh, judges just to read a sentence, [00:12:00] just give a score to it in terms of like how semantically rich that sentence is. From zero to 10, they just picked, you know, a number based 10 being like the the richest sentence.
And then we correlated that with what we got from our algorithm. And the correlation was significant. It was just an indirect way of showing maybe it is actually measuring what is supposed to measure. Because sometimes you, you think that it is doing that job like checking validity, is it really doing that?
So this is one way, not perfect, but one way to make sure that, or be more certain that, um, it is measuring semantic density.
Zach: So, uh, for a real world example, you know, one, something I saw in one of the papers about your work, uh. So one sentence that was an example of a, of a low semantic density was sometimes things are things.
So basically a very vague, you know, ambiguous [00:13:00] low, low meaning sentence, and then the sentence for high density. An example was view the latest news. You know, both ha both of the sentences have four words, but one is clearly more, it contains more information. So that maps over to Yeah. What, what you were talking about with the, the vectors going in different directions and, and different distances from each other.
Neguine: Yeah, exactly. And the, these are very good examples. The two components that we think, uh, explained the effect where the, the specific choice of words, how specific the words are. For instance thing, it is a very. It’s a non-specific term. So that sentence had a low semantic density for two reasons. One, the selection of the word itself thing versus news, for instance, which is a more specific term, and also redundancy.
It seems like that algorithm was very sensitive to redundancy. Things are things, so you see a lot of redundancy, which happens in schizophrenia. [00:14:00] Again, this, this needs to be tested separately. This is just an intuition what’s going on that, um, their thought patterns seem to be a little bit circular, so they just do not move on from where they are since like they’re stuck in a place and they say same thing over and over.
Sometimes different words for the same concept or sometimes the exact same words. So all of these parameters result in a lower density sentence.
Zach: Now, uh, one question I had reading this was, you know, schizophrenia is kind of a general term and it’s been criticized a lot. It’s, it’s kind of a classification that includes a lot of different behaviors.
Was there any finding about like, specific categories of schizophrenia or, or types of psychosis that this was. Associated with, um, if that, if that makes sense.
Neguine: Yeah. Uh, we did not do that. And um, the main reason is when DSM four, the Diagnostic and Statistical Manual, this is the main, um, textbook for psychiatry.[00:15:00]
And there are different versions. Usually there are some, you know, big philosophical changes from one generation to the other from four to five. Not much philosophical thought or consideration going on, but like, um. Making it more clinically relevant. Um, in DSM four, they had these subtype of schizophrenia and, uh, like the, the organized one or paranoid one different types.
Um, but. With time, they just, um, the researchers or clinicians, um, realize that there is not much value in such, um, classifications. So in GSM five, they drop those classification and they thought that this is just, you know, an extra step that does not help that much in terms of like. Section of treatment or, um, you know, other clinical variables.
So they just dropped it and they felt that, um, it’s [00:16:00] best not to go with classification. At least the ones that they were using in getting for were not the helpful ones. Maybe in future there’s a better, uh, system of classification, but. Since then, according to the most updated ones, there was nothing there.
We didn’t correlate it with
Zach: different types. Gotcha. So for the purposes of, of this study, when you said it was correlated with later, um, schizophrenic symptoms, does, does that just mean that the subjects, um, they were basically diagnosed later with schizophrenia, however that happened? Is, is that accurate to say?
Neguine: Exactly. Okay. Usually that is a, like a categorical, uh, phenomena. Like, um, like they develop like frank auditory hallucinations or, you know, but it’s, um, the nice thing about it versus depression that you are not quite sure it’s, uh, it’s a lot more explicit. Um, so it’s like a state change. Um, so we, we use that state status change as a, a [00:17:00] measure of conversion.
Zach: Right. So while there may be, yeah, there, there can be discussions about the different categories or, or symptoms, but long story short, these, these subjects develop some sort of psychosis, some sort of psychotic episodes. Exactly. Right. Exactly. Um, so, uh, I, one question I had about it was. The, for the semantic density was, uh, why was, uh, print like New York Times or Reddit used as the comparison for their, uh, spoken speech as opposed to using some samples of spoken conversation?
Neguine: Good question. For this, we needed a. A very, very large corpus. And for another project I am using, um, spoken language samples, but it’s like one 10th or maybe one 20th as large as, um, the New York Times or Reddit. So larger sample sizes give [00:18:00] you more accuracy and um, I would say. The answer is because we do not have such a, in comparison, such a, a large corpus of transcribed spoken language.
Zach: Makes sense. Yeah, I, I’d imagine it’d be hard to find, uh, a lar a very large sample of consistent, uh, language.
Neguine: Right? There is one called Switchboard that I’m using their project, uh, which is good, but again, it’s not even comparable. I think when I said one 20th, it’s much less than that. I should have it like a, um.
A real number for you. So I should avoid giving you numbers without making sure that it is the case, but I know that, uh, just different order of magnitude in terms of the size. Hmm.
Zach: So something I saw, uh, a sentence I saw was, uh, the result, the results suggest. The best indicator of conversion during the prodromal period may [00:19:00] not be poverty of speech, but rather poverty of content as measured by semantic density.
Could you talk a little bit about what the difference is between poverty of speech versus poverty of content?
Neguine: Sure. Poverty of speech reach, um. Does happen in schizophrenia as well, but maybe not early on, is just not talking enough. Ah, it gets to the point of becoming like mutism that the patients may not talk at all.
Uh, it’s just how much people talk, um, that can be measured as simply as just counting the words. Per minute if you wanna get a rate or, uh, how many words per sentence? All of these have been reported. This is because it’s a very easy variable to measure, just counting. So, but it becomes challenging when, as I used that example earlier, that the.
That the individual does talk, [00:20:00] but there is not much meaning there. Mm-hmm. Then we call it, uh, poverty of
Zach: content. Uh, quick question I just had was when you, when you said the, the study was involving these, uh, a database of people who might have, you know, likelihood of developing schizophrenia. I, I, I assume that was balanced with like a completely healthy control group too, right?
Like as an equal number of people basically.
Neguine: As far as the design of the study, since the outcome was conversion to psychosis, um, it was not. So in normal individuals, we do not expect the, the conversion to psychosis. Um,
Zach: oh, I see. So you were, you were studying, you were studying only, it was only the prodromal uh, population and then seeing which of them converting.
Exactly. Oh, I see. Exactly.
Neguine: See.
Zach: Exactly. Gotcha. Yeah. Let’s talk about the sound related speech, the other indicator that you studied. Can you talk a little bit about how you made the, found those patterns?
Neguine: That was, uh, [00:21:00] my favorite part of it. So in, in psychoanalysis, for instance. The patient is encouraged to just talk free associate, just talk, just sit down, relax, close your eyes and talk as much as people.
And then the analyzer tries to interpret, which is very subjective. And, um, the other problems associated with that is there are very few motives there that, um, you know, different types of complexes that the psycho analyst tries to project those few assumptions. Onto what the patient says. So it’s very subjective, it’s very limited.
So that was the main problem. So, but it is always interesting, like what is the hidden message of what a patient is talking about? Is there a way to just extract something that is implicit? So that was cool and I like that, that you may talk, I may talk for like half an hour. And then what is the [00:22:00] predominant concept in what?
I just talked for like half an hour. This is, um, what we called like, um. Implicit content and the way we measured this was actually pretty simple. If you go back to the idea of sentence vector, that if you add all the words of a, in a sentence to have a sentence vector, we just measured what word would have there.
Highest co-sign with the sentence vector co-sign. It’s not just in our study, but in vectors, like if two vectors have high co-sign, they are very similar. For instance, um, two synonyms have a very high co-sign, maybe like 0.8, but words that are very irrelevant have a cosign of like 0.1, 0.2. We wanted to see of all the words available in English, which one has the highest cosign with each of the sentences [00:23:00] of these individuals.
Um, I think that was a very creative part of the, the study and we just mapped them in the figure. In the, in the article you can see it’s about like different things and it to some extent reflects, you know, the interview itself. Uh, because in that specific interview. When did that start? When did you notice this?
Uh, when that stopped. So using, seen that figure, there are a lot of like words in months, like November, December, or days of the week or you know, time of the day. Just because it was about the, the interview, the nature of interview required. But there was something that appeared, for instance, like months appeared both in converters and nonvert ’cause they’re both asked about China, but something that appeared in.
Converters and not in the non-con converters was just a big space, and that was so resistant. Like no matter what we did, that population of words always appeared. Chant. Whisper or voice [00:24:00] itself, and I like the word whisper a lot because none of the individuals who converted to psychosis actually use the word whisper.
They just use sentences that when you do the cosign relationship, you just find that it has a high co similarity with the word whisper. That’s the reason why I like whisper a lot because. Thing that a clinician would notice. So it’s just, you know, extracting a lot of concept. I think majority of the words that, uh, I listed there, they did not appear in the actual text, but they were just the summary of
Zach: similar, similar words that, or Exactly.
Was that sound related language, was that something you looked for or was it something that showed up when you were analyzing how the vector showed up?
Neguine: Yeah, so it was totally data driven. We didn’t, so the, the first variable semantic density, we call it like hand engineering, we were targeting a specific variable and we tried to, it was more like a theory driven [00:25:00] approach.
But for the second part, the implicit content was data driven approach, meaning that we just let the. Data, decide what words have the highest, uh, cosign similarity with the sentences.
Zach: Wow. So,
Neguine: um, yeah, we’re surprised that, like, we just looked at it and I was like, oh my gosh. Just all about voices and knowing that auditory hallucination is of the predominant features.
Yeah,
Zach: it’s, that’s really interesting because you would think, you know, it could have just as easily been a theory that you had, but the fact that it was data driven. Pretty amazing. It just popped out of the data.
Neguine: Right? It was absolutely data driven. Yes.
Zach: Wow. And, and how did that show up? I’m, I’m curious how, like, how does the algorithm, how does the program present that to you?
Does it say like, there’s a, a group of words over here and then you have to dig into it manually? Or does it just pretty much just show you the, the word, the types of words that No.
Neguine: Yeah, in general. So it gets all the possible words as input. It just tests, you know, with each sentence of an [00:26:00] individual. It just runs like a, this co-sign similarity and then rank order, like based on a list of like what ha, what word has the highest similarity just goes down.
It’s a large face off word, but we just, um. Since it is ordered on the, uh, basis of the cosign similarity, which started up with ones with the highest similarity, which I think where this case was. The word voice itself.
Zach: Voice. Oh, I see. Interesting. And whisper, presumably would’ve been down, up there somewhere.
Neguine: Yeah, exactly. Top five.
Zach: And to give an example of this, there was an, an example language, um, speech here, uh, where a subject said quote, you know, I talk to myself, but I don’t, I don’t know if it, it is me. I mean, I, if I talk to myself in the mirror, you know, I’m talking to me. But how can I have a conversation with myself?
I say stuff in my head as if I am talking to me, and it’s funny and I laugh, like, I didn’t know that I was gonna say that. End quote. So that’s [00:27:00] an example of, uh, you know, something with a few different talking, uh, words or, um, talk to myself. I say stuff so that, just an example of the kind of speech there.
Neguine: Exactly, and I love that example. That’s why I picked that one when I was listening to the, uh, the, the interview. Um, because it’s just by itself. It’s an amazing phenomenon that that patient also talks about, like his brain talking, uh, saying some jokes and he laughs at them. A new joke that he has never heard, like the, his mind creates as like, there’s like a split brain.
That’s where the word schizophrenia comes from. Mm-hmm. So it seems like there was one part that creates all these new jokes and the subject himself has never heard. Mm-hmm. Does that creativity come from? And the novelty come from
Zach: also interesting too, because it seemed, it seemed pretty low semantic, uh, density too.
It was a lot of you, you know, I talk to myself, how can, uh, I, as if I’m talking to me, it was a [00:28:00] simple idea stretched down into many words, basically.
Neguine: Exactly, exactly. You’re absolutely right. Yeah.
Zach: And not to say that, you know, like you said, like. I don’t know if that means anything. Me noticing that. ’cause like you said, it was, the algorithm is noticing things that are too subtle for, for me to notice.
So maybe that’s just me, uh, reading into it and that’s not it. It would take a larger sample size probably Right? To, to say that that would, but you were right. I
Neguine: mean, you were right because, um, there, as you said, like it is kind of redundant saying the same thing over and over. So you’re right in that regard.
So maybe using studies like this, we as clinicians become more cognizant or more sensitive to detecting these things because redundancy is not necessarily or explicitly listed as something to look for. Maybe from now on as clinicians, we. Start doing that. I think you made a very good point about that sentence.
It is redundant,
Zach: so yeah, that’s what I wondered too. You know, I wondered how much of [00:29:00] this you could use, you know, uh, in note, in trying to notice things. I mean it, like you said, it. There probably are some things you can notice over a good sample size. But then, then it’s also like, it might be hard considering, you know, some of the, if you’re talking to teenagers who have a tendency to ramble on a, on a lot anyway, I, there might be some, some challenges there.
Neguine: Right. Exactly. Mm-hmm. Exactly. Yeah.
Zach: You found that pattern and I, and I wondered if were there other patterns that you found or that you think might be there in this kind of speech, and do you think it points to like a future where you can use machine learning algorithms to do basically psychoanalysis where they find.
Patterns in speech that indicate certain, you know, emotional problems. I, I know I just asked like two questions there, but,
Neguine: uh, no, no, no. Yeah, sure, sure. Um, I can answer both. Um, they’re, they’re very related. Um, yes. I think this is just the beginning of exploring what [00:30:00] machine learning can do. I like the data driven approach a lot more because it is the most bias free and a theoretic way of approaching the data.
When we decide on certain features, um, in psychiatry, it is based on experience. We think, okay, parameter A and B are the best predictors, but usually these are like intuitive assumptions. They’re good, they may work, but what if there are things that we never imagined? So I always like to be surprised, like, oh, I never thought about this.
So data-driven approaches allow these types of discovery. Can be done at the same time, both approaches for. There are, you know, measures of, like, it is known that patient with schizophrenia have very tangential way of talking, meaning that they go on a tangent, they never come back to the point. This is very easy to measure using these algorithms.
It’s not even [00:31:00] machine learning at this, you know, this, this particular part where you just measure how they drift by measuring the post similarity of the vectors again. Going back to your second question about psychoanalytic approaches. Yes, there are better models. Um, VER was developed in 2013. In 2020, what I found, uh, to be very difficult is to catch up with, um, um, the pace of the developments in and, uh, natural language processing.
NLP Dr. Phillip Wolf. I would like to give him the credit who, you know, is an expert and he always looks for the most, most updated version available at a time that we were doing the analysis. Similar work used LSA, uh, leading Semantic analysis, which was an old one, and he thought that this is just too old to use each generation.
Now it is less than a year, like within a couple of months, you just [00:32:00] get a new absolutely new generation of an NLP model. And the way it works, like they just report CS people, computer science people just list the accuracies on various measures, like how they can measure grammatically, how they can measure sentiment analysis.
But they give numbers and if you just look at all they, they’re all publicly available. If you just look at the accuracies, they’re just going higher and higher. So we did, the reason why I mentioned all these is that. The more recent, uh, language models have used some transformers, just again, architecturally new types of, uh, neural networks, and they can do automatic summarization for you.
I haven’t tested those, but they might give you a good way of summarizing what a patient says in a few sentences or in words.
Zach: Mm-hmm. It just, all of this makes me imagine [00:33:00] some sci-fi future where you go in and talk to a, uh, machine and then it spits out a, uh, some things to do to help you.
Neguine: I think it’s very possible.
I think my goal is to do that, to have an app ask the patient to talk. Hit that button. Patient talks for 15 minutes, and by the end of it, just like the way they put, you know, the electrodes on your chest to. Show you like different types of arrhythmia or different measures of the electrical activity of, uh, your heart.
By the end of talking, I want to have several parameters, like how logical it was, how coherent it was, what was the emotional. Load and just give a number for each of them.
Zach: Right. It, it seems like there’s so many things you could study, like the number of anger related words or the, you know, number of sadness related words, things like that.
Yeah,
Neguine: exactly. You just have a [00:34:00] number by the end of that. Yeah. Instead of just because the, the problem with psychiatry, and I’m a psychiatrist, so uh, I always had this criticism about the field. Is that y is not quantitative in when you compare it with other fields. And I don’t think it’s because psychiatry is behind.
I think it’s because the brain is so sophisticated when you compare it to any other organ that it is harder to have numbers associated with each concept. Um, we should be at the same time, very cautious about all these parameters. Because machine learning is such a strong tool that it always finds a solution for you almost always.
So we should always be careful, like, is it like a false alarm? We have to replicate these in order to make sure that what we’re, you know, measuring is. That’s actually what we intend to measure.
Zach: And that, and that’s why you use the Amazon Turk, uh, workers to get a human, um, perspective on, on the, on what the algorithm found.
Neguine: [00:35:00] Totally. Yeah. Because, um, it just, you just give problems to big neural networks and, you know, they’re trained, they have so many connections that one way or another they just. Solve the problem like now. Long time ago, they were able to outperform human being, playing chess. Now they’re outperforming human people to, uh, to play go, which is a Chinese game.
They say it’s like much more sophisticated than chess because the, because of the number of layers associated, it can outperform human being so it can find solutions very easily. So we need to replicate, make sure that is a good solution and what we are looking for.
Zach: So I was reading, um, I was actually reading a really great book, hidden Valley Road, which actually let me Google the author’s name right now just to do him that favor.
Hidden Valley Road by Robert Koker, K-O-L-K-E-R. Uh, just came out this year, was a, is a really good book about a family who had 12 children and, and six of the boys ended up with, um, [00:36:00] you know, diagnosed with schizophrenia. Uh, but one of the theories it talks about. There was the sensory gating, uh, theory of schizophrenia that people with schizophrenic symptoms basically over respond to sensory inputs.
And, um, that one of the tests they use for that is actually an, an auditory one where they, you know, they give people a series of tones or, or sounds of some sort and, um. There’s a correlation with people with, uh, schizophrenia, you know, over responding to the, the sounds after the first one more than, than other people do in general.
Uh, and I wondered if you thought that that kind of theory could relate to the, the sound related speech that you found in your study. I.
Neguine: That’s an excellent point, excellent point. I have not tested it. This topic is relative, uh, is relatively classic in attempts to explain auditory hallucination. So we kind of get adapted as we hear the same [00:37:00] auditory stimulus over and over this adaptation.
Like reduced response does not happen. Patients with schizophrenia. So one way that I’m thinking as a potential study, ’cause Naples actually does have, um, EEG is just to, and, and there is a very, so they put E, E, G and then they measure for that particular, um, the sensory. They put electrodes and they, uh, present, uh, patients with different auditor stimuli, and then they, they, they look at a very specific component called P 51 way to test what you suggested.
Again, a very nice idea is just to. A very simple correlation between our words, like words with words like whisper, voice chant, and, uh, P 50 and see if they’ve got, um. A reduced, uh, like a reduced wave or not a [00:38:00] very good point. Maybe was not possible to do that before because there was not such measure before.
Now that we do, maybe it’s a good idea to do that. Previous measures have found correlation between auditory hallucination and P 50 budget, just saying how severe auditor host nation was, but they didn’t have like actual objective numbers like this, so.
Zach: Hmm.
Neguine: So that can be a potential future study. And uh, thanks for the suggestion.
Zach: And, uh, I was curious, uh, I’d also read, uh, somewhere I can’t remember, where that patients with schizophrenic symptoms can also have, um, be more likely to talk about eyes and and vision. Um, have you seen anything about that and do you think that’s also something that is theoretically, I, I, I guess it didn’t get found in, in your, uh, analysis, so maybe it’s not
Neguine: an actual correlation.
So, um, visual hallucinations are not as common as auditory hallucinations. Usually when there is a visual hallucination, we look for other features. Um, there are [00:39:00] specific types of ethnicities that have more visual hallucinations than like auditor hallucinations. Very, this is a rare phenomena or.
Sometimes it is drug related, so it is not quite as classic as auditor hallucination. And, um, in the graph that we had, uh, in our paper, we reported whatever we saw, I don’t think we found visual.
Zach: Mm-hmm. Right. And I was thinking that even just, uh, the visual hallucinations. But I’d also seen, you know, I, I, it’s coming back to me now.
I think it was that artwork by schizophrenic patients. Had a lot of eyes in them, you know, they kind of were obsessed with the threatening nature of people’s eyes and eye contact. I kind of wonder if that might be findable too, but
Neguine: that’s a, yeah, that’s a good point. I maybe that, that reflects paranoia rather than Right.
Visual, uh, simulation. Um, so that’s a good point. Yeah. Uh, maybe it’s just, um, because they do feel like they’re [00:40:00] always being,
Zach: being watched.
Neguine: Yeah. And then they have this, uh, something that is called idea of reference thinking that whatever people on the radio TV talk about is about them. So it’s always like an eye watching them.
So maybe it’s a measure of paranoia. That’s a good point. Needs to be tested.
Zach: Right. I could, I could see it showing up as like words, like, uh, they looked at me, they were looking at me. Those kinds of, you know,
Neguine: right.
Zach: Speech. But I mean, that might be hard to find too. ’cause looking is such a common. Word anyway,
Neguine: may come later because we all, again tested like patients, um, at very early, early stages.
It’s not even psychosis yet, so it’s just two years prior to that. So maybe at later stages they, it is detectable.
Zach: And more random thought too. It may, when we were talking about, uh, you know, having, um, machine learning algorithms to do psychoanalysis, it’s, it’s kind of a challenge with schizophrenic patients specifically because they often have those, uh, paranoias about the controlling machine [00:41:00] kind of mechanisms that are, are, are, you know, absolutely something controlling them.
So it’s like maybe, you know, they, they won’t always enjoy the, the idea that, uh, someone’s studying them with a machine somewhere.
Neguine: Yeah, even without, even like at a clinic that, you know, they, they always think that there is like a, there is like a computer when they come to the clinic, somebody puts a computer in their hands and to control them or some of the medications that they get through injection or some computers that are monitoring them.
So telling them that there’s actually one computer observing that. Not be,
Zach: maybe leave that part out. Yeah. Yeah. We, we were just, we were watching Mad Men in the, the last season, you know, with the, had that classic presentation of one of the characters being afraid of this new computer that they had got in the office, you know, when computers were a new technology and he became paranoid and he had started having paranoid delusions about it.
Yeah.
Neguine: Very common. Very common. Yep.
Zach: Have there been, um, practical [00:42:00] applications with this technology yet, or are there starting to be, or is are some institutes or, or, or, or hospitals, um, using this to keep an eye on people? Or is that, maybe, are there some ethical problems with trying to use this in a practical way?
Neguine: I don’t think there’s ethical problem as long as the patient is aware of what is being done and what the purpose of the use of language is. It is in a clinical controlled setting, it should not be a problem. The thing is that the first step that if, if I want to, usually people like me, if we just move on to the next project, okay, now let’s start and let, let, let’s test it on another population or let’s check in in Alzheimer’s disease.
That’s what we are doing, but. If I wanted to do that, the first thing that I would’ve done would be to test everything on Naples three, the other, uh, generation of the, the next phase of Naples. Uh, get language samples, repeat everything, replicate maybe larger sample size, and make sure everything, [00:43:00] uh, works in a reliable fashion.
And then. Use that, you know, as a possible way to measure it as a clinical setting. But I would like to do other tests because this was like a, a proof of concept study. We need to do a lot more in order to, you know, make it happen at the, at the clinical setting. But I think that’s completely doable. We’re not far from there.
It’s just, you know, a group or maybe even our group to just redo things. But I am a novelty seeker. I just wanna, you know, test newer ideas rather than repeating. But that’s on the central part of science as well, so it’ll happen one day.
Zach: So, uh, do you have any other, um, research you wanna talk about now, um, that you, I think you said you were doing something with Alzheimer’s, was that right?
Neguine: Right. Um. Similar idea of trying to predict Alzheimer’s. Um, there was a very famous study that was done in the [00:44:00] population of nuns, and then they were asked to just write things. They wrote essays. And then years after, they were, they looked at who developed Alzheimer’s disease and they found a relationship between study.
So they use whatever measure, however they measure semantic density at that time, they found a correlation, which makes a lot of sense with, um, Alzheimer’s disease. So I wanna know whether, um, getting some data from these individuals, it’s actually easier these days because people write on different platforms.
Um, Reddit or Facebook, Twitter, just trace them. As far as possible and see if there are any indicators of developing Alzheimer’s disease. That’s what we are doing now. So I work, um, at Dr. Dickerson’s lab at Harvard University and, uh, over the past two years of my fellowship there. Which [00:45:00] I recently finished, uh, we’ve been working on on that.
So we’re at the beginning, um, stages of it.
Zach: This has been Dr. Nain. Reza, thanks so much for coming on. Is there, if anyone wants to, would like to get in touch with you, do you, um, have a recommendation for that?
Neguine: Sure. My email, nain reza@gmail.com would be the easiest and most reliable way to go.
Zach: Yeah. Thanks a lot.
Absolutely. It was a fascinating work and thanks for your contributions to the, um, to the medical field.
Neguine: It was a pleasure. Thank you so much.
Zach: This has been the People Who Read People Podcast. I’m Zach Elwood. If you’d like to see the blog where I post summaries of these episodes, go to www reading poker Tells Video slash blog.
Thanks for listening.
One reply on “Can you predict schizophrenia by analyzing language?, with Dr. Neguine Rezaii”
[…] is a reshare of a 2020 talk I did with psychology researcher Neguine Rezaii. We talk about her research using machine learning to find patterns in language used by teenagers […]