Warning: Critical Thinking Ahead
Students may find some of the material from Eitzen highly controversial. They may, in fact, vehemently disagree with some of the points raised. This is GOOD! You don't have to agree with the material. It is, after all, only a perspective -- a way of looking at the social world -- and we all have perspectives. I would hope that, in the process, students share their points of view. I would also hope that students will be open to understanding the perspectives encountered. There are seldom right or wrong answers in Sociology -- only perspectives. The trick in a class like this is to be open to multiple perspectives.
|"Focusing on the poor and ignoring the system of power, privilege, and profit which makes them poor, is a little like blaming the corpse for the murder"|
Michael Parenti (in Eitzen and Baca-Zinn, 2000)
I. Review of General Theory
A. Functionalist Theory
Understanding society from a functionalist perspective is to visualize society as a system where all the parts act together even though each part may be doing different things.
Associated with the system is structure. In society, institutions, such as family, education, and religion are the parts of the social system. They are structures in society that social activity is organized around.
The overall goal of the various structures (parts) is to maintain order in society.
The structures in society promote integration, stability, consensus, and balance in society.
B. Conflict Theory
Conflict theory is a theoretical framework which sees society as divided by inequality and conflict.
Conflict theorists see society less as a cohesive system and more an arena of conflict and power struggles.
Instead of people working together to further the goals of the "social system," people are seen achieving their will at the expense of others.
C. Symbolic Interactionist Theory
Symbolic Interactionist theory is a theoretical framework that sees society as the product of individuals interacting with one another.
The scope of investigation for these sociologists is very small. Interaction is generally face-to-face and addresses "everyday" activities.
They are interested in the way individuals act toward, respond to, and influence one another in society.
These kinds of sociologists are not interested in nation-states. They don't consider social institutions like the economy or government.
Interactionists prefer to explore the interaction of individuals or groups of individuals. Each communication produces new perspectives, expectations, and boundaries that individuals use to assure continual interactions in the future. Society occurs as a result of interaction between individuals and small groups of individuals.
II. History of Social Problems Theory
A. The Medical Model
The medical model argues that there are "universal criteria for normality" and tended to assume that social problems were linked to "bad people." They were viewed as "abnormal because of mental deficiency, mental disorder, lack or education, or incomplete socialization (Eitzen et al. 2009: 6-7) .
B. Absolutist Approach to Understanding Social Problems
In the 1920s and 1930's some sociologists focused on condition in society that fostered social problems. They investigated the process of migration, urbanization, and industrialization (Eitzen et al. 2009:7). They looked for "pockets of social disorganization" (e.g., areas of the city that have high rates of in and out migration also have high rates of crime).
C. Modern Studies of Deviance
In the recent past, sociologists have returned to "the study of problem individuals" (Eitzen et al. 2009:7). Eitzen et al (2009) point out two variations in the study of modern deviance.
1. Merton - Social Strain Theory
Society provides goals and means to achieve those goals. Deviance occurs when people recognize the goals, but don't have sufficient means to achieve those goals.
2. Labeling Theory
Others explore the role of society in "creating and sustaining deviance through labeling those people viewed as abnormal. Social reactions are viewed as the key in determining what a social problem is and who is deviant" (Eitzen et al. 2009:7).
D.The Subjective Nature of Social Problems
Some argue that what is considered a social problems is dependent on time and audience (Eitzen et al. 2009:8). Unemployment is not a problem for everyone. Nor is racism and sexism. Pollution is not viewed the same by everyone.
Example: Global warming is not a problem if you live in Greenland.
III. Toward A Definition of Social Problems
A. Objective Reality to Social Problems
Eitzen et al. (2009:8) argue that some social conditions are detrimental in any situation. In this sense, they have an objective character (e.g., disease, flu, ebola).
There are also conditions in society such as poverty, racism and sexism that cause material or psychological suffering for parts of the population. They prevent members of society from developing and using their full potential. This sort of suffering exists regardless of personal or cultural opinion. Those conditions are, therefore, social problems in any social setting.
Bias is a preference or an inclination for something. Bias can inhibit impartial judgment. Realizing that we have biases is important. We have feelings and values. Such feelings and values determine what we study. However, once we have acknowledged our biases, we cannot only report facts that we discover that support our point of view.
B. All Social Research is Political
Regarding the study of anything social, the research is either going to look at the characteristics of the individual or the social system within which a "problem" occurs.
One approach accepts the definition of deviance and the other "undermines" that accepted definition.
Both approaches are political, "yet there is a tendency to label as political only the research that challenges the system" (Eitzen et al. 2009:9). When research does point to systemic issues that harm the position of the poor, often the charge of Bias is raised.
C. Official Definitions of Social Problems
We must not automatically accept only those definitions that define social problems from the point of view of those in power.
Eitzen et al. (2009:9) warns against accepting definitions of social problems provided by those in power. "The powerful can define social reality in a way that manipulates public opinion."
- In the old south, slavery was not considered a problem, but slave revolts were.
- In Salem, the persecution of witches was not a social problem, but witches were.
- In the South prior to the Civil Rights era, Jim Crow laws were not a problem, but Rosa Parks was a problem when she wanted to sit down on a bus in Montgomery, ALA.
D. Public Opinion and the Media.
The mass-media is a primary source that defines social problems for many of us.
1.The Powerful Control the Media
Powerful interests control the mass media and, therefore, control public opinion. Often "relevant issues" are defined by those who wield power through the mass media.
The powerful, through the mass media, can set the agenda.
2.Conditions that Affect the Powerless are Ignored
The media may overlook conditions that are detrimental to the relatively powerless segments of society.
Attention is diverted to specific social instances and away from the cause of many social problems. There is a tendency to focus on the characteristics of individuals. As Skolnick and Currie notes: "conventional social problem writing invariable returns to the symptoms of social ills rather than to the source" of those ills (Eitzen, 2000:7).
By focusing on those who deviate, we often overlook the role of society's powerful.
- We study the criminal instead of the law or the prison system that tends to perpetuate crime.
- We scrutinize the mentally ill rather than the quality of life or social programs that initially bring on a mental breakdown. We don't study the role of social institutions that ultimately fail to accept responsibility by pushing the insane onto the street (deinstitutionalization) to "save the budget."
- We explore the culture of the poor rather than characteristics of the rich.
- We investigate the pathologies of students and their families rather than the inadequacies of higher education.
- We study the characteristics and consequences of poverty rather than the social structure that creates conditions that allow conditions like poverty to exist.
IV. Types of Social Problems
A. Norm Violations
Norm violations assume that a standard of behavior exists.
People who study norm violations are interested in society's failures like the criminal, the mentally ill, or the school dropout.
B. Social Conditions
Eitzen et al. (2009:10) contends that norm violations are symptoms of social problems rather that the problem itself.
Eitzen et al. (2009: 11) suggest that a second type of social problem involves conditions that cause psychic and material suffering for some category of people. The focus is on how society operates and who benefits and who doesn't benefit under existing social arrangements.
- How are society's rewards distributed?
- Do some categories of people suffer due to the way schools are organized
- Are some groups of people put at a disadvantage because of the manner juries are selected?
- Do some categories suffer because of the way health care is delivered?
As people withdraw from the system that fails to meet their needs, they will be defined by that society as "bad people, but this is so because they live in bad societies" (Eitzen et al. 2009:12).
V. Sociological Imagination
The sociological imagination refers to the ability to see the relationship between individual experiences and the larger society (Kendall, 1998:7).
As opposed to looking at isolated events (like slavery or drug abuse) by themselves, the student of social problems is encouraged to look at social problems in relation to other aspects of society like the economy, culture or religion.
According to Mills (in Eitzen et al, 2009:14) "the task of sociology is to realize that individual circumstances are inextricably linked to the structure of society."
To paraphrase C. Wright Mills (1959), people do not usually define their personal problems in terms of historical change and institutional contradictions. People do not usually think of the connection between the patterns of their own lives and the course of world history.
People live out biographies in the context of world events that are in turn determined by historically specified conditions. Both the lives of individuals and the course of world history is understood simultaneously.
- The sociological imagination is stimulated by a willingness to view the social world from the perspective of others.
- It involves moving from thinking about the individual and his/her problems to focusing on social, economic, and historical circumstances that produce the problem.
The Cause of Homelessness: Individual pathology or public policy
VI. Social Structure As The Basic Unit of Analysis
A. Person-Blame Approach
People generally understand social problems as some sort of pathology experienced by individuals. This approaches to understanding social problems is what Eitzen calls the person-blame approach.
This approach tends to assume that universal norms exist. Behavior is deviant depending on how much it strays from these norms.
Most people define a social problem as behavior that deviates from the norms and standards of society.
The system is not only taken for granted; it has, for most people, an aura of sacredness because of traditions and customs they associate with the system.
From the person-blame approach, those who deviate are seen as the source of trouble. The obvious question observers ask is, why do these people deviate from norms? Because most people view themselves as law abiding, they feel those who deviate do so because of some kind of unusual circumstances: accidents, illness, personal defect, character flaw, or maladjustment. For example, a person-blamer might argue that a poor person is poor because he or she is not bright enough to succeed. In other words, the deviant is the cause of his or her own problem.
The following are examples of perspectives that replay on person-blame approaches.
1. Cultural Deprivation
Eitzen et al. (2009:16) contends that people who blame the victim often cite cultural deprivation as the "cause" of social problems. Culture is seen as the "cause" of the problem. In other words, people who blame the victims see the culture of the group with the problem as inferior and deficient when compared to the culture of the dominant group in society.
For example, kids who don't do well is school have parents who don't speak proper English or who are uneducated.
How successful are Prisons in rehabilitating criminals? Not VERY! Three-fourths of the released criminals are re-arrested within four years. Recidivism refers to ex-offenders who are arrested for another criminal offense once they have been released from jail .
Why are recidivism rates so high? The person-blame approach might argue that the fault lies in the characteristics of the individual. Maybe they are greedy. Perhaps they have higher than usual levels of aggression. Person-blamers may also point out the ex-criminals lack of social controls (in Eitzen et al. 2009:16).
3. Social Darwinism
The discoveries of Charles Darwin had a profound impact on other branches of scientific inquiry. Charles Darwin, of course, is famous for his Theory of Evolution. In the world of biology the species most fit survived while those less fit eventually became extinct.
Social Darwinism is a distorted view of Darwin's theory. Many social scientists, most notably Herbert Spencer, attempted to apply the logic of Charles Darwin to the social world. The essence of the social Darwinist perspective is that races or cultures, who occupied a "superior position" in the social world, deserved that position because they were the most socially fit (Eitzen et al. 2009:18).
According to Spencer "the poor are poor because they are unfit." The poor are poor because they do not have the intellectual ability to be wealthy.
Spencer argued that "poverty is nature's way of 'excreting ... unhealthy, imbecile, slow, vacillating, faithless members' of society in order to make room for the fit" (Eitzen and Baca-Zinn, 1994:170).
Social Darwinists, therefore, oppose social programs because, they argue, social programs perpetuate the existence of the unfit group who would probably disappear in the absence of social welfare.
B. The Consequence of Blaming the Individual
1. Person-Blame Distracts Attention Away from Institutions
When one uses only the person blame approach, it frees the government, the economy, and the educational system (among other institutions) from blame. The person blame approach ignores the strains that are caused by inequalities within the system.
2. Person-Blame Makes it More Difficult to Institute Systemic Change
Excluding the existing order from blame makes it that much more difficult to initiate change in economic, social, or political institutions. By replying on a personal-blame approach, societal conditions such as norms that are racist, sexist, or homophobic go unchallenged.
3. Person-Blame Allows the Powerful to Control Dissidents
Blaming the individual allows the government to "control" dissidents more easily. Deviants are sent to prisons or hospitals for rehabilitation.
Replying on a personal-blame approach legitimizes social programs aimed at individuals. It encourages treatment of the individual in terms of counseling, behavior modification, or psychotherapy.
4. Person-Blame Reinforces Stereotypes
Person blame also has the potential to reinforce stereotypes. (e.g., the poor are poor because they are lazy.)
The person-blame approach tends to support the Social Darwinist position that people are placed in the system according to their ability or inability.
C. The System-Blame Approach
This course often advocates a system-blame approach.
System-blamers argue that societal conditions are the primary source of social problems.
System-blamers suggest that the key to understanding social problems is understanding the distribution of power in society.
Example: Student Failures -- System-blamers argue that student failure in school is linked to the failure of the education system to meet the needs of the students.
Solutions to social problems are going to involve manipulating those structures which generate the problems.
Example: Crime - The solution to crime might involve the establishment of education programs in prisons in order to improve literacy rates of those who are incarcerated.
D. Problems with The System-blame Approach
1. Sometimes Individuals are the Problem
Blaming the system also presents problems for social scientists as well. Ultimately the system is made up of people. Society results from the interaction of individuals. Individuals are sometimes aggressive, means, and nasty (Eitzen, 2000:14). Systemic explanations for social problems is only part of the truth. The system-blame approach may, therefore, absolve individuals from responsibility for their actions.
When a robber breaks into your house, damn the problems with the system. You have problems with that particular individual.
2. System-Blame: A Dogmatic Approach?
Blaming the system is only part of the truth. Blaming the system tends to assume a very rigid dogmatic approach to the understanding of society. It tends to present a picture that people have no free will (Eitzen, 2000:15).
E. Why I Use the System-Blame Approach?
I tend to use the system-blame approach for a couple of reasons.
- Since most people tend to blame individuals, we need a balance.
- Sociology is concerned with societal issues and society's institutional framework is responsible for creating many social problems.
- Since institutions are human creations, we should change them when they no longer serve the will of the people. Democratic conceptions of society have always held that institutions exist to serve people, not vice versa. Institutions, therefore, are to be accountable to the people whose lives they affect. When an institution, any institution, even the most "socially valued" -- is found to conflict with human needs, democratic thought holds that it ought to be changed or abolished (in Eitzen, 2000: 15-16). Accepting the system-blame approach is a necessary precondition to restructuring society along more human needs.
VII. The Scientific Method
The scientific method is a systematic, organized series of steps that ensures maximum objectivity and consistency in researching a problem (Schaefer and Lamm, 1992:35). The following are some components of the scientific method.
A. Test Ideas
Don't take assumptions for granted. Don't rely on common sense. Don't rely on traditional authority figures.
B. Evidence must Be Observable
Evidence should be observable because other Sociologists might want to perform the same study in order to verify or refute findings.
C. Describe How Evidence is Gathered
Any study of society should specify the methods the researcher used to obtain his or her information, the setting (where the researcher conducted the study), and the population (whom they studied). This is done so that other social scientists may test your findings. Social scientists are cautious in accepting the findings of other. Studies are often replicated to verify findings of initial studies.
A theory is a set of ideas [generalizations] supported by facts. Theories try to make sense out of those facts. Social scientists seldom accept theories as laws. Often they are not considered totally true. Furthermore, the subjects they attempt to explain (i.e., people and social institutions) are variable. Gergen (1982:12) in D'Andrade (p 27) states:
"It may be ventured that with all its attempts to emulate natural science inquiry, the past century of sociobehavioral research and theory has failed to yield a principle as reliable as Archimedes principle of hydrostatics or Galileo's Law of uniformly accelerated motion."
Because theories are general ideas, social scientists do not test them directly. A hypothesis is a speculative (or tentative) statement that predicts the relationship between two or more variables. It is, in essence, an educated guess. It specifies what the researcher expects to find. To be considered meaningful, a hypothesis must be testable; that is, capable of being evaluated (Schaefer & Lamm, 1992: 38).
VIII. Basic Statistical Concepts
A. Measures of Central Tendency: Mean and Median
The mean, or average, is a number calculated by adding a series of values and then dividing by the number of values. For example, to find the mean of the numbers 5, 19, and 27, we add them and divide by the number of values (3). The mean would then be 17 (Schaefer & Lamm, 1992: 36).
The median is the midpoint or number that divides a series of values (which are ranked in ascending or descending order) (Schaefer & Lamm, 1992: 36).
B. Rates & Percentages
A percentage is a portion based on 100. Use of percentages allows one to compare groups of different sizes.
Example: Comparing Populations of Different Sizes
If we are comparing contributors to a town's Baptist and Roman Catholic churches, the absolute numbers of contributors could be misleading if there were many more Baptists than Catholics living in the town. With percentages, we can obtain a more meaningful comparison, showing the proportion of persons in each group who contribute to their respective churches (Schaefer & Lamm, 1992: 36).
C. Target Populations and Samples
The target population refers to everyone in a group that is studies. For example, if one wants to know how people will vote in an election, the target population is everyone who is eligible to vote. How can a researcher study a population as large as that of the United States? The answer is that one cannot study entire populations. Large populations are simply too big. The researcher, therefore, needs to look at a small subset of the population. We call this subset a sample. The trick is to make sure that your sample closely parallels the characteristics of the larger population.
1. Random Sample
Henslin (1999:126) contends that a random sample is one in which everyone in a population has the same chance of being included in a study. A random sample is necessary if one is going to attempt to generalize the findings in a study to the larger population.
A hypothesis poses a relationship between two or more aspects of social relationships. These aspects are called variables. A variable is a measurable trait or characteristic that is subject to change under different conditions. Income, gender, occupation, and religion are variables. Variables may be independent or dependent.
1. Independent Variables
Independent variables in a hypothesis are those that influence or cause changes in another variable. In other words, an independent variable is something that is chosen by the researcher to cause a change in another variable.
2. Dependent Variables
The dependent variables are those variables are believed to be influenced by the independent variable (Schaefer & Lamm, 1992:38).
Example: Independent and Dependent Variables
Higher levels of education produce greater earnings. Education is the independent variable (it causes the change in income levels). Income level is the dependent variable. The income an individual earns "depends" or is determined by the influence of education.
A correlation exists when a change in one variable coincides with a change in another variable. The fact that a correlation exists means that the two variables are associated statistically with one another. However, the mere fact that associations exist, does not necessarily mean that a change in one variable causes a change in another variable. Correlations are an indication that causality may be present. They do not necessarily indicate causation (Schaefer & Lamm, 1992: 38).
One of the most common research mistakes is to assume that a high correlation between two variables means that one variable (independent) causes some change in another variable (dependent).
IX. Methods of Gathering Data
Weber suggested that sociology needs several methods of investigation. The following material provides various benefits and problems associated with four methods of gathering data.
A. Case Studies (field study)
- Case studies (or field studies) explore social life in its natural setting, observing and interviewing people where they live, work, and play (Kendall, 1998:25).
- Its advantages are that the researcher can study individuals in their natural setting (e.g., at home, at work, playing, etc.). Case studies provided volumes of information such that at the end of the study the researcher has a thorough understanding of the individuals involved in the study.
- Drawbacks to the case study include the fact that social scientists cannot usually investigate many cases because of time constraints. Another problem with the case study is that the results may not be generalizable to the population at large.
B. Participant Observation
- Social scientists would use participant observation gain a close and intimate familiarity with a group of individuals and their practices. Participant observation involves an intensive involvement with people in their natural environment, usually over an extended period of time.
- Example: A researcher conducting research on the homeless would live among the homeless day and night, sleeping on the street or at a shelter, and engaging in the same activities as the population he is studying. By doing so, the researcher would gather a broad understanding of the homeless, their needs, and characteristics.
- Its advantages are that the researcher can study individuals in their natural setting (e.g., at home, at work, playing, etc.). Case studies provided volumes of information such that at the end of the study the researcher has a thorough understanding of the individuals involved in the study.
- Drawbacks to the participant observation may be related to the fact that one does not merely observe. The researcher must find a role within the group observed. Overt participant-observation, therefore, is limited to contexts where the community under study understands and permits it. Participant observation is, therefore, restricted to the public events the observed group engages in. The researcher seldom is able to explore what happens "behind the scenes" in the group under investigation.
C. The Survey (Interviews)
- The researcher asks questions of the cases face to face or in a questionnaire.
- The advantages are that data collection is more systematic (you ask the same questions of every case).
- Because it is systematic and generally more condensed, the researcher can investigate more cases. Survey research can, in fact, be applied to several thousand (or million) cases. The U.S. Census begins as a survey of the population.
- Findings may be generalizable to larger populations.
- When relying on a survey questionnaire, much information is lost. Facial expressions are not recorded. Environmental considerations are missed.
- Furthermore, information can be lost because the interviewer failed to ask the right question.
Kendall (1998:26) describes an experiment as a "carefully designed situation (often taking place in a laboratory) in which the researcher studies the impact of certain factors on subjects' attitudes or behaviors."
- The experiment offers a high degree of exactness because one can control everything in a laboratory setting.
- Variables can be precisely studied. Natural science uses this approach most often. So does psychology.
- It is easier to determine cause and effect relationships.
- One disadvantage with the experiment in studying social phenomena is that the environment is contrived. People do not normally carry out their lives in a laboratory setting.
- Ethical issues may also arise when performing experiments on people. The Nazi death-camp experiments represent extreme instances of ethical violation. Even in ordinary university type experiments deception and misinformation are often employed. Many consider these ethical violations.
E. Secondary Data Analysis (Existing data)
- Existing data includes government records (census), personal documents, or mass communication (published books, the news, movies).
- The Statistical Abstract of the United States is an excellent source of existing data.
- The advantages are that data are generally easy to obtain. They already exist and can be found in most university libraries.
- Much existing data are also standardized. Standardization makes it easier to compare one set of data with another.
- One problem associated with existing data is that the researcher must use the format provided. For example, a researcher studying poverty would be frustrated with the census before 1970 because there was no poverty rate in 1960 and before.
Eitzen, D. Stanley and Maxine Baca-Zinn
1986 Social Problems. (3rd Ed.) Boston: Allyn and Bacon.
1994 Social Problems. (6th Ed.) Boston: Allyn and Bacon.
2000 Social Problems. (8th Ed.) Boston: Allyn and Bacon.
2003 Social Problems. (9th Ed.) Boston: Allyn and Bacon.
Eitzen, D. Stanley, Maxine Baca-Zinn, and Kelly Eitzen Smith
2009 Social Problems. (11th Ed.) Boston: Allyn and Bacon.
1998 Social Problems in a Diverse Society. Boston: Allyn and Bacon.
1. Conditional Probabilities and Bayes' Theorem
The probability of a hypothesis H conditional on a given body of data E is the ratio of the unconditional probability of the conjunction of the hypothesis with the data to the unconditional probability of the data alone.
(1.1) Definition. The probability of H conditional on E is defined as PE(H) = P(H & E)/P(E), provided that both terms of this ratio exist and P(E) > 0.
To illustrate, suppose J. Doe is a randomly chosen American who was alive on January 1, 2000. According to the United States Center for Disease Control, roughly 2.4 million of the 275 million Americans alive on that date died during the 2000 calendar year. Among the approximately 16.6 million senior citizens (age 75 or greater) about 1.36 million died. The unconditional probability of the hypothesis that our J. Doe died during 2000, H, is just the population-wide mortality rate P(H) = 2.4M/275M = 0.00873. To find the probability of J. Doe's death conditional on the information, E, that he or she was a senior citizen, we divide the probability that he or she was a senior who died, P(H & E) = 1.36M/275M = 0.00495, by the probability that he or she was a senior citizen, P(E) = 16.6M/275M = 0.06036. Thus, the probability of J. Doe's death given that he or she was a senior is PE(H) = P(H & E)/P(E) = 0.00495/0.06036 = 0.082. Notice how the size of the total population factors out of this equation, so that PE(H) is just the proportion of seniors who died. One should contrast this quantity, which gives the mortality rate among senior citizens, with the "inverse" probability of E conditional on H, PH(E) = P(H & E)/P(H) = 0.00495/0.00873 = 0.57, which is the proportion of deaths in the total population that occurred among seniors.
Here are some straightforward consequences of (1.1):
- Probability. PE is a probability function.
- Logical Consequence. If E entails H, then PE(H) = 1.
- Preservation of Certainties. If P(H) = 1, then PE(H) = 1.
- Mixing. P(H) = P(E)PE(H) + P(~E)P~E(H).
The most important fact about conditional probabilities is undoubtedly Bayes' Theorem, whose significance was first appreciated by the British cleric Thomas Bayes in his posthumously published masterwork, "An Essay Toward Solving a Problem in the Doctrine of Chances" (Bayes 1764). Bayes' Theorem relates the "direct" probability of a hypothesis conditional on a given body of data, PE(H), to the "inverse" probability of the data conditional on the hypothesis, PH(E).
(1.2) Bayes' Theorem. PE(H) = [P(H)/P(E)] PH(E)
In an unfortunate, but now unavoidable, choice of terminology, statisticians refer to the inverse probability PH(E) as the "likelihood" of H on E. It expresses the degree to which the hypothesis predicts the data given the background information codified in the probability P.
In the example discussed above, the condition that J. Doe died during 2000 is a fairly strong predictor of senior citizenship. Indeed, the equation PH(E) = 0.57 tells us that 57% of the total deaths occurred among seniors that year. Bayes' theorem lets us use this information to compute the "direct" probability of J. Doe dying given that he or she was a senior citizen. We do this by multiplying the "prediction term" PH(E) by the ratio of the total number of deaths in the population to the number of senior citizens in the population, P(H)/P(E) = 2.4M/16.6M = 0.144. The result is PE(H) = 0.57 × 0.144 = 0.082, just as expected.
Though a mathematical triviality, Bayes' Theorem is of great value in calculating conditional probabilities because inverse probabilities are typically both easier to ascertain and less subjective than direct probabilities. People with different views about the unconditional probabilities of E and H often disagree about E's value as an indicator of H. Even so, they can agree about the degree to which the hypothesis predicts the data if they know any of the following intersubjectively available facts: (a) E's objective probability given H, (b) the frequency with which events like E will occur if H is true, or (c) the fact that H logically entails E. Scientists often design experiments so that likelihoods can be known in one of these "objective" ways. Bayes' Theorem then ensures that any dispute about the significance of the experimental results can be traced to "subjective" disagreements about the unconditional probabilities of H and E.
When both PH(E) and P~H(E) are known an experimenter need not even know E's probability to determine a value for PE(H) using Bayes' Theorem.
(1.3) Bayes' Theorem (2nd form). PE(H) = P(H)PH(E) / [P(H)PH(E) + P(~H)P~H(E)]
In this guise Bayes' theorem is particularly useful for inferring causes from their effects since it is often fairly easy to discern the probability of an effect given the presence or absence of a putative cause. For instance, physicians often screen for diseases of known prevalence using diagnostic tests of recognized sensitivity and specificity. The sensitivity of a test, its "true positive" rate, is the fraction of times that patients with the disease test positive for it. The test's specificity, its "true negative" rate, is the proportion of healthy patients who test negative. If we let H be the event of a given patient having the disease, and E be the event of her testing positive for it, then the test's specificity and sensitivity are given by the likelihoods PH(E) and P~H(~E), respectively, and the "baseline" prevalence of the disease in the population is P(H). Given these inputs about the effects of the disease on the outcome of the test, one can use (1.3) to determine the probability of disease given a positive test. For a more detailed illustration of this process, see Example 1 in the Supplementary Document "Examples, Tables, and Proof Sketches".
2. Special Forms of Bayes' Theorem
Bayes' Theorem can be expressed in a variety of forms that are useful for different purposes. One version employs what Rudolf Carnap called the relevance quotient or probability ratio (Carnap 1962, 466). This is the factor PR(H, E) = PE(H)/P(H) by which H's unconditional probability must be multiplied to get its probability conditional on E. Bayes' Theorem is equivalent to a simple symmetry principle for probability ratios.
(1.4) Probability Ratio Rule. PR(H, E) = PR(E, H)
The term on the right provides one measure of the degree to which H predicts E. If we think of P(E) as expressing the "baseline" predictability of E given the background information codified in P, and of PH(E) as E's predictability when H is added to this background, then PR(E, H) captures the degree to which knowing H makes E more or less predictable relative to the baseline: PR(E, H) = 0 means that H categorically predicts ~E; PR(E, H) = 1 means that adding H does not alter the baseline prediction at all; PR(E, H) = 1/P(E) means that H categorically predicts E. Since P(E)) = PT(E)) where T is any truth of logic, we can think of (1.4) as telling us that
The probability of a hypothesis conditional on a body of data is equal to the unconditional probability of the hypothesis multiplied by the degree to which the hypothesis surpasses a tautology as a predictor of the data.
In our J. Doe example, PR(H, E) is obtained by comparing the predictability of senior status given that J. Doe died in 2000 to its predictability given no information whatever about his or her mortality. Dividing the former "prediction term" by the latter yields PR(H, E) = PH(E)/P(E) = 0.57/0.06036 = 9.44. Thus, as a predictor of senior status in 2000, knowing that J. Doe died is more than nine times better than not knowing whether she lived or died.
Another useful form of Bayes' Theorem is the Odds Rule. In the jargon of bookies, the "odds" of a hypothesis is its probability divided by the probability of its negation: O(H) = P(H)/P(~H). So, for example, a racehorse whose odds of winning a particular race are 7-to-5 has a 7/12 chance of winning and a 5/12 chance of losing. To understand the difference between odds and probabilities it helps to think of probabilities as fractions of the distance between the probability of a contradiction and that of a tautology, so that P(H) = p means that H is p times as likely to be true as a tautology. In contrast, writing O(H) = [P(H) − P(F)]/[P(T) − P(H)] (where F is some logical contradiction) makes it clear that O(H) expresses this same quantity as the ratio of the amount by which H's probability exceeds that of a contradiction to the amount by which it is exceeded by that of a tautology. Thus, the difference between "probability talk" and "odds talk" corresponds to the difference between saying "we are two thirds of the way there" and saying "we have gone twice as far as we have yet to go."
The analogue of the probability ratio is the odds ratioOR(H, E) = OE(H)/O(H), the factor by which H's unconditional odds must be multiplied to obtain its odds conditional on E. Bayes' Theorem is equivalent to the following fact about odds ratios:
(1.5) Odds Ratio Rule. OR(H, E) = PH(E)/P~H(E)
Notice the similarity between (1.4) and (1.5). While each employs a different way of expressing probabilities, each shows how its expression for H's probability conditional on E can be obtained by multiplying its expression for H's unconditional probability by a factor involving inverse probabilities.
The quantity LR(H, E) = PH(E)/P~H(E) that appears in (1.5) is the likelihood ratio of H given E. In testing situations like the one described in Example 1, the likelihood ratio is the test's true positive rate divided by its false positive rate: LR = sensitivity/(1 − specificity). As with the probability ratio, we can construe the likelihood ratio as a measure of the degree to which H predicts E. Instead of comparing E's probability given H with its unconditional probability, however, we now compare it with its probability conditional on ~H. LR(H, E) is thus the degree to which the hypothesis surpasses its negation as a predictor of the data. Once more, Bayes' Theorem tells us how to factor conditional probabilities into unconditional probabilities and measures of predictive power.
The odds of a hypothesis conditional on a body of data is equal to the unconditional odds of the hypothesis multiplied by the degree to which it surpasses its negation as a predictor of the data.
In our running J. Doe example, LR(H, E) is obtained by comparing the predictability of senior status given that J. Doe died in 2000 to its predictability given that he or she lived out the year. Dividing the former "prediction term" by the latter yields LR(H, E) = PH(E)/P~H(E) = 0.57/0.056 = 10.12. Thus, as a predictor of senior status in 2000, knowing that J. Doe died is more than ten times better than knowing that he or she lived.
The similarities between the "probability ratio" and "odds ratio" versions of Bayes' Theorem can be developed further if we express H's probability as a multiple of the probability of some other hypothesis H* using the relative probability function B(H, H*) = P(H)/P(H*). It should be clear that B generalizes both P and O since P(H) = B(H, T) and O(H) = B(H, ~H). By comparing the conditional and unconditional values of B we obtain the Bayes' Factor:
BR(H, H*; E) = BE(H, H*)/B(H, H*) = [PE(H)/PE(H*)]/ [P(H)/P(H*)].
We can also generalize the likelihood ratio by setting LR(H, H*; E) = PH(E)/PH*(E). This compares E's predictability on the basis of H with its predictability on the basis of H*. We can use these two quantities to formulate an even more general form of Bayes' Theorem.
(1.6) Bayes' Theorem (General Form) BR(H, H*; E) = LR(H, H*; E)
The message of (1.6) is this:
The ratio of probabilities for two hypotheses conditional on a body of data is equal to the ratio their unconditional probabilities multiplied by the degree to which the first hypothesis surpasses the second as a predictor of the data.
The various versions of Bayes' Theorem differ only with respect to the functions used to express unconditional probabilities (P(H), O(H), B(H)) and in the likelihood term used to represent predictive power (PR(E, H), LR(H, E), LR(H, H*; E)). In each case, though, the underlying message is the same:
conditional probability = unconditional probability × predictive power
(1.2) – (1.6) are multiplicative forms of Bayes' Theorem that use division to compare the disparities between unconditional and conditional probabilities. Sometimes these comparisons are best expressed additively by replacing ratios with differences. The following table gives the additive analogue of each ratio measure.
|Probability Ratio |
PR(H, E) = PE(H)/P(H)
PD(H, E) = PE(H) − P(H)
|Odds Ratio |
OR(H, E) = OE(H)/O(H)
|Odds Difference |
OD(H, E) = OE(H) − O(H)
|Bayes' Factor |
BR(H, H*; E) = BE(H, H*)/B(H, H*)
|Bayes' Difference |
BD(H, H*; E) = BE(H, H*) − B(H, H*)
We can use Bayes' theorem to obtain additive analogues of (1.4) – (1.6), which are here displayed along with their multiplicative counterparts:
|(1.4)||PR(H, E) = PR(E, H) = PH(E)/P(E)||PD(H, E) = P(H) [PR(E, H) − 1]|
|(1.5)||OR(H, E) = LR(H, E) = PH(E)/P~H(E)||OD(H, E) = O(H) [OR(H, E) − 1]|
|(1.6)||BR(H, H*; E) = LR(H, H*; E) = PH(E)/PH*(E)||BD(H, H*; E) = B(H, H*) [BR(H, H*; E) − 1]|
Notice how each additive measure is obtained by multiplying H's unconditional probability, expressed on the relevant scale, P, O or B, by the associated multiplicative measure diminished by 1.
While the results of this section are useful to anyone who employs the probability calculus, they have a special relevance for subjectivist or "Bayesian" approaches to statistics, epistemology, and inductive inference. Subjectivists lean heavily on conditional probabilities in their theory of evidential support and their account of empirical learning. Given that Bayes' Theorem is the single most important fact about conditional probabilities, it is not at all surprising that it should figure prominently in subjectivist methodology.
3. The Role of Bayes' Theorem in Subjectivist Accounts of Evidence
Subjectivists maintain that beliefs come in varying gradations of strength, and that an ideally rational person's graded beliefs can be represented by a subjective probability functionP. For each hypothesis H about which the person has a firm opinion, P(H) measures her level of confidence (or "degree of belief") in H's truth. Conditional beliefs are represented by conditional probabilities, so that PE(H) measures the person's confidence in H on the supposition that E is a fact.
One of the most influential features of the subjectivist program is its account of evidential support. The guiding ideas of this Bayesian confirmation theory are these:
- Confirmational Relativity. Evidential relationships must be relativized to individuals and their degrees of belief.
- Evidence Proportionism. A rational believer will proportion her confidence in a hypothesis H to her total evidence for H, so that her subjective probability for H reflects the overall balance of her reasons for or against its truth.
- Incremental Confirmation. A body of data provides incremental evidence for H to the extent that conditioning on the data raises H's probability.
The first principle says that statements about evidentiary relationships always make implicit reference to people and their degrees of belief, so that, e.g., "E is evidence for H" should really be read as "E is evidence for H relative to the information encoded in the subjective probability P".
According to evidence proportionism, a subject's level of confidence in H should vary directly with the strength of her evidence in favor of H's truth. Likewise, her level of confidence in H conditional on E should vary directly with the strength of her evidence for H's truth when this evidence is augmented by the supposition of E. It is a matter of some delicacy to say precisely what constitutes a person's evidence, and to explain how her beliefs should be "proportioned" to it. Nevertheless, the idea that incremental evidence is reflected in disparities between conditional and unconditional probabilities only makes sense if differences in subjective probability mirror differences in total evidence.
An item of data provides a subject with incremental evidence for or against a hypothesis to the extent that receiving the data increases or decreases her total evidence for the truth of the hypothesis. When probabilities measure total evidence, the increment of evidence that E provides for H is a matter of the disparity between PE(H) and P(H). When odds are used it is a matter of the disparity between OE(H) and O(H). See Example 2 in the supplementary document "Examples, Tables, and Proof Sketches", which illustrates the difference between total and incremental evidence, and explains the "baserate fallacy" that can result from failing to properly distinguish the two.
It will be useful to distinguish two subsidiary concepts related to total evidence.
- The net evidence in favor of H is the degree to which a subject's total evidence in favor of H exceeds her total evidence in favor of ~H.
- The balance of total evidence for H over H* is the degree to which a subject's total evidence in favor of H exceeds her total evidence in favor of H*.
The precise content of these notions will depend on how total evidence is understood and measured, and on how disparities in total evidence are characterized. For example, if total evidence is given in terms of probabilities and disparities are treated as ratios, then the net evidence for H is P(H)/P(~H). If total evidence is expressed in terms of odds and differences are used to express disparities, then the net evidence for H will be O(H) − O(~H). Readers may consult Table 3 (in the supplementary document) for a complete list of the possibilities.
As these remarks make clear, one can interpret O(H) either as a measure of net evidence or as a measure of total evidence. To see the difference, imagine that 750 red balls and 250 black balls have been drawn at random and with replacement from an urn known to contain 10,000 red or black balls. Assuming that this is our only evidence about the urn's contents, it is reasonable to set P(Red) = 0.75 and P(~Red) = 0.25. On a probability-as-total-evidence reading, these assignments reflect both the fact that we have a great deal of evidence in favor of Red (namely, that 750 of 1,000 draws were red) and the fact that we have also have some evidence against it (namely, that 250 of the draws were black). The net evidence for Red is then the disparity between our total evidence for Red and our total evidence against Red. This can be expressed multiplicatively by saying that we have seen three times as many red draws as black draws, which is just to say that O(Red) = 3. Alternatively, we can use O(Red) as a measure of the total evidence by taking our evidence for Red to be the ratio of red to black draws, rather than the total number of red draws, and our evidence for ~Red to be the ratio of black balls to red balls, rather than the total number of black draws. While the decision whether to use O as a measure total or net evidence makes little difference to questions about the absolute amount of total evidence for a hypothesis (since O(H) is an increasing function of P(H)), it can make a major difference when one is considering the incremental changes in total evidence brought about by conditioning on new information.
Philosophers interested in characterizing correct patterns of inductive reasoning and in providing "rational reconstructions" of scientific methodology have tended to focus on incremental evidence as crucial to their enterprise. When scientists (or ordinary folk) say that E supports or confirms H what they generally mean is that learning of E's truth will increase the total amount of evidence for H's truth. Since subjectivists characterize total evidence in terms of subjective probabilities or odds, they analyze incremental evidence in terms of changes in these quantities. On such views, the simplest way to characterize the strength of incremental evidence is by making ordinal comparisons of conditional and unconditional probabilities or odds.
(2.1) A Comparative Account of Incremental Evidence. Relative to a subjective probability function P,
- E incrementally confirms (disconfirms, is irrelevant to) H if and only if PE(H) is greater than (less than, equal to) P(H).
- H receives a greater increment (or lesser decrement) of evidential support from E than from E* if and only if PE(H) exceeds PE*(H).
Both these equivalences continue to hold with probabilities replaced by odds. So, this part of the subjectivist theory of evidence does not depend on how total evidence is measured.
Bayes' Theorem helps to illuminate the content of (2.1) by making it clear that E's status as incremental evidence for H is enhanced to the extent that H predicts E. This observation serves as the basis for the following conclusions about incremental confirmation (which hold so long as 1 > P(H), P(E) > 0).
(2.1a) If E incrementally confirms H, then H incrementally confirms E. (2.1b) If E incrementally confirms H, then E incrementally disconfirms ~H. (2.1c) If H entails E, then E incrementally confirms H. (2.1d) If PH(E) = PH(E*), then H receives more incremental support from E than from E* if and only if E is unconditionally less probable than E*. (2.1e) Weak Likelihood Principle. E provides incremental evidence for H if and only if PH(E) > P~H(E). More generally, if PH(E) > PH*(E) and P~H(~E) ≥ P~H*(~E), then E provides more incremental evidence for H than for H*.
(2.1a) tells us that incremental confirmation is a matter of mutual reinforcement: a person who sees E as evidence for H invests more confidence in the possibility that both propositions are true than in either possibility in which only one obtains.
(2.1b) says that relevant evidence must be capable of discriminating between the truth and falsity of the hypothesis under test.
(2.1c) provides a subjectivist rationale for the hypothetico-deductive model of confirmation. According to this model, hypotheses are incrementally confirmed by any evidence they entail. While subjectivists reject the idea that evidentiary relations can be characterized in a belief-independent manner — Bayesian confirmation is always relativized to a person and her subjective probabilities — they seek to preserve the basic insight of the H-D model by pointing out that hypotheses are incrementally supported by evidence they entail for anyone who has not already made up her mind about the hypothesis or the evidence. More precisely, if H entails E, then PE(H) = P(H)/P(E), which exceeds P(H) whenever 1 > P(E), P(H) > 0. This explains why scientists so often seek to design experiments that fit the H-D paradigm. Even when evidentiary relations are relativized to subjective probabilities, experiments in which the hypothesis under test entails the data will be regarded as evidentially relevant by anyone who has not yet made up his mind about the hypothesis or the data. The degree of incremental confirmation will vary among people depending on their prior levels of confidence in H and E , but everyone will agree that the data incrementally supports the hypothesis to at least some degree.
Subjectivists invoke (2.1d) to explain why scientists so often regard improbable or surprising evidence as having more confirmatory potential than evidence that is antecedently known. While it is not true in general that improbable evidence has more confirming potential, it is true that E's incremental confirming power relative to H varies inversely with E's unconditional probability when the value of the inverse probabilityPH(E) is held fixed. If H entails both E and E*, say, then Bayes' Theorem entails that the least probable of the two supports H more strongly. For example, even if heart attacks are invariably accompanied by severe chest pain and shortness of breath, the former symptom is far better evidence for a heart attack than the latter simply because severe chest pain is so much less common than shortness of breath.
(2.1e) captures one core message of Bayes' Theorem for theories of confirmation. Let's say that H is uniformly better than H* as predictor of E's truth-value when (a) H predicts E more strongly than H* does, and (b) ~H predicts ~E more strongly than ~H* does. According to the weak likelihood principle, hypotheses that are uniformly better predictors of the data are better supported by the data. For example, the fact that little Johnny is a Christian is better evidence for thinking that his parents are Christian than for thinking that they are Hindu because (a) a far higher proportion of Christian parents than Hindu have Christian children, and (b) a far higher proportion of non-Christian parents than non-Hindu parents have non-Christian children.
Bayes' Theorem can also be used as the basis for developing and evaluating quantitative measures of evidential support. The results listed in Table 2 entail that all four of the functions PR, OR, PD and OD agree with one another on the simplest question of confirmation: Does E provide incremental evidence for H?
(2.2) Corollary. Each of the following is equivalent to the assertion that E provides incremental evidence in favor of H: PR(H, E) > 1, OR(H, E) > 1, PD(H, E) > 0, OD(H, E) > 0.
Thus, all four measures agree with the comparative account of incremental evidence given in (2.1).
Given all this agreement it should not be surprising that PR(H, E), OR(H, E) and PD(H, E), have all been proposed as measures of the degree of incremental support that E provides for H. While OD(H, E) has not been suggested for this purpose, we will consider it for reasons of symmetry. Some authors maintain that one or another of these functions is the unique correct measure of incremental evidence; others think it best to use a variety of measures that capture different evidential relationships. While this is not the place to adjudicate these issues, we can look to Bayes' Theorem for help in understanding what the various functions measure and in characterizing the formal relationships among them.
All four measures agree in their conclusions about the comparative amount of incremental evidence that different items of data provide for a fixed hypothesis. In particular, they agree ordinally about the following concepts derived from incremental evidence:
- The effective increment of evidence that E provides for H is the amount by which the incremental evidence that E provides for H exceeds the incremental evidence that ~E provides for H.
- The differential in the incremental evidence that E and E* provide for H is the amount by which the incremental evidence that E provides for H exceeds the incremental evidence that E* provides for H.
Effective evidence is a matter of the degree to which a person's total evidence for H depends on her opinion about E. When PE(H) and P~E(H) (or OE(H) and O~E(H)) are far apart the person's belief about E has a great effect on her belief about H: from her point of view, a great deal hangs on E's truth-value when it comes to questions about H's truth-value. A large differential in incremental evidence between E and E* tells us that learning E increases the subject's total evidence for H by a larger amount than learning E* does. Readers may consult Table 4 (in the supplement) for quantitative measures of effective and differential evidence.
The second clause of (2.1) tells us that E provides more incremental evidence than E* does for H just in case the probability of H conditional on E exceeds the probability of H conditional on E*. It is then a simple step to show that all four measures of incremental support agree ordinally on questions of effective evidence and of differentials in incremental evidence.
(2.3) Corollary. For any H, E* and E with positive probability, the following are equivalent:
- E provides more incremental evidence than E* does for H
- PR(H, E) > PR(H, E*)
- OR(H, E) > OR(H, E*)
- PD(H, E) > PD(H, E*)
- OD(H, E) > OD(H, E*)
The four measures of incremental support can disagree over the comparative degree to which a single item of data incrementally confirms two distinct hypotheses. Example 3, Example 4, and Example 5 (in the supplement) show the various ways in which this can happen.
All the differences between the measures have ultimately to do with (a) whether the total evidence in favor of a hypothesis should be measured in terms of probabilities or in terms of odds, and (b) whether disparities in total evidence are best captured as ratios or as differences. Rows in the following table correspond to different measures of total evidence. Columns correspond to different ways of treating disparities.
Similar tables can be constructed for measures of net evidence and measures of balances in total evidence. See Table 5A in the supplement.
We can use the various forms of Bayes' Theorem to clarify the similarities and differences among these measures by rewriting each of them in terms of likelihood ratios.
This table shows that there are two differences between each multiplicative measure and its additive counterpart. First, the likelihood term that appears in a given multiplicative measure is diminished by 1 in its associated additive measure. Second, in each additive measure the diminished likelihood term is multiplied by an expression for H's probability: P(H) or O(H), as the case may be. The first difference marks no distinction; it is due solely to the fact that the multiplicative and additive measures employ a different zero point from which to measure evidence. If we settle on the point of probabilistic independence PE(H) = P(H) as a natural common zero, and so subtract 1 from each multiplicative measure, then equivalent likelihood terms appear in both columns.
The real difference between the measures in a given row concerns the effect of unconditional probabilities on relations of incremental confirmation. Down the right column, the degree to which E provides incremental evidence for H is directly proportional to H's probability expressed in units of P(T) or P(~H). In the left column, H's probability makes no difference to the amount of incremental evidence that E provides for H once PH(E) and either P(E) or P~H(E) are fixed. In light of Bayes' Theorem, then, the difference between the ratio measures and then difference measures boils down to one question:
Does a given piece of data provide a greater increment of evidential support for a more probable hypothesis than it does for a less probable hypothesis when both hypotheses predict the data equally well?
The difference measures answer yes, the ratio measures answer no.
Bayes' Theorem can also help us understand the difference between rows. The measures within a given row agree about the role of predictability in incremental confirmation. In the top row the incremental evidence that E provides for H increases linearly with PH(E)/P(E), whereas in the bottom row it increases linearly with PH(E)/P~H(E). Thus, when probabilities measure total evidence what matters is the degree to which H exceeds T as a predictor of E, but when odds measure total evidence it is the degree to which H exceeds ~H as a predictor of E that matters.
The central issue here concerns the status of the likelihood ratio. While everyone agrees that it should play a leading role in any quantitative theory of evidence, there are conflicting views about precisely what evidential relationship it captures. There are three possible interpretations.
|Probability as total evidence reading|
|Odds as total evidence reading|
On the first reading there is no conflict whatsoever between using probability ratios and using likelihood ratios to measure evidence. Once we get clear on the distinctions between total evidence, net evidence and the balance of evidence, we see that each of PR(H, E), LR(H, E) and LR(H, H*; E) measures an important evidential relationship, but that the relationships they measure are importantly different.
When odds measure total evidence neither PR(H, E) nor LR(H, H*; E) plays a fundamental role in the theory of evidence. Changes in the probability ratio for H given E only indicate changes in incremental evidence in the presence of information about changes in the probability ratio for ~H given E. Likewise, changes in the likelihood ratio for H and H* given E only indicate changes in the balance of evidence in light of information about changes in the likelihood ratio for ~H and ~H* given E. Thus, while each of the two functions can figure as one component in a meaningful measure of confirmation, neither tells us anything about incremental evidence when taken by itself.
The third view, "likelihoodism," is popular among non-Bayesian statisticians. Its proponents deny evidence proportionism. They maintain that a person's subjective probability for a hypothesis merely reflects her degree of uncertainty about its truth; it need not be tied in any way to the amount of evidence she has in its favor. It is likelihood ratios, not subjective probabilities, which capture the "scientifically meaningful" evidential relations. Here are two classic statements of the position.
All the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of the hypotheses on the data. (Edwards 1972, 30)
The ‘evidential meaning’ of experimental results is characterized fully by the likelihood function… Reports of experimental results in scientific journals should in principle be descriptions of likelihood functions. (Brinbaum 1962, 272)
On this view, everything that can be said about the evidential import of E for H is embodied in the following generalization of the weak likelihood principle:
The "Law of Likelihood". If H implies that the probability of E is x, while H* implies that the probability of E is x*, then E is evidence supporting H over H* if and only if x exceeds x*, and the likelihood ratio, x/x*, measures the strength of this support. (Hacking 1965, 106-109), (Royall 1997, 3)
The biostatistician Richard Royall is a particularly lucid defender of likelihoodism (Royall 1997). He maintains that any scientifically respectable concept of evidence must analyze the evidential impact of E on H solely in terms of likelihoods; it should not advert to anyone's unconditional probabilities for E or H. This is supposed to be because likelihoods are both better known and more objective than unconditional probabilities. Royall argues strenuously against the idea that incremental evidence can be measured in terms of the disparity between unconditional and conditional probabilities. Here is the gist of his complaint:
Whereas [LR(H, H*; E)] measures the support for one hypothesis H relative to a specific alternative H*, without regard either to the prior probabilities of the two hypotheses or to what other hypotheses might also be considered, the law of changing probability [as measured by PR(H, E)] measures support for H relative to a specific prior distribution over H and its alternatives... The law of changing probability is of limited usefulness in scientific discourse because of its dependence on the prior probability distribution, which is generally unknown and/or personal. Although you and I agree (on the basis of the law of likelihood) that given evidence supports H over H*, and H** over both H and H*, we might disagree about whether it is evidence supporting H (on the basis of the law of changing probability) purely on the basis of our different judgments of the priori probability of H, H*, and H**. (Royall 1997, 10-11, with slight changes in notation)
Royall's point is that neither the probability ratio nor probability difference will capture the sort of objective evidence required by science because their values depend on the "subjective" terms P(E) and P(H), and not just on the "objective" likelihoods PH(E) and P~H(E).
Whether one agrees with this assessment will be a matter of philosophical temperament, in particular of one's willingness to tolerate subjective probabilities in one's account of evidential relations. It will also depend crucially on the extent to which one is convinced that likelihoods are better known and more objective than ordinary subjective probabilities. Cases like the one envisioned in the law of likelihood, where hypotheses deductively entails a definite probability for the data, are relatively rare. So, unless one is willing to adopt a theory of evidence with a very restricted range of application, a great deal will turn on how easy it is to determine objective likelihoods in situations where the predictive connection from hypothesis to data is itself the result of inductive inferences. However one comes down on these issues, though, there is no denying that likelihood ratios will play a central role in any probabilistic account of evidence.
In fact, the weak likelihood principle (2.1e) encapsulates a minimal form of Bayesianism to which all parties can agree. This is clearest when it is restated in terms of likelihoods.
(2.1e) The Weak Likelihood Principle. (expressed in terms of likelihood ratios) If LR(H, H*; E) ≥ 1 and LR(~H, ~H*; ~E) ≥ 1, with one inequality strict, then E provides more incremental evidence for H than for H* and ~E provides more incremental evidence for ~H than for ~H*.
Likelihoodists will endorse (2.1e) because the relationships described in its antecedent depend only on inverse probabilities. Proponents of both the "probability" and "odds" interpretations of total evidence will accept (2.1e) because satisfaction of its antecedent ensures that conditioning on E increases H's probability and its odds strictly more than those of H*. Indeed, the weak likelihood principle must be an integral part of any account of evidential relevance that deserves the title "Bayesian". To deny it is to misunderstand the central message of Bayes' Theorem for questions of evidence: namely, that hypotheses are confirmed by data they predict. As we shall see in the next section, this "minimal" form of Bayesianism figures importantly into subjectivist models of learning from experience.
4. The Role of Bayes' Theorem in Subjectivist Models of Learning
Subjectivists think of learning as a process of belief revision in which a "prior" subjective probability P is replaced by a "posterior" probability Q that incorporates newly acquired information. This process proceeds in two stages. First, some of the subject's probabilities are directly altered by experience, intuition, memory, or some other non-inferential learning process. Second, the subject "updates" the rest of her opinions to bring them into line with her newly acquired knowledge.
Many subjectivists are content to regard the initial belief changes as sui generis and independent of the believer's prior state of opinion. However, as long as the first phase of the learning process is understood to be non-inferential, subjectivism can be made compatible with an "externalist" epistemology that allows for criticism of belief changes in terms the reliability of the causal processes that generate them. It can even accommodate the thought that the direct effect of experience might depend causally on the believer's prior probability.
Subjectivists have studied the second, inferential phase of the learning process in great detail. Here immediate belief changes are seen as imposing constraints of the form "the posterior probability Q has such-and-such properties." The objective is to discover what sorts of constraints experience tends to impose, and to explain how the person's prior opinions can be used to justify the choice of a posterior probability from among the many that might satisfy a given constraint. Subjectivists approach the latter problem by assuming that the agent is justified in adopting whatever eligible posterior departs minimally from her prior opinions. This is a kind of "no jumping to conclusions" requirement. We explain it here as a natural result of the idea that rational learners should proportion their beliefs to the strength of the evidence they acquire.
The simplest learning experiences are those in which the learner becomes certain of the truth of some proposition E about which she was previously uncertain. Here the constraint is that all hypotheses inconsistent with E must be assigned probability zero. Subjectivists model this sort of learning as simple conditioning, the process in which the prior probability of each proposition H is replaced by a posterior that coincides with the prior probability of H conditional on E.
(3.1) Simple Conditioning If a person with a "prior" such that 0 < P(E) < 1 has a learning experience whose sole immediate effect is to raise her subjective probability for E to 1, then her post-learning "posterior" for any proposition H should be Q(H) = PE(H).
In short, a rational believer who learns for certain that E is true should factor this information into her doxastic system by conditioning on it.
Though useful as an ideal, simple conditioning is not widely applicable because it requires the learner to become absolutely certain of E's truth. As Richard Jeffrey has argued (Jeffrey 1987), the evidence we receive is often too vague or ambiguous to justify such "dogmatism." On more realistic models, the direct effect of a learning experience will be to alter the subjective probability of some proposition without raising it to 1 or lowering it to 0. Experiences of this sort are appropriately modeled by what has come to be called Jeffrey conditioning (though Jeffrey's preferred term is "probability kinematics").
(3.2) Jeffrey Conditioning If a person with a prior such that 0 < P(E) < 1 has a learning experience whose sole immediate effect is to change her subjective probability for E to q, then her post-learning posterior for any H should be Q(H) = qPE(H) + (1 − q)P~E(H).
Obviously, Jeffrey conditioning reduces to simple conditioning when q = 1.
A variety of arguments for conditioning (simple or Jeffrey-style) can be found in the literature, but we cannot consider them here. There is, however, one sort of justification in which Bayes' Theorem figures prominently. It exploits connections between belief revision and the notion of incremental evidence to show that conditioning is the only belief revision rule that allows learners to correctly proportion their posterior beliefs to the new evidence they receive.
The key to the argument lies in marrying the "minimal" version of Bayesian expressed in the (2.1e) to a very modest "proportioning" requirement for belief revision rules.
(3.3) The Weak Evidence Principle If, relative to a prior P, E provides at least as much incremental evidence for H as for H*, and if H is antecedently more probable than H*, then H should remain more probable than H* after any learning experience whose sole immediate effect is to increase the probability of E.
This requires an agent to retain his views about the relative probability of two hypotheses when he acquires evidence that supports the more probable hypothesis more strongly. It rules out obviously irrational belief revisions such as this: George is more confident that the New York Yankees will win the American League Pennant than he is that the Boston Rex Sox will win it, but he reverses himself when he learns (only) that the Yankees beat the Red Sox in last night's game.
Combining (3.3) with minimal Bayesianism yields the following:
(3.4) Consequence If a person's prior is such that LR(H, H*; E) ≥ 1, LR(~H, ~H*; ~E) ≥ 1, and P(H) > P(H*), then any learning experience whose sole immediate effect is to raise her subjective probability for E should result in a posterior such that Q(H) > Q(H*).
On the reasonable assumption that Q is defined on the same set of propositions over which P is defined, this condition suffices to pick out simple conditioning as the unique correct method of belief revision for learning experiences that make E certain. It picks out Jeffrey conditioning as the unique correct method when learning merely alters one's subjective probability for E. The argument for these conclusions makes use of the following two facts about probabilities.
(3.5) Lemma If H and H* both entail E when P(H) > P(H*), then LR(H, H*; E) = 1
and LR(~H, ~H*; ~E) > 1.
(3.6) Lemma Simple conditioning on E is the only rule for revising subjective probabilities that yields a posterior with the following properties for any prior such that P(E) > 0:
- Q(E) = 1.
- Ordinal Similarity. If H and H* both entail E, then P(H) ≥ P(H*) if and
only if Q(H) ≥ Q(H*).
From here the argument for simple conditioning is a matter of using (3.4) and (3.5) to establish ordinal similarity. Suppose that H and H* entail E and that P(H) > P(H*). It follows from (3.5) that LR(H, H*; E) = 1 and LR(~H, ~H*; ~E) > 1. (3.4) then entails that any learning experience that raises E's probability must result in a posterior with Q(H) > Q(H*). Thus, Q and P are ordinally similar with respect to hypotheses that entail H. If we go on to suppose that the learning experience raises E's probability to 1, then (3.6) then guarantees that Q arises from P by simple conditioning on E.
The case for Jeffrey conditioning is similarly direct. Since the argument for ordinal similarity did not depend at all on the assumption that Q(E) = 1, we have really established
(3.7) Corollary • If H and H* entail E, then P(H) > P(H*) if and only if Q(H) > Q(H*). • If H and H* entail ~E, then P(H) > P(H*) if and only if Q(H) > Q(H*).
So, Q is ordinally similar to P both when restricted to hypotheses that entail E and when restricted to hypotheses than entail ~E. Moreover, since dividing by positive numbers does not disturb ordinal relationships, it also follows that that QE is ordinally similar to P when restricted to hypotheses that entail E, and that Q~E is ordinally similar to P when restricted to hypotheses than entail ~E. Since QE(E) = 1 = Q~E(E), (3.6) then entails:
(3.8) Consequence For every proposition H, QE(H) = PE(H) and Q~E(H) = P~E(H)
It is easy to show that (3.8) is necessary and sufficient for Q to arise from P by Jeffrey conditioning on E. Subject to the constraint Q(E) = q, it guarantees that Q(H) = qPE(H) + (1 −q)P~E(H).
The general moral is clear.
The basic Bayesian insight embodied in the weak likelihood principle (2.1e) entails that simple and Jeffrey conditioning on E are the only rational ways to revise beliefs in response to a learning experience whose sole immediate effect is to alter E's probability.
While much more can be said about simple conditioning, Jeffrey conditioning and other forms of belief revision, these remarks should give the reader a sense of the importance of Bayes' Theorem in subjectivist accounts of learning and evidential support. Though a mathematical triviality, the Theorem's central insight — that a hypothesis is supported by any body of data it renders probable — lies at the heart of all subjectivist approaches to epistemology, statistics, and inductive logic.
- Armendt, B. 1980. "Is There a Dutch Book Argument for Probability Kinematics?", Philosophy of Science47, 583-588.
- Bayes, T. 1764. "An Essay Toward Solving a Problem in the Doctrine of Chances", Philosophical Transactions of the Royal Society of London53, 370-418. [Fascimile available online: the original essay with an introduction by his friend Richard Price]
- Birnbaum A. 1962. "On the Foundations of Statistical Inference", Journal of the American Statistical Association53, 259-326.
- Carnap, R. 1962. Logical Foundations of Probability, 2nd edition. Chicago: University of Chicago Press.
- Chihara, C. 1987. "Some Problems for Bayesian Confirmation Theory", British Journal for the Philosophy of Science38, 551-560.
- Christensen, D. 1999. "Measuring Evidence", Journal of Philosophy96, 437-61.
- Dale, A. I. 1989. "Thomas Bayes: A Memorial", The Mathematical Intelligencer11, 18-19.
- ----- 1999. A History of Inverse Probability, 2nd edition. New York: Springer-Verlag.
- Earman, J. 1992. Bayes or Bust? Cambridge, MA: MIT Press.
- Edwards, A. W. F. 1972. Likelihood. Cambridge: Cambridge University Press.
- Glymour, Clark. 1980. Theory and Evidence. Princeton: Princeton University Press.
- Hacking, Ian. 1965. Logic of Statistical Inference. Cambridge: Cambridge University Press.
- Hájek, A. 2003. "Interpretations of the Probability Calculus", in the Stanford Encyclopedia of Philosophy, (Summer 2003 Edition), Edward N. Zalta (ed.), URL = <https://plato.stanford.edu/archives/sum2003/entries/probability-interpret/>
- Hammond, P. 1994. "Elementary non-Archimedean Representations for of Probability for Decision Theory and Games," in P. Humphreys, ed., Patrick Suppes: Scientific Philosopher, vol. 1., Dordrecht: Kluwer Publishers, 25-62.
- Harper, W. 1976. "Rational Belief Change, Popper Functions and Counterfactuals," in W. Harper and C. Hooker, eds., Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, vol. I. Dordrecht: Reidel, 73-115.
- Hartigan, J. A. 1983. Bayes Theory. New York: Springer-Verlag.
- Howson, Colin. 1985. "Some Recent Objections to the Bayesian Theory of Support", British Journal for the Philosophy of Science, 36, 305-309.
- Jeffrey, R. 1987. "Alias Smith and Jones: The Testimony of the Senses", Erkenntnis26, 391-399.
- ----- 1992. Probability and the Art of Judgment. New York: Cambridge University Press.
- Joyce, J. M. 1999. The Foundations of Causal Decision Theory. New York: Cambridge University Press.
- Kahneman, D. and Tversky, A. 1973. "On the psychology of prediction", Psychological Review80, 237-251.
- Kaplan, M. 1996. Decision Theory as Philosophy. Cambridge: Cambridge University Press.
- Levi, I. 1985. "Imprecision and Indeterminacy in Probability Judgment", Philosophy of Science53, 390-409.
- Maher, P. 1996. "Subjective and Objective Confirmation", Philosophy of Science63, 149-174.
- McGee, V. 1994. "Learning the Impossible," in E. Eells and B. Skyrms, eds., Probability and Conditionals. New York: Cambridge University Press, 179-200.
- Mortimer, Halina. 1988. The logic of induction, Ellis Horwood Series in Artificial Intelligence, New York; Halsted Press.
- Nozick, R. 1981. Philosophical Explanations. Cambridge: Harvard University Press.
- Renyi, A. 1955. "On a New Axiomatic Theory of Probability", Acta Mathematica Academiae Scientiarium Hungaricae6, 285-335.
- Royall, R. 1997. Statistical Evidence: A Likelihood Paradigm. New York: Chapman & Hall/CRC.
- Skyrms, B. 1987. "Dynamic Coherence and Probability Kinematics". Philosophy of Science54, 1-20.
- Sober, E. 2002. "Bayesianism — its Scope and Limits", in Swinburne (2002), 21-38.
- Sphon, W. 1986. "The Representation of Popper Measures", Topoi5, 69-74.
- Stigler, S. M. 1982. "Thomas Bayes' Bayesian Inference", Journal of the Royal Statistical Society, series A145, 250-258.
- Swinburne, R. 2002. Bayes' Theorem. Oxford: Oxford University Press (published for the British Academy).
- Talbot, W. 2001. "Bayesian Epistemology", Stanford Encyclopedia of Philosophy (Fall 2001 Edition), Edward N. Zalta (ed.), URL = <https://plato.stanford.edu/archives/fall2001/entries/epistemology-bayesian/>
- Teller, P. 1976. "Conditionalization, Observation, and Change of Preference", in W. Harper and C.A. Hooker, eds., Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science. Dordrecht: D. Reidel.
- Williamson, T. 2000. Knowledge and its Limits. Oxford: Oxford University Press.
- Van Fraassen, B. 1999. "A New Argument for Conditionalization". Topoi18, 93-96.