MEERA Glossary
Content analysis
Content analysis is a research analysis technique where data content (e.g., speech, written text, interviews, images, etc.) is categorized and classified. Content analysis can take a qualitative approach (e.g., focusing on meanings or implications of categories) or a quantitative approach (e.g., focusing on the relative frequency with which different categories are mentioned).
Related terms: Qualitative data, categorization of responses
Control group
In evaluation research, the control group is the group of people who do not receive the service or intervention, but who are similar to those who are receiving the service or intervention (i.e., the treatment group). For example, to study the impact of an environmental education program, an evaluator may designate one fifth grade classroom that participates in the program as the treatment group, and another fifth grade classroom in the same school that does not participate in the program as the control group. The control and treatment groups may be compared to determine what impacts, if any, the program had. Control groups are needed to rule out alternate explanations for results.
Related terms: Experimental design, quasi-experimental design, treatment group
Variants: Comparison group
Convenience sample
A sample selected not because it is representative of the population, but because it is convenient for the researcher to use – such as when college professors conduct a study with their own students.
Vogt, W. P. (1993). Dictionary of Statistics and Methodology. London: Sage Publications
Criterion sample
A critical case sample is a purposive sample that selects particularly important cases that logic or prior experience suggests will allow for generalization to a larger population. A common example is the selection of key voting precincts for predicting the outcome of an election. The assumption is that what is true for these critical cases is likely to be true for most others, thus allowing such cases to be generalized.
Critical case sample
A critical case sample is a purposive sample that selects particularly important cases that logic or prior experience suggests will allow for generalization to a larger population. A common example is the selection of key voting precincts for predicting the outcome of an election. The assumption is that what is true for these critical cases is likely to be true for most others, thus allowing such cases to be generalized.
Dependent variable
A dependent variable is a factor that is predicted to change in response to another factor, called an independent variable. For example, if you are examining the effect of a professional development workshop on teachers’ intentions to teach about the environment, the workshop will be the independent variable and the teachers' intentions will be the dependent variable.
Related terms: Independent variable
Descriptive statistics
Descriptive statistics are statistic procedures that are used to summarize, organize, graph, or otherwise represent information directly from data. Examples include measures such as frequency, mean, mode, and median. One example of a descriptive statistic would be the mean grade point average for students in a study. Descriptive statistics differ from inferential statistics, which try to make conclusions that extend beyond what the data immediately show.
Related terms: Mean, standard deviation
Educational impacts
Educational impacts are outcomes of education that can contribute to meeting long-term goals.
Educational impacts include long-term effects on learners, teachers, or the learning environment.
Examples include:
- Improved academic performance overall or in specific subjects
- Higher rates of graduation, college entrance, college completion, or teacher retention
Educational impacts may also consist of enhanced conditions for learning and teaching, such as improvements in the physical learning environment (e.g., the addition of schoolyard habitats) and improved relationships between students, teachers, and the community.
Variants: Educational impact
Effect size
An effect size is a measure of the strength of a relationship between two variables. Thus, in program evaluation, effect size can be used to estimate the magnitude of a treatment on an outcome. When undertaking research, it is often helpful to know not only whether a result is statistically significant, but also what the size of the result is. In other words, effect size can help determine whether the outcome of a treatment was practically meaningful. For example, a weight loss program could either report that it leads to statistically significant weight loss or that it leads to an average weight loss of 30 pounds. In this case, 30 pounds indicates the claimed effect size – a meaningful way to communicate the impact of the program to potential customers. Two commonly reported measures of effect size include the Pearson’s r correlation and the Cohen’s d test.
Related terms: Control group, treatment group, standard deviation, power analysis
Environmental impacts
Environmental impacts are positive or negative effects that human actions have on the environment. Assessing environmental impacts is an important environmental policy process. For example, the Environmental Protection Agency requires an environmental impact assessment to be completed before approving certain types of projects (e.g., building a new highway).
Environmental education programs often seek to achieve positive environmental impacts including improvements in environmental quality or prevention of environmental harms. Environmental impacts of programs can occur as a result of changes in participants’ behavior (e.g., lifestyle changes, participation in environmental action projects, etc.). Environmental impacts can be measured on a small-scale (e.g., decreased erosion in a local stream) or a large-scale (e.g., improved water quality in the Chesapeake Bay).
Variants: Environmental impact
Evaluation design
An evaluation design consists of the methods used by the evaluation. This typically involves deciding whether the evaluation will be experimental, quasi-experimental, or non-experimental, and whether it will involve:
- Pretest, posttest, both, or multiples of both
- A control group
- Random assignment of individuals to groups
An evaluation design will also clarify if data will be collected through surveys, interviews, document collection, observation, focus groups, case studies, or other means.
Bennett, D. (1984). Evaluating Environmental Education in Schools: UNESCO-UNEP International Environmental Education Programme.
Related terms: Experiemental design, quasi-experimental design, control group, treatment group
Evaluation questions
Evaluation questions are the questions that will be answered by evaluators through data collection, analysis, and interpretation. Evaluation questions should be developed in conjunction with program stakeholders and should grow from a program’s goals and objectives. The questions should focus on aspects and outcomes of the program that are important to the stakeholders. See Step 3 for more information.
Evaluation report
There are two common types of evaluation reports, progress reports and final reports. Progress reports provide information and recommendations based on the successes and challenges of a project while it is being implemented. Final reports present findings and conclusions about the project's accomplishment of its intended goals when the project is complete. Components commonly included in an evaluation report include:
- Executive summary
- Project/program description
- Evaluation overview
- Evaluation design
- Analysis process
- Results & recommendations
For more information about evaluation reports, refer to Step 7: Report Results.
Online Evaluation Resource Library. (Retrieved August 26, 2007). Reports. http://oerl.sri.com/reports/reports.html
Experimental design research
Experimental design research is a research approach where the researcher controls the selection of participants in the study and the participants are randomly assigned to treatment and control groups. By designating treatment and control groups, the researcher manipulates a variable. For example, say a program evaluator wants to know the effect of a zoo field trip on students’ attitudes toward endangered species. To examine the effect of the zoo trip, the experimental researcher can compare attitudes of students who have and have not taken the field trip. Experimental design research can be contrasted with observational or field research, which involves observing phenomena in a natural setting without manipulating variables.
Related terms: Evaluation design, random assignment, quasi-experimental design
Extreme case sample
An extreme case sample is one in which the researcher chooses participants who are highly unusual cases – these participants may be particularly competent or exceptionally troublesome. A few examples could include studying an outstanding student who organized a successful state-wide environmental campaign, a student who became an eco-terrorist, and a student who dropped out of a year-long program after a week.
Bamberger, M., Rugh, J., & Mabry, L. (2006). Real World Evaluation: Working Under Budget, Time, Data, and Political Constraints. Thousand Oaks, CA: Sage Publications.
Variants: Deviant case sample, outlier sample
Formal education
Although there are no absolute definitions for formal education, this term generally refers to the structured educational system provided by the state for children and young adults. In most countries, the formal education system is state-supported and state-operated. In some countries, the state allows and certifies private systems which provide a comparable education. Common formal education settings include classrooms, school science laboratories, and college lecture halls. Institutions that provide formal education include preK-12 schools, colleges, and universities.
Formative evaluation
“Formative evaluation is typically conducted during the development or improvement of a program … and it is conducted, often more than once, for in-house staff of the program with the intent to improve. The reports normally remain in-house; but serious formative evaluation may be done by an internal or an external evaluator or preferably, a combination; of course, many program staff are, in an informal sense, constantly doing formative evaluation. The distinction between formative and summative evaluation has been well summed up by Bob Stake; ‘When the cook tastes the soup, that’s formative; when the guests taste the soup, that’s summative.’”
Scriven, M. (1991). Evaluation Thesaurus. Newbury Park, CA: Sage Publications.
Generalizability
In program evaluation, generalizability may be defined as the extent to which you can come to conclusions about a larger population based on information from a sample of that population. For example, when a national curriculum developer such as Project Learning Tree develops a new curriculum, it first tests the curriculum in a small number of schools that are assumed to be representative of schools around the country. If this assumption is correct, the outcomes of using the curriculum in the small number of schools can be generalized to estimate the outcomes of using the curriculum in many schools around the country.
Vogt, W. P. (2005). Dictionary of Statistics and Methodology. London: Sage Publications.
Goal
Programs should have clearly specified goals and objectives before proceeding with an evaluation. A program goal is a broad statement of what the program hopes to accomplish or what changes it expects to produce. An sample goal could be to increase students’ reducing, reusing, and recycling behaviors. Objectives are different from goals in that objectives are specific and measurable, while goals are broad statements of intended outcomes that do not have to be measurable.
Related term: Objectives
Variants: Goals
Health impacts
Health impacts are positive or negative outcomes that affect human health in a population. Some environmental education efforts focus on educating people about:
- Health problems associated with toxins and contaminants in air, water, food, land, and the built environment
- Ways to avoid contact with toxins and contaminants
- Ways to reduce or prevent the introduction of toxins and contaminants to the environment.
Consequently, human health impacts of successful environmental education programs may include declining rates of particular diseases based on improvements in environmental quality, or increased ability of individuals to protect themselves from harmful impacts of toxins and contaminants.
Variants: Health impact
Hierarchical linear modeling
Hierarchical linear modeling is an advanced form of statistical regression that allows for analysis of data at multiple levels. For example, in educational program evaluation, the evaluator is often interested in the impact of a program on students. However, the students are nested within classrooms nested within schools – and both classrooms and schools can have significant impacts on students. Hierarchical linear modeling allows researchers to account for and remove the effects of factors that are not of primary interest (e.g., the effect of being in a certain classroom) so that the impact of interest (i.e., the program) can be estimated.
Variants: HLM, multi-level analysis
Homogeneous sample
A homogeneous sample is one in which the researcher chooses participants who are alike – for example, participants who belong to the same subculture or have similar characteristics. A homogeneous sample may be chosen with respect to only a certain variable – for instance, the researcher may be interested in studying participants who work in a certain occupation, or are in a certain age group. Homogeneous sampling can be of particular use for conducting focus groups because individuals are generally more comfortable sharing their thoughts and ideas with other individuals who they perceive to be similar to them.
Patton, M. (2001). Qualitative Research & Evaluation Methods. Thousand Oaks, CA: Sage Publications.
Impact evaluation
Impact evaluation is evaluation that focuses on the benefits or payoff of a program rather than on examining program processes, delivery or implementation.
Scriven, M. (1991). Evaluation Thesaurus. Newbury Park, CA: Sage Publications.
For information on the differences between impacts and outcomes in EE, click here.
Impacts
In evaluation, impacts are the broad, ultimate changes that occur within a community, organization, society, or environment as a result of a program or activity. Impacts are typically long-term changes, sometimes affecting individuals other than the direct program participants. Desired types of environmental education program impacts can include environmental impacts (e.g., improved water quality), educational impacts (e.g., improved achievement in school), and health impacts (e.g., lower asthma rates).
Scriven, M. (1991). Evaluation Thesaurus. Newbury Park, CA: Sage Publications.
For information on the differences between impacts and outcomes in EE, click here.
Variants: Impacts
Implementation evaluation
Implementation evaluations focus on telling “decision makers what is going on in a program, how the program has developed, and how and why programs deviate from initial plans and expectations.” Rather than focusing on outcomes, implementation evaluation pays attention to inputs, activities, processes, and structures within programs.
“Implementation evaluations answer the following kinds of questions: What do clients in the program experience? What services are provided to clients? What does staff do? What is it like to be in the program? How is the program organized?”
This type of evaluation reflects the idea that in order to understand how effective a program was after it was implemented, it is important to examine the extent and ways in which the program was actually implemented. Understanding the differences between how the program was intended to be and how it was actually implemented can provide useful information for making changes to improve the program in the future.
Patton, M. (2001). Qualitative Research & Evaluation Methods. Thousand Oaks, CA: Sage Publications.
Variants: Process evaluation
Independent variable
The independent variable is the presumed cause in a study that can predict or explain the values of another variable in the study. Thus, the independent variable is the hypothesized cause of an outcome (dependent variable). For example, if you are examining the effect of a professional development workshop on teachers’ intentions to teach about the environment, participation in the workshop would be the independent variable, and teachers’ intentions would be the dependent variable.
Vogt, W. P. (2005). Dictionary of Statistics and Methodology. London: Sage Publications.
Related terms: Dependent variable
Indicator
An indicator is an observable measure of achievement, performance, or change. It provides evidence of a condition or result. Examples of indicators include test scores, number of participants, program completion rates, types of environmentally responsible behaviors observed, or participants’ confidence in their environmental problem solving skills.
Variants: Indicators
Inferential statistics
“With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one rather than one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions. In contrast, we use descriptive statistics simply to describe what's going on in our data.”
You can use inferential statistics to:
- Test hypotheses (e.g., determine if there is a statistically significant relationship between variables in your evaluation)
- Predict future outcomes
- Describe associations between variables (e.g., correlations)
- Model relationships (e.g., regression)
Research Methods Knowledge Base. (Retrieved August 26, 2007). Inferential Statistics. http://www.socialresearchmethods.net/kb/statinf.php
Informal education
Formal, nonformal, and informal education are not completely distinct. However, notwithstanding some aspects of overlap, informal education often refers to voluntary learning experiences where the learner chooses what, when, and how to learn. In informal learning settings, an educator may make opportunities and information available to the learner to use however he or she chooses. Examples of informal learning experiences could include watching a nature program on television or reading interpretive signs in a park.
Heimlich, J. (1993). Nonformal Environmental Education: Toward a Working Definition. ERIC Bulletin. May 1993.
Informed consent
In research practice, informed consent is a legal condition entered into by a person who agrees or “gives consent” to participate in a study based on learning about and understanding the facts and implications of participation. Before being asked to agree to participate, participants should be provided with information concerning the purpose of the program evaluation, what they will be asked to do as participants, and how the information they provide will be used. Individuals should also be informed that they can withdraw their consent and stop participating in the evaluation at any time. In some cases involving adults, verbal consent may be adequate, but in general, it is preferable to provide written information about the evaluation that individuals can take with them and to have people who agree to participate sign and return a written consent form that is kept by the evaluator. When minors are participating in an evaluation, consent from a parent or guardian is required and additional safeguards (e.g., written consent from school district) are usually required. For more information on this topic, refer to Evaluation Consent and the Institutional Review Board Process.
Institutional review board
Institutional review boards, or IRBs, (also known as independent ethics committees, or human ethics committees) are committees mandated by federal regulation that review and monitor plans for research and evaluations involving human subjects. It is important to keep in mind that these committees not only protect evaluation participants from unethical methods or inappropriate questions, but also protect you and your program in the event that someone does not like part of the evaluation (Washington State University, 2004). Colleges, universities, and federal agencies, as well as some state and local agencies have IRBs with the authority to review, approve, require revisions in, or disapprove plans for all research and evaluations involving human subjects conducted by personnel or students. For more information about IRBs, refer to Evaluation Consent and the Institutional Review Board Process.
Washington State University Cooperative Extension. (2004). Participant Consent. Retrieved September 2007 http://ext.wsu.edu/lifeskills/participant.htm.
Variants: IRB
Instrument
In program evaluation, an instrument is a tool that is used to gather, organize, and analyze information needed to answer the evaluation questions. Instruments can include observation forms, interview protocols, questionnaires, and performance tests.
Online Evaluation Resource Library. (Retrieved August 26, 2007). Glossary of Plan Components. http://oerl.sri.com/plans/plansgloss.html
Intermediate testing
Intermediate testing is sometimes included in program evaluation designs. For example, in addition to conducting a pretest and posttest, an evaluator may decide to include an intermediate test by collecting some data (e.g., a survey or test of knowledge) while the participants are experiencing the program. Including an intermediate test is particularly useful for longer programs, and allows the evaluator to track participants’ progress as they experience the program over time. Intermediate testing can be added to a variety of program evaluation designs. In some cases, evaluators may decide to collect data at several intermediate points during a program.
Key incident analysis
In key incident analysis, multiple examples of major events are recorded and reflected upon to draw conclusions. Key incidents can be gathered in different ways, but one common approach is to ask a participant to tell a story about an experience they have had that is relevant to the phenomenon of interest. This technique is helpful for learning from non-routine events that may challenge common understandings and beliefs.
Variants: Critical incident analysis
Knowledge
Environmental educators are typically interested in four domains of environmental knowledge including:
- Knowledge in the natural sciences, including knowledge of ecological concepts and principles
- Knowledge of environmental problems and issues associated with them
- Knowledge of environmental problem-solving and action strategies
- Knowledge of the social sciences (with emphasis given to concepts and principles that inform our understanding of human-environment interactions and environmental problems, issues and solutions)”
Marcinkowski, T. (1993). Assessment in Environmental Education. In R. J. Wilke (Ed.), Environmental Education Teacher Resource Handbook: A Practical Guide for K-12 Environmental Education (pp. 143-198). Millwood, NY: Kraus International Publications. (text abridged).
Likert scale
The Likert Scale (pronounced “lick-ert”) is a type of rating scale often used in surveys that asks respondents to indicate their level of agreement with a statement. A typical Likert scale item consists of a statement and five levels of agreement that the respondent can choose from.
Example:
Despite our special abilities humans are subject to the laws of nature.
- Strongly agree
- Agree
- Neither agree nor disagree
- Disagree
- Strongly disagree
Logic model
A logic model is a graphic representation of the linkages between program goals, resources, activities, and expected outcomes. Logic models illustrate the ways in which program inputs and activities are thought to lead to outputs and outcomes in both the short and long term. Logic models often include diagrams or pictures that illustrate these relationships. In program evaluation, logic models provide a basis for developing evaluation strategies. For more information, see Step 2.
Related terms: Outcomes, impacts, outputs
Maximum variation sample
A maximum variation sample is a purposefully selected sample of persons or settings that represent a wide range of experience related to the phenomenon of interest. With a maximum variation sample, the goal is not to build a random and generalizable sample, but rather to try to represent a range of experiences related to what one is studying. Maximum variation sampling is an emergent or sequential approach – what one learns from initial participants can inform the subsequent direction of the study.
This type of sample can also be useful in situations where a random sample cannot be drawn, and when the sample size is very small. For example, evaluators of a small residential environmental education program might handpick the most diverse students possible for their sample, including several highly successful students, several typical students, and several students who have expressed dislike or dropped out of the program.
Maykut, P, & Morehouse, R. (2000). Beginning Qualitative Research: A Philosophic and Practical Guide. London: RoutledgeFalmer.
Variants: Maximum diversity sample, maximum heterogeneity sample
Mean
The mean is a descriptive statistic that measures the central location of the data. It is calculated as an average – by finding the sum of all individual data points and dividing by the number of points. The mean is sensitive to outliers or extreme cases. In other words, having a few extreme measurements in a data set can lead to a mean score being very different from a median score. For this reason, it can be helpful to report an additional descriptive statistic such as a standard deviation with the mean. In environmental education the mean might be calculated for test scores, such as pre and posttest scores.
Mixed methods research
Mixed methods research is an approach for conducting social science research that combines different types of research. Generally, quantitative and qualitative approaches are combined in a mixed methods study. The goal of using a mixed methods approach is to achieve a more robust view of what is being studied. An example of a mixed methods program evaluation could include the use of surveys, interviews, and observations for data collection.
Variants: Multimethodology
Mixed treatment groups
When evaluators are interested in the effects of different components of a complex program, or in the effects that a program has on different groups of individuals, they can include multiple treatment groups. For example, if a nature center provides three different education programs to the public, an evaluator can consider the participants in the different programs to represent different treatment groups. Or, if the nature center is interested in the effectiveness of one program when administered to teenagers versus senior citizens, an evaluator can compare results for these two treatment groups. In general, results for treatment groups can be compared with each other, or with results for a control group, to examine the effects a program has had.
Needs assessment
A needs assessment is an analysis approach typically conducted in the initial stages of planning to determine the need for a project or program by considering aspects like resources available, extent of the problem and need to address it, and audience interest and knowledge. A needs assessment can also be conducted for writing a grant proposal in order to provide evidence to the potential funder of the need for the program.
Variants: Front-end evaluation
Non-formal education
Formal, nonformal and informal education are not completely distinct entities. However, notwithstanding some aspects of overlap, nonformal education often refers to voluntary, structured learning activities that take place outside of a formal learning setting. Workshops, seminars, service groups, zoos, tours, and nature centers are typical settings for nonformal learning in environmental education.
Heimlich, J. (1993). Nonformal Environmental Education: Toward a Working Definition. ERIC Bulletin. May 1993.
Objective
An objective is a specific and measurable result that can be reached to accomplish a particular goal. Several examples of objectives for an education program about waste reduction could include:
- Students will be able to identify three ways in which an item can be reduced, reused, and recycled
- Students will report having reused paper in the last week.
The related goal that these objectives could contribute to accomplishing could be to increase student reducing, reusing, and recycling behaviors.
Related term: Goal
Variants: Objectives
Outcome evaluation
Outcome evaluation is used to examine a program’s direct effects on specifically defined target outcomes, and to provide direction for program improvement. For example, outcome evaluation may show that an environmental education program was successful in achieving its target outcome of 90% of participants being able to explain the greenhouse effect.
For information on the differences between impacts and outcomes in EE, click here.
Outcomes
Outcomes are the likely or achieved short-term and medium-term effects of a program or intervention. In other words, outcomes are what happen as a result of the program or activities. Environmental education outcomes that are commonly measured include changes in knowledge, skills, and attitudes.
For information on the differences between impacts and outcomes in EE, click here.
Variants: Outcome
Outputs
Outputs are the products and services that are produced by a program. Output measures can be used to indicate the degree to which products and services were produced as planned. Example outputs of an environmental education program could include a teachers’ manual, a workshop series, an Earth Day event, or an environmental writing competition.
Variants: Output
Participatory evaluation
Participatory evaluation is a bottom-up approach to evaluation that is controlled either partially or fully by interested program participants, staff, board members, and community members. These participants ask the questions, plan the evaluation design, gather and analyze data, and determine actions to take based on the results.
For more information, see Participatory Evaluation.
Kellogg Foundation. (1998). W.K. Kellogg Foundation Evaluation Handbook. Retrieved June 2006 at: http://www.wkkf.org/knowledge-center/resources/2010/w-k-kellogg-foundation-evaluation-handbook.aspx
Zukowski, A. and M. Lulaquisen. (2002). Participatory Evaluation. What is it? Why do it? What are the challenges? http://depts.washington.edu/ccph/pdf_files/Evaluation.pdf
Pilot test
A pilot test is an initial trial of a program, instrument, or other activity intended to test out procedures and discover and correct potential problems before proceeding to full scale implementation. Pilot tests can be conducted either for a program (i.e., testing out the program with a small group of participants) or for an evaluation (i.e., testing out instruments and data collection procedures with a small group of people similar to program participants). When possible, a pilot test, or trial run, is conducted with a sample group that is representative of the target population. Based on the results of the pilot test, revisions and improvements can be made before wider implementation of the program, instrument, or activity.
Posttest
In program evaluation, a posttest is a test or measurement administered after services or activities have ended. Posttest results are often compared with pretest results to examine the effects of the program being evaluated.
Variants: Post-test
Power analysis
In statistics, power is defined as the probability of rejecting the null hypothesis (i.e., there is no effect), when the alternative hypothesis (i.e., there is a real effect) is true. Researchers want the power, which is also referred to as Beta, to be as large as possible because greater power provides greater ability to detect an effect.
Power analysis is a statistical tool that helps researchers evaluate if they have looked hard enough for the effect in their data. It is generally considered adequate to obtain a power of 0.8 or 0.9. That is, the researcher would have an 80% or 90% chance of finding an effect if there actually is one. Power increases as sample size increases.
For more information, see Power Analysis, Statistical Significance, & Effect Size.
Related terms: Effect size, statistical significance
Pre-test
In program evaluation, a pretest is a test or measurement administered before the program or activities begin. The results of a pretest can later be compared with the results of a posttest to show evidence of the effects of the program being evaluated.
Variants: Pre-test
Program
In the context of evaluation, a program is the activity (or set of activities) that is being evaluated. In an environmental education evaluation, the program of interest may be one or more of:
- A classroom curriculum
- A field trip
- A workshop for teachers
- Or another type of activity
Purposeful sample
Purposeful sampling is a non-random method of sampling where the researcher selects “information-rich cases for study in depth. Information-rich cases are those from which one can learn a great deal about issues of central importance to the purpose of the research, thus the term purposeful sampling. For example, if the purpose of an evaluation is to increase the effectiveness of a program in reaching lower-socioeconomic groups, one may learn a great deal more by focusing in depth on understanding the needs, interests, and incentives of a small number of carefully selected poor families than by gathering standardized information from a large, statistically significant sample.”
Types of purposeful sampling include extreme case sampling, maximum variation sampling, homogeneous sampling, critical case sampling, snowball sampling, and criterion sampling.
Patton, M. (2001). Qualitative Research & Evaluation Methods. Thousand Oaks, CA: Sage Publications
Qualitative data
Qualitative data consists of any information that is collected that is not numerical. Types of sources of qualitative data include interviews, observations, field notes, written documents, photographs, and videotapes.
Research Methods Knowledge Base. (Retrieved August 26, 2007). Qualitative Data. http://www.socialresearchmethods.net/kb/qualdata.php
Quantitative data
Quantitative data is data that is measured, analyzed, summarized, or presented in numerical form. For example, quantitative evaluation methods can be used to measure instances, participants, size, degree, extent, or frequencies. In turn, these data can be used to provide a quantitative picture what is happening through reporting statistics either verbally or graphically (i.e., using tables, charts, histograms, or graphs).
Quasi-experimental design
In evaluation research, quasi-experimental design studies are different from experimental design studies in that participants are not randomly assigned to treatment and control groups. While a quasi-experimental design typically uses comparison groups, random assignment is not used in this design because it is often impractical or impossible to do so. To minimize threats to validity in quasi-experimental design studies, researchers can try to account for pre-existing factors (e.g., pretest knowledge or attitudes) that could affect the outcome differences between treatment and comparison groups.
R-squared
In statistics, R2 tells you what proportion of the variability in the dependent variable (the outcome) is explained by an independent variable (often, the treatment) in your statistical model. In other words, R2 helps you determine how well your statistical model fits. If you have a low R2 value, this suggests that your treatment is not having a big effect on the outcome you are interested in. If you have a high R2 value, the administration of your treatment or program has done a good job explaining the outcomes that you have found.
Related terms: Regression analysis
Variants: R2, Coefficient of Determination
Random assignment
Random assignment is an experimental design technique used to assign participants to different treatment and control groups such that each individual in each group is assigned entirely by chance. The intention underlying random assignment is that the characteristics for the different groups will be roughly equivalent and therefore any effect observed between groups can be linked to a treatment effect instead of to a characteristic of individuals in a group. Researchers often use computer programs to randomly assign participants to different groups. Other techniques as simple as pulling names out of a hat can work as well.
Vogt, W. P. (2005). Dictionary of Statistics and Methodology. London: Sage Publications.
Regression analysis
Regression analysis is a statistical tool used to investigate relationships between one dependent variable and one or more independent variables. In evaluation, regression analysis can be used to explain how the variation in the outcome depends on the program treatment – or in other words, to explain the effect of the treatment on the outcome. Regression analysis can also be used to predict future effects.
Related terms: R-squared
Reliability
Reliability is the consistency or repeatability of a measurement. Thus, a test measurement would be considered reliable if the test was given twice and a person’s score on the test was the same or similar both times. For example, if Emma took a test about knowledge of climate change twice and her scores were very close both times, this would provide evidence for the reliability of the the measures used to test her knowledge.
Two types of reliability include test/retest and internal consistency. The case of Emma taking a test twice and scoring similarly both times is an example of test/retest reliability.
In contrast, internal consistency evaluates reliability by including more than one question intended to measure the same thing in a test or questionnaire. If a responder provides similar answers to all of the questions intended to measure the same thing (i.e., if the questions are highly correlated), then the measures demonstrate reliability in the form of internal consistency.
Research Methods Knowledge Base. (Retrieved August 26, 2007). Reliability and Validity: What’s the Difference? http://www.socialresearchmethods.net/kb/reliable.php
Related term: Validity
Retention test
A retention test is a test that is administered not immediately, but some time period after a program or activity has ended, to determine how well an outcome was retained. For example, a year after a composting workshop, participants could be given a retention test to determine how much they are still composting compared with how much they were composting immediately after the workshop ended.
Retrospective pre-test
A retrospective pretest is given after the program ends, rather than before the program. Typically in conjunction with the posttest, the participant is asked to think back and answer questions about their level of understanding, skill, attitudes, or behavior before the program. For example, after completing a program on energy conservation, students could be asked how frequently they turned off lights that were not in use before participating in the program.
Retrospective pretests can be useful when it is difficult to administer traditional pretests. For example, in a professional development workshop, teachers might find it cumbersome to have to complete both a pre and a posttest. However, retrospective pretests also have weaknesses, including for example, that memory bias can lead to responders distorting retrospective responses to be more similar to their current thinking.
Lamb, T. (2005). The retrospective pretest: An imperfect but useful tool. The Evaluation Exchange, 11(2), retrieve online on August 26, 2007 at http://www.hfrp.org/evaluation/the-evaluation-exchange/issue-archive/evaluation-methodology/the-retrospective-pretest-an-imperfect-but-useful-tool
Sample
A sample is a group of subjects or cases selected from a larger group in the hope that this smaller group (the sample) will reveal important things about the larger group (the population).
Vogt, W. P. (2005). Dictionary of Statistics and Methodology. London: Sage Publications.
Variants: Sampling
Self-report
Self-report is a method used to evaluate participant characteristics by asking the participants to provide ratings about themselves. For example, an evaluator might ask a participant to rate their level of science knowledge, their attitude toward a politician, their skill in addressing local environmental problems, or their recent volunteer participation.
One problem with self-report measures is that respondents sometimes (either intentionally or unintentionally) provide inaccurate ratings. One common cause of inaccuracy in self-report measures is social desirability bias, which is a tendency for people to provide responses that they believe the researcher wants to hear.
Skills
Two important domains of skills relevant to environmental education include cognitive skills and affective skills.
“In environmental education, two sets of cognitive skills are of particular relevance: (1) skills for investigating environmental problems and issues, including identification, analysis, and evaluation; and (2) skills for dealing with action strategies, including their appropriate selection and the planning, implementation, and evaluation of discrete actions.
…Two sets of affective skills are also of particular relevance: (1) being able to weigh, choose, affirm, and make a commitment to the resolution of particular environmental problems and issues; and (2) being able to weigh one’s own values – relative to proposed and actual actions, in light of problem-related evidence and issue-related information regarding others’ values, and in light of potential consequences of taking action – and affirming and carrying out verbal commitments to support problem or issue resolution.”
Marcinkowski, T. (1993). Assessment in Environmental Education. In R. J. Wilke (Ed.), Environmental Education Teacher Resource Handbook: A Practical Guide for K-12 Environmental Education (pp. 143-198). Millwood, NY: Kraus International Publications.
Snowball sample
A snowball sample is a purposive sample selected by relying on previously identified group members to identify other members of the same population. As new names are identified, the sample gets bigger much like a rolling snowball. Any names mentioned repeatedly are likely to be particularly valuable. Snowballs samples are useful when a desired characteristic of a population is rare, or when a list of population members is unavailable to the researcher.
Henry, G. (1990). Practical Sampling. Newbury Park, CA: Sage Publications.
Research Methods Knowledge Base. (Retrieved August 26, 2007). Nonprobability Sampling. http://www.socialresearchmethods.net/kb/sampnon.php
Variants: Chain sample
Stakeholder
In the context of program evaluation, a stakeholder is an individual who has an interest in, affects, or may be affected by a program, evaluation, or evaluation outcome. For example, stakeholders can include program funders, steering committee members, program staff, program participants, and others.
Standard deviation
A standard deviation is a statistical measure of the spread or variability in a distribution of scores. “It is a measure of the average amount the scores in a distribution deviate from the mean. The more widely the scores are spread out, the larger the standard deviation.”
Vogt, W. P. (2005). Dictionary of Statistics and Methodology. London: Sage Publications.
Statistical significance
A result is called "statistically significant" if it is unlikely to have occurred by chance. This term is often used to describe differences, for example whether or not the difference in scores for two groups is statistically significant. Note though, that even if a difference between groups is statistically significant, that only means there is a difference, and does not necessarily mean that the difference is large or important.
Statistical significance is represented with the Greek symbol α (alpha). Evaluators often use a rule of finding a significance level of 5% for reporting statistical significance. If a test of significance gives a p-value lower than .05, then the null hypothesis (i.e., that there is no difference between groups) is rejected, and the finding is said to be statistically significant.
For example, if an evaluator finds there is a 1 in 100 chance that the difference in scores for treatment and control group participants could have happened by chance, then the significance level for her finding is .01. Because .01 is less than .05, the difference between the groups is called statistically significant.
Summative evaluation
Summative evaluation is evaluation designed is to evaluate the effectiveness of a program or activity based on the original objectives or for a variety of other purposes. Stakeholders including supervisors and grant makers are particularly interested in summative evaluation. Issues such as short term and long term program outcomes, cost of program development, on-going costs in relation to efficiency, and effectiveness of a program are often examined in summative evaluations. The proven success of a program can justify current and future allocation of resources to the program. Examples of techniques used for summative evaluations include pretest-posttest with one group; pretest-posttest with experimental and control groups; one group descriptive analysis; and analysis of costs, resources and implementation.
Scriven, M. (1991). Evaluation Thesaurus. Newbury Park, CA: Sage Publications.
T-Test
There are several kinds of t-tests, but the most common is the two sample t-test also known as the independent samples t-test. The two sample t-test tests whether or not two independent populations have the same mean values for a measure. For example, a researcher could use a two sample t-test to test whether a treatment and a control group have the same mean posttest score for attitude toward raising gasoline taxes.
Treatment group
The treatment group consists of the individuals who are participating in the program or intervention being studied. The treatment group can be compared with the control group (those who are not participating in the program or receiving the service) to determine what, if any changes the program caused.
Related terms: Control group
Variants: Experimental group
Typical case sample
Typical case sampling is a type of purposeful sampling in which, “subjects are selected who are likely to behave as most of their counterparts would.” For example, an evaluator might select students with a socio-demographic profile like that of the larger population of interest.
Bamberger, M., Rugh, J., & Mabry, L. (2006). Real World Evaluation: Working Under Budget, Time, Data, and Political Constraints. Thousand Oaks, CA: Sage Publications.
Unadjusted score
An unadjusted score is an original score (for example a score on a test) that has not been transformed. An example of an unadjusted score would be the number of questions that a student answered correctly on a test. An unadjusted score can be contrasted with a standard score, which indicates how many standard deviations an observation is above or below the mean.
For example, Corey’s test score can be represented as an unadjusted score (e.g., Corey correctly answered 9 out of 10 questions or 90%). Corey’s same test score can also be represented as a standard score (e.g., Corey’s test score was 1 standard deviation above the mean, or equal to or higher than about 84% of the other test scores in Corey’s class).
Variants: Raw score
Unit of analysis
“The unit of analysis is the major entity that you are analyzing in your study. For instance, any of the following could be a unit of analysis in a study:
- individuals
- groups
- artifacts (books, photos, newspapers)
- geographical units (town, census tract, state)
- social interactions (dyadic relations, divorces, arrests)
In environmental education program evaluation, your unit of analysis might, for example, be students, curriculum materials, or visits to your nature center or project website.
Research Methods Knowledge Base. (Retrieved August 31, 2007). Unit of Analysis. http://www.socialresearchmethods.net/kb/unitanal.php
Validity
Validity is used to “describe a measurement instrument or test that accurately measures what it is supposed to measure; the extent to which a measure is free of systematic error. Validity also refers to designs that help researchers gather data appropriate for answering their questions. Validity requires reliability, but the reverse is not true.
For example, say we want to measure individuals’ heights. If all we had was a bathroom scale, we could ask our individuals to step on the scale and record the results. Even if the measurements were highly reliable, that is, consistent form one weighing to the next, they would not be very valid. The weights would not be completely useless, however, since there generally is some correlation between height and weight.”
Vogt, W. P. (2005). Dictionary of Statistics and Methodology. London: Sage Publications.
Related terms: Reliability, generalizability