Module 5: Educational Measurement and Evaluation (Notes)

Measurement is the process of quantifying individual’s achievement, personality, attitudes, habits and skills.

Quantification appraisal of observable phenomena

Process of assigning symbols to dimensions of phenomena

An operation performed on the physical world by an observer

Process by which information about the attributes or characteristics of things are determined and differentiated

Evaluation is a qualitative aspect of determining the outcomes of learning.

Process of ranking with respect to attributes or trait

Appraising the extent of learning

Judging effectiveness of educational experience

Interpreting and analyzing changes in behavior

Describing accurately quantity and quality of thing

Summing up results of measurement or tests giving meaning based on value judgments

Systematic process of determining the extent to which instructional objectives are achieved

Considering evidence in the light of value standard and in terms of particular situations and goals which the group of individuals are striving to attain.

Testing is a technique of obtaining information needed for evaluation purposes. Tests, Quizzes, measuring instruments  are devices used to obtain such information

Functions of measurements

a) Instructional

i. Principal (basic purpose)

To determine what knowledge, skills, abilities, habits and attitudes have been acquired

To determine what progress or extent of learning attained

To determine strengths, weaknesses, difficulties and needs of students

ii. Secondary (auxiliary functions for effective teaching and learning)

To help in study habits formation

To develop the effort making capacity of students

To serve as aid for guidance, counselling, and prognosis

b)  Administrative or supervisory

To maintain standards

To classify or select for special purposes

To determine teachers efficiency, effectiveness of methods, strategies used (strengths, weaknesses, needs); standards of instruction

To serve as basis or guide for curriculum making and developing

Administrative or supervisory Function

To serve as guide in educational planning of administrators and supervisors

To set up norms of performance

To inform parents of their children’s progress in school

To serve as basis for research 

Functions of Evaluation

i. Evaluation assesses or make appraisal of educational objectives, programs, curricula, instructional materials, facilities

- Teacher

- Learner

- Public relations of the school

- Achievement scores of the learner 

ii. Evaluation conducts research

Principles of Evaluation. Evaluation should be

a) Based on clearly stated objectives

b) Comprehensive

c) Cooperative

d) Used Judiciously

e) Continuous and integral part of the teaching learning process

Types of Evaluation used in classroom instruction

a) Diagnostic Evaluation detects pupil’s learning difficulties which somehow are not revealed by formative tests. It is more comprehensive and specific. 

b) Formative Evaluation provides feedback regarding the student’s performance in attaining instructional objectives. It identifies learning errors that needed to be corrected and it provides information to make instruction more effective.

c) Placement Evaluation defines student’s entry behaviors. It determines knowledge and skills he possesses which are necessary at the beginning of instruction.

c) Summative Evaluation determines the extent to which objectives of instruction have been attained and is used for assigning grades or marks and to provide feedback to students.

Qualities of a Good Measuring Instrument

1. Validity

Content validity – face validity or logically validity used in evaluating achievement test

Concurrent validity – test agrees with or correlates with a criterion (ex. entrance examination)

Predictive validity – degree of accuracy of how test predicts the level of performance in activity which it intends to foretell

Construct validity – agreement of the test with a theoretical construct or trait (ex. IQ) 

Let’s have a problem situation

A fisherman who captures on piece of yellow fin tuna weighs it and it measures 100 kilograms. As he meets a friend after friend, he tells that the weight of the fish he caught is 130 kilo grams. In statistical sense, the story is reliable for it is consistent (why is it consistent), but the truthfulness of the fisherman’s story is not established, hence it is not valid but reliable. 

Lesson: A test can be reliable without being valid but a valid test is reliable.

2. Reliability

Methods of estimating reliability

Test retest Method (uses Spearman rank correlation coefficient)

Parallel forms or alternate forms (paired observations are correlated)

Split half method (odd even halves and computed using Spearman Brown formula)

Internal consistency method (Kuder - Richardson formula 20)

Scorer reliability method (two examiners independently score a set of test papers then correlate their scores) 

3. Tests

Classification of Tests according to manner of response

a) Oral and Written according to method of preparation:

b) Subjective or essay and Objective according to nature of answer.

Intelligence test, Personality test, Aptitude test, Prognostic test, Diagnostic test, 

Achievement test, Preference test, Accomplishment test, Scale test, Speed test, Power test, Standardized test, Teacher –  made test, Placement test.

Classification of Measuring Instrument

Standard Tests

i. a) Psychological test – Intelligence test, Aptitude test, Personality (Rating scale) test, Vocational and Professional Interest Inventory 

b) Educational Test

ii. Teacher – made test

Planning, Preparing, Reproducing, Administering, Scoring, Evaluating, Interpreting

Classification of Measuring Instrument

Standard Tests

1. a ) Psychological test – Intelligence test, Aptitude test, Personality (Rating scale) test, Vocational and Professional Interest Inventory

b) Educational Test

2. Teacher – made test

Planning, Preparing, Reproducing, Administering, Scoring, Evaluating, Interpreting. Evaluating with the use of ITEM Analysis

Effectiveness of distractors

A good distractor attracts the student in the lower group than in the upper group

Index of discrimination

The index of discrimination may be positive if more students in the high group got the correct answer and negative if more students in the low group got the correct answer.

Index of difficulty

Difficulty refers to the of getting the right answer of each item. The smaller the percentage, the more difficult the item is.

Practice Task in Item Analysis Test Item number 5

Options    1      2      3*        4       5

Upper 27% 2 3 7 2 0


Lower 27% 4 2 3 5 0


*correct answer

Types of Teacher – Made Tests

1. Essay type

Advantages: easy to construct, economical, minimize guessing, develops critical thinking, minimize cheating and memorizing, develops good study habits

2. Objective type

i. Recall type – simple recall, completion type

ii. Recognition type – alternate response (true/false, yes/no, right/wrong, agree/disagree); Multiple choice (stem and options variety, setting and options variety, group term variety, structured response variety, contained option variety)

iii. Matching type

iv. Rearrangement type

v. Analogy type – purpose, cause and effect, synonym relationship, antonym relationship, numerical relationship

vi. Identification type

Multiple Choice Test (Recognition type)

a) Stem and options variety :  the stem serves as the problem 

b) Setting and-options variety : the optional responses are dependent upon a setting or foundation of some sort, i.e. graphical representation

c) group term variety : consist of group of words or terms in which one does not belong to the group

d) Structured response variety: makes use of structured response which are commonly use in classroom testing for natural science subjects 

e) Contained option variety: designed to identify errors in a word, phrase, sentence or paragraph.


i. Purpose : shoe is to shoelace as door is to _  

a. transom b. threshold c. hinge  d. key     

ii. Cause and effect : heat is to fire as water is to _ 

a. sky      b. rain         c. cloud     d. H2O

iii. Synonym relationship: dig is to excavate as kill is to _

a. try       b. avenge   c. convict   d. slay

iv. Antonym relationship: fly is to spider as mouse is to _

a. rat       b. cat          c. rodent   d. animal

v. Numerical relationship: 2 is to 8 as 1/3 is to __

    a. 2/3      b. 4/3          c. 12         d. 4

Table of Specifications (TOS)

It is the teacher’s blue print.

It determines the content validity of the tests.

It is one way table that relates the instructional objectives to the course content

It makes use of Bloom’s Taxonomy in determining the Levels of Cognitive Domain

Criterion and Norm Reference Tests

Criterion Referenced Tests

It serves to identify on what extent the individual’s performance has met in a given criterion. (ex. A level of 75% score in all the test items could be considered a satisfactory performance)

It points out what a learner can do, not how he compares with others

It identifies weak and strong points in an individual’s performance

It tends to focus on sub skills, shorter, mastery learning

It could be both diagnostic and prognostic in nature.

Norm Referenced Tests

It compares a student’s performance with the performance of other students in the class

It uses the normal curve in distributing grades of students by placing them either above or below the mean.

The teacher’s main concern is the variability of the score.

The more variable the score is the better because it can determine how individual differs from the other. 

Uses percentiles and standard scores.

It tends to be of average difficulty.

Measures of Central Tendency

Mean, Median, Mode

Measures of Variability

Range, Quartile Deviation, Standard Deviation 

Point Measures

Quartiles, Deciles, Percentiles

Measures of Central Tendency

Mode – the crude or inspectional average measure. It is most frequently occurring score. It is the poorest measure of central tendency.

Advantage: Mode is always a real value since it does not fall on zero. It is simple to approximate by observation for small cases. It does not necessitate arrangement of values.

Disadvantage: It is not rigidly defined and is inapplicable to irregular distribution 

What is the mode of these scores?

75,60,78, 75 76 75 88 75 81 75

Median – The scores that divides the distribution into halves. It is sometimes called the counting average. 

Advantage: It is the best measure when the distribution is irregular or skewed. It can be located in an open ended distribution or when the data is incomplete (ex. 80% of the cases is reported)

Disadvantage: It necessitates arranging of items according to size before it can be computed

What is the median?

75,60,78, 75 76 75 88 75 81 75

MEAN – The most widely used and familiar average. The most reliable and the most stable of all measures of central tendency. 

Advantage: It is the best measure for regular distribution.

Disadvantage: It is affected by extreme values

What is the mean?

75,60,78, 75 76 75 88 75 81 75

Point Measures:

Quartiles point measures where the distribution is divided into four equal parts.

Q1 : N/4 or the 25% of distribution

Q2 : N/2 or the 50% of distribution ( this is the same as the median of the distribution)

Q3 : 3N/4 or the 75% of distribution

Deciles point measures where the distribution is divided into 10 equal groups. 

D1 : N/10 or the 10% of the distribution 

D2 : N/20 or the 20% of the distribution

D3 : N/30 or the 30% of the distribution 

D4 : N/40 or the 40% of the distribution

D5 : N/50 or the 50% of the distribution


D9 : N/90 or the 90% of the distribution          

Percentiles point measures where the distribution is divided into 100 equal groups

P1  : N/1    or the 1%   of the distribution

P10 : N/10 or the 10% of the distribution

P25 : N/25 or the 25% of the distribution 

P50 : N/50 or the 50% of the distribution

P75 : N/75 or the 75% of the distribution

P90 : N/90 or the 90% of the distribution

P99 : N/99 or the 99% of the distribution 

Measures of Variability or Scatter

a) Range 

     R = highest score – lowest score

b) Quartile Deviation 

QD = ½ (Q3 – Q1)

It is known as semi inter quartile range

It is often paired with median

Standard deviation 

It is the most important and best measure of variability of test scores. 

A small standard deviation means that the group has small variability or relatively homogeneous.

It is used with mean. 



No comments