Qualitative Research Case Study Definition In Statistics

Before designing a quantitative research study, you must decide whether it will be descriptive or experimental because this will dictate how you gather, analyze, and interpret the results. A descriptive study is governed by the following rules: subjects are generally measured once; the intention is to only establish associations between variables; and, the study may include a sample population of hundreds or thousands of subjects to ensure that a valid estimate of a generalized relationship between variables has been obtained. An experimental design includes subjects measured before and after a particular treatment, the sample population may be very small and purposefully chosen, and it is intended to establish causality between variables.


The introduction to a quantitative study is usually written in the present tense and from the third person point of view. It covers the following information:

  • Identifies the research problem -- as with any academic study, you must state clearly and concisely the research problem being investigated.
  • Reviews the literature -- review scholarship on the topic, synthesizing key themes and, if necessary, noting studies that have used similar methods of inquiry and analysis. Note where key gaps exist and how your study helps to fill these gaps or clarifies existing knowledge.
  • Describes the theoretical framework -- provide an outline of the theory or hypothesis underpinning your study. If necessary, define unfamiliar or complex terms, concepts, or ideas and provide the appropriate background information to place the research problem in proper context [e.g., historical, cultural, economic, etc.].


The methods section of a quantitative study should describe how each objective of your study will be achieved. Be sure to provide enough detail to enable the reader can make an informed assessment of the methods being used to obtain results associated with the research problem. The methods section should be presented in the past tense.

  • Study population and sampling -- where did the data come from; how robust is it; note where gaps exist or what was excluded. Note the procedures used for their selection;
  • Data collection – describe the tools and methods used to collect information and identify the variables being measured; describe the methods used to obtain the data; and, note if the data was pre-existing [i.e., government data] or you gathered it yourself. If you gathered it yourself, describe what type of instrument you used and why. Note that no data set is perfect--describe any limitations in methods of gathering data.
  • Data analysis -- describe the procedures for processing and analyzing the data. If appropriate, describe the specific instruments of analysis used to study each research objective, including mathematical techniques and the type of computer software used to manipulate the data.


The finding of your study should be written objectively and in a succinct and precise format. In quantitative studies, it is common to use graphs, tables, charts, and other non-textual elements to help the reader understand the data. Make sure that non-textual elements do not stand in isolation from the text but are being used to supplement the overall description of the results and to help clarify key points being made. Further information about how to effectively present data using charts and graphs can be found here.

  • Statistical analysis -- how did you analyze the data? What were the key findings from the data? The findings should be present in a logical, sequential order. Describe but do not interpret these trends or negative results; save that for the discussion section. The results should be presented in the past tense.


Discussions should be analytic, logical, and comprehensive. The discussion should meld together your findings in relation to those identified in the literature review, and placed within the context of the theoretical framework underpinning the study. The discussion should be presented in the present tense.

  • Interpretation of results -- reiterate the research problem being investigated and compare and contrast the findings with the research questions underlying the study. Did they affirm predicted outcomes or did the data refute it?
  • Description of trends, comparison of groups, or relationships among variables -- describe any trends that emerged from your analysis and explain all unanticipated and statistical insignificant findings.
  • Discussion of implications – what is the meaning of your results? Highlight key findings based on the overall results and note findings that you believe are important. How have the results helped fill gaps in understanding the research problem?
  • Limitations -- describe any limitations or unavoidable bias in your study and, if necessary, note why these limitations did not inhibit effective interpretation of the results.


End your study by to summarizing the topic and provide a final comment and assessment of the study.

  • Summary of findings – synthesize the answers to your research questions. Do not report any statistical data here; just provide a narrative summary of the key findings and describe what was learned that you did not know before conducting the study.
  • Recommendations – if appropriate to the aim of the assignment, tie key findings with policy recommendations or actions to be taken in practice.
  • Future research – note the need for future research linked to your study’s limitations or to any remaining gaps in the literature that were not addressed in your study.

Black, Thomas R. Doing Quantitative Research in the Social Sciences: An Integrated Approach to Research Design, Measurement and Statistics. London: Sage, 1999; Gay,L. R. and Peter Airasain. Educational Research: Competencies for Analysis and Applications. 7th edition. Upper Saddle River, NJ: Merril Prentice Hall, 2003; Hector, Anestine. An Overview of Quantitative Research in Composition and TESOL. Department of English, Indiana University of Pennsylvania; Hopkins, Will G. “Quantitative Research Design.” Sportscience 4, 1 (2000); "A Strategy for Writing Up Research Results. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper." Department of Biology. Bates College; Nenty, H. Johnson. "Writing a Quantitative Research Thesis." International Journal of Educational Science 1 (2009): 19-32; Ouyang, Ronghua (John). Basic Inquiry of Quantitative Research. Kennesaw State University.

Copyright © 2010 Stefan Loehnert. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Gathered data is frequently not in a numerical form allowing immediate appliance of the quantitative mathematical-statistical methods. In this paper are some basic aspects examining how quantitative-based statistical methodology can be utilized in the analysis of qualitative data sets. The transformation of qualitative data into numeric values is considered as the entrance point to quantitative analysis. Concurrently related publications and impacts of scale transformations are discussed. Subsequently, it is shown how correlation coefficients are usable in conjunction with data aggregation constrains to construct relationship modelling matrices. For illustration, a case study is referenced at which ordinal type ordered qualitative survey answers are allocated to process defining procedures as aggregation levels. Finally options about measuring the adherence of the gathered empirical data to such kind of derived aggregation models are introduced and a statistically based reliability check approach to evaluate the reliability of the chosen model specification is outlined.

1. Introduction

In this paper some aspects are discussed how data of qualitative category type, often gathered via questionnaires and surveys, can be transformed into appropriate numerical values to enable the full spectrum of quantitative mathematical-statistical analysis methodology. Therefore the impacts of the chosen valuation-transformation from ordinal scales to interval scales and their relations to statistical and measurement modelling are studied. This is applied to demonstrate ways to measure adherence of quantitative data representation to qualitative aggregation assessments-based on statistical modelling. Finally an approach to evaluate such adherence models is introduced. Concurrent a brief epitome of related publications is given and examples from a case study are referenced.

Gathering data is referencing a data typology of two basic modes of inquiry consequently associated with “qualitative” and “quantitative” survey results. Thereby “quantitative” is looked at to be a response given directly as a numeric value and “qualitative” is a nonnumeric answer. This differentiation has its roots within the social sciences and research. A brief comparison of this typology is given in [1, 2]. A refinement by adding the predicates “objective” and “subjective” is introduced in [3]. An elaboration of the method usage in social science and psychology is presented in [4]. A precis on the qualitative type can be found in [5] and for the quantitative type in [6]. A comprehensive book about the qualitative methodology in social science and research is [7]. Since both of these methodic approaches have advantages on their own it is an ongoing effort to bridge the gap between, to merge, or to integrate them. Following [8], the conversion or transformation from qualitative data into quantitative data is called “quantizing” and the converse from quantitative to qualitative is named “qualitizing”. The research on mixed method designs evolved within the last decade starting with analysis of a very basic approach like using sample counts as quantitative base, a strict differentiation of applying quantitative methods to quantitative data and qualitative methods to qualitative data, and a significant loose of context information if qualitative data (e.g., verbal or visual data) are converted into a numerically representation with a single meaning only [9].

A well-known model in social science is “triangulation” which is applying both methodic approaches independently and having finally a combined interpretation result. The main mathematical-statistical method applied thereby is cluster-analysis [10]. Model types with gradual differences in methodic approaches from classical statistical hypothesis testing to complex triangulation modelling are collected in [11]. Recently, it is recognized that mixed methods designs can provide pragmatic advantages in exploring complex research questions. However, the analytic process of analyzing, coding, and integrating unstructured with structured data by applying quantizing qualitative data can be a complex, time consuming, and expensive process. In [12], Driscoll et al. are presenting an example with simple statistical measures associated to strictly different response categories whereby the sample size issue at quantizing is also sketched.

A way of linking qualitative and quantitative results mathematically can be found in [13]. There are fuzzy logic-based transformations examined to gain insights from one aspect type over the other. Also in mathematical modeling, qualitative and quantitative concepts are utilized. In terms of decision theory [14], Gascon examined properties and constraints to timelines with LTL (linear temporal logic) categorizing qualitative as likewise nondeterministic structural, for example, cyclic, and quantitative as a numerically expressible identity relation. The object of special interest thereby is a symbolic representation of a -valuation with denoting the set of integers. A symbolic representation defines an equivalence relation between -valuations and contains all the relevant information to evaluate constraints. This might be interpreted as a hint that quantizing qualitative surveys may not necessarily reduce the information content in an inappropriate manner if a valuation similar to a -valuation is utilized. In [15] Herzberg explores the relationship between propositional model theory and social decision making via premise-based procedures. Condensed it is exposed that certain ultrafilters, which in the context of social choice are decisive coalitions, are in a one-to-one correspondence to certain kinds of judgment aggregation functions constructed as ultra-products. A special result is a “Impossibility theorem for finite electorates” on judgment aggregation functions, that is, if the population is endowed with some measure-theoretic or topological structure, there exists a single overall consistent aggregation.

2. Interlock Qualitative and Quantitative Concepts

2.1. From Quantitative Results to Qualitative Insights

Fuzzy logic-based transformations are not the only examined options to qualitizing in literature. The transformation from quantitative measures into qualitative assessments of software systems via judgment functions is studied in [16]. Based on Dempster-Shafer belief functions, certain objects from the realm of the mathematical theory of evidence [17], Kłopotek and Wierzchon. utilized exemplified decision tables as a (probability) measure of diversity in relational data bases. The authors viewed the Dempster-Shafer belief functions as a subjective uncertainty measure, a kind of generalization of Bayesian theory of subjective probability and showed a correspondence to the join operator of the relational database theory. This rough set-based representation of belief function operators led then to a nonquantitative interpretation. As a more direct approach the net balance statistic as the percentage of respondents replying “up” less the percentage replying “down” is utilized in [18] as a qualitative yardstick to indicate the direction (up, same or down) and size (small or large) of the year-on-year percentage change of corresponding quantitative data of a particular activity.

The following real life-based example demonstrates how misleading pure counting-based tendency interpretation might be and how important a valid choice of parametrization appears to be especially if an evolution over time has to be considered.

Example 1 (A Misleading Interpretation of Pure Counts). The situation and the case study-based on the following: projects () are requested to answer to an ordinal scaled survey about alignment and adherence to a specified procedural-based process framework in a self-assessment. Then the ( = 104) survey questions are worked through with a project external reviewer in an “initial review”. Based on these review results improvement recommendations are given to the project team. After a certain period of time a follow-up review was performed. So three samples available: self-assessment, initial review and follow-up sample. In case of such timeline depending data gathering the cumulated overall counts according to the scale values are useful to calculate approximation slopes and allow some insight about how the overall projects behavior evolves. Now we take a look at the pure counts of changes from self-assessment to initial review which turned out to be 5% of total count and from the initial review to the follow-up with 12,5% changed. Misleading is now the interpretation that the effect of the follow-up is greater than the initial review effect. Obviously the follow-up is not independent of the initial review since recommendations are given previously from initial review. A better effectiveness comparison is provided through the usage of statistically relevant expressions like the variance. For the self-assessment the answer variance was 6,3(%), for the initial review 5,4(%) and for the follow-up 5,2(%). This leads to the relative effectiveness rates shown in Table 1.

Table 1: Effectiveness rate example.

A variance-expression is the one-dimensional parameter of choice for such an effectiveness rating since it is a deviation measure on the examined subject-matter. The mean (or median or mode) values of alignment are not as applicable as the variances since they are too subjective at the self-assessment, and with high probability the follow-up means are expected to increase because of the outlined improvement recommendations given at the initial review.

Thereby, the empirical unbiased question-variance is calculated from the survey results with as the th answer to question and the according expected single question means , that is, In contrast to the one-dimensional full sample mean which is identical to the summing of the single question means , is not identical to the unbiased empirical full sample variance Also it is not identical to the expected answer mean variance where by the answer variance at the th question is It is a qualitative decision to use triggered by the intention to gain insights of the overall answer behavior. The full sample variance might be useful at analysis of single project answers, in the context of question comparison and for a detailed analysis of the specified single question. So is useful to evaluate the applied compliance and valuation criteria or to determine a predefined review focus scope. In fact, to enable such a kind of statistical analysis it is needed to have the data available as, respectively, transformed into, an appropriate numerical coding.

2.2. Transforming Qualitative Data for Quantitative Analysis

The research and appliance of quantitative methods to qualitative data has a long tradition. Due to [19] is the method of “Equal-Appearing Interval Scaling”. Essentially this is to choose a representative statement (e.g., to create a survey) out of each group of statements formed from a set of statements related to an attitude using the median value of the single statements as grouping criteria. A single statement's median is thereby calculated from the “favourableness” on a given scale assigned to the statement towards the attitude by a group of judging evaluators. A link with an example can be found at [20] (Thurstone Scaling). Also the technique of correspondence analyses, for instance, goes back to research in the 40th of the last century for a compendium about the history see Gower [21]. Correspondence analysis is known also under different synonyms like optimal scaling, reciprocal averaging, quantification method (Japan) or homogeneity analysis, and so forth [22] Young references to correspondence analysis and canonical decomposition (synonyms: parallel factor analysis or alternating least squares) as theoretical and methodological cornerstones for quantitative analysis of qualitative data. The great efficiency of applying principal component analysis at nominal scaling is shown in [23]. There is given a nice example of an analysis of business communication in the light of negotiation probability. The authors introduced a five-stage approach with transforming a qualitative categorization into a quantitative interpretation (material sourcing—transcription—unitization—categorization—nominal coding). The issues related to timeline reflecting longitudinal organization of data, exemplified in case of life history are of special interest in [24]. Thereby so-called Self-Organizing Maps (SOMs) are utilized. SOMs are a technique of data visualization accomplishing a reduction of data dimensions and displaying similarities. The authors consider SOMs as a nonlinear generalization of principal component analysis to deduce a quantitative encoding by applying life history clustering algorithm-based on the Euclidean distance (-dimensional vectors in Euclidian space) Belief functions, to a certain degree a linkage between relation, modelling and factor analysis, are studied in [25]. The authors used them to generate numeric judgments with nonnumeric inputs in the development of approximate reasoning systems utilized as a practical interface between the users and a decision support system. Another way to apply probabilities to qualitative information is given by the so-called “Knowledge Tracking (KT)” methodology as described in [26]. Thereby the idea is to determine relations in qualitative data to get a conceptual transformation and to allocate transition probabilities accordingly. Thus the emerging cluster network sequences are captured with a numerical score (“goodness of fit score”) which expresses how well a relational structure explains the data. Since such a listing of numerical scores can be ordered by the lower-less (≤) relation KT is providing an ordinal scaling. Limitations of ordinal scaling at clustering of qualitative data from the perspective of phenomenological analysis are discussed in [27].

3. Scaling

It is a well-known fact that the parametrical statistical methods, for example, ANOVA (Analysis of Variance), need to have some kinds of standardization at the gathered data to enable the comparable usage and determination of relevant statistical parameters like mean, variance, correlation, and other distribution describing characteristics. A survey about conceptual data gathering strategies and context constrains can be found in [28]. One of the basics thereby is the underlying scale assigned to the gathered data. The main types of numerically (real number) expressed scales are(i)nominal scale, for example, gender coding like “male = 0” and “female = 1”,(ii)ordinal scale, for example, ranks, its difference to a nominal scale is that the numeric coding implies, respectively, reflects, an (intentional) ordering (≤),(iii)interval scale, an ordinal scale with well-defined differences, for example, temperature in °C,(iv)ratio scale, an interval scale with true zero point, for example, temperature in °K,(v)absolute scale, a ratio scale with (absolute) prefixed unit size, for example, inhabitants.

Let us first look at the difference between a ratio and an interval scale: the true or absolute zero point enables statements like “20°K is twice as warm/hot than 10°K” to make sense while the same statement for 20°C and 10°C holds relative to the °C-scale only but not “absolute” since 293,15°K is not twice as “hot” as 283,15°K. Interval scales allow valid statements like: let temperature on day A = 25°C, on day B = 15°C, and on day C = 20°C. Now the ratio (A−B)/(A−C) = 2 validates “The temperature difference between day A and B is twice as much as between day A and day C”.

As mentioned in the previous sections, nominal scale clustering allows nonparametric methods or already (distribution free) principal component analysis likewise approaches. Examples of nominal and ordinal scaling are provided in [29]. A distinction of ordinal scales into ranks and scores is outlined in [30]. While ranks just provide an ordering relative to the other items under consideration only, scores are enabling a more precise idea of distance and can have an independent meaning. In case that a score in fact has an independent meaning, that is, meaningful usability not only in case of the items observed but by an independently defined difference, then a score provides an interval scale. An ordering is called strict if and only if “” holds.

Example 2 (Rank to score to interval scale). Let us evaluate the response behavior of an IT-system. The evaluation answers ranked according to a qualitative ordinal judgement scale aredeficient (failed) acceptable (partial) comfortable (compliant).Now let us assign “acceptance points” to construct a score of “weighted ranking”:deficient = acceptable = comfortable = .This gives an idea of (subjective) distance: 5 points needed to reach “acceptable” from “deficient” and further 3 points to reach “comfortable”. But from an interpretational point of view, an interval scale should fulfill that the five points from “deficient” to “acceptable” are in fact 5/3 of the three points from “acceptable” to “comfortable” (well-defined) and that the same score is applicable at other IT-systems too (independency). Therefore consider, as “throughput” measure, time savings:“deficient” = loosing more than one minute = −1,“acceptable” = between loosing one minute and gaining one = 0,“comfortable” = gaining more than one minute = 1.For a fully well-defined situation, assume context constrains so that not more than two minutes can be gained or lost. So from “deficient” to “comfortable”, the distance will always be “two minutes”.

3.1. Transforming Ordinal Scales into Interval Scales

Lemma 1. Each strict score with finite index set can be bijectively transformed into an order preserving ranking with .

Proof. Since the index set is finite is a valid representation of the index set and the strict ordering provides to be the minimal scoring value with if and only if . Thus is the desired mapping.

Aside of the rather abstract “”, there is a calculus of the weighted ranking with and which is order preserving and since for all it provides the desired (natural) ranking . Of course qualitative expressions might permit two or more items to occupy equal rank in an ordered listing but with assigning numeric values differentiation aspects are lost if different items represented by the same numeral. Approaches to transform (survey) responses expressed by (non metric) judges on an ordinal scale to an interval (or synonymously “continuous”) scale to enable statistical methods to perform quantitative multivariate analysis are presented in [31]. Thereby a transformation-based on the decomposition into orthogonal polynomials (derived from certain matrix products) is introduced which is applicable if equally spaced integer valued scores, so-called natural scores, are used. Also the principal transformation approaches proposed from psychophysical theory with the original intensity as judge evaluation are mentioned there.

Fechner's law
with constant l in .

Steven's Power Law
where depends on the number of units and is a measure of the rate of growth of perceived intensity as a function of stimulus intensity.

The Beidler Model
with constant usually close to 1.

Thereby the determination of the constants or that the original ordering is lost occurs to be problematic. From lemma1 on the other-hand we see that given a strict ranking of ordinal values only, additional (qualitative context) constrains might need to be considered when assigning a numeric representation. Of course each such condition will introduce tendencies. So without further calibration requirements it follows:

Consequence 1. An equidistant interval scaling which is symmetric and centralized with respect to expected scale mean is minimizing dispersion and skewness effects of the scale.

Proof. If , let . Since and the symmetry condition holds for each , there exist an with

One thought on “Qualitative Research Case Study Definition In Statistics

Leave a Reply

Your email address will not be published. Required fields are marked *