
Introduction
The relationship between education and income has been widely studied for different countries. For instance, Psacharopoulos (1993), Psacharopoulos & Ng(1992) and Psacharopoulos & Patrinos (2004) present estimates of the return to education for a wide range of countries. most of these studies use crosssectional data to estimate the return to schooling (RTS), although recently the use of panel and pseudopanel data has increased.
It is well known that when characteristics of the individual (such as ability) are not taken into account and they are correlated with the level of education, the ordinary least square (OlS) estimation of the RtS is biased. To correct this "ability" bias, two common methods had been employed; the use of panel data and instrumental variables. The main problem when using instrumental variables is to find proper instruments. The main problem using panel data is the lack of such data (particularly, for developing countries).
A third option is to use series of crosssections in the form of a pseudopanel. When analyzing the return to schooling, this data structure permits to reduce the ability bias that is present in a single crosssection. Moreover, this approach can be applied in many developing countries in which it is common to find series of household income surveys that change the sample every year.
Pseudopanel data has other advantages over panel data. As Deaton (1997) points out, pseudopanel does not suffer from attrition because it is constructed from new samples every year; also cohort data is likely to be less susceptible to measurement error than panel data, for "the quantity that is tracked is normally an average and the averaging will nearly always reduce the effects of measurement error" (Deaton (1997), p120).
Given these advantages, pseudopanels have recently been used to estimate the returns to schooling. For instance, Warunsiri and mcNown (2009) use pseudopanel techniques to calculate the return to schooling in thailand. Dickerson et al (2001) use a similar approach to calculate the returns to education in brazil, while Kaymak(2008) uses cohort data to calculate the return to schooling in the US.
In spite of its advantages, the use of pseudopanel data carries a set of difficulties when it is applied to the estimation of the return to schooling. In particular, when this approach is followed, the results (a) can be sensitive to the pseudopanel setting and (b) attenuation bias might be present due to sampling errors. Moreover, the estimation might suffer from various types of selectivity bias due to gender, retirement and selfemployment decisions (2). In our application we examine (a) and (b) with less discussion upon the selectivity bias.
Our estimates are based on Deaton and Paxon's (1994) income decomposition. their methodology decomposes income in three effects: age, cohort and year. Generally, once education is accounted for, the age effect is associated with experience, the year effect is related to macroeconomic fluctuations and the cohort effect is related to particular characteristics of a group (or cohort). In our application, the pseudopanel is constructed using Costa Rican household Surveys from 1987 to 2008. In this construction, a set of cohorts is defined (by year(s) of birth) and they are followed through each survey.
Different settings of the pseudo panel and different estimators are calculated. We use three settings for the construction of the pseudopanel. In our first example, we define each cohort by a single year of birth and we keep, in every year, those individuals between a minimum and a maximum age. This means that every year a new cohort "appears" and the oldest cohort "disappears" from the data set.
In a second example, a fixed set of cohorts are chosen and they are followed through all the years. In this case no cohorts are added or subtracted in any year, but the range of ages observed changes year by year. In particular, each year we use an older sample.
In a third example, the cohorts are defined by groups of people who are within a range of five years of age. For example, a cohort is defined for those who are between 15 and 19 years old in a particular year. These grouped cohorts are used to increase the number of observations within a cohort, which is recommended to reduce the measurement error bias. In spite of this advantage, in this setting, it is difficult to identify the year and age effects.
In each example, we estimated the returns to schooling by OLS and weighted least squares (WLS). We also calculated the estimator proposed by Devereux (2007a). This last estimator takes into account the measurement error present in the pseudopanel.
Once we study the behavior of the RTS under different settings and estimators, we analyze the relationship between the RTS and the year, cohort and age effects mentioned above. We found that the RTS are higher for older samples of people. This effect is greater in the crosssectional estimates than in the pseudopanel estimates. Therefore, it is important to define a criterion to choose the age range to be used.
Moreover, we found that the income of younger cohorts is greater than the income of older cohorts, once experience and short run fluctuations of the economy are accounted for. this difference in income between generations is explained by differences in their levels of education. Other factors that differ between generations seem to be less important. Finally, we present preliminary evidence suggesting that short run fluctuations on income affect in a greater extent those with less education.
The rest of the paper is divided in three parts. The next section presents the methodology and different estimates of the RTS for Costa Rica. The following section describes the relationship between years of schooling and the year, age and cohort effect. The conclusions are presented in the last section.

Methodology
We used Deaton's (1997) and Deaton & Paxon's (1994) methodology to analyze three effects on earnings: age, year and cohort. We also study their relationship with years of schooling. this methodology is suitable for analyzing Costa Rican household income, given the lack of a panel data set at individual level. Instead of a panel data set, the Costa Rican National Institute of Statistics performs a household survey each year that changes the sample year after year. Deaton's methodology lets us link this set of crosssectional data through time.
In this methodologyinstead of a single individuala group of people is followed. this group of peopleor cohortis defined by year of birth. For instance, those aged 15 in 1987 make up a cohort; those aged 16 in 1987 make up another cohort and so on. To follow a variable for these cohorts through time, the mean of the variable is calculated for the members of the cohort and this average is linked to the average of the same variable of those one year older in the next survey. For instance, the income of the cohort which was 15 in 1987 is represented by the average income of those aged 15 in 1987. One year later (in 1988), the income of this cohort is defined by the average income of those aged 16 in 1988. This lets us construct a data set with observations for cohorts in different years. This type of data array is known as a pseudopanel or synthetic panel.
Pseudopanel data has been used to analyze different variables (in particular household income) when panel data is not available (3). For instances, Deaton and Paxon (1994) constructed a pseudopanel using 15 consecutive household income and expenditure surveys, of Taiwan. Warunsiri and McNown (2009) used a pseudopanel to estimate the return to education for Thailand, while Dickerson et al (2001) took a similar approach using data from brazil.
As Deaton (1997) points out, pseudopanels have some advantages over most panels. Cohort data does not suffer from attrition (as do most panels) because they are constructed from new samples every year. Also, they are likely to be less susceptible to measurement error than panel data, because the quantity being tracked is an average and the averaging will, nearly always, reduce the effect of measurement error (Deaton 1997).
To apply this pseudopanel approach, we follow the mincerian tradition and assume that earnings are explained by the educational level and experience. Consider the following empirical formulation:
(1) ln([y.sup.*.sub.it]) = [omega] + [s.sup.*.sub.it][rho] + [X.sup.*.sub.it][beta] + [v.sub.i] + [[epsilon].sub.it]
Where [S.sup.*.sub.it] refers to the educational level of the individual i in year t, [[upsilon].sub.i] refers to characteristics of the individual i (as ability), and [X.sup.*.sub.it] refers to others income determinants. If [[upsilon].sub.i] was correlated with [s.sup.*.sub.i] the least squares estimation of (1) (without including the individual effect) would capture this correlation, and would be biased. If the covariance between ability and education was positive, the OLS estimation of (1) would overestimate the return to schooling. If the covariance was negative, the OLS estimation of (1) would underestimate the RTS. The covariance is usually assumed to be positive. Individuals with higher ability tend to acquire higher levels of education. Although, Warunsiri and McNown (2009) suggested that this covariance can be negative, since people with higher ability face a higher opportunity cost of schooling. This higher opportunity cost might lead to a negative correlation between ability and schooling.
Taking means over each cohort in (1) we obtain:
(2) ln([y.sub.ct]) = [omega] + [s.sub.ct][rho] + [X.sup.*.sub.ct][beta] + [v.sub.c] + [[epsilon].sub.ct]
In (2) the subindex c represents cohorts. As Kaymak (2008), Warunsiri and mcNown (2009) noted, this averaging, eliminates the ability bias, provided that is orthogonal to [s.sub.ct].
To relate (2) with Deaton and Paxon's(1994) decomposition...
Returns to education, cohorts and business cycle in Costa Rica: 19872008.
Author:  Rojas, Diego 
Position:  Texto en inglés 
Pages:  9(21) 
To continue reading
Request your trial
COPYRIGHT Universidad de Costa Rica