DUKE UNIVERSITY

EXPERIMENT 3

QUANTITATIVE ANALYSIS OF MULTICOMPONENT FLUORESCENT

MIXTURES USING CHEMOMETRIC ANALYSIS (CHEMO)

The goal of this experiment is to teach you to use chemometric techniques in the analysis of multicomponent data. Chemometrics† is the utilization of mathematical and statistical methods for handling, interpreting, and predicting chemical data. When analyzing multicomponent mixtures by fluorescence spectroscopy, deviations from the Beer-Lambert law are often observed. This can occur when the excitation peak of one component overlaps with the emission peak of another component and energy transfer occurs. In this experiment it will be shown that Principle Component Factor Analysis (PCFA) and Target Test Factor Analysis (TTFA) are powerful techniques for obtaining quantitative information from spectra which exhibit such deviations from Beer's law. You will show that it is possible to use PCFA and TTFA to analyze a data matrix consisting of spectra of multicomponent solutions to determine (a) the number of components contributing to the emission spectra, (b) the chemical identity of each component and (c) the concentration of each component.

THEORY

Principle component factor analysis has been successfully applied in a number of areas of analytical chemistry.1,2 This experiment provides a systematic chemometric approach for analyzing multi- component fluorescence spectra in which spectral overlap and its frequently observed consequence, energy transfer, are present. Our analysis requires that the spectral data (fluorescence intensities and concentrations) be stored in matrices as follows:

Intensity Matrix [D]

Each column of the intensity matrix [D] represents a spectrum. Each row of the intensity matrix represents a particular wavelength. If, for example, an experiment contained 20 spectra and each spectrum consisted of intensities measured at 70 wavelengths, the intensity matrix holding that data would have 70 rows and 20 columns. The elements of the matrix D(i,j) represent the

______________________________________________________________________________

† Chemometric analysis of multivariate data is also discussed in the theory section of Experiment 1,

part II.

fluorescence intensity of solution j at wavelength i. If the excitation wavelength is held constant when determining spectra for different solutions, a row of [D] should have values proportional to the concentrations of the components present, while a column represents the fluorescence emission spectrum of a solution.

Concentration Matrix [C]

Each column of the concentration matrix represents a mixture. Each row of the concentration matrix represents a particular component of the mixture. For example, if the 20 spectra in the intensity matrix were measured on twenty different mixtures containing some combination of 4 components, the corresponding concentration matrix would have 4 rows and 20 columns. It is important that the columns of the intensity and concentration matrices correspond to the same mixtures. In other words, column 3 of the intensity matrix should be the spectrum of the mixture whose component concentrations are found in column 3 of the concentration matrix.

Using principal component factor analysis and the matrices [D] and [C], information about the number of fluorescent chromophores contributing to the spectra can be obtained. This information can be used to generate a calibration matrix which can then be used to determine the concentration of each component in unknown solutions, provided their emission spectra have been measured.

THEORY

PRINCIPLE COMPONENT FACTOR ANALYSIS (PCFA)

Essentially factor analysis involves the transformation of the n orthogonal axes (representing the variables) that span the data space into n new axes (representing linear combinations of the variables), such that these new axes lie along the direction of maximum variance. This concept can be visualized with the help of the two- dimensional example in Figure 1. It is obvious from Figure 1(a) that the direction of maximum variance lies neither along the x-axis nor along the y-axis, but rather along some direction between them, i.e. along some combination of x and y. Similarly, the axis describing the direction of the second greatest amount of variation away from the principal direction of variance is coincident neither with x nor with y. Figure 1(b) depicts the identical distribution to that of Figure 1(a), but referred to a new set of axes f1 and f2, such that f 1 represents the direction of greatest variance, and f2 the direction of greatest variance orthogonal to f1. Now if the variation along f2 is minimal compared to that along f1, then it could justifiably be argued that the combination of x and y represented by f1 is adequate in describing the distribution of data points in the two-dimensional space spanned by x and y. In other words, a reduction in the dimensionality of the data point distribution from two to one has been achieved.

FIGURE 1: The relation between data distribution and (a) variable axes x and y and (b) factors f1 and f2. Note that the origin of the factor space actually lies at the midpoint of the data distribution; the factor axes in (b) are merely intended to emphasize the directions of variance.

In the case of an n-dimensional problem, factor analysis yields up to n orthogonal factors (linear combinations of the original variables) lying along, respectively, the axis of largest variance, the axis of second largest variance, the axis of third largest variance, and so on. Often the number of factors needed to describe, say 95% of the sample variance is less than n, so that factor analysis essentially affords a technique whereby the dimensionality of the parameter space can be reduced. The task of the chemist is then to interpret, in chemical terms, those factors extracted out of the data matrix by factor analysis.

Principal Component Factor Analysis (PCFA) extracts, from the data themselves, the axes (or eigenvectors) that best span the data matrix. The first eigenvector is computed such that the sum of the magnitudes of the projections of all points on that vector is a maximum, i.e. as much variation in the data as possible lies along the direction of the first eigenvector. The projection of each data point on the eigenvector will be the coordinate of that datum along the vector. The second eigenvector is chosen, orthogonal to the first, so that as much as possible of the remaining variation lies along this vector. Subsequent vectors and the projections of data thereon are constructed in like manner until all the variation in the data can be described in terms of the extracted eigenvectors and associated coordinates along these vectors. The data matrix is thus decomposed into two matrices, the row cofactor matrix and the column cofactor matrix, which are composed of the coordinates and eigenvectors respectively:

[D]PCFA = [R]abstract [C]abstract (1)

data row column

matrix matrix matrix

NOTE: The mathematical basis for Factor Analysis is explained in a tutorial in Appendix A following this experiment.

Since this factor analytical solution is purely mathematical and is devoid of physical meaning, these matrices are called abstract matrices. The columns of [R]abstract are called abstract factors.

Since the abstract solution should involve a physically meaningful number of factors, determination of n, the correct factor "size", is a particularly important step. As a result of this step we obtain an estimate of the complexity of the data space, information normally lacking even for the simplest chemical problems.

The objective of factor analysis is to develop a complete, physically meaningful model for the data. Hence we need to convert the abstract solution into a real solution. To do this, we mathematically "transform" the abstract factors into physically significant, "real" factors.

TARGET TRANSFORMATION FACTOR ANALYSIS (TTFA)

The rows of [R]abstract and the columns of [C]abstract are factors accounting for the variance in the data since they can be multiplied together to reproduce the original data matrix [D]. They are abstract factors, however, and have no physical meaning. We will use the chemometric technique known as Target Transformation Factor Analysis (TTFA) to determine the physical meaning or interpretation of these abstract factors. PCFA provides a significant head start, since at least we know how many factors we are looking for. Despite the complexity of the data space, we can identify individual factors through this method by performing target testing using the spectra of component compounds suspected to be factors. The target test can be performed either on rows or on columns. The procedure for columns (which we will use in this experiment) can be summarized as follows:

(2)

Here [C]abstract is the abstract column matrix of significant eigenvectors (from equation (1)). The target transformation vector, T, results from a least-squares operation involving the PCFA solution ([C]abstract) and the individual "target" being tested, designated by the vector Ctest. The least-squares procedure finds the transformation vector which minimizes the deviation between the test vector and the predicted vector. If Ctest is a real factor, the point-by-point agreement between it and the predicted vector Cpredicted should be good. If the test vector and the predicted vect or are significantly different, it is unlikely that the test vector is a real factor. In order to relate the variables used in equation (2) to the nomenclature you will use in the experimental procedure, we will rewrite this equation as follows:

(3)

where [vt] is the test vector, and [pd] is the predicted vector, from TTFA. The most important part of target testing vectors is deciding which vectors to test. Since it would be difficult to test every physical factor known, scientific intuition can help a great deal.

In this experiment you will use the standard spectrum of Rhodamine B in a target transformation of the columns of the data matrix. If the error between the tested and predicted vectors is small, you have confirmed that the spectrum of Rhodamine B is the factor accounting for most of the variance in the columns, and you may conclude that Rhodamine B is present in the solution. The procedure followed for single component solutions can be repeated for multicomponent solutions with the addition of additional target vectors.

CHEMOMETRIC ANALYSIS USING MATLAB

MATLAB will be used to perform the factor analysis. Factor analysis uses a method of successive approximation in order to arrive at a set of eigenvectors [vc] and eigenvalues [v ] that span the data in the intensity matrix [D]. The mathematical basis for PCFA is given in Appendix A at the end of this experiment. Initially the data matrix is multiplied by its transpose to produce a square matrix, called the covariance matrix [Cov]. This yields a square matrix which is required for the eigenanalysis.

In performing PCFA, the first eigenvalue and eigenvector of [Cov] are determined by standard statistical methods. An approximation of the original matrix is computed from the first eigenvalue and eigenvector and is subtracted from the original matrix leaving a residual matrix. The first eigenvalue and eigen-vector of the residual matrix are then determined. The process repeats until the residual matrix approximates zero. The matrix, [vc], is composed of the eigenvectors and the matrix,, contains the associated eigenvalues. Each eigenvalue measures the relative importance of the associated eigenvector. A large eigenvalue indicates a major factor, whereas a very small eigenvalue indicates an unimportant factor. Thus, the number of eigenvalues greater than experimental error is equivalent to the number of factors contributing to the variance in the data. In equation (2), [C]abstract is a condensed column matrix containing only the eigenvectors whose eigenvalues are significant and [R]abstract is the row matrix after being condensed accordingly. [D]PFA is similar to [D] and can be thought of as the original data excluding experimental error. The eigenvectors, [vc], and eigenvalues,, are generated by the MATLAB function file "vca".

If the data were free of experimental error, PCFA would yield exactly n eigenvalues, one for each of the controlling factors. Because of experimental error, PCFA solutions may generate c eigenvectors, one for each of the c columns in the data matrix. However, only n of this set of c eigenvectors, associated with the n largest eigenvalues, have physical meaning. Because there are c factors, the eigenvector matrix [vc] contains c columns and the eigenvalue matrix [v ] contains c rows. The complete solution overspans the true factor space, involving more eigenvectors than are necessary.

Because of the least-squares nature of PCFA, the eigenvalue matrix [vl] is formed using eigenvectors in decreasing order of importance. Factors are ranked according to their ability to account for variation in the data. The first row of the eigenvalue matrix, representing the first factor, is associated with the largest, most important eigenvalue. The last row is the least important, being associated with the smallest eigenvalue. The first factoraccounts for the greatest percent of variation in the data, the second for the next greatest percentage, and so forth, so that the complete set of c abstract factors accounts exactly for the data, including the experimental error in the measurements.

Having calculated the complete PCFA solution, we seek to discover how many of the c factors are physically important. The abstract factors can be divided into two sets: a primary set of n factors, which account for the real, measurable features of the data, and a secondary set of c - n factors, which is associated entirely with experimental error. By eliminating the secondary factors from the initial solution, we "compress" the factor model to incorporate only the physically significant factors.

A stepwise procedure is often utilized to deduce the correct number of factors. Each stage of the reproduction involves the following computation and comparison:

[R]abstract [C]abstrct = [D]n [D] (4)

Here [R]j,PCFA and [C]j,PCFA are the abstract m atrices based on the n most important eigenvectors; [D]n is the data matrix reproduced using the first n abstract factors; and [D] is the original data matrix. In the first reproduction attempt, only the single most important factor (n=1) is used; in the second stage of reproduction, the first and second most important factors (n=2) are employed simultaneously; and so forth, until all factors are used together in the final reproduction.

As additional factors are incorporated in Equation (5), data reproduction becomes more accurate, since, cumulatively, a greater fraction of the variation in the data is accounted for. When the correct number of factors is employed, the reproduced data matrix, [D]n, should equal the original data matrix within experimental error. The MATLAB function file "pcrregen" uses the eigenvector matrix [vc] and the chosen number of factors (n) to regenerate the original data in a matrix called [regen]. The residuals of the regeneration are contained in the matrix [resid]. The residual data is the subset of the original data that is not spanned by n factors. If too few factors are employed, the data will not be reproduced with sufficient accuracy. If too many factors are used, the extra factors will reproduce experimental error and will therefore serve no useful purpose.

Using MATLAB, the correct number of factors may be determined by inspection of the eigenvalue matrix [v ] or by use of the function file "pcrfrac". This uses the eigenvalues, [v ], to generate an empirical indicator function, [frac], which is a measure of the percentage data variation represented by the nth factor compared to the total variation remaining in the data after the variation in the first n-1 factors is removed. We plot [frac] and look for a minimum in the indicator function, noting how many factors correspond to that minimum. The recommended number of factors to keep is one less than that number.

FACTOR ANALYSIS OF FLUORESCENCE DATA

For a dilute sample containing a single emitting species we shall assume that the elements of the data matrix are given to an adequate approximation by

D(i, j) = 2.303 r ff Peli b csol j (5)

where r is the ratio of photons measured to photons emitted (a measure of instrument "response"); ff is the quantum efficiency of the fluorescence; Pli is the radiant power of exciting radiation at the wavelength corresponding to row i in the data matrix; e is the molar absorptivity of the emitting species at wavelength li; b is the pathlength or cell thickness; and csol j is the concentration of the emitting species in the solution corresponding to column j in the data matrix.

Equation (5) can be expressed more compactly as

D(i, j) = K x(i) y(j) (6)

where K = r f(f) 2.303 b is a solution and wavelength independent factor; x(i) = Plieli; and y(j) = csol j.

For data taken from a sample containing n components, this equation becomes

D(i, j) = K x(i, k) y(k, j) (7)

where y(k, j) is proportional to the concentration of component k in mixture j; and x(i, k) is proportional to the number of photons emitted by component k at wavelength i. Equation (7) is simply the equation of the inner product of two vectors, and can be rewritten as

D(i, j) = K (i) (j) (8)

and the matrix [D] is thus the product of two matrices, one a row matrix, [R'], consisting of row vectors X(i), the other a column matrix, [C'], consisting of column vectors Y(j), i and j going from 1 to the number of components, n.

[D'] = [R'] [C'] (9)

Equation (9) is the fluorescence equivalent of equation (2). From this argument we expect spectra of components to be factors accounting for variance in columns of matrix [D'] and concentration vectors to account for variance in rows of matrix [D'].

When the correct number of factors has been determined, the intensity and concentration matrices for 'known' solutions are used to set up a calibration matrix [fcal]. The MATLAB function file "pcrcal" generates the calibration coefficients for a given set of calibration solutions. The calibration matrix can then be used to calculate the concentrations of components in unknown solutions [Cu] if their spectra are set up in an intensity matrix [Du]:

[Cu] = [fcal]*[Du] (10)

Mixtures of Tb(DPA)3 (DPA = Dipicolinic acid) and Rhodamine B undergo significant energy transfer due to the overlap of the excitation spectrum of Rhodamine B and the emission spectrum of Tb(DPA)3 (Figure 2). The magnitude of energy transfer, which can also be thought of as the deviation of the mixture's spectrum from Beer's law is shown in Figure 3. Spectrum C, the deviation from Beer's law was obtained by subtracting the actual spectrum of the mixture, spectrum B, from the theoretical Beer's law spectrum, spectrum A.

FIGURE 2: (___) Emission Spectrum of Tb(DPA)3.

(. . .) Excitation Spectrum of Rhodamine B.

(---) Emission Spectrum of Rhodamine B.

FIGURE 3: A: Theoretical Beer's Law spectrum of a mixture of

0.25 µM Rhodamine B and 1.125 µM Tb(DPA)3;

B: Observed spectrum of same mixture; C: Deviation

from Beer's law.

In this experiment you will use PCFA to determine the concentrations and identities of emitting chromophores in unknown solutions.

INSTRUMENTATION

You will use a Ratio Spectrofluorometer to measure the emission spectra of your analytes. This instrument consists of two monochromators (an excitation and an emission monochromator) for wavelength selection, a precision sine drive system, a xenon light source, a sample compartment, a reference compartment, and two PM tube detectors.

The high intensity xenon lamp, Figure 4, is used as the excitation source. Its output is collimated and focused on the entrance slit of the excitation monochromator. The light from the slit is reflected by a mirror to a grating. This grating directs the dispersed light to a mirror which focuses the light on the exit slit. These slits control the bandpass and the the amount of light incident to the sample. A portion (10 to 15 percent) of the light is directed through the reference compartment to a diffuser. The diffuser acts as a beam scrambler for the reference PM tube to

FIGURE 4: Simplified Optical Diagram of the Spectrofluorimeter.

minimize the effects of PM tube cathode nonuniformities. The output of this tube is used by the Ratio Photometer to correct for xenon lamp fluctuations and to measure percent transmittance (%T) of samples placed in the reference compartment cuvet holder.

Fluorescent light emitted by a solution placed in the sample compartment enters the emission monochromator through an entrance slit. The fluorescent light is reflected by a mirror onto a grating. This grating directs the light to a mirror which focuses the light on the exit slit. The entrance and exit slits control the bandpass and the amount of fluorescent light incident on the signal PM tube. The signal PM tube converts the incident light into a corresponding electrical signal which is applied to the Ratio Photometer measuring circuits. The high voltages used to operate the PM tubes are provided by the Ratio Photometer.

EXPERIMENTAL SECTION

Reagents:

A. 0.1 M TRIS (Tris(hydroxymethyl)aminomethane) buffer at pH 8.0

B. 2 x 10-6 M Rhodamine B in 0.1 M TRIS

C. 4 x 10-5 M Tb(DPA)3 (DPA = Dipicolinic acid = 2,6-Pyridinedicarboxylic acid) in 0.1 M

TRIS

Procedure

1. Using the stock solutions, prepare 10 ml of each of the following solutions:

Solution # [Tb(DPA)3] (x 10-6 M) [Rhodamine B] (x 10-6 M) ______________________________________________________________________________

A 0 0.125 B 0 0.250 C 0 0.500 D 0 1.000 E 2.50 0 F 5.00 0 G 1.25 0.125 H 1.25 0.250 I 2.50 0.250 J 5.00 0.500

______________________________________________________________________________

Notes: (1) Rhodamine B and its solutions are highly toxic. You must wear

gloves at all times when handling solutions containing this

compound. All solutions must be prepared in a fume hood.

(2) Solutions of Rhodamine B are photosensitive. To minimize

photodecomposition, all solutions of this compound should be

wrapped in aluminum foil, and stored in the desk (cabinet under the

spectrofluorimeter) when not in use.

2. In addition, your TA will give you 10 ml each of three unknowns:

solK - containing an unknown concentration of Rhodamine B in TRIS buffer

solL - containing a mixture of Rhodamine B and Tb(DPA)3 in TRIS buffer

solM - containing an unknown concentration of Rhodamine B or Tb(DPA)3 or a mixture of

both compounds in TRIS buffer.

Make two dilutions of the unknown solution M as follows:

(a) Dilute 5 ml of solution M with 0.1 M TRIS buffer in a 10 ml volumetric flask -- this is

solution N (solN).

(b) Dilute 5 ml of solution N with 0.1 M TRIS buffer in another 10 ml volumetric flask.

This is solution O (solO).

3. Turn on the power bar to the computer/fluorimeter and wait for the C:\> prompt to

appear. At the prompt type fldm and wait for the main screen to appear.

Select Emission Options from the Instrument menu.

Highlight Emission Filter: 430*

*The Emission Filter may or may not be used...consult your TA.

Select Scan from the Instrument menu and wait for the scan options screen to appear. Make sure Emission is highlighted and the following parameters are chosen:

From: 450 To: 650 Excitation: 260 Scan Speed: 240 Ex. Slit: 5.0 EM Slit: 5.0 No. of Scans: 1 Increment: 0

Destination: solnNX.sp (where N = the letter of the

solution being measured; x = student #

Record the emission spectrum of each of the 15 solutions (SOLA-SOLO) from 450 to 650 nm in the 1 cm quartz cuvet. Each scan is initiated by selecting OK at the bottom of the Scan menu screen.

4. After each scan, the cuvet should be rinsed using 3 x 1 ml portions of TRIS buffer and then

3 x 0.5 ml aliquots of the next solution to be studied. All solutions should be introduced into

and removed from the cuvet using a clean pasteur pipet. Be careful not to scratch the cuvet

with the pipet. NOTE: THE QUARTZ FLUORESCENCE CUVET COSTS $100.00.

PLEASE EXERCISE EXTREME CAUTION IN USING IT AND AVOID

BREAKAGE.

5. As each spectrum is run it will appear on the screen in a view window. When the screen indicates that a run is complete select scan from the Instrument menu to prepare for the next sample. Be sure to alter the filename in the destination box before initiating the next run. If you forget to do this a dialog box will appear to remind you and offer you a chance to go back (select cancel) and alter the filename.

6. When all 15 samples have been run you will have 15 files stored in the Data Region window on the Main Screen. You must now export your data to your diskette (Drive B:) and then, in a separate step, edit your data files so they will be accepted by the Matlab analysis program.

a. close the View screen. b. place your diskette in Drive B: c. select the List Ascii option from the Options menu. d. Input your first filename (solnNX.sp) in the Source Data box. e. input B:\ in Output Directory. f. input the same filename (solnNX.sp) in Output File. g. input From: 450 To: 650. h. input, (comma) as Data Separator and Select OK. Repeat this procedure for all

15 files.

i. close the Ascii Data Formatting Window.

j. open the Drive B: icon to be sure all of your files are now on your diskette.

k. select Quit to DOS from the File menu and clear the Data Region window upon

exiting.

7. At the C:\> prompt type PE.

a. Enter the name of the first file (b:\lastnameN.sp) and hit enter on the keyboard.

You will see

There are 401 data pairs in this file.

The data run from 450 to 650 nm. Press any key to continue...

b. Save the wavelength data as b:\SOLNX.WVL

where N = letter of the solution measured.

X = the number assigned to you in this lab course.

c. Save the intensity data as b:\SOLNX.dat.

d. Repeat the procedure for all 15 files.

When you are done you will have 15 intensity files (SOLNX.dat to

SOLOX.dat) each containing 81 data points.

DATA ANALYSIS AND DISCUSSION QUESTIONS

NOTE: It is recommended that you perform this analysis using the terminals in Room 230C. If you try to use the Mac II terminals in the Chemistry Library some of the more complex graphs might appear distorted on the screen. Despite these visual problems, your printout should be normal.

1. Logon to Chem133, the SUN workstation and invoke MATLAB by typing matlab < ENTER> .

2. You will use the MATLAB subroutine fluor to perform chemometric analysis of your fluorescence data. At the MATLAB > > prompt type fluor < ENTER> and follow the prompts on the screen.

3. You will be prompted for your name, the name of your lab partner, and the value of your student number (X) used in your data files solNX.dat.

NOTE: If, for any reason, your data files are not named solNX.dat (where

N = A-O), you must now rename your files - consult your TA if in

doubt.

4. Follow the prompts and you will be asked to enter the concentrations (in moles/liter) of Rhodamine B and [Tb(DPA)3] in solutions A - J. For example, the concentration of Rhodamine B in solution A is 0.125e-6 moles/liter. Enter the calculated concentrations of YOUR solutions, which may not be exactly the same as the concentrations listed in the table shown in step (1) of the experimental procedure. Do not worry if you make an error in entering concentration data - the program will give you the opportunity to correct any mistakes.

5. When the concentration data has been correctly entered, the main menu of the chemometrics program will appear on the screen:

(1) Analysis of One Component Spectra and Unknown Solution K (2) Analysis of Two Component Spectra and Unknown Solution L. (3) Analysis of n-Component Unknown Solution M (4) Exit program.

You should proceed through each analysis in the order shown. First select option (1) to analyze the one-component solutions containing only Rhodamine B.

6. The emission spectra of solutions A, B, and D will be plotted. Press < ENTER> when you are ready to continue. MATLAB will now perform a principal component factor analysis (PCFA) of your data and generate the eigenvectors (in an 81 x 81 matrix [vc]) and eigenvalues (in an 81 x 1 matrix [vl]) which span the spectral data space of these three solutions. You must now determine how many factors (n) are significant and should be retained. This determination is made in two ways:

(i) Determine the number of significant eigenvalues from a plot of [vl] - consult your TA

if you need assistance with this.

(ii) Use of an indicator function frac as described in the theory section of this experiment.

The eigenvalues are used to generate an empirical function [frac], which is a measure

of the percentage data variation represented by the nth factor compared to the total

variation remaining in the data after the variation in the first n-1 factors is removed.

Look for a minimum in the plot of [frac], and note how many factors correspond to

that minimum. The recommended number of factors to keep is one less than that

number.

When the plots of (i) [v ]] and (ii) [frac] appear on the screen, make careful note of the appropriate value of n. (CAUTION: These plots may not be retrieved without rerunning the program from the beginning. Therefore you should decide on the appropriate value of n while each graph is on the screen.)

7. Comment in your lab report on the number of factors (n) suggested by analysis of the eigenvalues and by analysis of the indicator function. Was the result what you would have predicted? If not, why not? If the two methods of analysis suggest different values of n explain why this occurs. In your lab report discuss the physical meaning of each significant factor. The program will give you the opportunity to examine the effect of using different values of n.

8. Now you will examine how well your chosen number of factors will recreate the original spectral data. If you have chosen well, the regenerated data will be superimposed on the original spectra (watch the screen carefully!), and the residuals will be small and contain only noise (make particular note of the y-axis scale on the plot of residuals). The residuals represent data that could not be spanned with the subset of n factors. You may repeat this analysis as many times as you wish using any integral value for n. Explain each choice that you make. Explain any differences you observe in each analysis. Beware of using values of n which are too high. The optimum value of n is the minimum value which allows regenerated spectra to be superimposed over the original data, with low residuals.

9. You have determined the number of significant factors (n) which are required to reproduce the original data. You must now identify the physical meaning of each factor using target transformation factor analysis (TTFA). You will use standard solutions of Rhodamine B (solution C) and of Tb(DPA)3 (solution E) as test vectors in the data space generated by the spectra of solutions A, B and D. As described in the theory section TTFA uses a least squares procedure to minimize the difference between the test spectrum (vector [vt]) and the predicted spectrum [pr]. If the test spectrum is a real factor, the point-by-point agreement between it and the predicted spectrum should be good. If the test spectrum and the predicted spectrum are significantly different, it is unlikely that the test spectrum is a real factor. If there is good agreement between the spectrum of solution C and the predicted spectrum, then you have successfully identified Rhodamine B as a component in solutions A, B, and D. Explain the success or failure of the test using solution E as a test vector.

10. In order to confirm the presence of Rhodamine B in the one-component unknown solution K, we will use its spectrum as a test vector in TTFA of the data space spanned by the spectra of solutions A, B, and D. Comment quantitatively on the results.

11. Now that you have confirmed the presence of Rhodamine B in the unknown solution K, you will use PCFA to determine its concentration. The spectral and concentration data matrices for the known solutions A, B, and D are used to set up a calibration matrix. Using this calibration matrix and the spectral intensity matrix for solution K, you can calculate the concentration of Rhodamine B in this solution. When prompted for the number of factors to use in this calculation, use the optimum value of n determined in step (8). Although the results of your analysis will be sent to the printer, it would be wise to record the concentration data in your laboratory notebook when it appears on the screen.

12. You are now returned to the main menu of the chemometrics program. Proceed with option (2), the analysis of two-component solutions. Follow the on-screen prompts as before. All issues discussed in your study of one-component solutions should also be addressed here. Some of these issues will be raised by on-screen text questions. Of particular importance in the two-component system, you should discuss which of the two largest factors is the spectrum of Rhodamine B, and which is the spectrum of Tb(DPA)3. Hint: An examination of the spectral regeneration and, in particular, of the residuals, should help you with this discussion. Explain the difference in magnitude of the two major factors.

13. You will use PCFA to calculate the concentration of "known" solution J, based on its spectral intensity matrix and a calibration matrix generated from the intensity and concentration data of solutions F, G, H, I, and J. If you wish, you may repeat this calculation using a different number of factors - make careful note of your results. How do your results compare with the known concentrations of the two components in solution J? Use this data to estimate the percent uncertainty involved in using PCFA to determine the concentrations of Rhodamine B and Tb(DPA)3 in unknown solutions.

14. Again you are returned to the main menu to complete the analysis of the unknown n-component solution M. This solution may contain Rhodamine B or Tb(DPA)3 or a mixture of both chromophores or neither species! Select option (3) and follow the on-screen prompts as before. Your data matrix will consist of the spectrum of solution M, and the spectra of solutions N and O, prepared by diluting solution M. Use PCFA to determine the number of components present. Confirm that you are correct by regenerating the original data. Use TTFA to determine the identity of each component using standard spectra of Rhodamine B (solution C) and Tb(DPA)3 (solution E). As before, comment quantitatively on the agreement between the test and predicted spectra. Finally determine the concentration of each component in solution M using the calibration data generated from the two component solutions F, G, H, I, and J.

Notes: (1) The use of two-component data to create the calibration matrix will allow you to

determine the concentrations of zero, one, or two emitting chromophores.

(2) If your unknown solution, M, contains only one component, the concentration

calculated for the second component should be negligible.

15. Now that you have completed your analysis, you may exit the program by selecting option (4) from the main menu. All of your graphs and concentration calculations will be sent to the printer in the order in which they were generated. Type exit < ENTER> to quit MATLAB and type logout < ENTER> to logout from the SUN computer.

16. Summarize your results clearly at the end of your report.

LITERATURE CITED

(1) R. W. Rozett and E. M. Petersen, Anal. Chem., 1976, 48, 817.

(2) J. T. Bulmer and H. F. Shuruell, J. Phys. Chem., 1973, 77, 256.

(3) E. R. Malinowsky, D. G. Howery, Factor Analysis in Chemistry, John Wiley & Sons, New York, 1980.

(4) Skoog, D. A.; West, D. M.; Holler, F. J. Fundamentals of Analytical Chemistry, 6th edition, Saunders College Publishing, 1992, Chapter 20, Section 20D and subsections; Chapter 23.

(5) Lochmüller, C. H. and Meiseles, B., "Quantitative Analysis of Multicomponent Fluorescent Mixtures Using Target Transformation Factor Analysis", Duke University, unpublished results. Experiment 3 is based upon this work.

EXPERIMENT 3

PRE-LABORATORY ASSIGNMENT

1. Given the following stock solutions:

A. 0.1 M TRIS (Tris(hydroxymethyl)aminomethane) buffer at pH 8.0 B. 2 x 10-6 M Rhodamine B in 0.1 M TRIS C. 4 x 10-5 M Tb(DPA)3 in 0.1 M TRIS state exactly how you would prepare 10 ml of each of the following solutions:

Solution # [Tb(DPA)3] (x 10-6 M) [Rhodamine B] (x 10-6 M) ______________________________________________________________________________

A 0 0.125 B 0 0.250 C 0 0.500 D 0 1.000 E 2.50 0 F 5.00 0 G 1.25 0.125 H 1.25 0.250 I 2.50 0.250 J 5.00 0.500

______________________________________________________________________________

Note: In this experiment you will use adjustible pipets which may be set to deliver small

precise (+0.01 ml) solution volumes. This should allow you to prepare solutions

having exactly the concentrations listed in this table.

2. What is energy transfer and when does it occur? 3. Following absorption of light, what are the principle modes of deactivation of the excited state(s)? (see reference (4))

APPENDIX A

FACTOR ANALYSIS: TUTORIAL

Factor analysis is a purely mathematical technique that neither claims nor requires previous knowledge of the physical system generating the data. The following tutorial was adapted from Malinowski3 to explain the steps involved in factor analysis. Factor analysis is usually used on large data sets. We will use a small data matrix (10x3) so that we can do most of the steps by hand or with a calculator. After completing this tutorial, you should examine the same data set using the factor analysis programs used in Experiments 1 and 3.

MATRIX MULTIPLICATION: A REVIEW

A matrix [D] may be formed from the product of a row factor matrix [R] and a column factor matrix [D] as follows:

[R] x [C] = [D]

DATA ANALYSIS

To use factor analysis, the experimenter initially begins with a data matrix [D]. For example, in the HPLC experiment data matrix, the columns consist of solvent systems and the rows correspond to each solute. Our sample matrix is the following:

[D] =

1. Construction of the covariance matrix

The first step the factor analysis program takes is the construction of the square covariance matrix [Z]. The covariance matrix is the dot product of the column vectors of the data matrix or the product of the data matrix and its transpose [D]T:

[Z] = [D]T[D]

Using our sample matrix, calculate [Z].

answer:

[Z] =

(A normalized covariance matrix, called a correlation matrix can also be generated. Normalization is a scaling of the rows or columns such that the spread of every row or column is the same. We will not deal with the normalized data in this tutorial.)

2. Determination of the First Eigenvector

The data is now in the correct form for eigenvector analysis. The factor analysis program computes the first eigenvector (which accounts for as much variance as possible in the data) by decomposition of the covariance matrix. In this process the eigenvector is obtained by the method of iteration.

(a) First, we set up the equation:

[Z]C1 = L1C1

Where C1 is the first eigenvector and L1 is the eigenvalue.

(b) We arbitrarily set C1 = and multiply it by the covariance matrix to

obtain a value for [Z]C1 and therefore (L1C1):

(c) The resulting column vector is normalized by dividing each element by the square

root of the sum of the squares of its elements:

Calculate the sum of the squares of the elements of the column vector:

Calculate the normalized column vector:

So:

[Z]C1 = (14537.4)-1C1

and (14537.4)-1 is the first approximation for L1.

(d) We now have a new approximation for C1:

(e) We now set up the equation as follows:

[Z][2ndC1] = L1[3rdC1]

The process is repeated, generating new and better approximations of C1 and L1

until the equation:

[Z][C1] = L1[C1] is satisfied.

Continue the iteration and calculate the final values for L1 and C1.

Answer:

L1 = 26868.9

C1 =

(f) Test these results:

[Z] X [C1] = L1[C1]

L1[C1] ÷ L1 = [C1]

L1 is the first eigenvalue.

3. Calculation of the First Residual Matrix

A new matrix can be calculated in which all variance accounted for by the first eigenvector has been removed. There is no variance due to the first factor in this residual matrix [R].

[R]1 = [Z] - L1C1C1T

Calculate [R]1:

Answer:

4. Calculate the second eigenvector and corresponding eigenvalue using the same procedure

as before.

Answer:

L2 = 4040.2

We would expect the value of eigenvalue L2 to be less than the value of eigenvalue

L1 since the eigenvector C2 accounts for less variance than the eigenvector C1.

5. Calculate the second residual matrix.

Answer:

Because the residual matrix is essentially zero, the eigenvector analysis is complete for this

set of data. This means that two factors are sufficient to account for the variance in the data.

More complicated (larger) data sets will produce more eigenvectors.

6. Determination of Abstract Column and Row Factor Matrices

(a) The column cofactor matrix [C*] is determined as follows:

(b) The row cofactor matrix [R*] is calculated as follows:

[R*] = [D][C*]T

Calculate [R*]:

Answer:

[R*] =

Obviously, [R] x [C] = [D].

NOTE: [C*] is equivalent to [vc] in MATLAB

[R*] is equivalent to [vr] in MATLAB

is equivalent to [v ] in MATLAB

19