[译文] 【连载】CB Predictor操作手册(Crystal Ball Predictor 水晶球预测)

Regression methods回归方法

CB Predictor uses one of three regression methods: standard regression, forward stepwise regression, and iterative stepwise regression.
CB Predictor使用三种回归方法中的一种:标准回归,逐步回归和迭代回归。

Standard regression标准回归

Standard regression performs multiple linear regression, generating regression coefficients for each independent variable you specify, no matter how significant.

Forward stepwise regression逐步向前回归

Forward stepwise regression adds one independent variable at a time to the multiple linear regression equation, starting with the independent variable with the most significant probability of the correlation (partial F statistic). It then recalculates the partial F statistic for the remaining independent variables, taking the existing regression equation into consideration.

CB Predictor Note: The resulting multiple linear regression equation will always have at least one independent variable.

Forward stepwise regression continues to add independent variables until either:
•It runs out of independent variables.
•It reaches one of the selected stopping criteria in the Stepwise Options dialog.
•The number of included independent variables reaches one-third the number of data points in the series.
There are two stopping criteria: 有两个停止标准:
R-squaredR平方 Stops the stepwise regression if the difference between a specified statistic (either R2 or adjusted R2) for the previous and new regression solutions is below a threshold value. When this happens, CB Predictor does not use the last independent variable.如果之前的统计和新的回归方法之间的偏差低于临界值的时候应该停止逐步回归。当这种情况发生的时候,CB Predictor不使用最后一个自变量。For example, the third step of a stepwise regression results in an R2 value of 0.81, and the fourth step adds another independent variable and results in an R2 value is 0.83. The difference between the R2 values is 0.02.If the Threshold value is 0.03, CB Predictor returns to the regression equation for the third step and stops the stepwise Regression例如,逐步回归结果的第三步R2的值是0.81,第四步加另一个独立变量后R2的值是0.83。两者之间的R2值差值是0.0 2
如果差值为0.03CB Predictor将返回到第三步回归方程,并停止逐步回归。
F-test significanceF
Stops the stepwise regression if the probability of the F statistic for a new solution is above a maximum value.如果新方案的F值是最大的则停止逐步回归。For example, if you set the maximum probability to 0.05 and the F statistic for the fourth step of a stepwise regression results in a probability of 0.08, CB Predictor returns to the regression equation for the third step and stops the stepwise regression.例如,如果设置的最大概率值是0.05,逐步回归结果第四步F统计值是0.08CB Predictor返回到第三步回归方程,并停止逐步回归

Iterative stepwise regression迭代回归

Iterative stepwise regression adds or removes one independent variable at a time to or from the multiple linear regression equation.
To perform iterative stepwise regression, CB Predictor:迭代回归的情况是:

1.Calculates the partial F statistic for each independent variable.

2.Adds the independent variable with the most significant correlation (partial F statistic).

3.Checks the partial F statistic of the independent variables in the regression equation to see if any became insignificant (have a probability below the minimum) with the addition of the latest independent variable.

4.Removes the least significant of any insignificant independent variables one at a time.

5.Repeat step 3 until no insignificant variables remain in the regression equation.

6.Repeat steps 1 through 5 until:
•The model runs out of independent variables.
•The regression reaches one of the stopping criteria (see the previous section for information on how the stopping criteria work).
•The same independent variable is added and then removed
        使用完了所有的自变量
        达到了停止回归的标准
        同一个进行了自变量增加、删除。

CB Predictor Note: The resulting equation will always have at least one independent variable.
CB Predictor注意:方程最后至少要包含一个自变量

Regression statistics回归统计

Once CB Predictor finds the regression equation, it calculates several statistics to help you evaluate the regression. See Appendix E, “Regression Statistic Formulas” for more information on the formulas CB Predictor uses to calculate these statistics.
一旦CB Predictor得到回归方程,它会帮助你计算各种统计数据来对回归进行评。附录E中“回归统计公式”有关于CB Predictor计算统计的更多内容。


Coefficient of determination. This statistic indicates the percentage of the variability of the dependent variable that the regression equation explains.

For example, an R2 of 0.36 indicates that the regression equation accounts for 36% of the variability of the dependent variable.

Adjusted R2调整 R2

Corrects R2 to account for the degrees of freedom in the data. In other words, the more data points you have, the more universal the regression equation is. However, if you have only the same number of data points as variables, the R2 might appear deceivingly high. This statistic corrects for that.


For example, the R2 for one equation might be very high, indicating that the equation accounted for almost all the error in the data. However, this value might be inflated if the number of data points was insufficient to calculate a universal regression equation.



Sum of square deviations. The least squares technique for estimating regression coefficients minimizes this statistic, which measures the error not eliminated by the regression line.

For any line drawn through a scatter plot of data, there are a number of different ways to determine which line fits the data best. One method is to compare the fit of lines is to calculate the SSE (sum of the squared errors) for each line. The lower the SSE, the better the fit of the line to the data.

F statistic

Tests the significance of the regression equation as measured by R2. If this value is significant, it means that the regression equation does account for some of the variability of the dependent variable.

t statistic

Tests the significance of the relationship between the coefficients of the dependent variable and the individual independent variable, in the presence of the other independent variables. If this value is significant, it means that the independent variable does contribute to the dependent variable.


Indicates the probability of your calculated F or t statistic being as large as it is (or larger) by chance. A low p value is good and means that the F statistic is not coincidental and, therefore, is significant. A significant F statistic means that the relationship between the dependent variable and the combination of independent variables is significant.

Generally, you want your p to be less than 0.05.


Detects autocorrelation at lag 1. This means that each time-series value influences the next value. This is the most common type of autocorrelation.For the formula, see page 138.
The value of this statistic can be any value between 0 and 4. Values indicate slow-moving, none, or fast-moving autocorrelation, as shown in Table 2.2.

Table 2.2 Interpreting the Durbin Watson statistic
Durbin-Watson statistic统计Means含义
Less than 1
The errors are positively correlated. An increase in one period follows an increase in the previous period.误差是正相关。一个周期增加下一个周期也会随着增加。
22No autocorrelation.不相关
More than 3大于3The errors are negatively correlated. An increase in one period follows an decrease in the previous period.误差是负相关。一个周期的增加下一个周期会随之减少。

Avoid using independent variables that have errors with a strong positive or negative correlation, since this can lead to an incorrect forecast for the dependent variable.



Historical data statistics历史数据统计

CB Predictor automatically calculates the following statistics for historical data series:
CB Predictor会自动计算历史数据的下列统计值:

The mean of a set of values is found by adding the values and dividing their sum by the number of values. The term “average” usually refers to the mean. For example, 5.2 is the mean or average of 1, 3, 6, 7, and 9.

standard deviation标准方差
The standard deviation is the square root of the variance for a distribution. Like the variance, it is a measure of dispersion about the mean and is useful for describing the “average” deviation.

For example, you can calculate the standard deviation of the values 1, 3, 6, 7, and 9 by finding the square root of the variance that is calculated in the variance example below.
The standard deviation, denoted as , is calculated from the variance as follows:

where the variance is a measure of the dispersion, or spread, of a set of values about the mean. When values are close to the mean, the variance is small. When values are widely scattered about the mean, the variance is larger.

To calculate the variance of a set of values:

1.Find the mean or average.

2.For each value, calculate the difference between the value and the mean.

3.Square these differences.

4.Divide by n-1, where n is the number of differences.

For example, suppose your values are 1, 3, 6, 7, and 9. The mean is 5.2. The variance, denoted by , is calculated as follows:


The minimum is the smallest value in the data range.

The maximum is the largest value in the data range.

Ljung-Box statistic Ljung-Box 统计
Measures whether a set of autocorrelations is significantly different from a set of autocorrelations that are all zero. See page 139 for the formula.


下次进入手册的第三章,用CB Predictor进行预测{:8_307:}
Chapter 3              
Forecasting with CB Predictor
用CB Predictor进行预测

In this chapter
•Creating a spreadsheet with historical data
•Loading and starting CB Predictor
•Guidelines for using CB Predictor
•Analyzing the results
•Customizing reports and charts
•加载和启动CB Predictor
•使用CB Predictor指南

This chapter contains detailed procedures for using CB Predictor. It describes how to forecast using both time-series forecasting methods and multiple linear regression. It also describes all the settings you can choose for your results.
本章对CB Predictor的使用步骤进行了详细描述。它对时间序列和多元线性回归的预测都进行了描述,同时对选择结果的设置进行了描述。


When you use CB Predictor for the first time, there are several things you need to do:
当你第一次使用CB Predictor时,需要注意以下几方面的事情:

1.Create an Excel spreadsheet with your historical data.

2.Start CB Predictor.
2、打开CB Predictor

3.Run CB Predictor.
3、运行CB Predictor

4.Analyze your results.

5.Customize your results.

This chapter describes how to complete each of these steps to make forecasting your historical data an easy task.

Creating a spreadsheet with historical data

Before using CB Predictor, you must create an Excel spreadsheet with your historical data. Creating a spreadsheet for use with CB Predictor is easy. The spreadsheet must include:
在使用CB Predictor之前,你必须先建立一张有历史数据的excel电子表格。这张电子表格必须包括:

•Optionally, a descriptive spreadsheet title.

•Optionally, a date (or other time period, such as Q2-2004) column or row, either at the top or along the left side of your data. If you format your dates as Excel dates, CB Predictor’s Intelligent Input can find the dates, extend them with the forecasted values, and use them as labels on charts.
有一组数据按列或者行进行排列,如果对数据继续排列,CB Predictor会自动的选择输入的数据,并且会输出预测值。

•Historical data, spaced equal time periods apart, in columns or rows adjacent to the date column or row. You can use CB Predictor to simultaneously forecast from one to 10,000 adjacent historical data series.
历史数据必须是同一时间段的数据,按列或者行排列,你可以使用CB Predictor同时对相近的1-10000的历史数据进行预测。

Excel Note: Excel only has 256 possible columns, compared to 16,000 or 65,000 possible rows (depending on your version). So, if you have a large number of historical data points, organize your data in columns. If you have a large number of data series, organize your data in rows.

To produce a reasonable forecast, you should have at least 6 historical data values. To use seasonal forecasting methods, you need at least two complete cycles of data.

•Optionally, headings for each data column or row, such as SKU 23442, Gas Usage, or Interest Rate.

The Toledo Gas spreadsheet has all these components.


Loading and starting CB Predictor
加载和启动CB Predictor

If you have Crystal Ball Professional or Premium Edition, CB Predictor is loaded when you start Crystal Ball.

如果你有CB Predictor的专业版或高级版,CB Predictor会在你启动Crystal Ball的时候同时启动
To start CB Predictor, use one of these:

•In the menubar, choose Run > CB Predictor, or
•Type Alt-r, p.

When you start CB Predictor, the CB Predictor dialog, or wizard, appears as shown in Figure 3.2. It has four tabs, arranged from left to right in the typical order of their use. For descriptions of each setting on each tab, see Appendix A, “The CB Predictor Wizard.”
开始CB Predictor,在菜单中点击Run>CB Predictor,或者点击Alt-r,p
当开始CB Predictor,CB Predictor对话框或者向导会出现,如图3.2.从左到右它有四个表格。

Guidelines for using CB Predictor CB Predictor使用指南

After you create your Excel spreadsheet with historical data, forecasting data using CB Predictor follows a 10-step process:
将历史数据放入到Excel电子表格中后,按照以下的10个步骤用CB Predictor对历史数据进行预测:

1.Select a cell range with your historical data to forecast.

2.Specify your data arrangement.

3.View a graph of your historical data to identify any seasonality (data cycles) or trend and to see summary statistics.

4.Identify your time periods, whether your data have seasonality, and, if so, how long your season is.

5.Set whether you want to use multiple linear regression to forecast any variables.

6.Select the time-series forecast methods to try for each variable.

7.Enter the number of periods you want to forecast.

8.Select a confidence interval to calculate or display with your forecasted values.

9.Select the results you want.

10.Preview and run the forecast, creating your results.

CB Predictor leads you through these steps using the CB Predictor wizard, but you can also go directly to any of these steps to change methods or select different settings and reforecast data.
CB Predictor使用向导引导每个步骤,但是你也可以直接到其中的任何一个步骤去选择不同的预测结果。

Also, many of these steps might be done automatically by CB Predictor. For example, CB Predictor’s Intelligent Input can find and select your data, their arrangement, and the data units. And, you can skip many other steps if the settings are already what you want. For example, all the time-series forecasting methods are selected by default, and that is probably how most users will leave them.
同时,许多步骤CB Predictor会自动执行,例如,CB Predictor自动选择你要输入的数据,数据排列,和数据单位。你可以通过设置你想要的跳过一些步骤。

Selecting historical data选择历史数据

When you select your historical data, you must identify the Excel cells that contain the data, including dates and headers, and set settings to identify dates, headers, and orientation in the data.

To produce a reasonable forecast, you should have a minimum number of historical data points. CB Predictor imposes several requirements defining a minimum number of points to use to create a forecast. The bare minimum is 5. However, other limitations include:
为了得到一个合理的结果,你需要有历史数据的最小值。CB Predictor

•Single moving average requires that the number of historical data points is twice the number of points to forecast.

•Double moving average requires that the number of historical data points be three times the number of points to forecast (or at least 6, whichever is higher).

•To use seasonal methods, you must have at least two seasons (complete cycles) of historical data.

•For multiple linear regression, the number of historical data points must be three times the number of independent variables (counting the included constant as an independent variable).

•To lag an independent variable in multiple linear regression, the number of historical data points must be three times the lag.

If your data has empty cells in the middle of a data series, CB Predictor returns an error. CB Predictor treats zeros in data series as data values. If you are trying to forecast several data series at once, your data series do not have to start at the same time period. However, all the data series must end at the same time period.
如果你的数据系列中有空的单元格,CB Predictor会提示错误。CB Predictor会把它默认为是0。如果你想一次预测更多的数据,你的数据系列中不需要都是同一个时间开始的数据,但是需要是在同一个时间结束的数据。

When you initially open a spreadsheet, there are three ways to select the historical data to forecast:

•Use CB Predictor’s Intelligent Input.
使用CB Predictor自动选择

•Select your data before you start the wizard.

•Select your data after you start the wizard.

Automatic data selection数据自动选取

The easiest way to select data is to select one cell somewhere in your continuous data range before you start the CB Predictor wizard. When you start the wizard, CB Predictor’s Intelligent Input searches for all the adjacent cells with numbers, dates, and headers and makes some other assumptions about your input data, such as whether your data are in rows or columns. This often completes most of the fields and settings on the Input Data tab.
这是最简单的方法,在打开CB Predictor向导前在数据中选择其中的一个数据。当你打开向导,CB Predictor自动选择所有数据的单元格,包括数据、标题和对数据的说明。

Manual data selection before starting CB Predictor

在用 CB Predictor之前选择数据的方法

The second way to select your historical data is to highlight the data range (including headers and dates) before you start CB Predictor.
第二种方法是在开始CB Predictor之前选择历史数据(包括标题和数据)

Manual data selection within CB Predictor用 CB Predictor选择数据的方法

The third way to select historical data is to start CB Predictor with no data, date, or header cells selected. The Input Data tab of the wizard appears with the Range field blank. At this point, you must select your historical data manually.
第三种方法是先CB Predictor,然后选择历史数据。

To select historical data manually from the Input Data tab:

1.Start CB Predictor.
The Input Data tab of the wizard dialog appears. For more information on this dialog, see “Input Data tab” on page 100.
1、开始CB Predictor,输入数据的向导对话框出现。

2.Under Step 1, in the Range field, either type a range name (if defined), enter the range of cells with the historical data, including any headers (e.g., A4:B42), or:

a. Click Select.
The Select Range dialog replaces the wizard dialog.

b. Select the cells with the historical data, including any cells with headers and dates.

c. Click OK to return to the wizard dialog.
The selected range appears in the Range field.

Specifying data arrangement对数据进行排列

No matter which selection method you use, you must also specify your data arrangement to help CB Predictor identify whether you have dates and headers adjacent to your data series and whether your data are in rows or columns. If you used the Intelligent Input to select your data, these settings should already be set correctly.

To set arrangement settings, use the Input Data tab of the CB Predictor wizard as shown in Figure 3.2 on page 64
排列设置,参考第64页的表3.2 CB预测显示卡的方法输入数据:

1.Under Step 2 of the wizard, if your historical data is in:
•Rows, select Data In Rows
•Columns, select Data In Columns

2.If you have headers (titles) at the top of your columns or to the left of your rows, select the First Row (Column) Has Headers setting.

3.If your first column or row lists the dates or time periods for your data series, select the First Column (Row) Has Dates setting.

Viewing your historical data查看历史数据

As you progress through the wizard, you need to know if your data are seasonal (increase and decrease in a regular cycle) and, if so, what the season or cycle is. If you don’t already have a feel for the behavior of your data, you might want to view your selected historical data before you continue.

To view a graph of your historical data: 查看图表中的历史数据:

1.Under Step 3 of the CB Predictor wizard (on the Input Data tab), click View Data.
1、在CB Predictor的第三步向导卡(输入数据表格),点击查看数据

The View Historical Data dialog appears as shown in Figure 3.3.


图3.3.png (207.7 KB, 下载次数: 63)

2.If your data have a recurring pattern, your data might be seasonal, and you should note how many dates or time periods are in one cycle of the pattern.

3.If you selected more than one historical data series, change the graph to view another data series by selecting it from the Series list.

CB Predictor Note: When the Series list is selected, you can also use the up and down arrows to scroll through the list.
CB Predictor 注意:但选择序列数据时,可通过列表的上下滑轮来进行选择。  

4.To see the three highest autocorrelations and the Ljung-Box statistic, select Autocorrelations from the View list.

The View Historical Data dialog appears in Autocorrelations view as shown in Figure 3.4.

For information on the Ljung-Box statistic, see “Ljung-Box statistic” on page 53.
更多关于Ljung-Box 统计的信息,查看 “Ljung-Box统计”
For more information on both views of the View Historical Data dialog, see page 102.

5.Click Close.
The Input Data tab reappears.


Identifying time periods and seasonality确认时间周期和季节周期

You need to identify your time periods and seasonality for CB Predictor.
To identify your data’s time periods and seasonality:
你需要确认CB Predictor中时间周期。

1.In the Input Data tab, click Next.
The Data Attributes tab appears. For more information on this tab, see “Data Attributes tab”

2.Under Step 4 on the Data Attributes tab, identify the time period for your data values.
For example, if your data represent monthly numbers, select months.

3.Indicate the seasonality of your data:
•If any of your data series are seasonal, select the Seasonality setting and enter the number of time periods it takes before your data pattern repeats. You must have at least two seasons (complete cycles) of data to use the seasonal methods.

This number is usually the number of periods per year. For example, if you have 24 monthly data points, and your data has peaks every December, your seasonality (repeating pattern) has a period of one year or 12 months.

CB Predictor Note: You can also view the autocorrelations on the View Historical Data dialog to discover how many periods you have in a season.
•If none of your data are seasonal, select the No Seasonality setting.
•If you are forecasting multiple data series and each has a different seasonality, you must forecast each individually.
        如果数据没有周期性的话,选择No Seasonality
        如果要预测多组数据,并且每组数据有不同的周期性,你必须对单个进行预测。

Using multiple linear regression使用多元线性回归

If you know that some independent variables affect another variable of interest, you should use multiple linear regression as the forecasting method for that particular dependent variable. For example, summer temperatures affect electricity usage because as it gets hotter, more people run their air conditioning. This means that electricity usage (the dependent variable) is dependent on the temperature  (  an independent variable).

CB Predictor Note: The “multiple” in multiple linear regression represents the fact that you can have more than one independent variable.
To forecast a dependent variable with regression, CB Predictor:

CB Predictor中用回归的方法预测一个因变量:

a. Creates an equation that defines the mathematical relationship between the independent variables and a dependent variable. This is the regression equation.

b. Forecasts each independent variable by running all the selected time-series forecasting methods for each one and using the best method for each.

c. Calculates the regression equation with the forecasted independent variable values to create the forecast for the dependent variable.

This process of creating the regression equation, forecasting the independent variables, and calculating the results to forecast the dependent variable in one easy step is called HyperCasting™.

To use multiple linear regression: 使用多元线性回归:

1.Under Step 5 on the Data Attributes tab, if one or more variables depend on other variables that you have, select the Use Multiple Linear Regression setting.
The Regression Variables dialog appears as shown on page 106.

2.Usually the dependent variable or variables already appear in the Dependent Variables list. If they do not, follow these steps:

a. Select the name of your dependent variable in the All Series list.
You can have more than one dependent variable. CB Predictor forecasts them all, one at a time, as functions of all the same independent variables.
   你可以有多个自变量,CB Predictor可以对他们都进行预测,如果是有相同作用的自变量,一次预测一个。

b. Click >> next to the Dependent Variables list.
The variable moves to the Dependent Variables list.

3.Verify that all independent variables are included in the Independent Variables list. If not, add them the same way:

a. Select the names of your independent variables in the All Series list.
To select multiple names, hold down either the <Ctrl> key or the <Shift> key or drag the mouse over the list.

b. Click >> next to the Independent Variables list.
The variables move to the Independent Variables list.

4.To lag independent variable data by a number of time periods:

a. Select a variable from the Independent Variable list.

b. Enter a number in the Lag field at the bottom of the list.

c. Repeat for any other independent variables you want to lag.

5.For any independent variables you don’t want to forecast:

a. Select the variable from the Independent Variable list.

b. Select the Do Not Forecast setting at the bottom of the list.

c. Repeat for any other independent variables you don’t want CB Predictor to forecast.

6.Click OK.
The Data Attributes tab reappears.
6、点击ok 数据对话框显示出来

7.Select the regression method to use, either standard, forward stepwise, or iterative stepwise.

8.If you selected a stepwise regression, you can set settings associated with stepwise regression.

a. Click Stepwise Options.
The Stepwise Options dialog appears as shown on page 108.

b. Set the settings. For more information on these settings, see “Regression methods”

c. Click OK.
You return to the wizard.
C、点击ok 你可以返回到向导卡中

9.If you want CB Predictor to calculate the regression equation without a constant (to force the resulting equation to pass through the mathematical origin), be sure the Include Constant setting is not selected.
9、如果想用CB Predictor计算回归方程不是规定的(而不是强制的通过计算来得到方程结果),请确认包含固定设置的选项没有选择。

