您好,欢迎来到意榕旅游网。
搜索
您的当前位置:首页数据挖掘英文题目

数据挖掘英文题目

来源:意榕旅游网
2.4. Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45,

46, 52, 70.

(a) What is the mean of the data? What is the median?

(b) What is the mode of the data? Comment on the data's modality (i.e., bimodal, trimodal, etc.).

(c) What is the midrange of the data?

(d) Can you ¯nd (roughly) the ¯rst quartile (Q1) and the third quartile (Q3) of the data?

(e) Give the ¯ve-number summary of the data.

(f ) Show a boxplot of the data.

(g) How is a quantile-quantile plot di®erent from a quantile plot ?

2.9. Suppose a hospital tested the age and body fat data for 18 randomly selected adults with the following result

age 23 23 27 27 39 41 47 49 50

%fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2

age 52 56 57 58 58 60 61

%fat 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7

(a) Calculate the mean, median and standard deviation of age and %fat.

(b) Draw the boxplots for age and %fat.

(c) Draw a scatter plot and a q-q plot based on these two variables.

(d) Normalize the two variables based on z-score normalization.

(e) Calculate the correlation coe±cient (Person's product moment coe±cient). Are these two variables positively or negatively correlated?

2.11. Use the two methods below to normalize the following group of data:

200; 300; 400; 600; 1000

(a) min-max normalization by setting min = 0 and max = 1

(b) z-score normalization

4.4. Suppose that a base cuboid has three dimensions A; B; C, with the following number of cells: jAj =

1; 000; 000, jBj = 100, and jCj = 1000. Suppose that each dimension is evenly partitioned into 10 portions for chunking.

(a) Assuming each dimension has only one level, draw the complete lattice of the cube.

(b) If each cube cell stores one measure with 4 bytes, what is the total size of the computed cube if the

cube is dense ?

(c) State the order for computing the chunks in the cube that requires the least amount of space, and

compute the total amount of main memory space required for computing the 2-D planes.

5.3. A database has ¯ve transactions. Let min sup = 60% and min conf = 80%.

(a) Find all frequent itemsets using Apriori and FP-growth, respectively. Compare the e±ciency of the two mining processes.

(b) List all of the strong association rules (with support s and con¯dence c) matching the following

metarule, where X is a variable representing customers, and itemi denotes variables representing

items (e.g., \\A\

8x 2 transaction; buys(X; item1) ^ buys(X; item2) ) buys(X; item3) [s; c]

6.14. The following table shows the midterm and ¯nal exam grades obtained for students in a database course.

(a) Plot the data. Do x and y seem to have a linear relationship?

(b) Use the method of least squares to ¯nd an equation for the prediction of a student's ¯nal exam grade based on the student's midterm grade in the course.

(c) Predict the ¯nal exam grade of a student who received an 86 on the midterm exam.

7.2. Given the following measurements for the variable age :

18; 22; 25; 42; 28; 43; 33; 35; 56; 28;

standardize the variable by the following:

(a) Compute the mean absolute deviation of age.

(b) Compute the z-score for the ¯rst four measurements.

7.3. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36,

8):

(a) Compute the Euclidean distance between the two objects.

(b) Compute the Manhattan distance between the two objects.

(c) Compute the Minkowski distance between the two objects, using p = 3.

例2.4.3:已知有20个样本,每个样本有2个特征,数据分布如下图,使用C-均值法实现样本分类(C=2)。

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- yrrf.cn 版权所有 赣ICP备2024042794号-2

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务