Image Processing
这是第一次将我的Markdown笔记导入到Wordpress中,我发现有好多格式上的问题,内联中的公式和图片的导入进来就消失,很头疼~ 后续再慢慢改进。笔记内容是week2~week8的基本内容。
Spatial domain operations
Neighbourhood operations
(spatial filtering on groups of pixels) also called filtering
convolution math eqution:
image f(x,y) and Kernel h(x,y)
Kernel filter is Convolution filter which can change some feature of images
- Image coordinates are flipped
- Kernel coordinates are flipped
- Kernel is shifted and flipped

Kernel couldn’t always fit the size of image
border problem, some solution list:
- Padding
- Clamping
- Wrapping does’n often using
- Mirroring

Some filter
- Often used image blurring and noise reduction
- Simplest smoothing filter called uniform filter
- Gaussian filter
- Median filter also called order-statistics filter
can eliminates the random the noise pixels

Spatial domain operations
- directionly operation on piexls of images like Point operations and Neighbourhood operations
Transform domain operations
- mainly in Fourier space
Spatial VS Frequency domain
- Spatial: Changes in pixel position correspond to changes in the scene
- Frequency: Changes in pixel position correspond to changes in the frequency
• High frequencies correspond to rapidly changing intensities across pixel
• Low frequency components correspond to large-scale image structures
Throught the Fourier transform to processing the images:

Fourier transform (1D)
Feature representation
For further processing
- object detection
- image segmentation
- image classfication
- image retrieval 图像检索
- image stitching 图像拼接
- object tracking
Desirable properties of features
Reproducibility (robustness)
- Should be detectable at the same locations in different images despite changes in illumination and viewpoint
- 鲁棒性(Reproducibility): 在不同条件下(光照、视角变化)仍能被检测到。
Saliency (descriptiveness)
- Similar salient points in different images should have similar features
- 描述性(Saliency): 能够区分不同对象,使相似区域在不同图像中具有相似特
Compactness (efficiency)
- Fewer features
- Smaller features
- 紧凑性(Compactness): 降低维度,减少计算和存储开销。
Types of image features
| Colour features | Shape features |
|---|---|
| – Colour histogram | – Basic shape features |
| – Colour moment | – Shape context |
| – Histogram of oriented gradients (HOG) | |
| Texture features | |
| – Haralick texture features | |
| – Local binary patterns (LBP) |
Colour features (colour is the simplest feature to compute)
Colour histogram: Represent the global distribution of pixel colours in an image
– Step 1: Construct a histogram for each colour channel (R, G, B)
– Step 2: Concatenate the histograms (vectors) of all channels as the final feature vector

Colour moment
Moments based representation of colour distributions
– Gives a feature vector of only 9 elements (for RGB images)
– Lower representation capability than the colour histogram
Texture features
Visual characteristics and appearance of object and Especially used for texture classification
Haralick texture features
• Array of statistical descriptors of image patterns
• Capture spatial relationship between neighbouring pixels
haralick texture features:
通过构造灰度共生矩阵(GLCM),统计图像中像素灰度值对在一定距离和角度下的联合出现频率,再从 中提取诸如对比度、相关性、能量、同质性等统计量,能够描述图像的纹理模式。
1: Construct the gray-level co-occurrence matrix (GLCM)
2: Compute the Haralick feature descriptors from the GLCM
One example: often used in medical imaging studies due to their simplicity and interpretability
- Local binary patterns (LBP) (Describe the spatial structure of local image texture)
Local binary patterns:
在图像中选定一个局部区域(通常为一个小窗口),将中心像素与周围像素比较,生成一个二值模式(8位数字),再统计每种模式出现的频率形成直方图。LBP具有多尺度和旋转不变性,适合纹理分类。
典型的特征描述符
- SIFT algorithm(Scale-Invariant Feature Transform)

SIFT 特征在图像匹配、图像拼接和目标识别中非常常用,匹配时通常利用最近邻距离比(NNDR)进行 筛选。
Descriptor matching:

How do we represent an entire image using a set of SIFT features?
Feature encoding:
Global encoding of local SIFT features
Most popular method: Bag-of-Words (BoW)
Step1: Create the vocabulary k-means clustering (k-means聚类)
- Step2: Image encoding (“visual words” in this “vocabulary” used to represent an image)
Texture features
They can be used to identify and classify objects and one example : object recognition
Basic shape features
- Simple geometrical shape descriptors (利用物体轮廓计算诸如凸包面积、紧凑度、圆形度(例如:4πA/P²,对于完美圆形值为1)、延展性、离心率等量化描述。)
- Boundary descriptors
Shape context
Shape context is a point-wise local feature descriptor
- Pick 𝑛 points 𝑝𝑖 on the contour of a shape
Example : Shape matching
Histogram of oriented gradients (HOG)
- HOG描述符通过统计图像中局部区域(称为“cell”)内的梯度方向分布来刻画物体的局部形状。
- Step 1: Calculate the gradient vector at each pixel
- Step 2: Construct the gradient histogram of all pixels in a cell
- Step 3: Generate detection-window level HOG descriptor
Pattern Recognition
Basic following concept:
- Pre-processing 2. Feature extraction 3. Feature descriptors 4. Feature vectors 5. Feature selection 6. Models 7. Training samples 8. Cost 9. Decision boundary
Feature Vector Features represent knowledge about the object such as [length, colour, lightness, …]
Pattern Recognition Models:
- generative models: Model the “mechanism” by which the data was generated
- discriminative models: Applicable to supervised learning tasks (labelled data)
Classification 常见分类器
Classifier performs object recognition by assigning a class label to an object, using the object description in the form of features.
- Binary Classification: 二分类任务常用
Nearest Class Mean Classifier最近类均值分类器:
对于每个类别,计算训练样本的均值(或质心),然后将新的样本归类到与其欧氏距离最近的均值所属的类别。
pros:
- simple, fast,
- Works well when classes are compact and far from each other
Cons:
- Poor results for complex classes (multimodal, non-spherical),
- Cannot handle outliers and noisy data well
K-Nearest Neighbours ClassifierK-最近邻分类器
对于测试样本,计算其与所有训练样本的距离,然后选择最近的
个邻居,采用多数投票决定类别 pros:
- ✓ Very simple and intuitive
- ✓ Easy to implement
- ✓ No a priori assumptions
- ✓ No training step
- ✓ Decision surfaces are non-linear
cons:
- × Slow algorithm for big data sets
- × Needs homogeneous (similar nature) feature types and scales
- × Does not perform well when the number of variables grows (curse of dimensionality)
- × Finding the optimal K (number of neighbours) to use can be challenging
Bayesian Decision Theory 贝叶斯
- 利用贝叶斯公式计算后验概率,决策规则为选取使最大的类别
- 在鱼类分类中,可以结合先验概率(比如某类鱼在历史数据中出现的频率)和特征(如鱼的长度)计算每个类别的后验概率,最终选择后验概率最高的类别。
- Posterior probability
p(ci∣x):
- Bayesian decision rule is:
𝑐 = arg maxi(𝑝(𝑐𝑖|𝑥))which equal𝑐 = arg max𝑖(𝑝(𝑥|𝑐𝑖)𝑝(𝑐𝑖))
Bayesian Decision Risk 当不同分类错误的代价不同(例如销售不同鱼类的经济成本不一样)时,通过最小化条件风险来做出决策:
𝑅(𝛼𝑖|𝑥)is also called conditional risk ptimal Bayes decision strategy is to minimize the conditional risk 最优决策是选择使R(𝛼_i|x)最小的类别pros:
- ✓ Simple and efficient
- ✓ Considers uncertainties
- ✓ Permits combining new information with current knowledge
cons:
- × Struggles with complex data relationships
- × Choice of priors can be subjective
Decision Tree
通过一系列基于特征的“问答”(例如“鱼的长度是否大于80cm?”),将样本逐步划分到叶子节点,叶子节点对应具体的类别。
常利用信息论中的熵与信息增益来确定在每个节点上选择哪个特征进行划分。
Entropy:
Information gain: 信息增熵

Pros:
- ✓ Easy to interpret
- ✓ Can handle both numerical and categorical data
- ✓ Robust to outliers and missing values
- ✓ Gives information on importance of features (feature selection)
Cons:
- × Tends to overfit
× Only axis-aligned splits
- × Greedy algorithm (may not find the best tree)
- Random Forests
由多棵决策树组成的集成方法。每棵树在训练时通过自助采样(bootstrap)和在每个分裂节点随机选择一部分特征来构建,从而降低单棵决策树过拟合的风险。最终通过多数投票得到预测结果。
pros:
✓ High accuracy among traditional classification algorithms for many problems
✓ Works efficiently and effectively on large datasets
✓ Handles thousands of input features without feature selection
✓ Handles missing values effectively
cons:
× Less interpretable than an individual decision tree
× More complex and more time-consuming to construct than decision trees
Linear classifier 线性可分
A binary classification problem can be modeled by a separation function 𝑓(𝑥) using the data such that:
A linear classifier has the form: b called bais
Support Vector Machines (SVMs)
Hard-margin linear SVM:
- 在类别完全线性可分的前提下,通过最大化分类超平面与最近样本的距离来构建决策边界。优化问题可以转化为一个二次规划问题,并且通过拉格朗日对偶可以得到问题的对偶形式。
- hard margin SVM which does not allow any misclassification of samples
Soft Margin SVM:
- 当数据存在噪声或类别重叠时,引入“松弛变量” 允许部分样本违反边界要求,从而平衡分类错误与边缘宽度。
- allow some some any classification of ssamples, Soft margin SVMs are better able to handle noisy data.
Nonlinear SVM:
- 将原始数据映射到高维特征空间,使得在新的空间中数据能够线性可分
- Feature mapping into a higher dimensional space can be done using a kernel function which reduces the complexity of the optimization problem
pros:
✓ Very effective in high dimensional feature spaces
✓ Effective when the number of features is larger than the training data size
✓ Among the best algorithms when the classes are (well) separable
✓ Work very well when the data is sparse
✓ Can be extended to nonlinear classification via kernel trick
cons:
× For larger datasets it takes more time to process
× Does not perform well for overlapping classes
× Hyperparameter tuning needed for sufficient generalization
Multiclass classifier
If there are more than two classes, we must build a multiclass classifier, some directly used for muticlass classification
– K-nearest neighbours
– Decision trees
– Bayesian techniques
One-vs-Rest 一对多, 为每个类别构建一个分类器,区分该类与其它所有类别。
One versus one 一对一,对每对类别构建一个分类器,最终通过投票决定样本所属的类别。

Evaluation of Classification Error 评估performence
Receiver Operating Curve (ROC) Roc 曲线 Area Under the ROC (AUC or AUROC) ummarizes overall performance 分析真阳性率与假阳性率的平衡,AUC 越大表明分类性能越好。 面积越大越好
Confusion Matrix 显示不同类别之间的预测情况,矩阵对角线上的数值表示正确分类,非对角线则显示错误分类。
Binary Confusion Matrix


Precision/correctness: 精确率
Recall / sensitivity / completeness: 召回率分别衡量正类预测的正确性和覆盖率,常用 F1 分数作为综合评价指标。
F1 score:
Image Segmentation
- Image classification
Assigning a single label to the whole image - Object localization Object detection
Finding the bounding boxes of all relevant objects in the image - Semantic segmentation
Classifying each pixel in the whole image - Instance segmentation
Uniquely labelling the pixels of each object instance in the image.
Basic Segmentation method
Thresholding 阈值分割 image —> histogram Threshold—> segmentation
- Works fine if regions have sufficiently different intensity distributions. But the is problems when regions have overlapping intensity distributions.
K-means clustering K-means 聚类
- 通过预先指定类别数
,对像素或图像块的特征(例如颜色、纹理等)进行聚类,将图像划分为 个区域。 But if the number of clusters is not known a priori.
- 通过预先指定类别数
Feature based pixel classification 居于像素特征
- 对每个像素提取局部特征(例如邻域颜色、纹理统计量等),然后利用训练好的分类器对每个像素进行分类,从而生成分割结果。
Advance Segmentation method
Region splitting and merging 区域分裂合并
- 先将图像整体分裂为多个子区域,再根据区域内的统计信息(如均值、方差)将相似区域合并。
- 适合在图像预处理后对连通区域进行进一步细化。
Watershed segmentation 分水岭算法
把图像看作一个拓扑地形,其中像素强度代表高度,从局部极小值开始“灌水”,直到不同区域的水汇合为止。分水岭线即为区域边界。

原始图像中的噪声可能会导致过度分割,通常需要预处理(如平滑)和后处理(如区域合并)
Maximally stable extremal regions最大稳定极值区域 (MSER)
- 通过不断改变阈值,观察连通区域的形状变化,选择那些在多种阈值下保持稳定的区域作为候选分割区域。
- 对于文本检测、场景理解等任务中具有较高鲁棒性。
Mean shifting均值漂移
- 均值漂移是一种基于密度估计的聚类方法,通过迭代地将采样窗口移动到局部密度最大的区域,寻找数据分布的峰值。
- 优点:不需要预先指定聚类数目,对噪声较为鲁棒。 缺点:计算量较大,且结果依赖于窗口大小参数。
Superpixel segmentation超像素分割
- 将图像中相似的像素组合成超像素,作为基本单元进行后续分割和分类。常见方法有 SLIC(简单线性迭代聚类)。Advantages: 降低计算量、保留边界信息,适合后续高级分割或目标识别任务。
Conditional random field条件随机场 (CRF)
- 利用概率图模型,将图像分割问题建模为一个能量最小化问题,通过结合节点(超像素)与边缘(相邻区域之间的相似性)的信息,获得全局一致的分割结果。常用于对初步分割结果进行精细化调整。
Active contour segmentation主动轮廓 (Snakes)
- 主动轮廓方法通过定义一条初始曲线,然后利用内部平滑力和外部图像梯度力不断调整,使曲线贴合目标边缘。需要良好的初始化,且在边缘弱或噪声较大时容易失效。
Level-set segmentation level-set 分割
- Level-set 方法采用隐式函数表示分割曲线,通过求解偏微分方程使得零水平集演化为目标边界,能自适应处理拓扑变化(如分裂与合并)。能自动处理复杂形状和拓扑结构,但计算量较大。
Evaluating segmentation
Segmented object pixels: 𝑆 True object pixels: 𝑇

- True positives: TP = 𝑆 ∩ 𝑇 Pixels correctly segmented as object
- True negatives: TN = 𝑆𝑐 ∩ 𝑇𝑐 Pixels correctly segmented as background
- False positives: FP = 𝑆 ∩ 𝑇𝑐 Pixels incorrectly segmented as object
- False negatives: FN = 𝑆𝑐 ∩ 𝑇 Pixels incorrectly segmented as background
- Sensitivity (= true-positive rate) TPR
- Specificity (= true-negative rate) TNR
Jaccard and Dice similarity coefficients
Jaccard similarity coefficient (JSC, also called IoU)
Dice similarity coefficient (DSC)
Improve image segmentation results
- erosion:消除小的无意义的物体
- dilation:空洞填补
- open:先腐蚀,再膨胀,消除细小的物体,能够平滑物体轮廓,能够断开细小的粘连
- close:先膨胀再腐蚀,填充前景物体中的小洞,能够连接相应的物体,平滑轮廓,消除细小的暗部区域
- gradient:形态学梯度,轮廓
- tophat:原始图像与开运算之后的图像的差
- blackhat:闭运算之后,图像与原始图像的差
Deep learning
Neural Networks
- linear Classifiers: Image classification with linear classifier

Activation function:

Convolutional Neural Networks (CNNs)
Convolution layer
ReLU activation
Pooling layer
Flattening
Fully Connected (FC) Layers
Output Layer
Stride
padding

发表回复
要发表评论,您必须先登录。