marblech – Page 2

海思3403-SS928 yolov5 c++开发日记（2）

昨天，把供应商提供的（据称）官方的SDK包里的python 预处理的坑给填了，下面是这个python 的原码，然后分析里面的坑
import os import numpy as np from PIL import Imagedef process(input_path): try: input_image = Image.open(input_path) input_image = input_image.resize((640, 640),resample=Image.BILINEAR) # hwc img = np.array(input_image) # rgb to bgr img = img[:, :, ::-1] shape = img.shape img = img.astype("uint8") # img = img.astype("float32") img = img.reshape([1] + list(shape)) result = img.transpose([0, 3, 1, 2]) output_name = input_path.split('.')[0] + "_yolov5.bin" result.tofile(output_name) except Exception as except_err: print(except_err) return 1 else: return 0

这段python 代码作做主要是为了对图片进行尺寸、和输入数据统一化的处理。yolov5的输入一般是640×640 的尺寸是以 NCHW 的布局排列，其中N 是批次数一船是1 ，C是通道数一般是3 （代表RGB 三个通道），H是 height 高度，W是width 宽度，所以这个代码主要作用是为了把图片的长，宽转成640×640。然后把图片转为numpy 数组，增加一个批次维度，然后再把数组的shaep 即布局转为 NCHW 即[1,3,640,640]

但从这个代码看，他并没有考虑图像的长宽比因为resize 的强行处理，被强行扭曲了。这将导致图像的内容会被压扁扭曲了。在一般的模型，包括官方的，或者使用自行标注的数据集，都不会使用这样扭曲了的图像进行训练。需要某些时候会使用一些方法对原始采集的图片进行一些人工处理凭空造一些数据。但这么扭曲压扁的是比较少的。所以，这么对输入图像进行处理是会造成模型识别结果不正确的问题。这是问题其一。

其二，这个代码使用Pillow包进行图像的处理，PIL的Image.open 是把原始图像以RGB的色域格式进行读取处理。但他在下面的代码里进行了 img = img[:, :, ::-1] 处理，这个处理是把RGB 转为 BGR 。这么干其实是错的，也可能是多此一举。原因是，yolov5 的输入图像是RGB 格式的，因为训练的程序就是以RGB的格式进行训练的，所以模型是不能识别BGR格式的图像。在供应商的提拱的官方文档中说到了一个–-insert_op_conf= 参数。经过我对这个文档的研究和对照yolov5 的官方算法处理流程。个人认为，这个其实是定义一个预处理的的方式和参数的东西。相当于把预处理以预先定义好的参数文件整合到模型里。里面提到了把BGR转为RGB的方法，这里可以以使用这个把BGR的输入图像转回RGB再输入到真正模型进行处理。但个人认为，这么做是能对应上python 文件的RGB 转BGR的操作。但这纯粹的多此一举。因为我的第一目标是要把模型推理在NPU上跑起来，且跑正确了，下一步再考虑性能的问题。所以我是直接在模型转换时不加 –insert_op_conf 这个参数。直接是：

atc –model=./yolov5s_v6.2.onnx –framework=5 –input_shape=”images:1,3,640,640″ –output=v5s_o –soc_version=”OPTG” –output_type=FP32

其三，也是最后的坑，即使修改这个python 程序，把RGB 的转换去掉了，再把图像resize 这个处理改掉，保持原图模宽比。但最后输出的.bin 文件，再调用官方的库读进去处理，模型识别的结果还是错的。原因，目前还不明，估计是保存或读取文件到内存的方式哪里不对导致最后读进去处理，输入到模型的数据是错的。在这个问题是我花了大量的时间来调试代码。一开始以为是后处理代码的问题，怎么都不对。然后才怀疑前处理的数据问题。最后，解决方法是，直接不要官方的这个python 代码生成.bin 文件，把他这个处理，直接使用c++重写。然后，把处理后数据直接输入到 ACL 库的对应输入数据函数。然后，调用ACL库进行模型推理。

以下是整个过程的完整代码，里面有部份是调试过程中产生的无效代码，也有些是走了弯路的代码，懒得整理重构去掉了，先记下来再说：

#include <iostream>

#include <map>

#include <sstream>

#include <algorithm>

#include <functional>

#include <sys/stat.h>

#include <fstream>

#include <cstring>

#include <sys/time.h>

using namespace std;

#include “acl/acl.h”

#include “opencv2/opencv.hpp”

#define INFO_LOG(fmt, …) fprintf(stdout, “[INFO] ” fmt “\n”, ##__VA_ARGS__)

#define WARN_LOG(fmt, …) fprintf(stdout, “[WARN] ” fmt “\n”, ##__VA_ARGS__)

#define ERROR_LOG(fmt, …) fprintf(stderr, “[ERROR] ” fmt “\n”, ##__VA_ARGS__)

typedef enum Result {

SUCCESS=0,

FAILED=1

} Result;

bool g_isDevice = false;

const char *aclConfigPath = “acl.json”;

const char *MODEL_PATH=”model/v5s.om”;

// const char *PIC=”data/t_yolov5.bin”;

const char *PIC=”data/output2.jpg”;

const int loop_count=11;

// YOLOv5 相关参数

const int INPUT_SIZE = 640;

const float CONF_THRESHOLD = 0.5;

const float NMS_THRESHOLD = 0.45;

const int NUM_CLASSES = 80;

const float OBJ_THRESH = 0.5;

const float NMS_THRESH = 0.4;

// 定义检测框结构体

struct Box {

floatx, y, w, h,x1,y1,x2,y2;

floatscore;

floatconfidence;

floatclass_score;

intclass_id;

};

// 定义检测结果结构体

struct Detection {

Boxbox;

floatscore;

intclass_id;

};

// 定义NMS输出结构

struct DetectionResult {

floatx, y, w, h; // 中心点坐标和宽高

floatprob; // 最终概率

intobj_id; // 类别ID

};

// 定义检测框结构

struct bbox_t {

cv::Rectbox;

floatx, y, w, h;

floatscore;

floatprob;

intobj_id;

intclass_id;

};

// sigmoid激活函数

inline float sigmoid(float x) {

return1.0f/ (1.0f+expf(-x));

}

/// <summary>

/// preprocess image

/// </summary>

/// <param name=”image”></param>

/// <param name=”target_size”></param>

/// <returns></returns>

cv::Mat preprocess_image(const cv::Mat& frame) {

// Format frame

intw=frame.cols;

inth=frame.rows;

int_max=std::max(h, w);

cv::Matimage=cv::Mat::zeros(cv::Size(_max, _max), CV_8UC3);

cv::Rectroi(0, 0, w, h);

frame.copyTo(image(roi));

// Fix bug, boxes consistency!

floatx_factor=image.cols/static_cast<float>(640);

floaty_factor=image.rows/static_cast<float>(640);

cv::Matblob=cv::dnn::blobFromImage(image, 1/255.0, cv::Size(640, 640), cv::Scalar(0, 0, 0), true, false);

// size_t tpixels = model_session.input_model_height * model_session.input_model_width * 3;

// std::array<int64_t, 4> input_shape_info{ 1, 3, model_session.input_model_height, model_session.input_model_width };

// return { blob, tpixels, input_shape_info, x_factor, y_factor };

returnblob;

}

// 计算两个边界框的IoU(交并比)

float calculateIoU(const Box& box1, const Box& box2) {

floatx1=std::max(box1.x1, box2.x1);

floaty1=std::max(box1.y1, box2.y1);

floatx2=std::min(box1.x2, box2.x2);

floaty2=std::min(box1.y2, box2.y2);

floatintersection_area=std::max(0.0f, x2-x1) *std::max(0.0f, y2-y1);

floatbox1_area= (box1.x2-box1.x1) * (box1.y2-box1.y1);

floatbox2_area= (box2.x2-box2.x1) * (box2.y2-box2.y1);

floatunion_area=box1_area+box2_area-intersection_area;

returnintersection_area/union_area;

}

// 计算IOU

float calculate_iou(const Box& box1, const Box& box2) {

// 计算每个框的左上角和右下角坐标

floatx1_min=box1.x-box1.w/2;

floaty1_min=box1.y-box1.h/2;

floatx1_max=box1.x+box1.w/2;

floaty1_max=box1.y+box1.h/2;

floatx2_min=box2.x-box2.w/2;

floaty2_min=box2.y-box2.h/2;

floatx2_max=box2.x+box2.w/2;

floaty2_max=box2.y+box2.h/2;

// 计算交集区域

floatinter_x_min=std::max(x1_min, x2_min);

floatinter_y_min=std::max(y1_min, y2_min);

floatinter_x_max=std::min(x1_max, x2_max);

floatinter_y_max=std::min(y1_max, y2_max);

floatinter_width=std::max(0.0f, inter_x_max-inter_x_min);

floatinter_height=std::max(0.0f, inter_y_max-inter_y_min);

floatinter_area=inter_width*inter_height;

// 计算并集区域

floatbox1_area=box1.w*box1.h;

floatbox2_area=box2.w*box2.h;

floatunion_area=box1_area+box2_area-inter_area;

// 计算IOU

returninter_area/union_area;

}

// 非极大值抑制

std::vector<Box> nonMaxSuppression(const std::vector<Box>& boxes, float threshold) {

std::vector<Box>result;

if (boxes.empty()) returnresult;

// 按置信度降序排列边界框

std::vector<Box>sorted_boxes=boxes;

std::sort(sorted_boxes.begin(), sorted_boxes.end(),

[](const Box& a, const Box& b) {

returna.score>b.score;

});

std::vector<bool>is_removed(sorted_boxes.size(), false);

for (size_ti=0; i<sorted_boxes.size(); ++i) {

if (is_removed[i]) continue;

result.push_back(sorted_boxes[i]);

// 移除与当前框IoU较高的其他框

for (size_tj=i+1; j<sorted_boxes.size(); ++j) {

if (is_removed[j]) continue;

// 只比较同一类别的边界框

if (sorted_boxes[i].class_id==sorted_boxes[j].class_id) {

floatiou=calculateIoU(sorted_boxes[i], sorted_boxes[j]);

if (iou>=threshold) {

is_removed[j]=true;

}

returnresult;

}

// 计算 IOU（交并比）

float iou(const Box& a, const Box& b) {

floatinterArea=std::max(0.0f, std::min(a.x+a.w, b.x+b.w) -std::max(a.x, b.x)) *

std::max(0.0f, std::min(a.y+a.h, b.y+b.h) -std::max(a.y, b.y));

floatunionArea=a.w*a.h+b.w*b.h-interArea;

returninterArea/unionArea;

}

// 非极大值抑制

std::vector<Detection> nms(const std::vector<Detection>& detections, float iou_threshold) {

std::vector<Detection>result;

std::vector<bool>suppressed(detections.size(), false);

for (size_ti=0; i<detections.size(); ++i) {

if (suppressed[i]) continue;

result.push_back(detections[i]);

for (size_tj=i+1; j<detections.size(); ++j) {

if (iou(detections[i].box, detections[j].box) >iou_threshold) {

suppressed[j]=true;

}

returnresult;

}

void draw_boxes(cv::Mat& image, const std::vector<DetectionResult>& boxes) {

for(size_ti=0; i<boxes.size(); i++) {

intidx=boxes[i].obj_id;

cv::Rectrect={(int)boxes[i].x,(int)boxes[i].y,(int)boxes[i].w,(int)boxes[i].h};

cv::rectangle(image, rect, cv::Scalar(0, 0, 255), 2, 8);

cv::rectangle(image, cv::Point(boxes[i].x, boxes[i].y-20),

cv::Point(boxes[i].x, boxes[i].y), cv::Scalar(0, 255, 255), -1);

putText(image, to_string(idx), cv::Point(boxes[i].x, boxes[i].y), cv::FONT_HERSHEY_PLAIN, 2.0, cv::Scalar(255, 0, 0), 2, 8);

}

vector<cv::Rect> xywh2xyxy(const vector<cv::Rect2f> &boxes) {

vector<cv::Rect>output_boxes;

for (constauto&box : boxes) {

floatx1=box.x-box.width/2.0;

floaty1=box.y-box.height/2.0;

floatx2=box.x+box.width/2.0;

floaty2=box.y+box.height/2.0;

output_boxes.emplace_back(cv::Rect(cv::Point(x1, y1), cv::Point(x2, y2)));

}

returnoutput_boxes;

}

// 过滤低置信目标

void filter_boxes(vector<vector<float>> &boxes, vector<float> &confidences, vector<int> &class_ids) {

vector<vector<float>>filtered_boxes;

vector<float>filtered_confidences;

vector<int>filtered_classes;

for (size_ti=0; i<confidences.size(); i++) {

if (confidences[i]>=OBJ_THRESH) {

filtered_boxes.push_back(boxes[i]);

filtered_confidences.push_back(confidences[i]);

filtered_classes.push_back(class_ids[i]);

}

boxes=filtered_boxes;

confidences=filtered_confidences;

class_ids=filtered_classes;

}

// NMS 非极大值抑制

std::vector<DetectionResult> nms_boxes(const std::vector<Box>& boxes,

floatnms_threshold,

floatconf_threshold) {

std::vector<DetectionResult>result;

// 准备OpenCV NMS所需的数据结构

std::vector<int>classIds;

std::vector<float>confidences;

std::vector<cv::Rect>cv_boxes;

// 遍历所有检测框

for (constauto&box : boxes) {

// 计算最终得分

floatscore=box.confidence*box.class_score;

// 应用置信度阈值

if (score>conf_threshold) {

// 转换为左上角和宽高表示

floatx1=box.x-box.w/2;

floaty1=box.y-box.h/2;

floatw=box.w;

floath=box.h;

// 确保在有效范围内

x1=std::max(0.0f, x1);

y1=std::max(0.0f, y1);

// 添加到列表

classIds.push_back(box.class_id);

confidences.push_back(score);

cv_boxes.push_back(cv::Rect(x1, y1, w, h));

}

// 应用NMS

std::vector<int>indices;

cv::dnn::NMSBoxes(cv_boxes, confidences, conf_threshold, nms_threshold, indices);

std::cout<<“NMS前有效框数量: “<<cv_boxes.size() <<“, NMS后保留: “<<indices.size() <<std::endl;

// 构建输出结果

for (size_ti=0; i<indices.size(); ++i) {

intidx=indices[i];

DetectionResultdet;

det.x=cv_boxes[idx].x+cv_boxes[idx].width/2; // 转回中心点表示

det.y=cv_boxes[idx].y+cv_boxes[idx].height/2;

det.w=cv_boxes[idx].width;

det.h=cv_boxes[idx].height;

det.prob=confidences[idx];

det.obj_id=classIds[idx];

result.push_back(det);

}

returnresult;

}

// 在图像上绘制检测框

void drawDetectionResults(cv::Mat& image, const std::vector<Box>& detections) {

// 定义颜色表 – 为不同类别分配不同颜色

conststd::vector<cv::Scalar>colors= {

cv::Scalar(255, 0, 0), // 蓝色

cv::Scalar(0, 255, 0), // 绿色

cv::Scalar(0, 0, 255), // 红色

cv::Scalar(255, 255, 0), // 青色

cv::Scalar(0, 255, 255), // 黄色

cv::Scalar(255, 0, 255), // 紫色

cv::Scalar(128, 0, 0), // 深蓝

cv::Scalar(0, 128, 0), // 深绿

cv::Scalar(0, 0, 128), // 深红

cv::Scalar(128, 128, 0) // 橄榄

};

// 类别名称，根据您的模型调整

conststd::vector<std::string>class_names= {

“person”, “bicycle”, “car”, “motorcycle”, “airplane”, “bus”, “train”, “truck”, “boat”,

“traffic light”, “fire hydrant”, “stop sign”, “parking meter”, “bench”, “bird”, “cat”,

“dog”, “horse”, “sheep”, “cow”, “elephant”, “bear”, “zebra”, “giraffe”, “backpack”,

“umbrella”, “handbag”, “tie”, “suitcase”, “frisbee”, “skis”, “snowboard”, “sports ball”,

“kite”, “baseball bat”, “baseball glove”, “skateboard”, “surfboard”, “tennis racket”,

“bottle”, “wine glass”, “cup”, “fork”, “knife”, “spoon”, “bowl”, “banana”, “apple”,

“sandwich”, “orange”, “broccoli”, “carrot”, “hot dog”, “pizza”, “donut”, “cake”, “chair”,

“couch”, “potted plant”, “bed”, “dining table”, “toilet”, “tv”, “laptop”, “mouse”,

“remote”, “keyboard”, “cell phone”, “microwave”, “oven”, “toaster”, “sink”, “refrigerator”,

“book”, “clock”, “vase”, “scissors”, “teddy bear”, “hair drier”, “toothbrush”

};

// 遍历所有检测框

for (constauto&det : detections) {

// 获取边界框坐标

intx1=static_cast<int>(det.x1);

inty1=static_cast<int>(det.y1);

intx2=static_cast<int>(det.x2);

inty2=static_cast<int>(det.y2);

// 确保边界框在图像范围内

x1=std::max(0, std::min(x1, image.cols-1));

y1=std::max(0, std::min(y1, image.rows-1));

x2=std::max(0, std::min(x2, image.cols-1));

y2=std::max(0, std::min(y2, image.rows-1));

// 计算颜色索引

intcolor_idx=det.class_id%colors.size();

cv::Scalarcolor=colors[color_idx];

// 绘制矩形框

cv::rectangle(image, cv::Point(x1, y1), cv::Point(x2, y2), color, 2);

// 准备标签文本

std::stringclass_name= (det.class_id<class_names.size()) ?

class_names[det.class_id]:

“class “+std::to_string(det.class_id);

std::stringlabel=class_name+” “+std::to_string(static_cast<int>(det.score*100)) +”%”;

// 绘制填充的矩形作为标签背景

intbaseline=0;

cv::Sizelabel_size=cv::getTextSize(label, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseline);

cv::rectangle(image,

cv::Point(x1, y1-label_size.height-5),

cv::Point(x1+label_size.width, y1),

color, -1);

// 绘制文本标签

cv::putText(image, label, cv::Point(x1, y1-5),

cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 255, 255), 1);

}

// —————– 解析 ACL 输出为 C++ 数据 —————–

vector<cv::Mat> parseAclOutput(aclmdlDataset *dataset) {

vector<cv::Mat>outputs;

size_tnum_tensors=aclmdlGetDatasetNumBuffers(dataset);

for (size_ti=0; i<num_tensors; ++i) {

aclDataBuffer*buffer=aclmdlGetDatasetBuffer(dataset, i);

void*data=aclGetDataBufferAddr(buffer);

size_tbuffer_size=aclGetDataBufferSizeV2(buffer);

// 获取 shape (1, 255, 80, 80) / (1, 255, 40, 40) / (1, 255, 20, 20)

size_tgrid_size= (i==0) ?80: (i==1) ?40:20;

intnum_anchors=3;

intnum_classes=80;

intchannels=num_anchors* (num_classes+5); // 255 = 3 * (80 + 5)

float*float_data=reinterpret_cast<float*>(data);

// 创建 OpenCV Mat 格式: (grid_size, grid_size, channels)

cv::Matmat(grid_size*grid_size, channels, CV_32F, float_data);

// 复制数据，确保 ACL 释放后仍可用

cv::Matmat_clone=mat.clone();

outputs.push_back(mat_clone);

}

returnoutputs;

}

void postprocess(cv::Mat& frame, cv::Mat boxes_m, cv::Mat confs_m,float conf_threshold, float nms_threshold, std::vector<Detection>& final_detections){

intnum_boxes=boxes_m.rows;

intnum_classes=confs_m.cols;

std::vector<Detection>boxes;

std::vector<cv::Rect>boxes_list;

std::vector<float>conf_list;

std::vector<int>class_id_list;

floatscale_x=1920/416.0f;

floatscale_y=1080/416.0f;

// 遍历所有框

for (inti=0; i<num_boxes; ++i) {

// 获取框坐标

Boxrect{boxes_m.at<float>(i, 0)*416, boxes_m.at<float>(i, 1)*416,

boxes_m.at<float>(i, 2)*416, boxes_m.at<float>(i, 3)*416};

rect.x=std::max(0.0f, rect.x*scale_x);

rect.y=std::max(0.0f, rect.y*scale_y);

rect.w=std::max(0.0f, (rect.w*scale_x)-rect.x);

rect.h=std::max(0.0f, (rect.h*scale_y)-rect.y);

// 获取该框的最大类别置信度

intclass_id=-1;

floatmax_conf=0.0f;

for (intj=0; j<num_classes; ++j) {

floatconf=confs_m.at<float>(i, j);

if (conf>max_conf) {

max_conf=conf;

class_id=j;

}

// Detection det;

// 将框和置信度存入结构体

// boxes.push_back({rect, max_conf, class_id});

boxes_list.push_back({rect.x,rect.y,rect.w,rect.h});

conf_list.push_back(max_conf);

class_id_list.push_back(class_id);

}

for(autobox:boxes){

cout<<“Box “<<“: [“<<box.box.x<<“, “<<box.box.y<<“, “<<box.box.w<<“, “<<box.box.h<<“]”<<endl;

}

cout<<“num_boxes: “<<num_boxes<<endl;

cout<<“num_classes: “<<num_classes<<endl;

std::vector<int>indices;

cv::dnn::NMSBoxes(boxes_list, conf_list, conf_threshold, nms_threshold, indices);

for (intidx : indices) {

Detectiondet;

det.box.x=boxes_list[idx].x;

det.box.y=boxes_list[idx].y;

det.box.w=boxes_list[idx].width;

det.box.h=boxes_list[idx].height;

det.class_id=class_id_list[idx];

det.score=conf_list[idx];

boxes.push_back(det);

}

// // 调用非极大值抑制进行后处理

// float iouThreshold = 0.5f;

// float confThreshold = 0.5f;

// std::vector<Detection> finalBoxes = nonMaximumSuppression(boxes, iouThreshold, confThreshold);

// // 输出最终保留的框

std::cout<<“Final Detected Boxes after NMS: “<<boxes.size() <<std::endl;

for (constauto&box : boxes) {

std::cout<<“Box “<<“: [“<<box.box.x<<“, “<<box.box.y<<“, “<<box.box.w<<“, “<<box.box.h<<“]”<<“, Confidence: “<<box.score<<“, Class: “<<box.class_id<<std::endl;

}

// draw_boxes(frame,boxes);

}

void processFeatureMap(const float* feature_map, std::vector<Box>& detections,

intgrid_h, intgrid_w, intnum_anchors, intnum_outputs,

floatconf_threshold, intinput_w, intinput_h) {

// YOLOv5 v6.2 anchors和strides

std::vector<std::vector<int>>anchors= {

{10, 13, 16, 30, 33, 23}, // 80×80

{30, 61, 62, 45, 59, 119}, // 40×40

{116, 90, 156, 198, 373, 326} // 20×20

};

intstrides[] = {8, 16, 32}; // 对应80×80, 40×40, 20×20

// 确定当前特征图对应的索引

intfeature_idx=0;

if (grid_h==80) feature_idx=0;

elseif (grid_h==40) feature_idx=1;

elseif (grid_h==20) feature_idx=2;

else {

std::cout<<“Unsupported feature map size: “<<grid_h<<“x”<<grid_w<<std::endl;

return;

}

intstride=strides[feature_idx];

intnum_classes=num_outputs-5; // 85 – 5 = 80类

// 调试信息

std::cout<<“处理特征图: “<<grid_h<<“x”<<grid_w

<<” stride=”<<stride

<<” anchor_group=”<<feature_idx<<std::endl;

intdebug_count=0; // 用于限制调试输出

// 遍历每个anchor、每行、每列

for (inta=0; a<num_anchors; ++a) {

for (inti=0; i<grid_h; ++i) {

for (intj=0; j<grid_w; ++j) {

// 计算当前网格点在特征图中的索引

// 对应 record 变量的计算

float*record=const_cast<float*>(feature_map) +

a*grid_h*grid_w*num_outputs+

i*grid_w*num_outputs+

j*num_outputs;

// 指向类别分数的指针

float*cls_ptr=record+5;

// 遍历所有类别

for (intcls=0; cls<num_classes; ++cls) {

// 计算类别置信度 = sigmoid(类别分数) * sigmoid(objectness)

floatscore=sigmoid(cls_ptr[cls]) *sigmoid(record[4]);

// 只处理高于阈值的检测结果

if (score>conf_threshold) {

// 解码边界框坐标

floatcx= (sigmoid(record[0]) *2.0f-0.5f+j) *stride;

floatcy= (sigmoid(record[1]) *2.0f-0.5f+i) *stride;

floatw=pow(sigmoid(record[2]) *2.0f, 2) *anchors[feature_idx][2*a];

floath=pow(sigmoid(record[3]) *2.0f, 2) *anchors[feature_idx][2*a+1];

// 创建Box对象并保存检测结果

Boxbox;

box.x=cx; // 中心x坐标

box.y=cy; // 中心y坐标

box.w=w; // 宽度

box.h=h; // 高度

box.confidence=sigmoid(record[4]); // objectness

box.class_id=cls; // 类别ID

box.class_score=sigmoid(cls_ptr[cls]); // 类别置信度

// 添加到检测结果列表

detections.push_back(box);

// 输出部分检测结果用于调试

if (debug_count<4) {

std::cout<<“找到目标: 类别=”<<cls

<<” 置信度=”<<score

<<” 边界框=[“<<cx<<“, “<<cy<<“, “<<w<<“, “<<h<<“]”

<<std::endl;

debug_count++;

}

std::cout<<“特征图 “<<grid_h<<“x”<<grid_w<<” 共检测到 “

<<detections.size() <<” 个目标”<<std::endl;

}

void postprocess(cv::Mat& frame, const int num_boxes, const float* boxes, const float* confs, float conf_threshold, float nms_threshold, std::vector<Detection>& final_detections) {

vector<cv::Rect>temp_boxes;

vector<float>temp_confidences;

vector<int>temp_class_ids;

intorg_width=frame.cols;

intorg_height=frame.rows;

for(inti=0;i<num_boxes;i++){

// cv::Mat class_scores = scores.row(0).col(i);

std::vector<float>scores_list;

doubleconfidence=-1;

intclass_id=-1;

for(intj=0;j<80;j++){

// scores_list.push_back(confs[i*80+j]);

if(confs[i*80+j]>confidence){

confidence=confs[i*80+j];

class_id=j;

}

// cv::Mat class_scores(scores_list,false);

// cv::Point class_id_point;

// minMaxLoc(class_scores, 0, &confidence, 0, &class_id_point);

// int class_id = class_id_point.y;

if(confidence>conf_threshold){

intx=static_cast<int>(boxes[i*4+0]*org_width);

inty=static_cast<int>(boxes[i*4+1]*org_height);

intx2=static_cast<int>(boxes[i*4+2]*org_width);

inty2=static_cast<int>(boxes[i*4+3]*org_height);

cout<<“Box “<<i<<“: [“<<x<<“, “<<y<<“, “<<x2<<“, “<<y2<<“]”<<endl;

temp_boxes.push_back({cv::Point(x,y),cv::Point(x2,y2)});

temp_confidences.push_back(static_cast<float>(confidence));

temp_class_ids.push_back(class_id);

}

// 非极大值抑制

std::vector<int>indices;

cv::dnn::NMSBoxes(temp_boxes, temp_confidences, conf_threshold, nms_threshold, indices);

vector<cv::Rect>final_boxes;

vector<float>final_confidences;

vector<int>final_class_ids;

std::vector<Detection>detection_list;

for (intidx : indices) {

Detectiondet;

final_boxes.push_back(temp_boxes[idx]);

det.box.x=temp_boxes[idx].x;

det.box.y=temp_boxes[idx].y;

det.box.w=temp_boxes[idx].width;

det.box.h=temp_boxes[idx].height;

final_confidences.push_back(temp_confidences[idx]);

final_class_ids.push_back(temp_class_ids[idx]);

det.class_id=temp_class_ids[idx];

det.score=temp_confidences[idx];

detection_list.push_back(det);

}

cout<<“final_boxes: “;

for (constauto&box : final_boxes) {

cout<<“[“<<box.x<<“, “<<box.y<<“, “<<box.width<<“, “<<box.height<<“] “;

}

cout<<endl;

cout<<“final_confidences: “;

for (constauto&conf : final_confidences) {

cout<<conf<<” “;

}

cout<<endl;

cout<<“final_class_ids: “;

for (constauto&id : final_class_ids) {

cout<<id<<” “;

}

cout<<endl;

// draw_boxes(frame,detection_list);

}

static inline int64_t getCurrentTimeUs()

{

structtimevaltv;

gettimeofday(&tv, NULL);

returntv.tv_sec*1000000+tv.tv_usec;

}

uint32_t load_data(void *&inputBuff)

{

std::ifstreaminput_file(PIC, std::ifstream::binary);

if (input_file.is_open() ==false) {

ERROR_LOG(“open file %s failed”, PIC);

}

input_file.seekg(0, input_file.end);

uint32_tfile_szie=input_file.tellg();

if (file_szie==0) {

ERROR_LOG(“binfile is empty, filename is %s”, PIC);

input_file.close();

}

cout<<“————>file size “<<file_szie<<endl;

input_file.seekg(0, input_file.beg);

input_file.read(static_cast<char*>(inputBuff), file_szie);

input_file.close();

returnfile_szie;

}

int main()

{

/***************************************************/

/*****************Init ACL**************************/

/***************************************************/

cout<<“->ACL INIT “<<endl;

aclErrorret=aclInit(aclConfigPath);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“acl init failed, errorCode = %d”, static_cast<int32_t>(ret));

returnFAILED;

}

/***************************************************/

/*****************apply resource********************/

/***************************************************/

// set device only one device

int32_tdeviceId_=0;

ret=aclrtSetDevice(deviceId_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“acl set device %d failed, errorCode = %d”, deviceId_, static_cast<int32_t>(ret));

returnFAILED;

}

cout<<“->set device “<<deviceId_<<endl;

// create context (set current)

cout<<“->create context”<<endl;

aclrtContextcontext_;

ret=aclrtCreateContext(&context_, deviceId_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“acl create context failed, deviceId = %d, errorCode = %d”,

deviceId_, static_cast<int32_t>(ret));

returnFAILED;

}

// create stream

cout<<“->create stream”<<endl;

aclrtStreamstream_;

ret=aclrtCreateStream(&stream_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“acl create stream failed, deviceId = %d, errorCode = %d”,

deviceId_, static_cast<int32_t>(ret));

returnFAILED;

}

// get run mode

cout<<“->get run mode”<<endl;

aclrtRunModerunMode;

ret=aclrtGetRunMode(&runMode);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“acl get run mode failed, errorCode = %d”, static_cast<int32_t>(ret));

returnFAILED;

}

g_isDevice=(runMode==ACL_DEVICE) ;

/***************************************************/

/********load model and get infos of model**********/

/***************************************************/

uint32_tmodelId_=0;

ret=aclmdlLoadFromFile(MODEL_PATH,&modelId_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“load model from file failed, model file is %s, errorCode is %d”,

MODEL_PATH, static_cast<int32_t>(ret));

returnFAILED;

}

cout<<“->load mode “<<“\””<<MODEL_PATH<<“\””<<” model id is “<<modelId_<<endl;

//get model describe

cout<<“->create model describe”<<endl;

aclmdlDesc*modelDesc_;

modelDesc_=aclmdlCreateDesc();

if (modelDesc_==nullptr) {

ERROR_LOG(“create model description failed”);

returnFAILED;

}

cout<<“->get model describe”<<endl;

ret=aclmdlGetDesc(modelDesc_, modelId_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“get model description failed, modelId is %u, errorCode is %d”,

modelId_, static_cast<int32_t>(ret));

returnFAILED;

}

deviceId_=0;

/***************************************************/

/******************prepare input data buffer********/

/***************************************************/

if (modelDesc_==nullptr) {

ERROR_LOG(“no model description, create input failed”);

returnFAILED;

}

cv::Matframe=cv::imread(PIC);

if (frame.empty()) {

ERROR_LOG(“read image failed, image path is %s”, PIC);

returnFAILED;

}

cout<<“->read image “<<PIC<<endl;

cv::Matresized_frame=preprocess_image(frame);

aclmdlDataset*input_;

void*inputDataBuffer=nullptr;

size_tmodelInputSize=aclmdlGetInputSizeByIndex(modelDesc_, 0);

cout<<“->get input size “<<modelInputSize<<endl;

cout<<“->apply input mem “<<endl;

aclErroraclRet=aclrtMalloc(&inputDataBuffer, modelInputSize, ACL_MEM_MALLOC_NORMAL_ONLY);

if (aclRet!=ACL_SUCCESS) {

ERROR_LOG(“malloc device buffer failed. size is %zu, errorCode is %d”,

modelInputSize, static_cast<int32_t>(aclRet));

returnFAILED;

}

cout<<“->copy data to device “<<endl;

ret=aclrtMemcpy(inputDataBuffer, modelInputSize, resized_frame.data, modelInputSize, ACL_MEMCPY_HOST_TO_DEVICE);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“copy data to device failed, errorCode is %d”, static_cast<int32_t>(ret));

(void)aclrtFree(inputDataBuffer);

inputDataBuffer=nullptr;

returnFAILED;

}

cout<<“->copy data to device success “<<endl;

cout<<“->create input dataset “<<endl;

input_=aclmdlCreateDataset();

if (input_==nullptr) {

ERROR_LOG(“can’t create dataset, create input failed”);

returnFAILED;

}

cout<<“->create databuffer”<<endl;

aclDataBuffer*inputData=aclCreateDataBuffer(inputDataBuffer, modelInputSize);

if (inputData==nullptr) {

ERROR_LOG(“can’t create data buffer, create input failed”);

returnFAILED;

}

cout<<“->get input data buffer”<<endl;

size_tinputNum=aclmdlGetDatasetNumBuffers(input_);

cout<<“->get input dataset num “<<inputNum<<endl;

if (inputNum!=0) {

ERROR_LOG(“dataset buffer num is not 0, create input failed”);

(void)aclDestroyDataBuffer(inputData);

inputData=nullptr;

returnFAILED;

}

cout<<“->get input data buffer success “<<endl;

cout<<“->add data to datasetbuffer”<<endl;

ret=aclmdlAddDatasetBuffer(input_, inputData);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“add input dataset buffer failed, errorCode is %d”, static_cast<int32_t>(ret));

(void)aclDestroyDataBuffer(inputData);

inputData=nullptr;

returnFAILED;

}

INFO_LOG(“create model input success”);

/***************************************************/

/************prepare output data buffer*************/

/***************************************************/

aclmdlDataset*output_;

cout<<“->create dataset”<<endl;

output_=aclmdlCreateDataset();

if (output_==nullptr) {

ERROR_LOG(“can’t create dataset, create output failed”);

returnFAILED;

}

size_toutput_num=aclmdlGetNumOutputs(modelDesc_);

cout<<“->get num of output “<<output_num<<endl;

for (size_ti=0; i<output_num; ++i) {

size_tmodelOutputSize=aclmdlGetOutputSizeByIndex(modelDesc_, i);

cout<<“-> output size[“<<i<<“] :”<<modelOutputSize<<endl;

void*outputBuffer=nullptr;

aclErrorret=aclrtMalloc(&outputBuffer, modelOutputSize, ACL_MEM_MALLOC_NORMAL_ONLY);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“can’t malloc buffer, size is %zu, create output failed, errorCode is %d”,

modelOutputSize, static_cast<int32_t>(ret));

returnFAILED;

}

//apply output buffer

cout<<“->apply output buffer”<<endl;

aclDataBuffer*outputData=aclCreateDataBuffer(outputBuffer, modelOutputSize);

if (outputData==nullptr) {

ERROR_LOG(“can’t create data buffer, create output failed”);

(void)aclrtFree(outputBuffer);

returnFAILED;

}

cout<<“->AddDatasetBuffer”<<endl;

ret=aclmdlAddDatasetBuffer(output_, outputData);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“can’t add data buffer, create output failed, errorCode is %d”,

static_cast<int32_t>(ret));

(void)aclrtFree(outputBuffer);

(void)aclDestroyDataBuffer(outputData);

returnFAILED;

}

cout<<“-> get original output test”<<endl;

aclDataBuffer*dataBuffer=aclmdlGetDatasetBuffer(output_, i);

void*data=aclGetDataBufferAddr(dataBuffer);

uint32_tlen=aclGetDataBufferSizeV2(dataBuffer);

cout<<“-> getDataBufferSizeV2[“<<i<<“] :”<<len<<endl;

float*outData=NULL;

outData=reinterpret_cast<float*>(data);

for(intnum=0;num<10;num++){

cout<<outData[num]<<endl;

}

cout<<“->create model output success “<<endl;

/***************************************************/

/******************inference************************/

/***************************************************/

// for(int i=0;i<100000;i++){

cout<<“->begin inference “<<“model id is “<<modelId_<<endl;

int64_tsum=0;

int64_tstart_time=0;

int64_tend_time=0;

int64_teclipes_time=0;

// for(int i = 0; i < loop_count;i++){

start_time=getCurrentTimeUs();

//ret = aclmdlExecuteAsync(modelId_, input_, output_,stream_);

ret=aclmdlExecute(modelId_, input_, output_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“execute model failed, modelId is %u, errorCode is %d”,

modelId_, static_cast<int32_t>(ret));

returnFAILED;

}

end_time=getCurrentTimeUs();

eclipes_time=end_time-start_time;

// if(i!=0){

// sum =sum + eclipes_time;

// }

printf(“——————use time %.2f ms\n”, eclipes_time/1000.f);

// }

// printf(“loop %d averages use time %.2f ms\n”,loop_count-1,(sum/1000.f)/(loop_count-1));

/***************************************************/

/******************post process*********************/

/***************************************************/

std::vector<cv::Mat>output_list=parseAclOutput(output_);

std::vector<int>output_shape;

aclFormatformat;

std::vector<constchar*>output_names;

aclDataTypedata_type;

intnum_boxes;

intnum_classes;

for(intnum=0;num<(int)output_num;num++){

constchar*output_name;

size_toutput_stride;

uint32_tlen ;

aclmdlIODimsdims;

output_shape.clear();

aclmdlGetOutputDims(modelDesc_, num,&dims);

output_name=aclmdlGetOutputNameByIndex(modelDesc_,num);

output_names.push_back(output_name);

format=aclmdlGetOutputFormat(modelDesc_,num);

data_type=aclmdlGetOutputDataType(modelDesc_,num);

aclDataBuffer*dataBuffer=aclmdlGetDatasetBuffer(output_, num);

void*data=aclGetDataBufferAddr(dataBuffer);

len=aclGetDataBufferSize(dataBuffer);

cout<<“->dims [ “;

for(intdim_num=0;dim_num<(int)dims.dimCount;dim_num++){

cout<<dims.dims[dim_num]<<” “;

output_shape.push_back(dims.dims[dim_num]);

}

cout<<“] output format is “<<format<<” data_type is”<<data_type<<” output_stride is “<<output_stride<<” output_name is “<<output_names[num]<<“-> getDataBufferSize[“<<num<<“] :”<<len<<endl;

if(num==0){

num_boxes=output_shape[1];

}else{

num_classes=output_shape[2];

}

float*outData=NULL;

outData=reinterpret_cast<float*>(data);

//show output data

// for(int show_num=0;show_num<0;show_num++){

// printf(“%f \n”,outData[show_num]);

// }

}

// yolov5的结果集为[1,3,80,80,85] [1,3,40,40,85] [1,3,20,20,85] 下面对这个结果集进行处理

// 这里添加处理YOLOv5结果集的代码

// 获取三个输出特征图

constfloat*feature_map1=nullptr;

constfloat*feature_map2=nullptr;

constfloat*feature_map3=nullptr;

// 遍历输出数据集，获取特征图

std::vector<cv::Mat>feature_maps;

for (size_ti=0; i<aclmdlGetDatasetNumBuffers(output_); ++i) {

aclDataBuffer*buffer=aclmdlGetDatasetBuffer(output_, i);

void*data=aclGetDataBufferAddr(buffer);

if (i==0) {

feature_map1=reinterpret_cast<float*>(data); // 80×80

} else if (i == 1) {

feature_map2=reinterpret_cast<float*>(data); // 40×40

} else if (i == 2) {

feature_map3=reinterpret_cast<float*>(data); // 20×20

}

// 处理参数

constfloatconf_threshold=0.5f; // 置信度阈值

constfloatnms_threshold=0.45f; // NMS阈值

constintinput_width=640; // 输入图像宽度

constintinput_height=640; // 输入图像高度

// 确保创建一个全局向量存储所有检测结果

std::vector<Box>all_detections;

// 处理三个特征图

std::vector<Box>detections_80x80;

processFeatureMap(feature_map1, detections_80x80, 80, 80, 3, 85, conf_threshold, input_width, input_height);

std::vector<Box>detections_40x40;

processFeatureMap(feature_map2, detections_40x40, 40, 40, 3, 85, conf_threshold, input_width, input_height);

std::vector<Box>detections_20x20;

processFeatureMap(feature_map3, detections_20x20, 20, 20, 3, 85, conf_threshold, input_width, input_height);

// 合并所有检测结果

all_detections.insert(all_detections.end(), detections_80x80.begin(), detections_80x80.end());

all_detections.insert(all_detections.end(), detections_40x40.begin(), detections_40x40.end());

all_detections.insert(all_detections.end(), detections_20x20.begin(), detections_20x20.end());

std::cout<<“所有特征图总共检测到 “<<all_detections.size() <<” 个目标”<<std::endl;

// 执行NMS

std::vector<DetectionResult>final_detections=nms_boxes(all_detections, nms_threshold, conf_threshold);

// 输出最终结果

std::cout<<“最终检测到 “<<final_detections.size() <<” 个目标”<<std::endl;

cv::Matimg=cv::imread(“data/output2.jpg”);

draw_boxes(img,final_detections);

cv::imwrite(“result.jpg”,img);

/***************************************************/

/*********************destroy model output*********/

/***************************************************/

for (size_ti=0; i<aclmdlGetDatasetNumBuffers(output_); ++i) {

aclDataBuffer*dataBuffer=aclmdlGetDatasetBuffer(output_, i);

void*data=aclGetDataBufferAddr(dataBuffer);

(void)aclrtFree(data);

(void)aclDestroyDataBuffer(dataBuffer);

}

(void)aclmdlDestroyDataset(output_);

output_=nullptr;

INFO_LOG(“destroy model output success”);

/***************************************************/

/*******************destroy model input*************/

/***************************************************/

for (size_ti=0; i<aclmdlGetDatasetNumBuffers(input_); ++i) {

aclDataBuffer*dataBuffer=aclmdlGetDatasetBuffer(input_, i);

(void)aclDestroyDataBuffer(dataBuffer);

}

(void)aclmdlDestroyDataset(input_);

input_=nullptr;

INFO_LOG(“destroy model input success”);

/***************************************************/

/******uninstall model and release resource*********/

/***************************************************/

cout<<“->unload model id is “<<modelId_<<endl;

ret=aclmdlUnload(modelId_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“unload model failed, modelId is %u, errorCode is %d”,

modelId_, static_cast<int32_t>(ret));

returnFAILED;

}

INFO_LOG(“unload model success, modelId is %u”, modelId_);

// releasemodelDesc_

if (modelDesc_!=nullptr) {

aclmdlDestroyDesc(modelDesc_);

modelDesc_=nullptr;

}

INFO_LOG(“release modelDesc_ success, modelId is %u”, modelId_);

//release resorce

if (stream_!=nullptr) {

ret=aclrtDestroyStream(stream_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“destroy stream failed, errorCode = %d”, static_cast<int32_t>(ret));

}

stream_=nullptr;

}

cout<<“->destroy stream done”<<endl;

if (context_!=nullptr) {

ret=aclrtDestroyContext(context_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“destroy context failed, errorCode = %d”, static_cast<int32_t>(ret));

}

context_=nullptr;

}

cout<<“->destroy context done “<<endl;

ret=aclrtResetDevice(deviceId_);

if (ret!=ACL_SUCCESS) {

ERROR_LOG(“reset device %d failed, errorCode = %d”, deviceId_, static_cast<int32_t>(ret));

}

cout<<“->reset device id is “<<deviceId_<<endl;

ret=aclFinalize();

if (ret!=ACL_SUCCESS) {

ERROR_LOG(” failed, errorCode = %d”, static_cast<int32_t>(ret));

}

INFO_LOG(“end to finalize acl”);

}

上面代码的运行结果图

海思3403-SS928 yolov5 c++开发日记（1）

之所以写这个日记是为了记着整个过程的问题，因为过去的一段时间里，已经踩过不少的坑，到目前为止，有些坑还是没有解决。所以需要记着做过的尝试，成功的要记着，失败的也得记着。

首先，根据供应商和线上一些有限的资料看到，3403-SS928这个芯片号称的10.4T算力是由一个有4.8T算力和另一个5.6T算力的两个不同架构的NPU组成。这两个NPU互相不能通用，模型虽都是.om格式但不通用，调用模型的推理程序和库也不通用。

目前我可以做到4.8T（号称）的这个核的模型转换，使用的是供应商提供的官方模型转换工具 Ascend-cann-toolkit_5.13.t5.0.b050_linux-x86_64.run 这个ATC工具

这个工具的安装也有坑，具体是什么有空再写，总之目前我能成功装上的环境是使用windows10 的 wsl2 安装 ubuntu 20.04版本linux 再加上手工编译安装的 Python 3.8.16

安装atc 工具后，使用 source /${安装目录}/Ascend/ascend-toolkit/latest/x86_64-linux/bin/setenv.bash 引用atc工具环境

运行atc 进行模型转换，命令如下：

atc –model=./yolov5s_v6.2.onnx –framework=5 –input_shape=”images:1,3,640,640″ –output=v5s_o –soc_version=”OPTG” –output_type=FP32

看到网上有些大神的文章说使用 –insert_op_conf= xxx.cfg 定义AAPP，在研究了一翻供应商官方提供的关于ATC的官方文档后暂时觉得没有必要。这东西其实是把图像的前处理以某种方式整合到.om文件里，这是我的个人理解，不知道对不对。但现在我的问题并不是性能问题，而是需要先让整个模型推理过程运行正确。所以我现在没有加入 –insert_op_conf= 这个参数

另外，供应商提供的包里有一个把图片转成.bin 格式的 python 程序。对其分析研究一翻后，先前认为这个程序只是把.jpg图片转成需要输入模型的大小，即 640×640 ,并且把图片数组转为 NCHW 的布局，然后保存成 binary 格式的.bin 文件。但现在感觉这里有两个大坑。

一，图片是使用resize 强制转大小，并没有保持原图的横宽比，这将会导致图像压缩扭曲，使模型识别结果错误（暂时个人认为是这样）。

二，是程序使用PILLOW 库进行图像读取和处理，图像在初始读取时是以RGB 的色域格式进行读取的，但程序对这个RGB 色域进行了错误的转换（暂时我认为是错的），把色域转为BGR。所以，下一步我将修改这个python 使之保持RGB的格式。

编译opencv-4.8.1+ffmpeg的一些问题和解决方法

1、此版opencv貌似存在图片处理上的bug，在arm类的CPU上编译时会在photo上报错，所以得加上 -DBUILD_opencv_xphoto=OFF -DBUILD_opencv_optflow=OFF -DBUILD_opencv_rgbd=OFF 参数，把xphoto屏蔽掉。

2、在arm类的cpu上的linux 编译是不会下载ffmpeg库，即使添加了–DWITH_FFMPEG=ON 也一样，得使用系统APT或YUM下载的 ffmpeg 和相关库并安装才能找到并编译，但如果是有裁剪和嵌入的FFMPEG的需要，就需要自行编译FFMPEG.

3、编译FFMPEG不能使用静态库，并且需要把avresample 加入ffmpeg的编译，否则在编译OPENCV时，会提示找不到avresample 的引用而导致编译报错

4、需要加入-DOPENCV_EXTRA_MODULES_PATH 扩展库，否则在编译OPENCV时会报缺少引用而编译失败

5、怀疑是因为旧版本的编译器的关系，例如是在SS928（即海思3403)的交叉编译器上编译，在编译测试程序时会报错。需要加入-DBUILD_TESTS=OFF 强制不编译OPENCV的测试程序

最后，以下是我在SS928（即海思3403）编译环境上使用的完整OPENCV和FFMPEG的预编译命令：

cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_opencv_world=ON -DCMAKE_INSTALL_PREFIX=./install -DBUILD_opencv_xphoto=OFF -DBUILD_opencv_optflow=OFF -DBUILD_opencv_rgbd=OFF -DWITH_FFMPEG=ON -DOPENCV_EXTRA_MODULES_PATH=../../modules -DBUILD_TESTS=OFF ..

./configure –enable-gpl –enable-libx264 –prefix=./install –enable-pic –enable-avresample –enable-shared –disable-static

rk3588的RKNN_lite使用方法

1、在使用Python的 RKNN_toolkit_lite2 时提示没有找到 librknnrt.so 需从以下路径获取。

https://github.com/rockchip-linux/rknpu2/blob/master/runtime/RK3588/Linux/librknn_api/aarch64/librknnrt.so

2、貌似没有RKNN_toolkit2的板端部署的Python whl 所以，只能把RKNN_toolkit2先在X86 的PC上安装，并使用其进行模型转换，之后再把转换后的模型复制到板端部署使用。

3、模型转换最好使用ONNX格式，转换成目标的rknn格式。ONNX需转换成upset12或以上。

milk-v duo 扩展 / 空间的方法

milk-v duo 的官方镜像写入的 / 空间只有几百M，而我的sd 卡却有16G 的空间，这样就有13G 左右的空间闲置了，经过实测和多翻尝试，使用下面的方法可以扩展 / 空间。

1、把sd 卡插到一个读卡器上，然后用一个LINUX 系统读出，我用的是ubuntu 20.04

2、运行 lsblk 查看sd卡的名称

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0 931.5G  0 disk 
(snip! a lot of extra junk was here)
sdf      8:80   1     0B  0 disk 
sdg      8:96   1 16.1G  0 disk <---- 我的SD卡
├─sdg1   8:97   1   128M  0 part 
├─sdg2   8:98   1   768M  0 part 
└─sdg3   8:99   1   256M  0 part

运行 fdisk /dev/sdg 修改分区

Command (m for help): d
Partition number (1-3, default 3): 2

Partition 2 has been deleted.

Command (m for help): d
Partition number (1,3, default 3): 3

Partition 3 has been deleted.

Command (m for help):

删除旧的分区

Command (m for help): n
Partition type
   p   primary (1 primary, 0 extended, 3 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (2-4, default 2): 2
First sector (262145-250347519, default 264192): 262145
Last sector, +/-sectors or +/-size{K,M,G,T,P} (262145-250347519, default 250347519): ＋12G 

Created a new partition 2 of type 'Linux' and of size 12 GiB.
Partition #2 contains a ext4 signature.

Do you want to remove the signature? [Y]es/[N]o:N  <-----是否移除签名，选N

保险起建，选一下分区类型

Command (m for help): t
Partition number (1,2, default 2): 2
Hex code or alias (type L to list all): 83

Changed type of partition 'Linux' to 'Linux'.

最后保存

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

运行 e2fsck -f /dev/sdg2 检查一下

e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
rootfs: 4730/49152 files (0.6% non-contiguous), 180791/786432 blocks

最后运行 resize2fs /dev/sdg2 调整容量

resize2fs 1.46.5 (30-Dec-2021)
Resizing the filesystem on /dev/sdg2 to 125042684 (1k) blocks.
The filesystem on /dev/sdg2 is now 125042684 (1k) blocks long.

这个调整容量可能会很久，需要准备充足的时间。完成后就可以了。

c++ opencv 的 MOG2 算法参数定义和停止回归训练的方法

Ptr<BackgroundSubtractorMOG2> bgsubtractor = createBackgroundSubtractorMOG2();

// 用于训练背景的帧数，如果不手动设置learning rate，history就被用于计算当前的learning rate，
// history越大，learning rate越低，背景更新越慢
bgsubtractor->setHistory(500);

// 方差阈值，主要用于判断前景还是背景，值越大，灵敏度越低
// 如果光照变化明显，如阳光下的水面，建议设为25,36
bgsubtractor->setVarThreshold(16);

// 是否检测有影子，开启后会增加算法复杂度
bgsubtractor->setDetectShadows(true);

// 高斯模型个数，默认5个，最多8个
bgsubtractor->setBackgroundRatio(4);

// 高斯背景模型权重和阈值，nmixtures个模型按权重重排序后，
// 只取模型权重累加值大于backgroundRatio的前几个作为背景模型
bgsubtractor->setNMixtures(5);

// 新建高斯模型的方差初始值，默认15
bgsubtractor->setVarInit(15);

// 背景更新时，用于限制高斯模型方差的最大值，默认20
bgsubtractor->setVarMax(20);

// 背景更新时，用于限制高斯模型方差的最小值，默认4
bgsubtractor->setVarMin(4);

// 方差阈值，用于已经存在的匹配的模型，如果不存在则新建一个
bgsubtractor->setVarThresholdGen(100);

PS：实际使用时发现指针实例的算法模型并没有以上的函数去设置这些参数细节（opencv 版本 4.5.2）查阅了opencv的官方在线文档后，发现这些参数在c++版本中均为virtual 的函数方法。实际效果我并没有一一调用尝试，只是修改了指针实例化时可传入的 History,Threshold和shadowDetect 三个参数。实际使用中，感觉把History 参数和 Threshold 调大一些会比默认的参数好一些。

关于实际使用时的目标融入背景问题。参阅了多个国内外关于物品遗留的专利和论文，里面大致都会提到使用帧对比或者快、慢速建模的方法，并提到了用此方法可以避免前景目标融入背景，具体做法专利和论文均没有详细提及，有的只是画了一个简单的流程图，后经我查阅了OPENCV的官方文档和研究MOG2 算法处理的原理，得出只要在一开始对背景建模后，可以用一种方法让MOG2模型的训练停止，从而使其只是作推理计算而不作模型更新的办法，这样前景目标就可以不融入背景。具体方法就是在调用apply函数时，同时传入学习率参数，当训练到某程度时，把学习率设为0即可让模型停止更新。同理，只要把学习率设为1 周可以整体刷新模型让其重新学习。

cv::freetype 在windows 的中文乱码问题

在windows 的vs 环境中默认使用的是带BOM的UTF，但即使在保存源代码文件时选了UTF8 ，编译后同样还是会乱码。经再三研究后，发现需在工程文件的属性－》C/C++ －》命令行中加入%(AdditionalOptions) /source-charset:utf-8 /execution-charset:utf-8

加入后重新编译就可以了

Build opencv with gstreamer

因为从pip 下载的公开仓库的 opencv-python 并不包含 gstreamer 的调用，所以要使用opencv + gstreamer 需要自行build opencv 并把 gstreamer build 进去。

编译方法如下：

安装gstreamer

sudo apt-get install gstreamer1.0*

sudo apt install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev

编译opencv

mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D INSTALL_PYTHON_EXAMPLES=ON \
-D INSTALL_C_EXAMPLES=OFF \
-D PYTHON_EXECUTABLE=$(which python3) \
-D BUILD_opencv_python2=OFF \
-D CMAKE_INSTALL_PREFIX=$(python3 -c “import sys; print(sys.prefix)”) \
-D PYTHON3_EXECUTABLE=$(which python3) \
-D PYTHON3_INCLUDE_DIR=$(python3 -c “from distutils.sysconfig import get_python_inc; print(get_python_inc())”) \
-D PYTHON3_PACKAGES_PATH=$(python3 -c “from distutils.sysconfig import get_python_lib; print(get_python_lib())”) \
-D WITH_GSTREAMER=ON \
-D BUILD_EXAMPLES=ON ..

python 解析byte 到具体值

使用 struct.unpack(‘类型’,byte对象) 方法，可以轻松解析byte 到具体值。类型对应的字节长度和字符串如下表

Format	C Type	Python type	Standard size	Notes
`x`	pad byte	no value		(7)
`c`	char	bytes of length 1	1
`b`	signed char	integer	1	(1), (2)
`B`	unsigned char	integer	1	(2)
`?`	_Bool	bool	1	(1)
`h`	short	integer	2	(2)
`H`	unsigned short	integer	2	(2)
`i`	int	integer	4	(2)
`I`	unsigned int	integer	4	(2)
`l`	long	integer	4	(2)
`L`	unsigned long	integer	4	(2)
`q`	long long	integer	8	(2)
`Q`	unsigned long long	integer	8	(2)
`n`	ssize_t	integer		(3)
`N`	size_t	integer		(3)
`e`	(6)	float	2	(4)
`f`	float	float	4	(4)
`d`	double	float	8	(4)
`s`	char[]	bytes		(9)
`p`	char[]	bytes		(8)
`P`	void*	integer		(5)

cuda 12.x.x 在编译 darknet 时的问题和解决

cuda 12 开始不再支持compute_35 这么就造成如果使用这个或更高版本的cuda 时编译 darknet 不过的问题，并且，因为cuda 12 在编译安装后的库目录也做了一些改变，增加了 stubs 目录，在/usr/lib 目录里，部份.so 文件放在了这个目录里，这么就会造成编译 darknet 时提示找不到 -lcuda 的问题。经过我多次尝试和网上查找相关资料，终于找到解决方法

1、修改 Makefile 找到

ARCH= -gencode arch=compute_35,code=sm_35

修改为 ARCH= -gencode arch=compute_75,code=sm_75

2、找到 -lcuda 指向的目录/usr/lib

修改为 /usr/lib/stubs

December 2025
M	T	W	T	F	S	S
« Nov
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31