intel网卡安装奇遇记

最近项目上碰到一个有意思的事,客户要求实现roce组网,但是云服务商提供的网卡是intel800系列,驱动没有适配国产OS。

官网产品主页

intel官方驱动只支持红帽和suse

从BMC查询的网卡型号为NIC-E810CQDA2G2P5-E810,在intel官网上查询(第二次查询才找到这个全量包),发现其对os的支持范围非常窄(红帽、SUSE、ubuntu等),没有对国产麒麟、欧拉的支持。

第一次查询E810网卡驱动时,base driver,rdma driver都是单独的页面

真心话,intel的驱动查询网站着实不如英伟达的

驱动和操作系统适配


从上表可以看出,810系列网卡,其驱动实际分为两种,base driver和rdma driver,其中base driver又分为物理机和虚拟机两种驱动。

安装驱动

intel的驱动安装相比英伟达网卡,要复杂一些。首先,要安装网卡的基本驱动,启动网卡并设置ip地址。然后再安装rdma驱动,并安装rdma库(从github下载)。最后,为了最大化rdma的性能,要打开流量控制标识(PFC或LFC),同时对应交换机的端口也要设置PFC。

RDMA协议里,只有rocev2从pfc收益最多,原因也很简单,roce是over udp的,而udp可以说是原本就不靠谱的以太网协议里,更不靠谱的一种协议,如果没有流量控制,当出现丢包时,整个网络传输就基本不可用了。

供应商

服务器硬件供应商提供了一个网卡驱动“nic-e810-kylinlinuxv10sp2.rpm,吊诡的是,他们找不到 下载链接。

驱动下载

除了github外,20年前最火爆的开源项目网站是sourceforge.
目前sourceforge活得有滋有味,因为不可言说的原因,在国内可以畅通无阻访问,这点完胜github。

总结,以后配套设备驱动下载的顺序依次为

  • 原厂商官网,如intel,英伟达
  • 集成商或经销商的官网,一般可以在原厂商官网上找到,如intel
  • 开源网站

查理芒格去世,学习一切,才是他挥舞的长剑

今天早晨看到芒格去世的消息,众多公众号文章中,有一篇我阅读的最认真,下面这段话特别引起我的共鸣,大概一周前,在郑州一间宾馆房间里,我产生过类似的想法。

83岁的芒格在南加州大学毕业典礼上说:

我想进一步解释为什么人们必须拥有跨科学的心态,才能高效而成熟地生活。在这里,我想引用古代最伟大的律师马尔库斯·图鲁斯·西塞罗的一个重要思想。

西塞罗有句话很著名,他说,如果一个人不知道他出生之前发生过什么事情,在生活中就会像一个无知的孩童。

这个道理非常正确,西塞罗正确地嘲笑了那些愚蠢得对历史一无所知的人。但如果你们将西塞罗这句话推而广之——我认为你们应该这么做——除了历史之外,还有许多东西是人们必须了解的。

所谓的许多东西就是所有学科的重要思想。但如果你对一种知识死记硬背,以便能在考试中取得好成绩,这种知识对你们不会有太大的帮助。

你们必须掌握许多知识,让它们在你们的头脑中形成一个思维框架,在随后的日子里能自动地运用它们。如果你们能够做到这一点,我郑重地向你们保证,总有一天你们会在不知不觉中意识到:“我已经成为我的同龄人中最有效率的人之一”。

与之相反,如果不努力去实践这种跨科学的方法,你们中的许多最聪明的人只会取得中等成就,甚至生活在阴影中。

过去40年,我对某些领域的知识,有意无意的不接触,例如医疗健康、化学、汽车修理,大多数人和我一样,被工作劳心劳力,即使一个热爱学习的人,也往往把精力放在与工作有关的事情上。就算有自己的兴趣,也仅仅聚焦在很窄的一个领域。

我们生而为人,作为个体和宇宙发生联系,对宇宙的认识的质量,是建立在从古至今,从中到外所有人有价值的知识上。缺乏任何方面的知识,都会影响我们对宇宙的认知。

这并不是说,学习中不要区分优先级,毕竟选择哪些马上可以实践的知识,是学习知识最好的方式。

我的核心看法是:不能为自己预设学习的门槛,不能把兴趣、利益作为掩盖懒惰的借口,所有未知的知识,都是你要勇于去探索,去掌握、去实践的。否则就是自动放弃掌握你的小宇宙的权利,拱手相让给那些坏人。

不要被懒惰、胆怯掌控,去面对整个宇宙,留下一点点余晖,最终消失在星辰之间。

附录:
查理·芒格逝世:我的剑传给能挥舞它的人

更高级的minikube命令

几个高级的minikube命令

Now that we have familiarized ourselves with the default minikube start command, let’s dive deeper into Minikube to understand some of its more advanced features.

The minikube start by default selects a driver isolation software, such as a hypervisor or a container runtime, if one (VitualBox) or multiple are installed on the host workstation. In addition it downloads the latest Kubernetes version components. With the selected driver software it provisions a single VM named minikube (with hardware profile of CPUs=2, Memory=6GB, Disk=20GB) or container to host the default single-node all-in-one Kubernetes cluster. Once the node is provisioned, it bootstraps the Kubernetes control plane (with the default kubeadm tool), and it installs the latest version of the default container runtime, Docker, that will serve as a running environment for the containerized applications we will deploy to the Kubernetes cluster. The minikube start command generates a default minikube cluster with the specifications described above and it will store these specs so that we can restart the default cluster whenever desired. The object that stores the specifications of our cluster is called a profile.

As Minikube matures, so do its features and capabilities. With the introduction of profiles, Minikube allows users to create custom reusable clusters that can all be managed from a single command line client.

The minikube profile command allows us to view the status of all our clusters in a table formatted output. Assuming we have created only the default minikube cluster, we could list the properties that define the default profile with:

$ minikube profile list

|———-|————|———|—————-|——|———|———|——-|——–|

Profile VM Driver Runtime IP Port Version Status Nodes Active
minikube virtualbox docker 192.168.59.100 8443 v1.25.3 Running 1 *
———- ———— ——— —————- —— ——— ——— ——- ——–

This table presents the columns associated with the default properties such as the profile name: minikube, the isolation driver: VirtualBox, the container runtime: Docker, the Kubernetes version: v1.25.3, the status of the cluster – running or stopped. The table also displays the number of nodes: 1 by default, the private IP address of the minikube cluster’s control plane VirtualBox VM, and the secure port that exposes the API Server to cluster control plane components, agents and clients: 8443.

What if we desire to create several reusable clusters instead, with other drivers (Docker or Podman (still experimental on Linux)) for node isolation, or different Kubernetes versions (v1.23.3 or v1.24.4), another runtime (cri-o or containerd), and possibly 2, 3, or more nodes (if permitted by the resources of our host system)? What if we desire to further customize the cluster with a specific networking option or plugin? The minikube start command allows us to create such custom profiles with the –profile or -p flags. Several of the isolation drivers support creation of node VMs or node containers of custom sizes as well, features that we will not explore in this course as not all are very stable at the time of this writing.

Below are a few examples of more complex start commands that allow custom clusters to be created with Minikube. They assume that the desired driver software (Docker and/or Podman) has been installed on the host workstation. There is no need to download the desired CNI (network plugin) or the container runtime, they will be set up and enabled by Minikube on our behalf:

$ minikube start –kubernetes-version=v1.23.3
–driver=podman –profile minipod

$ minikube start –nodes=2 –kubernetes-version=v1.24.4
–driver=docker –profile doubledocker

$ minikube start –driver=virtualbox –nodes=3 –disk-size=10g
–cpus=2 –memory=4g –kubernetes-version=v1.25.1 –cni=calico
–container-runtime=cri-o -p multivbox

$ minikube start –driver=docker –cpus=6 –memory=8g
–kubernetes-version="1.24.4" -p largedock

$ minikube start –driver=virtualbox -n 3 –container-runtime=containerd
–cni=calico -p minibox

Once multiple cluster profiles are available (the default minikube and custom minibox), the profiles table will look like this:

$ minikube profile list

|———-|————|———|—————-|——|———|———|——-|——–|

Profile VM Driver Runtime IP Port Version Status Nodes Active
minibox virtualbox crio 192.168.59.101 8443 v1.25.3 Running 3
minikube virtualbox docker 192.168.59.100 8443 v1.25.3 Running 1 *
———- ———— ——— —————- —— ——— ——— ——- ——–

The active marker indicates the target cluster profile of the minikube command line tool. The target cluster can be set to minibox with the following command:

$ minikube profile minibox

The target cluster can be set to the default minikube with one of the following commands:

$ minikube profile minikube

$ minikube profile default

一个实际的例子

增加一个新的cluster

$ minikube start --driver=docker -p minibox

孤独的猛士

昨天看到一个很有意思的评论。

在东方大国使用chatGPT,要对抗两个超级大国的联合制裁

早上出恭时,我就在想啊,难道这不是国师们的一步大棋吗?

因为我们一直是有一个逻辑:敌人的敌人就是朋友,凡事敌人打压的,就是我们要鼓励的。

那么究竟是什么样的大棋观,让我们采取了和灯塔国一样的策略呢?我们知道灯塔人在科技方面,和老百姓民生有关的最突出表现,就是人工智能了。马斯克的火箭也很先进,毕竟离老百姓还有点远,对地球上全社会生产效率的提升也非常有限。

人工智能恰恰相反,真正能提升各行各业的生产效率,这也是灯塔作为一个超级大国的核心竞争力,就整个国家财富的指标来看,股市最有代表性,看看英伟达和微软的迅猛涨势。如果连这点优势都被东方大国夺去,我认为这个严重性要远远大于地缘政治的竞争,这属于釜底抽薪了,看看大鹅今天惨况,很容易得出这个结论–霸权没有经济支撑,最终就是一败涂地。

东方大国难道没有看到这一点吗?难道是要对方的最后一块国之命脉上发起进攻吗?当然不会,这是最大的韬光养晦。就像一个市场上竞争的两个寡头,双方打算在其他更重要领域进行合作,合作方式为一方通过示弱在人工智能行业进行了妥协,那么什么是更重要的领域呢?

12月apec会议,我们拭目以待,神秘领域自然浮现。

k8s亲和度调度

阿里云的这篇文章,介绍了如何利用亲和度(or反亲和度)特性,来实现pod的调度。
这里有一个例子,用来把pod调度到特定名字的node上。

比如要将Pod调度到nodeName是k8s-worker-2的节点上
apiVersion: v1
kind: Pod
metadata:
  name: webapp
  namespace: demo
  labels:
    app: webapp
spec:
  nodeName: 'k8s-worker-2'
  containers:
    - name: webapp
      image: nginx
      ports:
        - containerPort: 80

下面是chatgpt对这段脚本的详细解释

The YAML configuration you've provided is for a Kubernetes Pod named "webapp" in the "demo" namespace. Here's a breakdown of the configuration:

- `apiVersion: v1` and `kind: Pod`: These fields specify the Kubernetes API version and the resource type, which, in this case, is a Pod.

- `metadata`: This section contains metadata information for the Pod, such as its name, namespace, and labels.

  - `name: webapp`: The name of the Pod is set to "webapp."

  - `namespace: demo`: The Pod is created within the "demo" namespace.

  - `labels`: Labels are key-value pairs that help you organize and categorize Pods. In this case, the label "app" is set to "webapp."

- `spec`: This section defines the specifications of the Pod, including its containers, resources, and other settings.

  - `nodeName: 'k8s-worker-2'`: This field specifies the node (Kubernetes worker node) where the Pod should be scheduled. It pins the Pod to a specific node named "k8s-worker-2." This is known as node affinity and ensures that the Pod runs on a specific node, which can be useful for various scenarios such as hardware requirements or node-specific configurations.

  - `containers`: This is a list of containers running within the Pod. In this case, there is one container defined.

    - `name: webapp`: The container is named "webapp."

    - `image: nginx`: This field specifies the Docker image to use for this container. In this case, it uses the official Nginx image.

    - `ports`: This field defines the ports that are exposed by the container.

      - `containerPort: 80`: The container exposes port 80, which is the default port for HTTP traffic, allowing the Nginx web server within the container to listen for incoming HTTP requests.

This configuration will create a Pod running an Nginx web server on a specific node named "k8s-worker-2" in the "demo" namespace. The web server within the Pod will be accessible on port 80, and it has the label "app=webapp" for categorization and selection purposes.

读许倬云《现代文明的成坏》有感

在本书的最后一章,许倬云收录了自己写的一首短歌—《读北岛-青灯—有感》,其中末段他写道:

无穷折射中,
你我他,
今夕与未来,
不需分辨,
都融入N纬度的无限。
芥籽中见须弥,
刹那便是永恒——
人间在我,我在人间

他认为基督教式微后,作为人与神之间纽带的理性,徒留了“实用”这一点点的作用。儒教则自古“以人为本”,但在西方文明的冲击下,已然体无完肤。

随着科技在经济生活中的重要性与日俱增,重拾文明建设的要点,就要依托现代科技了。

科技把人与人之间的关系,重塑为一张大网,个人与这张大网形成合力,探索宇宙以发生联系。通过这一过程,让人与神再次链接起来,形成超越的价值,天人合一。

科技+儒教,从此相得益彰。

kubelet and kubectl

Kubelet and kubectl are two essential components of the Kubernetes ecosystem, each serving distinct but interconnected roles.

Kubelet:

  • Definition:

    • Kubelet is an agent that runs on each node in a Kubernetes cluster. Its primary responsibility is to ensure that the containers running on the node are in the desired state, as defined by the Kubernetes control plane.
  • Key Functions:

    • Pod Lifecycle Management: Kubelet is responsible for starting, stopping, and maintaining the lifecycle of pods on its node. It communicates with the container runtime (e.g., Docker, containerd) to create and manage containers within pods.
    • Health Checks: Kubelet regularly performs health checks on the containers it manages. If a container or pod is not healthy, Kubelet can take corrective actions, such as restarting the container.
    • Resource Management: Kubelet ensures that containers have access to the necessary resources (CPU, memory) as specified in pod resource requests and limits.
    • Image Management: Kubelet pulls container images as needed and ensures they are available for pod instantiation.
    • Node Status: Kubelet reports the node’s status and resource usage to the Kubernetes control plane.

kubectl:

  • Definition:

    • kubectl (Kube Control) is the command-line tool used to interact with a Kubernetes cluster. It serves as the primary interface for administrators, developers, and operators to manage and control the cluster.
  • Key Functions:

    • Cluster Management: kubectl allows you to create, update, and delete Kubernetes resources like pods, services, deployments, and config maps.
    • Resource Inspection: You can use kubectl to inspect the status and details of resources in the cluster, e.g., kubectl get pods, kubectl describe service.
    • Scaling: kubectl enables you to scale deployments or replica sets up or down, e.g., kubectl scale deployment my-app –replicas=3.
    • Resource Creation: You can create resources from YAML or JSON files using kubectl apply -f <file.yaml>.
    • Pod Execution: You can execute commands inside pods using kubectl exec, access logs with kubectl logs, and copy files in and out of pods with kubectl cp.
    • Interactivity: kubectl offers an interactive mode, where you can run a shell in a container using kubectl exec -it.

In summary, Kubelet is an agent running on each node that takes care of pod and container management on that node, while kubectl is the command-line tool used to manage and control the entire Kubernetes cluster from a centralized location. They work together to ensure that containerized workloads are orchestrated and maintained according to the desired configuration.

石康10月24日公众号文章

石康10月24日的公众号文章《印度智慧》一文,结尾他写到:

在历史的长河中,大家(指印度和其他国家)各走各路,各观各景,各抒各情,自享其福,自取其辱,冷暖自知,至死方休。
不过是种瓜得瓜,不足为奇。

问题是你要如何应对。
对世界,你微尘瞬间。
对个人,你一生一世。

flask的开发模式和生产模式(WSGI)

Flask容器化部署这篇文章介绍了一种利用WSGI技术,实现生产部署的方法。
WSGI介绍

我想了解flask run,下面是chatgpt的答复:

Flask is a micro web framework for Python that allows you to build web applications easily. When you use flask run to set up a web service, you are starting a development web server that comes bundled with Flask. This server is typically used for development and debugging purposes. It is not suitable for production use because it is single-threaded, not very robust, and not designed to handle a high volume of requests.

On the other hand, when you use a WSGI (Web Server Gateway Interface) server to deploy your Flask application, you are using a production-ready server to serve your application. WSGI is a standardized interface that allows web applications written in Python, like Flask, to be deployed on various web servers. When you deploy your Flask application with a WSGI server, such as Gunicorn, uWSGI, or mod_wsgi, you get a more robust and scalable web server setup that can handle concurrent requests and is better suited for production use.

Here are some key differences and considerations:

  1. Development vs. Production: flask run is intended for development and debugging, while WSGI servers are used in production environments.

  2. Scalability: WSGI servers can handle multiple concurrent requests and are suitable for scaling your application as traffic increases. Flask’s development server is not optimized for high concurrency.

  3. Reliability: WSGI servers are more reliable and have features like process management, worker processes, and failover mechanisms, making them better suited for handling production workloads.

  4. Deployment: When deploying your Flask application in a production environment, it’s recommended to use a WSGI server in conjunction with a reverse proxy server (like Nginx or Apache) for better performance, security, and load balancing.

In summary, while flask run is convenient for quickly testing your Flask application during development, it’s not suitable for production use. When deploying your Flask application for production, you should use a WSGI server for better performance, concurrency, and reliability.