神奇数字“42”

The choice of random_state=42 is somewhat of a cultural reference and an inside joke in the programming and data science community. Here’s the background and explanation:

Why 42?

Cultural Reference:
The number 42 is famously known as "the answer to the ultimate question of life, the universe, and everything" from Douglas Adams’ science fiction series "The Hitchhiker’s Guide to the Galaxy". This reference has made the number 42 a playful, arbitrary choice that many programmers and data scientists use when a random seed is needed.
Consistency:
Since many tutorials and examples use 42, it has become a de facto standard. Using the same random seed can make it easier for others to follow along with examples and verify results.

一个决策树项目中的数据处理

因为要使用决策树,没法对非数值进行处理,所以需要把dataframe中的一些数据转换成数值,使用了如下code

from sklearn.preprocessing import LabelEncoder
from collections import defaultdict
d = defaultdict(LabelEncoder)

X_trans = X.apply(lambda x: d[x.name].fit_transform(x))
X_trans.head()

ChatGPT4解释如下

The expression lambda x: d[x.name].fit_transform(x) is a compact way of applying a function to each column of a pandas DataFrame. Let’s dissect this expression further to understand how it works in the context of transforming categorical data into numerical format using LabelEncoder in a Python environment.

The lambda Function

A lambda function in Python is a small anonymous function defined with the keyword lambda. The syntax is:

lambda arguments: expression

In this case, the lambda function has one argument x (representing a column of the DataFrame) and the expression involves applying the fit_transform method from a LabelEncoder instance to x.

How the lambda Function is Applied

  • x: Within the context of the apply() method, x represents each column in the DataFrame one at a time. For instance, if the DataFrame has columns "Color" and "Size", x will be the "Color" column data in one iteration and the "Size" column data in the next.

Usage of d[x.name]

  • d: This is a defaultdict containing instances of LabelEncoder. The use of defaultdict ensures that if a LabelEncoder for a particular column name does not exist, it will automatically create a new LabelEncoder instance without raising an error.

  • x.name: This attribute fetches the name of the column currently being processed (like "Color" or "Size").

  • d[x.name]: This retrieves the LabelEncoder instance for the column named x.name. If it’s the first time this column is being accessed, defaultdict initializes a new LabelEncoder for this key.

The fit_transform Method

  • fit_transform(): This method is part of LabelEncoder and serves a dual purpose:
    • fit: It learns the mapping from the labels in the column to integers. For example, for a column with labels ["Red", "Blue", "Green"], it might learn the mapping Red -> 0, Blue -> 1, Green -> 2.
    • transform: It then transforms the labels into these integers based on the mapping it has learned.

Putting It All Together

When you call X.apply(lambda x: d[x.name].fit_transform(x)), it performs the following steps for each column in DataFrame X:

  • Accesses each column as x.
  • Fetches the LabelEncoder for that column by x.name.
  • Applies fit_transform to the column, converting text labels into integers.
  • Returns a new DataFrame where each text label has been replaced by an integer.

This method is particularly useful when preparing data for machine learning models, which generally require numeric input features rather than text labels. The transformation is done in-place and efficiently for each column, utilizing the flexibility of defaultdict and the functionality of LabelEncoder.

simple llm app

我最近在用sensenova的api做测试,每次都在代码里上传,太麻烦了,打算做个小app。

第一步是系统设计。

用chatgpt里的software architecture visualiser工具

第一次生成效果

第二次重新调整了输入,经历几次语法错误后,生成如下

阮一峰_科技爱好者280期摘要

ip地址解释器

https://ip.guide/

curl -sL ip.guide/178.173.224.69›

python全栈开发-聊天室

仅仅使用115行代码,使用flask框架
https://github.com/yuxiaoy1/chatfairy

微软生成式ai课程

https://github.com/Microsoft/generative-ai-for-beginners

机器学习和docker学习系列教程

英文个人网站
https://ataiva.com/archives/

利用llm把自然语言需求转换成软件!

https://github.com/kuafuai/DevOpsGPT
Multi agent system for AI-driven software development. Combine LLM with DevOps tools to convert natural language requirements into working software. Supports any development language and extends the existing code.

超轻量个人博客框架

https://github.com/Meekdai/Gmeek
Gmeek
一个博客框架,超轻量级个人博客模板。完全基于Github Pages 、 Github Issues 和 Github Actions。不需要本地部署,从搭建到写作,只需要18秒,2步搭建好博客,第3步就是写作。

大部分ai创业企业要嗝屁

https://weightythoughts.com/p/most-ai-startups-are-doomed

六款游戏开发平台的对比

https://ruoyusun.com/2023/10/12/one-game-six-platforms.html

重启flask-1

间隔2-3个月后,重新开始编程遭遇的最常见问题是,是环境搭建

  • 换一台电脑,操作系统版本、python依赖的环境都不相同,比如我的imac,因为硬件限制,mac版本大概会永远停留在在Monterey;

  • 即使是曾经使用顺畅的老电脑,也许经过重大升级,比如2022年10月中旬刚刚发布的macOS-13.0,或许是公司办公网络进行了限制,没法顺利下载补丁,也会遭遇环境方面的问题。

import_tkinter问题

首先遭遇的是tkinter问题,如下所示

import _tkinter # If this fails your Python may not be configured for Tk

在stackoverflow上搜索,大部分答案给出的都是你没有安装好tkinter这个库,解决的方式也很简单,用brew安装

但是对我而言,问题没有这么简单,因为我执行这个命令后,出现的是新的错误提示

fatal: not in a git directory
Error: Command failed with exit 128: git

参考下面reference里小哥的分析,这是因为homebrew的组件homebrew-core和homebrew-cask没有被识别为Git仓库。

解决办法也很简单,直接按照brew的提示就好,输入

brew -v

按照提示来操作即可,如下图

怎么,还是报错,因为我copy如上命令时,没有发现最后一行其实是没有"/"链接的,因此会出现no such file or directory的提示,只要copy正确的命令即可

下来,就是正常安装

brew install python-tk

这样就完事了吗,当然不行,执行flask run,下面报错没有user这张数据库表,那么显然是数据库的问题了

数据库问题

因为我们使用的是SQLAlchemy的数据库,因此我们需要先建立一个数据库文件。

同时,我们会在目录下发现一个名为data.db的文件,这个文件名称来自app.py中的app.config函数

app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////' + os.path.join(app.root_path, 'data.db') 

reference

Command failed with exit 128:git

重启flask学习

时隔多日,重启flask学习,这次打算严格按照如下技能要求自己

  • 使用git做版本管理
  • 使用venv虚拟环境
  • 使用nano或vi编辑器
  • 使用vscode
  • 坚持每天学习

删除venv环境,因为写错目录名称了

升级pip和pip3

从github上clone仓库

从一个远程仓库解绑,然后重新和另外一个仓库建立绑定关系

解决git无法push的问题

背景

5月16日,也就是前天,我开始学习李辉的Flask入门教程

目的很简单,由于花了一周时间学习Miguel Grinberg的The Flask Mega-Tutorial,在第四章“Database”上的时间有将近3天。我感觉自己的进度有些快,步子太大,终于扯到了。

按照推特上这位大神的说法,要享受海洋般的码农的工作机会,要经历9层磨难:Flask是第7层,Database是第8层。

wecom-temp-90ebe03d8c5e0ab68eda5686bd5102e9

在第8层滞留许久后,我开始采用李辉的教程,这个教程恰好与Miguel Grinberg的教材在部分知识点上互补。比如【Mega-Tutorial】介绍使用git时,仅仅介绍了git clone命令,方便读者不用逐行敲代码,而【Flask入门教程】对git的介绍就更实用,从git repo的建立开始,包括如何管理自己的git hub页面。

使用git完成watchlist项目后,依葫芦画瓢,我在microblog的项目上也使用了类似命令,但是问题就发生了:

22-05-18/Users/xxxxx/study/test~%>git branch -a
*flask
master

似乎自己就被困在了【Mega-Tutroial】的怪圈里,我估计原因是使用git clone后,默认的分支就是flask,而不是自己新建的master分支,而且无法删除flask这个主分支。

最直接的办法,是新建了一个test目录,在这个目录下把所有文件拷贝到flasktest下面,对原来的内容覆盖掉。

cp -rf flasktest/. test

然而,除了主branch变成master外,一切照旧。

问题现象与分析

现象

我开始就认为是权限问题,“access rights”。

22-05-18/Users/XXXXX/study/flasktest~%>git pull
ERROR: Repository not found.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

所以把努力方向放在如何优化授权上面,比如把私钥再增加一次:

ssh-add ~/.ssh/id_rsa
22-05-18/Users/XXXXX/study/flasktest~%>ssh-add -l
3072 SHA256:n9EHVCpvMuEfdkajdfkajfdkjakdfjakjdfak//Q5pbok XXXXXXXX@dfjakjdfjdfajdfa (RSA)

分析

由于告警信息中也出现了“no work tree”,网上有类似分析说这是bare repository:
The direct reason for the error is that yes, it’s impossible to use git-add with a bare repository. A bare repository, by definition, has no work tree. git-add takes files from the work tree and adds them to the index, in preparation for committing.
尝试去除bare:

git config --unset core.bare

也有意见说,在git的配置文件里,把origin相关信息都屏蔽掉:

22-05-18/Users/xxxxxxxxx/study/test/.git~%>vi config
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
        ignorecase = true
        precomposeunicode = true
#[remote "origin"]
#       url = git@github.com:xxxxxxxx/flasktest.git
#       fetch = +refs/heads/*:refs/remotes/origin/*
#[remote "origin"]
#       url = git@github.com:xxxxxxxx/flasktest.git
#       fetch = +refs/heads/*:refs/remotes/origin/*
[remote "origin"]
        url = git@github.com:xxxxxxxxx/flasktest.git
        fetch = +refs/heads/*:refs/remotes/origin/*

问题解决

折腾了2天,最后只能盯着屏幕告警,突然发现“and the repository exists”,这句话提醒我,是不是这个repo根本就没有创建成功过?

立即上github,手动创建repo。

然后一切水到渠成

22-05-18/Users/xxxxxx/study/flasktest~%>git branch -a
* master
  remotes/origin/master

反思

  1. 想当然认为,repo是通过git命令行来建立的,因此不断的假设flasktest.git已经建立好了;
  2. 并没有认真对待git,git的培训教程也没有认真学习。

git培训教材

Git

基本操作

绑定用户名和邮箱地址

(venv) 22-05-17/Users/XXXXX/study/watchlist~%>git --version
git version 2.30.1 (Apple Git-130)
(venv) 22-05-17/Users/xxxxx/study/watchlist~%>git config --global user.nam
e "xxxxxxx"
(venv) 22-05-17/Users/xxxxx/study/watchlist~%>git config --global user.ema
il "xxxxxx@outlook.com"

查看本地分支

test~%>git branch -vv
* flask     71a338f this is a test of microblog project
  flasktest 71a338f this is a test of microblog project
  master    71a338f this is a test of microblog project

查看远程分支

watchlist~%>git branch -r
  origin/master

删除本地分支

microblog/test~%>git branch -d flasktest
Deleted branch flasktest (was 71a338f).

查看配置列表

test~%>git config --list
remote.origin.url=git@github.com:XXXXXX/flasktest.git
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
branch.master.remote=origin

push与pull

push命令

push命令用来将远程分支内容,拉取到本地分支上,当远程分支和本地分支名称一致时,可以省略本地分支名称,push命令中的origin会让人很疑惑

pull命令

push与pull命令参考

建立或删除本地库与远程库的绑定

建立本地库与远程库的绑定关系,和远程库进行绑定时,需要给远程库一个名字,一般来说origin就是默认的名称

xxx/flasky/watchlist~%>git remote add origin git@github.com:xxx/watchlist.git

删除本地库和远程库的绑定,注意这里只是删除了本地库与远程库之间的映射关系,并没有真的删除任何物理内容。

xxx/flasky/watchlist/watchlist~%>git remote rm origin
xxx/flasky/watchlist/watchlist~%>git remote -v

如果有绑定的远程库,命令“git remote -v”会显示如下

现在因为我使用git remote rm origin命令删除了绑定关系,则git remote -v什么都不会显示,因为linux的最大理念就是,“没有消息就是好消息”😊

fatal: refusing to merge unrealated histories

xxxx/flasky/watchlist~%>git pull origin master
From https://github.com/xxxx/watchlist
 * branch            master     -> FETCH_HEAD
fatal: refusing to merge unrelated histories

The “fatal: refusing to merge unrelated histories” Git error occurs when two unrelated projects are merged, and two projects are unaware of each other existence and having mismatch commit histories.

To provide a little background .git directory, which is usually hidden, contains all the changes or “commits” of the repo that gets tracked. Rewriting the repository history is possible, but it’s generally not the typical use case. Git is used for version control, which means to track the history of the file.

Use Cases that lead to git fatal: refusing to merge unrelated histories

If you have cloned a repository and, for some reason, the .git folder is corrupted or deleted. Since git will be unaware of your local history, any action you perform like git pull or git push to remote repository will throw this error as there is no tracking information for the current branch.
You have created a new repository, made few commits to it, and now try to pull a remote repository that already has its own commits. Git will throw an error here since it is unaware of how these two projects and commits are related.

The solution to Refusing to merge unrelated histories

The error started occurring from git version 2.9.0 release notes and above. To solve this issue –allow-unrelated-histories flag when pulling the data from remote repository.

解决unrelated histories问题参考

全流程概要

  • 1 绑定用户名和邮箱地址(SSH指纹也提前关联好)
  • 2 mkdir watchlist
  • 3 初始化本地库
xxxx/flasky/watchlist~%>git init
Initialized empty Git repository in /Users/xxxx/Study/flasky/watchlist/.git/
  • 4 绑定本地库与远程库
flasky/watchlist~%>git remote add origin git@github.com:madapapa/watchlist.git
  • 5 查看远程库
xxxx/flasky/watchlist~%>git remote -v
origin	git@github.com:madapapa/watchlist.git (fetch)
origin	git@github.com:madapapa/watchlist.git (push)
  • 6 同步远程库内容,忽略unrelated histories问题
xxxx/flasky/watchlist~%>git pull origin master --allow-unrelated-histories
remote: Enumerating objects: 49, done.
remote: Counting objects: 100% (49/49), done.
remote: Compressing objects: 100% (33/33), done.
remote: Total 49 (delta 16), reused 38 (delta 10), pack-reused 0
Unpacking objects: 100% (49/49), 438.70 KiB | 477.00 KiB/s, done.
From github.com:madapapa/watchlist
 * branch            master     -> FETCH_HEAD
 * [new branch]      master     -> origin/master
  • 7 我没有新建gitignore文件

  • 8 在本地修改文件后,进行提交工作

【注意】这里的push命令在第一次使用时,一般建议附带-u参数,这样Git不仅会把本地master分支的内容更新到远程新建的master分支上,还会把本地master和远程的master关联起来,在以后的推送和拉取时就可以简化命令(不需要带-u参数,直接git push即可)。

git常用命令

Python里的__name__究竟有啥用

由于在Pyhton里,并没有一个类似c或其他语言的main()函数,所以当我们把运行python程序的命令传递给所谓解释器,如CPython,或更常见的基于BSD开发的交互式解释器IPython。

比如,使用flask框架进行web编程:

export FLASK_APP = test1.py

通常,解释器会按照第一个没有缩近的代码来顺序执行。
不过,在执行前,解释器会定义一些特殊的变量,__name__就是这种变量。

如果被执行文件是自己被直接执行,那么解释器就会将__name__设置为__main__;如果被执行文件是被import的,那么__name__就会被设置为导入的模块名。

通过对__name__ 变量的检查,我们就可以知道被执行文件是否是被导入的。

下面举个例子:
一个test1.py的文件

print("test1 __name__ is %s"   %__name__)
if __name__ == "__main__":
    print("test1 is being run directly")
else:
    print("test1 is being imported")

输出结果如下

(venv) *****microblog/test/test1.py
test1 __name__ is __main__
test1 is being run directly

test2.py的文件

import test1

print("test2 __name__ is %s " %__name__)

输出结果如下

(venv) *****/microblog/test/test2.py
test1 __name__ is test1
test1 is being imported
test2 __name__ is __main__